-
Notifications
You must be signed in to change notification settings - Fork 150
Add Tasklet Classification #2280
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
phschaad
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some questions and minor corrections - after that will take closer look at remaining parts
Co-authored-by: Philipp Schaad <[email protected]>
| import typing | ||
|
|
||
|
|
||
| class TaskletType(Enum): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
str enum?
|
|
||
|
|
||
| def _token_split(string_to_check: str) -> Set[str]: | ||
| """ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not the correct (Sphinx) docstring format we use in the rest of dace (apply everywhere)
| def _token_split(string_to_check: str) -> Set[str]: | ||
| """ | ||
| Splits a string into a set of tokens, keeping delimiters, and returns all tokens. | ||
| The input string is split on empty space and brackets (` `, `(`, `)`, `[`, `]`). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
double backticks (``) for code comments
| if isinstance(node.op, ast.USub): | ||
| return f"-{node.operand.value}" | ||
| elif isinstance(node.op, ast.UAdd): | ||
| return str(node.operand.value) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not +value?
| return lhs_str, rhs_str | ||
|
|
||
|
|
||
| def _extract_non_connector_syms_from_tasklet(node: dace.nodes.Tasklet, state) -> typing.Set[str]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
missing type hints and direct inclusion of typing.
Since dace is Python 3.10+, use set[str] instead of typing
| n_in = len(in_conns) | ||
| n_out = len(out_conns) | ||
|
|
||
| assert n_out <= 1, "Only support tasklets with at most 1 output in this pass" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same
| assert isinstance(node, dace.nodes.Tasklet) | ||
| code: CodeBlock = node.code | ||
| assert code.language == dace.dtypes.Language.Python | ||
| code_str: str = code.as_string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's inefficient that you already have an AST object and you convert it to a string and then reparse it. Explain your reasoning please.
| lhs_data = state.sdfg.arrays[lhs_data_name] | ||
|
|
||
| # Assignment operators it will return op <- `=` and always populate `rhs1` | ||
| if code_str == f"{lhs} = {rhs}" or code_str == f"{lhs} = {rhs};": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would not trust an expression like this (e.g., what about a = (b)?). Why not use isinstance(ast.Assign/AnnAssign)?
| info_dict.update({"type": ttype, "constant1": c1, "constant2": None, "op": op}) | ||
| return info_dict | ||
|
|
||
| raise NotImplementedError("Unhandled case in detect tasklet type") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this just return a cannot-classify-tasklet or "other" TaskletType result?
| return node | ||
|
|
||
|
|
||
| def rewrite_boolean_functions_to_boolean_ops(src: str) -> str: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
where is this used?
alexnick83
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very well written, but I feel that it is a bit overengineered considering the currently limited supported cases. Please find some comments and questions below, but I will probably need to have another look.
| Each pattern represents a specific combination of input types (arrays, scalars, symbols) | ||
| and operation types (assignment, binary operation, unary operation). | ||
|
|
||
| Note: inside a tasklet you always have scalars, it is about he connector types |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| Note: inside a tasklet you always have scalars, it is about he connector types | |
| Note: inside a tasklet you always have scalars, it is about the connector types |
| ARRAY_ARRAY_ASSIGNMENT: Direct array-to-array copy (e.g., a = b) | ||
| ARRAY_SYMBOL_ASSIGNMENT: Symbol/constant assignment to array (e.g., a = sym) | ||
| ARRAY_SCALAR_ASSIGNMENT: Scalar variable assignment to array (e.g., a = scl) | ||
| SCALAR_ARRAY_ASSIGNMENT: Array assignment to scalar variable (e.g., scl = a) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a valid case? Even when, e.g., you assign A[i] to a scalar b, isn't the source connector a scalar?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see now, way below (line 600+), that you are looking at the data type, not the connector type. Maybe this should be clarified in the note above (line 26).
| ---------- | ||
| string_to_check : str | ||
| The string to split into tokens. | ||
| pattern_str : str |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But the parameter is not actually kept in the signature
| # Split while keeping delimiters | ||
| tokens = re.split(r'(\s+|[()\[\]])', string_to_check) | ||
|
|
||
| # Replace tokens that exactly match src |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment doesn't make sense (copied from token_match?)
| if isinstance(node, ast.Constant): | ||
| return str(node.value) | ||
| elif isinstance(node, ast.UnaryOp) and isinstance(node.operand, ast.Constant): | ||
| if isinstance(node.op, ast.USub): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need the bitwise inversion? In that case, I would evaluate the result and convert it to a string.
|
|
||
| found = op | ||
|
|
||
| code_rhs = src.split(" = ")[-1].strip() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't it be better to convert to AST, find the assignment, and then take the value?
| """ | ||
| tdict = dict() | ||
| for ie in state.in_edges(tasklet): | ||
| if ie.data is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe worth testing what happens when the input comes from another tasklet ... we used to have some "hidden" scalar descriptors for that, but I am not up to date on that.
| For function calls, uses AST parsing to extract arguments in order. | ||
| For operators, splits the code by the operator symbol. | ||
| """ | ||
| code_rhs = code_str.split(" = ")[-1].strip() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above.
| n_out = len(out_conns) | ||
|
|
||
| assert n_out <= 1, "Only support tasklets with at most 1 output in this pass" | ||
| lhs = next(iter(node.out_connectors.keys())) if n_out == 1 else None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nitpicky, but why not just lhs = out_conns[0] if n_out == 1 else None?
|
|
||
| assert n_out == 1 | ||
|
|
||
| if n_in == 1: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given the many restrictions on supported tasklets, the following feels somewhat too long. If I understand correctly, you accept codes with a single expression (assignment). The RHS may be either an operand or a unary/binary operator. The operands can only be connector names, symbols, or other expressions that evaluate to a constant. You also have a special case where the binary operation is applied to the same connector and interpreted as a unary op. Are you going to support more generalized tasklets in the future?
The analysis returns a dictionary containing the left- and right-hand-side operands of a given tasklet. It returns the operator (or function), the constants involved, and whether they are data (array or scalar) or constants (hardcoded constants or symbols), etc.
I have implemented the analysis for the auto-vectorization to determine whether to use vector-vector intrinsics or vector-scalar intrinsics, etc. It only accepts tasklets with at most 2 rhs operands. SplitTasklets that I have merged before can be used to ensure that all supported operators and a subset of functions (such as min, sin, cos, etc.) are used. Python tasklets have 2 rhs operands.
The analysis was also useful for a student trying to implement blocked-FP numbers in DaCe, so I think it will have further use beyond auto-vectorization.