Skip to content

Conversation

@ThrudPrimrose
Copy link
Collaborator

The analysis returns a dictionary containing the left- and right-hand-side operands of a given tasklet. It returns the operator (or function), the constants involved, and whether they are data (array or scalar) or constants (hardcoded constants or symbols), etc.

I have implemented the analysis for the auto-vectorization to determine whether to use vector-vector intrinsics or vector-scalar intrinsics, etc. It only accepts tasklets with at most 2 rhs operands. SplitTasklets that I have merged before can be used to ensure that all supported operators and a subset of functions (such as min, sin, cos, etc.) are used. Python tasklets have 2 rhs operands.

The analysis was also useful for a student trying to implement blocked-FP numbers in DaCe, so I think it will have further use beyond auto-vectorization.

Copy link
Collaborator

@phschaad phschaad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some questions and minor corrections - after that will take closer look at remaining parts

import typing


class TaskletType(Enum):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

str enum?



def _token_split(string_to_check: str) -> Set[str]:
"""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not the correct (Sphinx) docstring format we use in the rest of dace (apply everywhere)

def _token_split(string_to_check: str) -> Set[str]:
"""
Splits a string into a set of tokens, keeping delimiters, and returns all tokens.
The input string is split on empty space and brackets (` `, `(`, `)`, `[`, `]`).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

double backticks (``) for code comments

if isinstance(node.op, ast.USub):
return f"-{node.operand.value}"
elif isinstance(node.op, ast.UAdd):
return str(node.operand.value)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not +value?

return lhs_str, rhs_str


def _extract_non_connector_syms_from_tasklet(node: dace.nodes.Tasklet, state) -> typing.Set[str]:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing type hints and direct inclusion of typing.
Since dace is Python 3.10+, use set[str] instead of typing

n_in = len(in_conns)
n_out = len(out_conns)

assert n_out <= 1, "Only support tasklets with at most 1 output in this pass"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same

assert isinstance(node, dace.nodes.Tasklet)
code: CodeBlock = node.code
assert code.language == dace.dtypes.Language.Python
code_str: str = code.as_string
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's inefficient that you already have an AST object and you convert it to a string and then reparse it. Explain your reasoning please.

lhs_data = state.sdfg.arrays[lhs_data_name]

# Assignment operators it will return op <- `=` and always populate `rhs1`
if code_str == f"{lhs} = {rhs}" or code_str == f"{lhs} = {rhs};":
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would not trust an expression like this (e.g., what about a = (b)?). Why not use isinstance(ast.Assign/AnnAssign)?

info_dict.update({"type": ttype, "constant1": c1, "constant2": None, "op": op})
return info_dict

raise NotImplementedError("Unhandled case in detect tasklet type")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this just return a cannot-classify-tasklet or "other" TaskletType result?

return node


def rewrite_boolean_functions_to_boolean_ops(src: str) -> str:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where is this used?

Copy link
Contributor

@alexnick83 alexnick83 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very well written, but I feel that it is a bit overengineered considering the currently limited supported cases. Please find some comments and questions below, but I will probably need to have another look.

Each pattern represents a specific combination of input types (arrays, scalars, symbols)
and operation types (assignment, binary operation, unary operation).

Note: inside a tasklet you always have scalars, it is about he connector types
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Note: inside a tasklet you always have scalars, it is about he connector types
Note: inside a tasklet you always have scalars, it is about the connector types

ARRAY_ARRAY_ASSIGNMENT: Direct array-to-array copy (e.g., a = b)
ARRAY_SYMBOL_ASSIGNMENT: Symbol/constant assignment to array (e.g., a = sym)
ARRAY_SCALAR_ASSIGNMENT: Scalar variable assignment to array (e.g., a = scl)
SCALAR_ARRAY_ASSIGNMENT: Array assignment to scalar variable (e.g., scl = a)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a valid case? Even when, e.g., you assign A[i] to a scalar b, isn't the source connector a scalar?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see now, way below (line 600+), that you are looking at the data type, not the connector type. Maybe this should be clarified in the note above (line 26).

----------
string_to_check : str
The string to split into tokens.
pattern_str : str
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But the parameter is not actually kept in the signature

# Split while keeping delimiters
tokens = re.split(r'(\s+|[()\[\]])', string_to_check)

# Replace tokens that exactly match src
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment doesn't make sense (copied from token_match?)

if isinstance(node, ast.Constant):
return str(node.value)
elif isinstance(node, ast.UnaryOp) and isinstance(node.operand, ast.Constant):
if isinstance(node.op, ast.USub):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need the bitwise inversion? In that case, I would evaluate the result and convert it to a string.


found = op

code_rhs = src.split(" = ")[-1].strip()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it be better to convert to AST, find the assignment, and then take the value?

"""
tdict = dict()
for ie in state.in_edges(tasklet):
if ie.data is not None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe worth testing what happens when the input comes from another tasklet ... we used to have some "hidden" scalar descriptors for that, but I am not up to date on that.

For function calls, uses AST parsing to extract arguments in order.
For operators, splits the code by the operator symbol.
"""
code_rhs = code_str.split(" = ")[-1].strip()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above.

n_out = len(out_conns)

assert n_out <= 1, "Only support tasklets with at most 1 output in this pass"
lhs = next(iter(node.out_connectors.keys())) if n_out == 1 else None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nitpicky, but why not just lhs = out_conns[0] if n_out == 1 else None?


assert n_out == 1

if n_in == 1:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the many restrictions on supported tasklets, the following feels somewhat too long. If I understand correctly, you accept codes with a single expression (assignment). The RHS may be either an operand or a unary/binary operator. The operands can only be connector names, symbols, or other expressions that evaluate to a constant. You also have a special case where the binary operation is applied to the same connector and interpreted as a unary op. Are you going to support more generalized tasklets in the future?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants