-
Notifications
You must be signed in to change notification settings - Fork 1k
feat(snowflake)!: support transpilation of BITSHIFTLEFT and BITSHIFTRIGHT #6586
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
SQLGlot Integration Test ResultsComparing:
By Dialect
Overallmain: 5697 total, 5429 passed (pass rate: 95.3%), sqlglot version: sqlglot:feature/transpile-bitshift: 5697 total, 5429 passed (pass rate: 95.3%), sqlglot version: Difference: No change |
georgesittas
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs testing, but I'm skeptical about the approach. May need to sync in Slack about this.
@georgesittas Please see the Jira ticket for all tests that were run prior to the first push. There's about 300 test queries against Snowflake, transpiled, run against DuckDB and results checked against one another. |
|
We need to add tests. |
9f96195 to
21052b5
Compare
21052b5 to
7c4b5e3
Compare
83f7591 to
526cd22
Compare
|
@fivetran-kwoodbeck @VaggelisD take a look at the refactored PR when you get the chance, took a stab at cleaning it up a bit. |
VaggelisD
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only got one question, @fivetran-kwoodbeck if the refactored logic & tests check out feel free to merge
|
Hi @VaggelisD The tests don't pass and I'm not sure what's the best way to deal with it (a lot of back and forth). There are 2 issues, both of which can be seen here: The first is I think it needs an Which I think can be solved with an extra |
526cd22 to
bc2170a
Compare
georgesittas
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few more comments, but should be good to go soon.
| # Ensure type annotation is available for nested expressions | ||
| if not this.type: | ||
| from sqlglot.optimizer.annotate_types import annotate_types | ||
|
|
||
| this = annotate_types(this, dialect=self.dialect) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't have to annotate here, right? The assumption is that we've already annotated w/ Snowflake at this point and so the type should be available. If not, then we don't have to do anything, since DuckDB will not annotate w/ binary and the branch in L774 should not be taken.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was added to help with nested calls. Without it, this in Snowflake:
SELECT BITSHIFTRIGHT(BITSHIFTLEFT(X'0080'::BINARY, 1), 1)
was transpiling to this in DuckDB (which doesn't run):
SELECT CAST(CAST(CAST(CAST(UNHEX('0080') AS BLOB) AS BIT) << 1 AS BLOB) AS INT128) >> 1
as opposed to:
SELECT CAST(CAST(CAST(CAST(CAST(UNHEX('0080') AS BLOB) AS BIT) << 1 AS BLOB) AS BIT) >> 1 AS BLOB)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you do annotate_types(ast, dialect="snowflake").sql("duckdb")? This is how we expect the flow to look like.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When testing, I'm calling transpile(snowflake_sql, read="snowflake", write="duckdb")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
^ We can operate under the assumption that the ast being converted into duckdb will be annotated already, meaning that you can 1) parse w/ snowflake 2) annotate w/ snowflake and 3) generate to duckdb.
Using only the transpile API is not enough in some cases, hence the above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I suggest removing this branch and adapting the testing logic so that it calls annotate types first and then generates duckdb sql, there are some other examples of this in the codebase that you can mimic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I'll update my testing interface. I do see a few examples of this = annotate_types(...) in this file, which is where I got the idea from.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @georgesittas, it appears that validate_all doesn't annotate types? Without the inline annotate_types call, tests like this result in an invalid duckdb query:
self.validate_all(
"SELECT BITSHIFTLEFT(X'FF', 4)",
write={
"snowflake": "SELECT BITSHIFTLEFT(x'FF', 4)",
"duckdb": "SELECT CAST(UNHEX('FF') AS INT128) << 4",
},
)
Another option is to have a computed type property on HexString:
class HexString(Condition):
arg_types = {"this": True, "is_integer": False}
@property
def type(self) -> t.Optional[DataType]:
if self._type:
return self._type
# Infer type from is_integer arg set at parse time
if self.args.get("is_integer"):
return DataType.build("BIGINT")
return DataType.build("BINARY")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @georgesittas, it appears that validate_all doesn't annotate types
Hey, that's correct– this is intentional. This is one example of what I was referring to: https://github.com/tobymao/sqlglot/blob/94e6fe03752500ddeace317081dc758d36c78c1f/tests/dialects/test_bigquery.py#L785C21-L789.
sqlglot/typing/__init__.py
Outdated
| exp.HexString, | ||
| exp.Unhex, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we preemptively annotating HexString as BINARY here? There is a flag HEX_STRING_IS_INTEGER_TYPE that determines the output type of HexString, so the base annotator should use that to determine the type and hence needs a custom annotator.
In BigQuery and ClickHouse, the type is an integer, for example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed, does this fit the bill in terms of updating the base annotator?
a90a7e7 to
4bd8cc5
Compare
…DB with INT128 casts and precedence fixes
52ccc48 to
72741a9
Compare
DuckDB has support for left and right shift (<< and >>) up to INT128, but by default it assumes INT32 and needs casting in order to prevent it from throwing errors in various situations. See the Jira tickets for full testing.
Snowflake claims it always returns an INT128 (for INT input). In practice, it scales based on result and ranges from NUMBER(3,0) to NUMBER(38,0) depending on what number is returned. Snowflake also supports BINARY input.
Snowflake does not throw errors when you shift over the max int, DuckDB does.
Documentation:
https://docs.snowflake.com/en/sql-reference/functions/bitshiftleft
https://docs.snowflake.com/en/sql-reference/functions/bitshiftright