Skip to content

Conversation

@fivetran-kwoodbeck
Copy link
Collaborator

@fivetran-kwoodbeck fivetran-kwoodbeck commented Dec 17, 2025

DuckDB has support for left and right shift (<< and >>) up to INT128, but by default it assumes INT32 and needs casting in order to prevent it from throwing errors in various situations. See the Jira tickets for full testing.

Snowflake claims it always returns an INT128 (for INT input). In practice, it scales based on result and ranges from NUMBER(3,0) to NUMBER(38,0) depending on what number is returned. Snowflake also supports BINARY input.

Snowflake does not throw errors when you shift over the max int, DuckDB does.

Documentation:
https://docs.snowflake.com/en/sql-reference/functions/bitshiftleft
https://docs.snowflake.com/en/sql-reference/functions/bitshiftright

@github-actions
Copy link
Contributor

github-actions bot commented Dec 17, 2025

SQLGlot Integration Test Results

Comparing:

  • this branch (sqlglot:feature/transpile-bitshift, sqlglot version: feature/transpile-bitshift)
  • baseline (main, sqlglot version: 28.5.1.dev77)

⚠️ Limited to dialects: snowflake, duckdb

By Dialect

dialect main sqlglot:feature/transpile-bitshift difference links
duckdb -> duckdb 4003/4003 passed (100.0%) 4003/4003 passed (100.0%) No change full result / delta
snowflake -> duckdb 579/847 passed (68.4%) 579/847 passed (68.4%) No change full result / delta
snowflake -> snowflake 847/847 passed (100.0%) 847/847 passed (100.0%) No change full result / delta

Overall

main: 5697 total, 5429 passed (pass rate: 95.3%), sqlglot version: 28.5.1.dev77

sqlglot:feature/transpile-bitshift: 5697 total, 5429 passed (pass rate: 95.3%), sqlglot version: feature/transpile-bitshift

Difference: No change

Copy link
Collaborator

@georgesittas georgesittas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs testing, but I'm skeptical about the approach. May need to sync in Slack about this.

@fivetran-kwoodbeck
Copy link
Collaborator Author

This needs testing, but I'm skeptical about the approach. May need to sync in Slack about this.

@georgesittas Please see the Jira ticket for all tests that were run prior to the first push. There's about 300 test queries against Snowflake, transpiled, run against DuckDB and results checked against one another.

@georgesittas
Copy link
Collaborator

We need to add tests.

@fivetran-kwoodbeck fivetran-kwoodbeck force-pushed the feature/transpile-bitshift branch from 9f96195 to 21052b5 Compare December 19, 2025 21:30
@fivetran-kwoodbeck fivetran-kwoodbeck force-pushed the feature/transpile-bitshift branch from 21052b5 to 7c4b5e3 Compare December 22, 2025 21:42
@georgesittas georgesittas force-pushed the feature/transpile-bitshift branch from 83f7591 to 526cd22 Compare December 24, 2025 09:09
@georgesittas
Copy link
Collaborator

@fivetran-kwoodbeck @VaggelisD take a look at the refactored PR when you get the chance, took a stab at cleaning it up a bit.

Copy link
Collaborator

@VaggelisD VaggelisD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only got one question, @fivetran-kwoodbeck if the refactored logic & tests check out feel free to merge

@fivetran-kwoodbeck
Copy link
Collaborator Author

fivetran-kwoodbeck commented Dec 24, 2025

Hi @VaggelisD The tests don't pass and I'm not sure what's the best way to deal with it (a lot of back and forth). There are 2 issues, both of which can be seen here:

SELECT BITSHIFTLEFT(X'0080'::BINARY, 1);
SELECT BITSHIFTRIGHT(BITSHIFTLEFT(X'0080'::BINARY, 1), 1);

The first is INT to BLOB issue, when transpiled:

SELECT CAST(CAST(CAST(128 AS BLOB) AS BIT) << 1 AS BLOB);

Conversion Error: Unimplemented type for cast (INTEGER -> BLOB)
LINE 1: SELECT CAST(CAST(CAST(128 AS BLOB) AS BIT) << 1 AS BLOB)

I think it needs an UNHEX and cast to BIT (see below). The second, if we fix the hex issue, then has another problem where the first parameter is not seen as BINARY so it assumes INT, which is wrong:

SELECT CAST(CAST(CAST(UNHEX('0080') AS BIT) << 1 AS BLOB) AS INT128) >> 1

Conversion Error: Unimplemented type for cast (BLOB -> HUGEINT) LINE 1: SELECT CAST(CAST(CAST(UNHEX('0080') AS BIT) << 1 AS BLOB) AS INT128... ^

Which I think can be solved with an extra annotate_types called on the first parameter, but I'm not entirely sure.

@fivetran-kwoodbeck fivetran-kwoodbeck force-pushed the feature/transpile-bitshift branch from 526cd22 to bc2170a Compare December 29, 2025 23:05
Copy link
Collaborator

@georgesittas georgesittas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few more comments, but should be good to go soon.

Comment on lines +767 to +791
# Ensure type annotation is available for nested expressions
if not this.type:
from sqlglot.optimizer.annotate_types import annotate_types

this = annotate_types(this, dialect=self.dialect)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't have to annotate here, right? The assumption is that we've already annotated w/ Snowflake at this point and so the type should be available. If not, then we don't have to do anything, since DuckDB will not annotate w/ binary and the branch in L774 should not be taken.

Copy link
Collaborator Author

@fivetran-kwoodbeck fivetran-kwoodbeck Jan 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was added to help with nested calls. Without it, this in Snowflake:

SELECT BITSHIFTRIGHT(BITSHIFTLEFT(X'0080'::BINARY, 1), 1)

was transpiling to this in DuckDB (which doesn't run):

SELECT CAST(CAST(CAST(CAST(UNHEX('0080') AS BLOB) AS BIT) << 1 AS BLOB) AS INT128) >> 1

as opposed to:

SELECT CAST(CAST(CAST(CAST(CAST(UNHEX('0080') AS BLOB) AS BIT) << 1 AS BLOB) AS BIT) >> 1 AS BLOB)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you do annotate_types(ast, dialect="snowflake").sql("duckdb")? This is how we expect the flow to look like.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When testing, I'm calling transpile(snowflake_sql, read="snowflake", write="duckdb")

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

^ We can operate under the assumption that the ast being converted into duckdb will be annotated already, meaning that you can 1) parse w/ snowflake 2) annotate w/ snowflake and 3) generate to duckdb.

Using only the transpile API is not enough in some cases, hence the above.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I suggest removing this branch and adapting the testing logic so that it calls annotate types first and then generates duckdb sql, there are some other examples of this in the codebase that you can mimic.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I'll update my testing interface. I do see a few examples of this = annotate_types(...) in this file, which is where I got the idea from.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @georgesittas, it appears that validate_all doesn't annotate types? Without the inline annotate_types call, tests like this result in an invalid duckdb query:

        self.validate_all(
            "SELECT BITSHIFTLEFT(X'FF', 4)",
            write={
                "snowflake": "SELECT BITSHIFTLEFT(x'FF', 4)",
                "duckdb": "SELECT CAST(UNHEX('FF') AS INT128) << 4",
            },
        )

Another option is to have a computed type property on HexString:

class HexString(Condition):
    arg_types = {"this": True, "is_integer": False}

    @property
    def type(self) -> t.Optional[DataType]:
        if self._type:
            return self._type
        # Infer type from is_integer arg set at parse time
        if self.args.get("is_integer"):
            return DataType.build("BIGINT")
        return DataType.build("BINARY")

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @georgesittas, it appears that validate_all doesn't annotate types

Hey, that's correct– this is intentional. This is one example of what I was referring to: https://github.com/tobymao/sqlglot/blob/94e6fe03752500ddeace317081dc758d36c78c1f/tests/dialects/test_bigquery.py#L785C21-L789.

Comment on lines 45 to 46
exp.HexString,
exp.Unhex,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we preemptively annotating HexString as BINARY here? There is a flag HEX_STRING_IS_INTEGER_TYPE that determines the output type of HexString, so the base annotator should use that to determine the type and hence needs a custom annotator.

In BigQuery and ClickHouse, the type is an integer, for example.

Copy link
Collaborator Author

@fivetran-kwoodbeck fivetran-kwoodbeck Jan 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed, does this fit the bill in terms of updating the base annotator?

@fivetran-kwoodbeck fivetran-kwoodbeck force-pushed the feature/transpile-bitshift branch from a90a7e7 to 4bd8cc5 Compare January 7, 2026 23:41
@fivetran-kwoodbeck fivetran-kwoodbeck force-pushed the feature/transpile-bitshift branch from 52ccc48 to 72741a9 Compare January 8, 2026 23:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants