fix(spark-expr): handle array length mismatch in datediff for dictionary-backed timestamps #3278

vigneshsiva11 · 2026-01-26T05:01:25Z

Which issue does this PR close?

Rationale for this change

When reading Iceberg tables, timestamp columns may be dictionary-encoded in the underlying Parquet files.
In the current implementation of datediff, scalar and array arguments can be converted into arrays of different lengths, which leads to a runtime error:

Cannot perform binary operation on arrays of different length

This behavior differs from Spark, which correctly broadcasts scalar inputs and handles dictionary-backed columns without error.
This change ensures Comet’s datediff implementation aligns with Spark semantics and avoids execution failures.

What changes are included in this PR?

Ensure both datediff arguments are converted into arrays of the same length by correctly broadcasting scalars
Normalize dictionary-backed inputs to Date32Array before applying the binary operation
Prevent array length mismatches during vectorized execution

How are these changes tested?

Existing unit tests in datafusion-comet-spark-expr were run locally
Manual verification using Iceberg timestamp columns with to_date and datediff
Verified that operations succeed when Comet is enabled and match Spark behavior

Copilot

Pull request overview

Fixes Comet’s datediff execution on dictionary-encoded / Iceberg-backed timestamp inputs by ensuring scalar arguments are broadcast to the correct batch length and normalizing inputs before the binary op.

Changes:

Broadcast scalar datediff arguments to the columnar batch length to avoid array-length mismatches.
Cast inputs to Date32 prior to subtraction to handle dictionary-backed arrays.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-01-26T05:05:17Z

native/spark-expr/src/datetime_funcs/date_diff.rs

 // under the License.

 use arrow::array::{Array, Date32Array, Int32Array};
+use arrow::compute::cast;


use arrow::compute::cast; is unused (the code calls arrow::compute::cast(...) with a fully-qualified path). This will fail CI because clippy is run with -D warnings. Either remove this import or use cast(&end_arr, &DataType::Date32) / cast(&start_arr, &DataType::Date32) so the import is actually used.

Suggested change

use arrow::compute::cast;

Copilot · 2026-01-26T05:05:18Z

native/spark-expr/src/datetime_funcs/date_diff.rs


        Ok(ColumnarValue::Array(Arc::new(result)))
    }
-
-    fn aliases(&self) -> &[String] {
-        &self.aliases
-    }
 }


The aliases field still includes "datediff", but the fn aliases(&self) -> &[String] implementation was removed. If ScalarUDFImpl's default aliases() returns an empty slice (as used by other UDF impls in this crate), this will drop the datediff alias and can break Spark SQL/function resolution that relies on the alias rather than the primary name date_diff. Re-introduce aliases() (returning &self.aliases) or remove aliases entirely and ensure the UDF is registered under the intended name(s).

codecov-commenter · 2026-01-26T12:47:51Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 60.13%. Comparing base (f09f8af) to head (c4a39b0).
⚠️ Report is 896 commits behind head on main.

Additional details and impacted files

@@             Coverage Diff              @@
##               main    #3278      +/-   ##
============================================
+ Coverage     56.12%   60.13%   +4.00%     
- Complexity      976     1456     +480     
============================================
  Files           119      175      +56     
  Lines         11743    16085    +4342     
  Branches       2251     2665     +414     
============================================
+ Hits           6591     9673    +3082     
- Misses         4012     5066    +1054     
- Partials       1140     1346     +206

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Fix datediff array length mismatch for dictionary-backed timestamps

1d89eb9

Copilot AI review requested due to automatic review settings January 26, 2026 05:01

Copilot started reviewing on behalf of vigneshsiva11 January 26, 2026 05:01 View session

Copilot AI reviewed Jan 26, 2026

View reviewed changes

vigneshsiva11 mentioned this pull request Jan 26, 2026

CometNativeException: "arrays of different length" when using to_date on Iceberg Timestamp column #3255

Open

vigneshsiva11 changed the title ~~Fix datediff array length mismatch for dictionary-backed timestamps~~ fix(spark-expr): handle array length mismatch in datediff for dictionary-backed timestamps Jan 26, 2026

fix(spark-expr): remove unused import in datediff

c4a39b0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(spark-expr): handle array length mismatch in datediff for dictionary-backed timestamps #3278

fix(spark-expr): handle array length mismatch in datediff for dictionary-backed timestamps #3278

Uh oh!

vigneshsiva11 commented Jan 26, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Jan 26, 2026

Uh oh!

Copilot AI Jan 26, 2026

Uh oh!

codecov-commenter commented Jan 26, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

fix(spark-expr): handle array length mismatch in datediff for dictionary-backed timestamps #3278

Are you sure you want to change the base?

fix(spark-expr): handle array length mismatch in datediff for dictionary-backed timestamps #3278

Uh oh!

Conversation

vigneshsiva11 commented Jan 26, 2026

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

How are these changes tested?

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Jan 26, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 26, 2026

Choose a reason for hiding this comment

Uh oh!

codecov-commenter commented Jan 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

codecov-commenter commented Jan 26, 2026 •

edited

Loading