-
Notifications
You must be signed in to change notification settings - Fork 277
fix(spark-expr): handle array length mismatch in datediff for dictionary-backed timestamps #3278
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
fix(spark-expr): handle array length mismatch in datediff for dictionary-backed timestamps #3278
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Fixes Comet’s datediff execution on dictionary-encoded / Iceberg-backed timestamp inputs by ensuring scalar arguments are broadcast to the correct batch length and normalizing inputs before the binary op.
Changes:
- Broadcast scalar
datediffarguments to the columnar batch length to avoid array-length mismatches. - Cast inputs to
Date32prior to subtraction to handle dictionary-backed arrays.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // under the License. | ||
|
|
||
| use arrow::array::{Array, Date32Array, Int32Array}; | ||
| use arrow::compute::cast; |
Copilot
AI
Jan 26, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use arrow::compute::cast; is unused (the code calls arrow::compute::cast(...) with a fully-qualified path). This will fail CI because clippy is run with -D warnings. Either remove this import or use cast(&end_arr, &DataType::Date32) / cast(&start_arr, &DataType::Date32) so the import is actually used.
| use arrow::compute::cast; |
|
|
||
| Ok(ColumnarValue::Array(Arc::new(result))) | ||
| } | ||
|
|
||
| fn aliases(&self) -> &[String] { | ||
| &self.aliases | ||
| } | ||
| } |
Copilot
AI
Jan 26, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The aliases field still includes "datediff", but the fn aliases(&self) -> &[String] implementation was removed. If ScalarUDFImpl's default aliases() returns an empty slice (as used by other UDF impls in this crate), this will drop the datediff alias and can break Spark SQL/function resolution that relies on the alias rather than the primary name date_diff. Re-introduce aliases() (returning &self.aliases) or remove aliases entirely and ensure the UDF is registered under the intended name(s).
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #3278 +/- ##
============================================
+ Coverage 56.12% 60.13% +4.00%
- Complexity 976 1456 +480
============================================
Files 119 175 +56
Lines 11743 16085 +4342
Branches 2251 2665 +414
============================================
+ Hits 6591 9673 +3082
- Misses 4012 5066 +1054
- Partials 1140 1346 +206 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Which issue does this PR close?
Closes #3255
Rationale for this change
When reading Iceberg tables, timestamp columns may be dictionary-encoded in the underlying Parquet files.
In the current implementation of
datediff, scalar and array arguments can be converted into arrays of different lengths, which leads to a runtime error:This behavior differs from Spark, which correctly broadcasts scalar inputs and handles dictionary-backed columns without error.
This change ensures Comet’s
datediffimplementation aligns with Spark semantics and avoids execution failures.What changes are included in this PR?
datediffarguments are converted into arrays of the same length by correctly broadcasting scalarsDate32Arraybefore applying the binary operationHow are these changes tested?
datafusion-comet-spark-exprwere run locallyto_dateanddatediff