Skip to content

Conversation

@vigneshsiva11
Copy link

Which issue does this PR close?

Closes #3180

Rationale for this change

Spark TimestampNTZ values represent local time without any timezone
information. However, the current Comet implementation applies timezone
conversion when evaluating hour, minute, and second, which leads to
incorrect results for TimestampNTZ inputs in non-UTC session timezones.

This change aligns Comet behavior with Spark semantics by ensuring that
timezone conversion is only applied to Timestamp values with an explicit
timezone, and not to TimestampNTZ.

What changes are included in this PR?

  • Updated the Rust implementation of extract_date_part used by
    hour, minute, and second to:
    • Bypass timezone conversion for TimestampNTZ inputs
    • Preserve existing behavior for Timestamp inputs with timezone
  • Added defensive handling for unsupported input types

How are these changes tested?

  • Verified via Rust unit tests:
    cargo test -p datafusion-comet-spark-expr

Copilot AI review requested due to automatic review settings January 24, 2026 16:19
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Aligns Comet’s hour/minute/second behavior with Spark semantics by avoiding session-timezone conversion for TimestampNTZ (timestamp without timezone), while preserving the existing conversion behavior for timezone-aware timestamps.

Changes:

  • Bypass timezone conversion for DataType::Timestamp(_, None) (TimestampNTZ) inputs.
  • Preserve timezone conversion for DataType::Timestamp(_, Some(_)) inputs via array_with_timezone.
  • Add an explicit execution error for unsupported (non-timestamp) input types.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +79 to +85
let array = match array.data_type() {
// TimestampNTZ → DO NOT apply timezone conversion
DataType::Timestamp(_, None) => array.clone(),

// Timestamp with timezone → convert from UTC to session timezone
DataType::Timestamp(_, Some(_)) => array_with_timezone(
array,
Copy link

Copilot AI Jan 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

invoke_with_args now pattern-matches directly on array.data_type() and rejects DataType::Dictionary(_, _) with an execution error. This is a regression for dictionary-encoded timestamp columns (the function’s return_type explicitly supports dictionary input, and array_with_timezone also has dictionary handling). Add explicit handling for DataType::Dictionary(_, value_type) where value_type is Timestamp(_, None) (no conversion) or Timestamp(_, Some(_)) (apply array_with_timezone), instead of falling into the error branch.

Copilot uses AI. Check for mistakes.
@codecov-commenter
Copy link

codecov-commenter commented Jan 25, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 60.07%. Comparing base (f09f8af) to head (86ba72d).
⚠️ Report is 879 commits behind head on main.

Additional details and impacted files
@@             Coverage Diff              @@
##               main    #3265      +/-   ##
============================================
+ Coverage     56.12%   60.07%   +3.94%     
- Complexity      976     1426     +450     
============================================
  Files           119      172      +53     
  Lines         11743    15926    +4183     
  Branches       2251     2631     +380     
============================================
+ Hits           6591     9567    +2976     
- Misses         4012     5031    +1019     
- Partials       1140     1328     +188     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@vigneshsiva11 vigneshsiva11 force-pushed the fix-timestamp-ntz-hour-3180 branch from 86ba72d to c5cf290 Compare January 25, 2026 16:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] hour/minute/second expressions incorrectly apply timezone conversion to TimestampNTZ inputs

2 participants