-
Notifications
You must be signed in to change notification settings - Fork 277
Description
Summary
The hour, minute, and second expressions (and potentially other timestamp extraction functions) incorrectly apply timezone conversion when the input is a TimestampNTZ (timestamp without timezone) value.
Root Cause
In native/spark-expr/src/datetime_funcs/extract_date_part.rs, the implementation uses array_with_timezone:
let array = array_with_timezone(
array,
self.timezone.clone(),
Some(&DataType::Timestamp(
Microsecond,
Some(self.timezone.clone().into()),
)),
)?;For a TimestampNTZ input (Timestamp(Microsecond, None)), this calls timestamp_ntz_to_timestamp in utils.rs, which converts the local time value to UTC by applying the session timezone offset.
Expected Behavior
For TimestampNTZ inputs:
- Values are stored as local time without any timezone information
hour(timestamp_ntz_value)should return the hour component directly from the local time value- No timezone conversion should be applied
For Timestamp (with timezone) inputs:
- Values are stored in UTC
hour(timestamp_value)should convert from UTC to the session timezone before extracting the hour- This is the current behavior and is correct
Example
-- With session timezone = America/Los_Angeles (UTC-8)
-- TimestampNTZ value representing "2024-01-15 10:30:00" local time
--
-- Expected: hour() = 10 (extract directly from local time)
-- Actual (bug): hour() = 18 (incorrectly converts assuming value is in UTC)Affected Functions
hour(SparkHour)minute(SparkMinute)second(SparkSecond)
All three use the same extract_date_part! macro with array_with_timezone.
Impact
This bug can cause incorrect results when using hour/minute/second functions on TimestampNTZ columns in non-UTC timezones. Currently the Scala serde layer does not validate input types, so TimestampNTZ inputs are not blocked.
Suggested Fix
Either:
- Add input type validation in the Scala serde layer (like
CometUnixTimestamp) to mark TimestampNTZ as unsupported and fall back to Spark - Fix the Rust implementation to handle TimestampNTZ correctly by not applying timezone conversion
Option 1 is safer as a short-term fix.
Related
- Issue Implement Spark-compatible CAST to/from TimestampNTZType #378: Implement Spark-compatible CAST to/from TimestampNTZType
- Issue [EPIC] [DISCUSS] Comet timezone handling #2733: [EPIC] Comet timezone handling
- Issue [Bug] String to TimestampNTZ cast incorrectly produces timestamp with UTC timezone #3179: String to TimestampNTZ cast bug
Note: This issue was generated with AI assistance.