Skip to content

[Bug] hour/minute/second expressions incorrectly apply timezone conversion to TimestampNTZ inputs #3180

@andygrove

Description

@andygrove

Summary

The hour, minute, and second expressions (and potentially other timestamp extraction functions) incorrectly apply timezone conversion when the input is a TimestampNTZ (timestamp without timezone) value.

Root Cause

In native/spark-expr/src/datetime_funcs/extract_date_part.rs, the implementation uses array_with_timezone:

let array = array_with_timezone(
    array,
    self.timezone.clone(),
    Some(&DataType::Timestamp(
        Microsecond,
        Some(self.timezone.clone().into()),
    )),
)?;

For a TimestampNTZ input (Timestamp(Microsecond, None)), this calls timestamp_ntz_to_timestamp in utils.rs, which converts the local time value to UTC by applying the session timezone offset.

Expected Behavior

For TimestampNTZ inputs:

  • Values are stored as local time without any timezone information
  • hour(timestamp_ntz_value) should return the hour component directly from the local time value
  • No timezone conversion should be applied

For Timestamp (with timezone) inputs:

  • Values are stored in UTC
  • hour(timestamp_value) should convert from UTC to the session timezone before extracting the hour
  • This is the current behavior and is correct

Example

-- With session timezone = America/Los_Angeles (UTC-8)
-- TimestampNTZ value representing "2024-01-15 10:30:00" local time
-- 
-- Expected: hour() = 10 (extract directly from local time)
-- Actual (bug): hour() = 18 (incorrectly converts assuming value is in UTC)

Affected Functions

  • hour (SparkHour)
  • minute (SparkMinute)
  • second (SparkSecond)

All three use the same extract_date_part! macro with array_with_timezone.

Impact

This bug can cause incorrect results when using hour/minute/second functions on TimestampNTZ columns in non-UTC timezones. Currently the Scala serde layer does not validate input types, so TimestampNTZ inputs are not blocked.

Suggested Fix

Either:

  1. Add input type validation in the Scala serde layer (like CometUnixTimestamp) to mark TimestampNTZ as unsupported and fall back to Spark
  2. Fix the Rust implementation to handle TimestampNTZ correctly by not applying timezone conversion

Option 1 is safer as a short-term fix.

Related


Note: This issue was generated with AI assistance.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions