feat: enforce and test non-null column invariant on writes#2483
feat: enforce and test non-null column invariant on writes#2483sanujbasu merged 1 commit intodelta-io:mainfrom
Conversation
5e327f6 to
26a8b68
Compare
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #2483 +/- ##
==========================================
+ Coverage 88.47% 88.50% +0.03%
==========================================
Files 178 179 +1
Lines 58807 59031 +224
Branches 58807 59031 +224
==========================================
+ Hits 52028 52245 +217
+ Misses 4793 4785 -8
- Partials 1986 2001 +15 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
26a8b68 to
eb70829
Compare
scottsand-db
left a comment
There was a problem hiding this comment.
Looks great! Minor comments overall -- core production logic is sound.
Note claude also identified:
- Transaction::partitioned_write_context --- Says "null is valid for any primitive partition column". After this PR that is wrong. The companion docstring on validate_types was updated; the public-API surface should match.
7c554cc to
fb4cfa6
Compare
Range-diff: stack/materialize-partition-columns-cleanup (7c554cc -> fb4cfa6)
Reproduce locally: |
Range-diff: stack/materialize-partition-columns-cleanup (fb4cfa6 -> e7c821d)
Reproduce locally: |
fb4cfa6 to
e7c821d
Compare
Range-diff: stack/materialize-partition-columns-cleanup (e7c821d -> c59a39d)
Reproduce locally: |
e7c821d to
c59a39d
Compare
scottsand-db
left a comment
There was a problem hiding this comment.
Looks good but some feedback on some of the comment blocks
DrakeLin
left a comment
There was a problem hiding this comment.
let's add a test with a table with multiple partition tables with mixed nullability
c59a39d to
897cd26
Compare
Fixes delta-io#2465. Closes the write-time enforcement gap left by delta-io#2418, which auto-enables the invariants writer feature on schemas with non-null columns but does not itself reject nulls on writes. Kernel now rejects Scalar::Null partition values for partition columns with nullable: false in partitioned_write_context, matching Delta-Spark's DELTA_NOT_NULL_CONSTRAINT_VIOLATED (SQLSTATE 23502). The check sits in validate_types alongside the existing type checks. The Scalar::Null inner type continues to be ignored on nulls, so connectors can keep constructing null partition values without first looking up the schema; only the schema field's nullability is enforced. This matches Catalyst's NullType permissiveness and the type-erased on-wire format (partitionValues serializes nulls as JSON null with no type information). Adds integration tests covering all three non-null write paths: 1/ Non-materialized partitioned writes (default): kernel rejects the null Scalar at partitioned_write_context time, before serialization. Covered across all column-mapping modes via the kernel create_table builder. 2/ Materialized partitioned writes (materializePartitionColumns): kernel delegates to the engine's ParquetHandler. The default engine inherits the guarantee from arrow-rs, whose RecordBatch::try_new rejects nulls in fields with nullable: false. The table for this case is set up via the kernel create_table builder with delta.feature.materializePartitionColumns=supported (allow-listed by delta-io#2481). 3/ Plain non-null data columns: same engine-delegation seam, exercised across a representative primitive type matrix (integer, long, string, binary, boolean, timestamp, decimal). The table is created via the kernel create_table builder so the non-null schema drives the auto-enablement of the invariants writer feature end-to-end; the test asserts both the auto-enable and the downstream RecordBatch::try_new rejection. Pins the full chain documented on maybe_enable_invariants in transaction/builder/create_table.rs. Local verification: 1/ cargo +nightly fmt --check 2/ cargo clippy --workspace --benches --tests --all-features -- -D warnings 3/ cargo doc --workspace --all-features --no-deps 4/ cargo test -p delta_kernel --all-features for affected paths
Range-diff: stack/materialize-partition-columns-cleanup (897cd26 -> 063dda3)
Reproduce locally: |
897cd26 to
063dda3
Compare
What changes are proposed in this pull request?
Fixes #2465.
Closes the write-time enforcement gap left by #2418, which auto-enables the invariants writer feature on schemas with non-null columns but does not itself reject nulls on writes. Kernel now rejects Scalar::Null partition values for partition columns with nullable: false in partitioned_write_context, matching Delta-Spark's DELTA_NOT_NULL_CONSTRAINT_VIOLATED (SQLSTATE 23502).
The check sits in validate_types alongside the existing type checks. The Scalar::Null inner type continues to be ignored on nulls, so connectors can keep constructing null partition values without first looking up the schema; only the schema field's nullability is enforced. This matches Catalyst's NullType permissiveness and the type-erased on-wire format (partitionValues serializes nulls as JSON null with no type information).
How was this change tested?
Adds integration tests covering all three non-null write paths:
1/ Non-materialized partitioned writes (default): kernel rejects the null Scalar at partitioned_write_context time, before serialization. Covered across all column-mapping modes via the kernel create_table builder.
2/ Materialized partitioned writes (materializePartitionColumns): kernel delegates to the engine's ParquetHandler. The default engine inherits the guarantee from arrow-rs, whose RecordBatch::try_new rejects nulls in fields with nullable: false. The table for this case is set up via the kernel create_table builder with delta.feature.materializePartitionColumns=supported (allow-listed by #2481).
3/ Plain non-null data columns: same engine-delegation seam, exercised across a representative primitive type matrix (integer, long, string, binary, boolean, timestamp, decimal). The table is created via the kernel create_table builder so the non-null schema drives the auto-enablement of the invariants writer feature end-to-end; the test asserts both the auto-enable and the downstream RecordBatch::try_new rejection. Pins the full chain documented on maybe_enable_invariants in transaction/builder/create_table.rs.
Local verification:
1/ cargo +nightly fmt --check
2/ cargo clippy --workspace --benches --tests --all-features -- -D warnings
3/ cargo doc --workspace --all-features --no-deps
4/ cargo test -p delta_kernel --all-features for affected paths