feat: add set_nullable support for ALTER TABLE#2388
feat: add set_nullable support for ALTER TABLE#2388william-ch-databricks wants to merge 1 commit intodelta-io:mainfrom
Conversation
847c5d6 to
b5aabfe
Compare
96e546b to
da4e9c1
Compare
20641c7 to
6891ac8
Compare
6891ac8 to
7506e12
Compare
…2385) ## Stacked PR Use this [link](https://github.com/delta-io/delta-kernel-rs/pull/2385/files) to review incremental changes. - [**stack/alter-table-1-refactor-state**](#2385) [[Files changed](https://github.com/delta-io/delta-kernel-rs/pull/2385/files)] - [stack/alter-table-2-supports-data-files](#2386) [[Files changed](https://github.com/delta-io/delta-kernel-rs/pull/2386/files/029d6672081d32caea314dca61f3688e841f3036..50130c30c4378997fbf3d0ef7426ce23992e5c85)] - [stack/alter-table-3-framework-add-column](#2387) [[Files changed](https://github.com/delta-io/delta-kernel-rs/pull/2387/files/50130c30c4378997fbf3d0ef7426ce23992e5c85..eaa5277d025434aa78b79beed1a7cedfe82aa621)] - [stack/alter-table-4-set-nullable](#2388) [[Files changed](https://github.com/delta-io/delta-kernel-rs/pull/2388/files/eaa5277d025434aa78b79beed1a7cedfe82aa621..7506e1273ef349fbf628e4a6db5d0e7c1b4cbd8b)] - [stack/alter-table-5-column-mapping-add](#2389) [[Files changed](https://github.com/delta-io/delta-kernel-rs/pull/2389/files/7506e1273ef349fbf628e4a6db5d0e7c1b4cbd8b..d943ca2945da1ec9c354db57c889d93456381dee)] - [stack/alter-table-6-drop-column](#2390) [[Files changed](https://github.com/delta-io/delta-kernel-rs/pull/2390/files/d943ca2945da1ec9c354db57c889d93456381dee..40e3a1ff9e2915b456d0f0a76a50eb67f27a1ab1)] - [stack/alter-table-7-rename-column](#2391) [[Files changed](https://github.com/delta-io/delta-kernel-rs/pull/2391/files/40e3a1ff9e2915b456d0f0a76a50eb67f27a1ab1..628af3c02767f008a4e3c80a146aec7e5a3ac0b3)] --------- ## What changes are proposed in this pull request? Splits Transaction's snapshot into two concerns: - `read_snapshot_opt: Option<SnapshotRef>` -- the pre-commit table state (None for CREATE TABLE) - `effective_table_config: TableConfiguration` -- the config this commit will produce This separates "what did I read?" (conflict detection, post-commit snapshots) from "what will this commit produce?" (schema, protocol, stats, write context). Write-path call sites read from `effective_table_config`; read-path call sites use `read_snapshot()`. Also adds `should_emit_protocol` / `should_emit_metadata` flags to replace the old `is_create_table()` checks for Protocol/Metadata action emission, and replaces the synthetic pre-commit snapshot in CREATE TABLE with direct `TableConfiguration` construction. This is a pure refactor with no behaviour change. ## How was this change tested? All existing tests pass. Added unit tests for `LogSegment::new_for_version_zero` (valid input, non-zero version rejection, non-commit file rejection).
1fef96b to
6f3f954
Compare
6f3f954 to
c026d01
Compare
scottsand-db
left a comment
There was a problem hiding this comment.
Looks great! Minor comments
| } | ||
|
|
||
| // ============================================================================ | ||
| // SET NULLABLE tests |
There was a problem hiding this comment.
- can we set a partition column to be nullable?
- let's x-test this against ColumnMaping please - CM OFF, NAME, ID
- let's write the CRC (checksum) after this commit, too
- let's checkpoint after this commit, too
- let's read back the table after that, too
There was a problem hiding this comment.
all added except for CRC.
I don't think we currently support CRC right now. I've added this to the issue tracking already. Can we address in follow-up?
const INCREMENTAL_SAFE_OPS: &[&str] = &[
"WRITE", "MERGE", "UPDATE", "DELETE", "OPTIMIZE",
"CREATE TABLE", "REPLACE TABLE",
"CREATE TABLE AS SELECT", "REPLACE TABLE AS SELECT", "CREATE OR REPLACE TABLE AS SELECT",
];
There was a problem hiding this comment.
We actually do have CRC right now! See snapshot.write_checksum()
|
|
||
| // Replace the value at `key` with `f(value)`, preserving the map's insertion order. Calls | ||
| // `on_missing` to build the error if `key` is not present. | ||
| fn modify_indexmap_in_place<K, V, E>( |
There was a problem hiding this comment.
Food for thought: Claude tells me
Should Fix
S1. modify_indexmap_in_place is over-engineered; use IndexMap::get_mut
File: kernel/src/transaction/schema_evolution.rs:93-111
Flagged by: scovich-reviewer, code-review-refactor
The swap_remove_full + insert + swap_indices dance is clever but surprising. IndexMap::get_mut(&key) returns Option<&mut V> in place with no removal/reinsertion, and the only caller's modifier (|mut field| { field.nullable = true; ... Ok(field) }) only needs mutable access, not ownership transfer. The current helper
has only one caller and four tests protecting its own correctness.
fn modify_indexmap_in_place<K, V, E>(
map: &mut IndexMap<K, V>,
key: &K,
on_missing: impl FnOnce() -> E,
f: impl FnOnce(&mut V) -> Result<(), E>,
) -> Result<(), E> { ... map.get_mut(key).ok_or_else(on_missing).and_then(f) }
As a bonus, the docs-reviewer identified that the current helper leaves the map in an inconsistent state if f returns an error (key already removed, last element moved into its slot). The get_mut version eliminates that hazard.
There was a problem hiding this comment.
Hmm, this was because the inner fields weren't directly mutable. But we could expose this and make it a lot cleaner! PTAL at new code
c026d01 to
6f75bfc
Compare
| /// Intended for use in test assertions. | ||
| #[cfg(any(test, feature = "test-utils"))] | ||
| #[allow(clippy::panic, clippy::expect_used)] | ||
| pub fn field_at_path<'a>(&'a self, path: &[String]) -> &'a StructField { |
There was a problem hiding this comment.
Are there other test-only functions in this file? Could we move this to be next to them? Are we able to move this inside of a test module scope? I think that would provide a compile-time guarantee that this is only used by test code and never by production code. I want to avoid anyone using this function in production code since it panics.
There was a problem hiding this comment.
I put it as a method on StructType, because it's used in kernel/tests/alter_tables.rs and kernel/src/transaction/schema_evolution.rs. It uses #[cfg(any(test, feature = "test-utils"))] to try to enforce that this won't be used in production. We could move this into test module scope, but I put it here to avoid duplicating it. Let me know what you think
| Ok(()) | ||
| } | ||
|
|
||
| // ============================================================================ |
There was a problem hiding this comment.
Can you ask Claude to read the Delta Spark "schema evolution set nullable" test suites and see if there are any other interesting test cases we should handle?
There was a problem hiding this comment.
added clustering column test
| @@ -39,17 +131,11 @@ pub(crate) struct SchemaEvolutionResult { | |||
| /// Returns an error if any operation fails validation. The error message identifies which | |||
| /// operation failed and why. | |||
| pub(crate) fn apply_schema_operations( | |||
There was a problem hiding this comment.
Claude says (food for thought):
The previous code rebuilt via StructType::try_new(fields.into_values()), which runs ensure_no_metadata_columns_in_field recursively and rebuilds metadata_columns. The new in-place mutation skips both. For SetNullable it's safe; for AddColumn the top-level metadata-column check remains, but nested metadata columns
inside a struct field now slip through (theoretical today since they require new_unchecked).
Fix: either re-run an equivalent nested-metadata-column scan after each op, or add a comment documenting the assumption that input fields came from try_new. The PR description doesn't mention this trade-off.
There was a problem hiding this comment.
Hmm, true, we should check for adding a column. But this doesn't impact set nullable/rename/drop as it doesn't change anything except for name/nullability
| SetNullable { column: ColumnName }, | ||
| } | ||
|
|
||
| /// Signal returned by [`modify_field_at_path`] handlers describing what to do with the leaf field |
There was a problem hiding this comment.
Food for thought, claude says:
walk_column_fields_by (kernel/src/schema/mod.rs:760-792) already implements the immutable case-sensitive descent. The PR re-implements the mutable case-insensitive descent as a free function that pokes at StructType internals via field_map_mut. Move it onto StructType::modify_field_at_path(path, FnOnce(&mut
StructField) -> DeltaResult<()>) — this drops LeafAction, drops field_map_mut callers, tightens encapsulation, and brings the helper down from ~115 LOC to ~35 LOC. Could be deferred to the next stacked PR if the author files an issue, but the consolidation only gets harder once more callers exist.
There was a problem hiding this comment.
I think this could be a good idea, but I would prefer to consolidate later after the final implementation of modify_field_at_path is decided
6f75bfc to
d9c7c53
Compare
Implement the SetNullable schema operation which changes a column's nullability from NOT NULL to nullable. If the column is already nullable, this is a no-op. Only the safe direction is allowed (NOT NULL -> nullable) per the Delta protocol.
d9c7c53 to
fb29758
Compare
🥞 Stacked PR
Use this link to review incremental changes.
What changes are proposed in this pull request?
Adds the
SetNullableoperation to the ALTER TABLE framework, allowing columns to be changedfrom NOT NULL to nullable. Supports nested columns via
ColumnNamepaths.modify_field_at_pathfunction to recurse through struct and rebuild bottom-up to update nullable field.How was this change tested?
modify_field_at_path: top-level, nested, sibling preservation, nonexistentfield, non-struct traversal, case-insensitive lookup
SetNullableviaapply_schema_operations: required->nullable, alreadynullable noop, deeply nested (3 levels), chained with add_column
kernel/tests/alter_table.rs: set_nullable with schema reload,already-nullable noop, nonexistent column error, chained add_column + set_nullable