Skip to content

feat: add set_nullable support for ALTER TABLE#2388

Open
william-ch-databricks wants to merge 1 commit intodelta-io:mainfrom
william-ch-databricks:stack/alter-table-4-set-nullable
Open

feat: add set_nullable support for ALTER TABLE#2388
william-ch-databricks wants to merge 1 commit intodelta-io:mainfrom
william-ch-databricks:stack/alter-table-4-set-nullable

Conversation

@william-ch-databricks
Copy link
Copy Markdown
Collaborator

@william-ch-databricks william-ch-databricks commented Apr 14, 2026

🥞 Stacked PR

Use this link to review incremental changes.


What changes are proposed in this pull request?

Adds the SetNullable operation to the ALTER TABLE framework, allowing columns to be changed
from NOT NULL to nullable. Supports nested columns via ColumnName paths.

modify_field_at_path function to recurse through struct and rebuild bottom-up to update nullable field.

How was this change tested?

  • Unit tests for modify_field_at_path: top-level, nested, sibling preservation, nonexistent
    field, non-struct traversal, case-insensitive lookup
  • Unit tests for SetNullable via apply_schema_operations: required->nullable, already
    nullable noop, deeply nested (3 levels), chained with add_column
  • Integration tests in kernel/tests/alter_table.rs: set_nullable with schema reload,
    already-nullable noop, nonexistent column error, chained add_column + set_nullable

@william-ch-databricks william-ch-databricks force-pushed the stack/alter-table-4-set-nullable branch 8 times, most recently from 847c5d6 to b5aabfe Compare April 15, 2026 19:27
@william-ch-databricks william-ch-databricks marked this pull request as ready for review April 15, 2026 19:28
@william-ch-databricks william-ch-databricks force-pushed the stack/alter-table-4-set-nullable branch 2 times, most recently from 96e546b to da4e9c1 Compare April 16, 2026 17:06
@william-ch-databricks william-ch-databricks marked this pull request as draft April 16, 2026 19:54
@william-ch-databricks william-ch-databricks force-pushed the stack/alter-table-4-set-nullable branch 6 times, most recently from 20641c7 to 6891ac8 Compare April 17, 2026 18:33
@william-ch-databricks william-ch-databricks marked this pull request as ready for review April 17, 2026 18:33
@william-ch-databricks william-ch-databricks force-pushed the stack/alter-table-4-set-nullable branch from 6891ac8 to 7506e12 Compare April 17, 2026 21:15
github-merge-queue Bot pushed a commit that referenced this pull request Apr 17, 2026
…2385)

## Stacked PR
Use this
[link](https://github.com/delta-io/delta-kernel-rs/pull/2385/files) to
review incremental changes.
-
[**stack/alter-table-1-refactor-state**](#2385)
[[Files
changed](https://github.com/delta-io/delta-kernel-rs/pull/2385/files)]
-
[stack/alter-table-2-supports-data-files](#2386)
[[Files
changed](https://github.com/delta-io/delta-kernel-rs/pull/2386/files/029d6672081d32caea314dca61f3688e841f3036..50130c30c4378997fbf3d0ef7426ce23992e5c85)]
-
[stack/alter-table-3-framework-add-column](#2387)
[[Files
changed](https://github.com/delta-io/delta-kernel-rs/pull/2387/files/50130c30c4378997fbf3d0ef7426ce23992e5c85..eaa5277d025434aa78b79beed1a7cedfe82aa621)]
-
[stack/alter-table-4-set-nullable](#2388)
[[Files
changed](https://github.com/delta-io/delta-kernel-rs/pull/2388/files/eaa5277d025434aa78b79beed1a7cedfe82aa621..7506e1273ef349fbf628e4a6db5d0e7c1b4cbd8b)]
-
[stack/alter-table-5-column-mapping-add](#2389)
[[Files
changed](https://github.com/delta-io/delta-kernel-rs/pull/2389/files/7506e1273ef349fbf628e4a6db5d0e7c1b4cbd8b..d943ca2945da1ec9c354db57c889d93456381dee)]
-
[stack/alter-table-6-drop-column](#2390)
[[Files
changed](https://github.com/delta-io/delta-kernel-rs/pull/2390/files/d943ca2945da1ec9c354db57c889d93456381dee..40e3a1ff9e2915b456d0f0a76a50eb67f27a1ab1)]
-
[stack/alter-table-7-rename-column](#2391)
[[Files
changed](https://github.com/delta-io/delta-kernel-rs/pull/2391/files/40e3a1ff9e2915b456d0f0a76a50eb67f27a1ab1..628af3c02767f008a4e3c80a146aec7e5a3ac0b3)]

---------
## What changes are proposed in this pull request?

Splits Transaction's snapshot into two concerns:
- `read_snapshot_opt: Option<SnapshotRef>` -- the pre-commit table state
(None for CREATE TABLE)
- `effective_table_config: TableConfiguration` -- the config this commit
will produce

This separates "what did I read?" (conflict detection, post-commit
snapshots) from "what will
this commit produce?" (schema, protocol, stats, write context).
Write-path call sites read from
`effective_table_config`; read-path call sites use `read_snapshot()`.

Also adds `should_emit_protocol` / `should_emit_metadata` flags to
replace the old
`is_create_table()` checks for Protocol/Metadata action emission, and
replaces the synthetic
pre-commit snapshot in CREATE TABLE with direct `TableConfiguration`
construction.

This is a pure refactor with no behaviour change.

## How was this change tested?

All existing tests pass. Added unit tests for
`LogSegment::new_for_version_zero` (valid input,
non-zero version rejection, non-commit file rejection).
@william-ch-databricks william-ch-databricks force-pushed the stack/alter-table-4-set-nullable branch 2 times, most recently from 1fef96b to 6f3f954 Compare April 24, 2026 17:43
@william-ch-databricks william-ch-databricks force-pushed the stack/alter-table-4-set-nullable branch from 6f3f954 to c026d01 Compare April 24, 2026 21:21
Copy link
Copy Markdown
Collaborator

@scottsand-db scottsand-db left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Minor comments

Comment thread kernel/src/transaction/builder/alter_table.rs Outdated
Comment thread kernel/src/transaction/builder/alter_table.rs Outdated
Comment thread kernel/src/transaction/schema_evolution.rs Outdated
Comment thread kernel/src/transaction/builder/alter_table.rs Outdated
Comment thread kernel/src/transaction/schema_evolution.rs Outdated
Comment thread kernel/src/transaction/schema_evolution.rs Outdated
Comment thread kernel/src/transaction/schema_evolution.rs Outdated
Comment thread kernel/src/transaction/schema_evolution.rs Outdated
}

// ============================================================================
// SET NULLABLE tests
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • can we set a partition column to be nullable?
  • let's x-test this against ColumnMaping please - CM OFF, NAME, ID
  • let's write the CRC (checksum) after this commit, too
  • let's checkpoint after this commit, too
  • let's read back the table after that, too

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all added except for CRC.

I don't think we currently support CRC right now. I've added this to the issue tracking already. Can we address in follow-up?

const INCREMENTAL_SAFE_OPS: &[&str] = &[                                                                                                                                                              
      "WRITE", "MERGE", "UPDATE", "DELETE", "OPTIMIZE",                                                                                                                                                 
      "CREATE TABLE", "REPLACE TABLE",                                                                                                                                                                  
      "CREATE TABLE AS SELECT", "REPLACE TABLE AS SELECT", "CREATE OR REPLACE TABLE AS SELECT",
  ]; 

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We actually do have CRC right now! See snapshot.write_checksum()

Comment thread kernel/tests/alter_table.rs
Comment thread kernel/tests/alter_table.rs

// Replace the value at `key` with `f(value)`, preserving the map's insertion order. Calls
// `on_missing` to build the error if `key` is not present.
fn modify_indexmap_in_place<K, V, E>(
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Food for thought: Claude tells me

  Should Fix

  S1. modify_indexmap_in_place is over-engineered; use IndexMap::get_mut

  File: kernel/src/transaction/schema_evolution.rs:93-111
  Flagged by: scovich-reviewer, code-review-refactor

  The swap_remove_full + insert + swap_indices dance is clever but surprising. IndexMap::get_mut(&key) returns Option<&mut V> in place with no removal/reinsertion, and the only caller's modifier (|mut field| { field.nullable = true; ... Ok(field) }) only needs mutable access, not ownership transfer. The current helper
  has only one caller and four tests protecting its own correctness.

  fn modify_indexmap_in_place<K, V, E>(
      map: &mut IndexMap<K, V>,
      key: &K,
      on_missing: impl FnOnce() -> E,
      f: impl FnOnce(&mut V) -> Result<(), E>,
  ) -> Result<(), E> { ... map.get_mut(key).ok_or_else(on_missing).and_then(f) }

  As a bonus, the docs-reviewer identified that the current helper leaves the map in an inconsistent state if f returns an error (key already removed, last element moved into its slot). The get_mut version eliminates that hazard.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, this was because the inner fields weren't directly mutable. But we could expose this and make it a lot cleaner! PTAL at new code

Comment thread kernel/src/transaction/schema_evolution.rs
@william-ch-databricks william-ch-databricks force-pushed the stack/alter-table-4-set-nullable branch from c026d01 to 6f75bfc Compare April 26, 2026 22:13
Comment thread kernel/src/schema/mod.rs
/// Intended for use in test assertions.
#[cfg(any(test, feature = "test-utils"))]
#[allow(clippy::panic, clippy::expect_used)]
pub fn field_at_path<'a>(&'a self, path: &[String]) -> &'a StructField {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there other test-only functions in this file? Could we move this to be next to them? Are we able to move this inside of a test module scope? I think that would provide a compile-time guarantee that this is only used by test code and never by production code. I want to avoid anyone using this function in production code since it panics.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I put it as a method on StructType, because it's used in kernel/tests/alter_tables.rs and kernel/src/transaction/schema_evolution.rs. It uses #[cfg(any(test, feature = "test-utils"))] to try to enforce that this won't be used in production. We could move this into test module scope, but I put it here to avoid duplicating it. Let me know what you think

Comment thread kernel/src/schema/mod.rs Outdated
Comment thread kernel/src/transaction/builder/alter_table.rs Outdated
Comment thread kernel/src/transaction/schema_evolution.rs Outdated
Comment thread kernel/tests/alter_table.rs Outdated
Comment thread kernel/tests/alter_table.rs Outdated
Comment thread kernel/tests/alter_table.rs
Ok(())
}

// ============================================================================
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you ask Claude to read the Delta Spark "schema evolution set nullable" test suites and see if there are any other interesting test cases we should handle?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added clustering column test

Comment thread kernel/src/schema/mod.rs Outdated
@@ -39,17 +131,11 @@ pub(crate) struct SchemaEvolutionResult {
/// Returns an error if any operation fails validation. The error message identifies which
/// operation failed and why.
pub(crate) fn apply_schema_operations(
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude says (food for thought):

  The previous code rebuilt via StructType::try_new(fields.into_values()), which runs ensure_no_metadata_columns_in_field recursively and rebuilds metadata_columns. The new in-place mutation skips both. For SetNullable it's safe; for AddColumn the top-level metadata-column check remains, but nested metadata columns
  inside a struct field now slip through (theoretical today since they require new_unchecked).

  Fix: either re-run an equivalent nested-metadata-column scan after each op, or add a comment documenting the assumption that input fields came from try_new. The PR description doesn't mention this trade-off.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, true, we should check for adding a column. But this doesn't impact set nullable/rename/drop as it doesn't change anything except for name/nullability

SetNullable { column: ColumnName },
}

/// Signal returned by [`modify_field_at_path`] handlers describing what to do with the leaf field
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Food for thought, claude says:

  walk_column_fields_by (kernel/src/schema/mod.rs:760-792) already implements the immutable case-sensitive descent. The PR re-implements the mutable case-insensitive descent as a free function that pokes at StructType internals via field_map_mut. Move it onto StructType::modify_field_at_path(path, FnOnce(&mut
  StructField) -> DeltaResult<()>) — this drops LeafAction, drops field_map_mut callers, tightens encapsulation, and brings the helper down from ~115 LOC to ~35 LOC. Could be deferred to the next stacked PR if the author files an issue, but the consolidation only gets harder once more callers exist.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this could be a good idea, but I would prefer to consolidate later after the final implementation of modify_field_at_path is decided

Comment thread kernel/tests/alter_table.rs Outdated
Comment thread kernel/src/transaction/schema_evolution.rs
Comment thread kernel/src/transaction/schema_evolution.rs Outdated
Comment thread kernel/src/transaction/schema_evolution.rs
Comment thread kernel/src/transaction/schema_evolution.rs
@william-ch-databricks william-ch-databricks force-pushed the stack/alter-table-4-set-nullable branch from 6f75bfc to d9c7c53 Compare April 27, 2026 06:19
Implement the SetNullable schema operation which changes a column's
nullability from NOT NULL to nullable. If the column is already nullable,
this is a no-op.

Only the safe direction is allowed (NOT NULL -> nullable) per the Delta
protocol.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants