Skip to content

feat: add drop_column support for ALTER TABLE#2390

Open
william-ch-databricks wants to merge 3 commits intodelta-io:mainfrom
william-ch-databricks:stack/alter-table-6-drop-column
Open

feat: add drop_column support for ALTER TABLE#2390
william-ch-databricks wants to merge 3 commits intodelta-io:mainfrom
william-ch-databricks:stack/alter-table-6-drop-column

Conversation

@william-ch-databricks
Copy link
Copy Markdown
Collaborator

@william-ch-databricks william-ch-databricks commented Apr 14, 2026

🥞 Stacked PR

Use this link to review incremental changes.


What changes are proposed in this pull request?

Adds ALTER TABLE ... DROP COLUMN via AlterTableTransactionBuilder::drop_column. Supports top-level and nested columns; physical Parquet data is untouched (column mapping required).

Rejects (case-insensitively): missing column mapping, partition columns, clustering columns or their ancestor structs, and drops that would leave an empty struct level.

How was this change tested?

  • Unit tests for the schema-evolution helper (modify + remove, nested paths, order preservation, no-empty-struct invariant) and the full DropColumn flow through apply_schema_operations.
  • Integration tests: drop lifecycle with scan + reload, clustered-table rejection, partition-column rejection, chained add + drop + set_nullable, and the add(foo) + drop(foo) schema-no-op edge case.

@william-ch-databricks william-ch-databricks force-pushed the stack/alter-table-6-drop-column branch 17 times, most recently from 10cc412 to 40e3a1f Compare April 17, 2026 21:15
github-merge-queue Bot pushed a commit that referenced this pull request Apr 17, 2026
…2385)

## Stacked PR
Use this
[link](https://github.com/delta-io/delta-kernel-rs/pull/2385/files) to
review incremental changes.
-
[**stack/alter-table-1-refactor-state**](#2385)
[[Files
changed](https://github.com/delta-io/delta-kernel-rs/pull/2385/files)]
-
[stack/alter-table-2-supports-data-files](#2386)
[[Files
changed](https://github.com/delta-io/delta-kernel-rs/pull/2386/files/029d6672081d32caea314dca61f3688e841f3036..50130c30c4378997fbf3d0ef7426ce23992e5c85)]
-
[stack/alter-table-3-framework-add-column](#2387)
[[Files
changed](https://github.com/delta-io/delta-kernel-rs/pull/2387/files/50130c30c4378997fbf3d0ef7426ce23992e5c85..eaa5277d025434aa78b79beed1a7cedfe82aa621)]
-
[stack/alter-table-4-set-nullable](#2388)
[[Files
changed](https://github.com/delta-io/delta-kernel-rs/pull/2388/files/eaa5277d025434aa78b79beed1a7cedfe82aa621..7506e1273ef349fbf628e4a6db5d0e7c1b4cbd8b)]
-
[stack/alter-table-5-column-mapping-add](#2389)
[[Files
changed](https://github.com/delta-io/delta-kernel-rs/pull/2389/files/7506e1273ef349fbf628e4a6db5d0e7c1b4cbd8b..d943ca2945da1ec9c354db57c889d93456381dee)]
-
[stack/alter-table-6-drop-column](#2390)
[[Files
changed](https://github.com/delta-io/delta-kernel-rs/pull/2390/files/d943ca2945da1ec9c354db57c889d93456381dee..40e3a1ff9e2915b456d0f0a76a50eb67f27a1ab1)]
-
[stack/alter-table-7-rename-column](#2391)
[[Files
changed](https://github.com/delta-io/delta-kernel-rs/pull/2391/files/40e3a1ff9e2915b456d0f0a76a50eb67f27a1ab1..628af3c02767f008a4e3c80a146aec7e5a3ac0b3)]

---------
## What changes are proposed in this pull request?

Splits Transaction's snapshot into two concerns:
- `read_snapshot_opt: Option<SnapshotRef>` -- the pre-commit table state
(None for CREATE TABLE)
- `effective_table_config: TableConfiguration` -- the config this commit
will produce

This separates "what did I read?" (conflict detection, post-commit
snapshots) from "what will
this commit produce?" (schema, protocol, stats, write context).
Write-path call sites read from
`effective_table_config`; read-path call sites use `read_snapshot()`.

Also adds `should_emit_protocol` / `should_emit_metadata` flags to
replace the old
`is_create_table()` checks for Protocol/Metadata action emission, and
replaces the synthetic
pre-commit snapshot in CREATE TABLE with direct `TableConfiguration`
construction.

This is a pure refactor with no behaviour change.

## How was this change tested?

All existing tests pass. Added unit tests for
`LogSegment::new_for_version_zero` (valid input,
non-zero version rejection, non-commit file rejection).
@william-ch-databricks william-ch-databricks force-pushed the stack/alter-table-6-drop-column branch 4 times, most recently from c18ab58 to 979fb8a Compare April 22, 2026 05:06
@william-ch-databricks william-ch-databricks force-pushed the stack/alter-table-6-drop-column branch from 979fb8a to f07d63b Compare April 22, 2026 18:38
@william-ch-databricks william-ch-databricks force-pushed the stack/alter-table-6-drop-column branch 6 times, most recently from 6e91f96 to ea25e49 Compare April 23, 2026 20:22
ethan-tyler pushed a commit to ethan-tyler/delta-kernel-rs that referenced this pull request Apr 23, 2026
…a-io#2386)

## Stacked PR
Use this
[link](https://github.com/delta-io/delta-kernel-rs/pull/2386/files) to
review incremental changes.
-
[stack/alter-table-1-refactor-state](delta-io#2385)
[[Files
changed](https://github.com/delta-io/delta-kernel-rs/pull/2385/files)]
[MERGED]
-
[**stack/alter-table-2-supports-data-files**](delta-io#2386)
[[Files
changed](https://github.com/delta-io/delta-kernel-rs/pull/2386/files)]
-
[stack/alter-table-3-framework-add-column](delta-io#2387)
[[Files
changed](https://github.com/delta-io/delta-kernel-rs/pull/2387/files/a71c8b8c83d33b1ab882cae984973a76aafc3a31..6dd1b32e8c5163018f2c836f733b58fc4bdc2dc2)]
-
[stack/alter-table-4-set-nullable](delta-io#2388)
[[Files
changed](https://github.com/delta-io/delta-kernel-rs/pull/2388/files/6dd1b32e8c5163018f2c836f733b58fc4bdc2dc2..d121f77a73a37f12421c42dc5c4a5626ad2a1e10)]
-
[stack/alter-table-5-column-mapping-add](delta-io#2389)
[[Files
changed](https://github.com/delta-io/delta-kernel-rs/pull/2389/files/d121f77a73a37f12421c42dc5c4a5626ad2a1e10..cb5ca4e01024e50ca6b5b0576bbeb9bb6b184d17)]
-
[stack/alter-table-6-drop-column](delta-io#2390)
[[Files
changed](https://github.com/delta-io/delta-kernel-rs/pull/2390/files/cb5ca4e01024e50ca6b5b0576bbeb9bb6b184d17..6e91f960109a4d4db662e417780bb4beaac88f5f)]
-
[stack/alter-table-7-rename-column](delta-io#2391)
[[Files
changed](https://github.com/delta-io/delta-kernel-rs/pull/2391/files/6e91f960109a4d4db662e417780bb4beaac88f5f..300d676946d785eb195018e2a9ee696c1f6623a6)]
-
[stack/alter-table-8-operation-string](delta-io#2447)
[[Files
changed](https://github.com/delta-io/delta-kernel-rs/pull/2447/files/300d676946d785eb195018e2a9ee696c1f6623a6..fee7c029d62cfb55b809ced35c0cb0553ddaf3e7)]
-
[stack/alter-table-9-is-blind-append](delta-io#2448)
[[Files
changed](https://github.com/delta-io/delta-kernel-rs/pull/2448/files/fee7c029d62cfb55b809ced35c0cb0553ddaf3e7..3d44b51e0d037c8f33ffbece2d7deecd800d5544)]

---------
## What changes are proposed in this pull request?

Introduce a `SupportsDataFiles` marker trait to gate data-file methods
at compile time. This prevents future metadata-only transaction types
(like `AlterTable`) from accessing methods that add, remove, or
configure data files.

**Gated behind `SupportsDataFiles` (only `ExistingTable` and
`CreateTable`):**

| Method | Reason |
|--------|--------|
| `stats_schema()` | Returns table-specific stats schema for writing
files |
| `stats_columns()` | Returns column names needing stats collection |
| `generate_logical_to_physical()` | Builds partition column transform
for file writing |
| `get_write_context()` | Returns full write context (target dir,
schemas, column mapping) |
| `add_files()` | Queues file additions |

**Kept on `impl<S> Transaction<S>` (all transaction types):**

| Method | Reason |
|--------|--------|
| `commit()` | Shared commit path; handles empty data gracefully |
| `with_data_change()` / `set_data_change()` | ALTER TABLE sets
`data_change: false` |
| `with_engine_info()` | Applicable to all commits |
| `with_commit_info()` | Applicable to all commits |
| `with_transaction_id()` | Applicable to all commits |
| `with_domain_metadata()` | Schema changes may need domain metadata |
| `add_files_schema()` | Read-only accessor; used by `generate_adds()`
which handles empty state internally. Kept ungated so it remains
available if it becomes dynamic in the future. |
| `validate_add_files_stats()` | Internal; handles empty data with early
return |
| `generate_adds()` | Internal; returns early when no files queued |
| `generate_remove_actions()` | Internal; returns early when no files
queued |
| `generate_dv_update_actions()` | Internal; returns early when no DV
updates |
| `generate_adds_for_dv_update()` | Internal; only called from
`generate_dv_update_actions` |
| `build_crc_delta()` | Internal; shared commit path |
| `into_committed()` / `into_conflicted()` / `into_retryable()` |
Internal; shared commit path |

## How was this change tested?

All existing tests pass unchanged. The trait adds compile-time safety
with no runtime behavior change. A `compile_fail` doctest will be added
in the follow-up PR that introduces `AlterTable` to verify that data
file methods are rejected at compile time.
@william-ch-databricks william-ch-databricks force-pushed the stack/alter-table-6-drop-column branch from ea25e49 to 1382e79 Compare April 23, 2026 22:03
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 23, 2026

Codecov Report

❌ Patch coverage is 96.71429% with 23 lines in your changes missing coverage. Please review.
✅ Project coverage is 88.51%. Comparing base (6486bd2) to head (ae88d84).

Files with missing lines Patch % Lines
kernel/src/transaction/schema_evolution.rs 96.64% 12 Missing and 5 partials ⚠️
kernel/src/actions/mod.rs 96.36% 0 Missing and 2 partials ⚠️
kernel/src/schema/mod.rs 90.47% 1 Missing and 1 partial ⚠️
kernel/src/transaction/builder/alter_table.rs 92.59% 0 Missing and 2 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2390      +/-   ##
==========================================
+ Coverage   88.40%   88.51%   +0.10%     
==========================================
  Files         178      178              
  Lines       58235    58914     +679     
  Branches    58235    58914     +679     
==========================================
+ Hits        51485    52145     +660     
- Misses       4783     4795      +12     
- Partials     1967     1974       +7     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@william-ch-databricks william-ch-databricks force-pushed the stack/alter-table-6-drop-column branch from 1382e79 to e0f3f47 Compare April 24, 2026 17:34
@william-ch-databricks william-ch-databricks marked this pull request as ready for review April 24, 2026 17:37
@william-ch-databricks william-ch-databricks force-pushed the stack/alter-table-6-drop-column branch from e0f3f47 to 4a63b71 Compare April 24, 2026 17:43
@william-ch-databricks william-ch-databricks force-pushed the stack/alter-table-6-drop-column branch from 4a63b71 to 1607acd Compare April 24, 2026 19:08
@william-ch-databricks william-ch-databricks force-pushed the stack/alter-table-6-drop-column branch 3 times, most recently from 9c29791 to bb5e516 Compare April 27, 2026 06:19
Implement the SetNullable schema operation which changes a column's
nullability from NOT NULL to nullable. If the column is already nullable,
this is a no-op.

Only the safe direction is allowed (NOT NULL -> nullable) per the Delta
protocol.
When column mapping is enabled (mode = name or id), newly added columns
via ALTER TABLE ADD COLUMN are now assigned column mapping metadata
(unique ID and UUID-based physical name). The maxColumnId table property
is updated in the evolved metadata configuration.

Reuses the existing assign_field_column_mapping function (now pub(crate))
which handles recursive assignment for nested types.
Implement the DropColumn schema operation which removes a column from
the logical schema. Column mapping must be enabled (mode = name or id)
because without it, physical Parquet files would still contain the
column data.

Validates:
- Column mapping is enabled
- Column exists in the schema
- Column is not a partition column
- Schema won't be empty after dropping
@william-ch-databricks william-ch-databricks force-pushed the stack/alter-table-6-drop-column branch from bb5e516 to ae88d84 Compare April 27, 2026 08:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants