Skip to content

feat: derive ALTER TABLE isBlindAppend from per-op policy#2448

Draft
william-ch-databricks wants to merge 6 commits intodelta-io:mainfrom
william-ch-databricks:stack/alter-table-9-is-blind-append
Draft

feat: derive ALTER TABLE isBlindAppend from per-op policy#2448
william-ch-databricks wants to merge 6 commits intodelta-io:mainfrom
william-ch-databricks:stack/alter-table-9-is-blind-append

Conversation

@william-ch-databricks
Copy link
Copy Markdown
Collaborator

@william-ch-databricks william-ch-databricks commented Apr 22, 2026

🥞 Stacked PR

Use this link to review incremental changes.


What changes are proposed in this pull request?

Replaces the hardcoded isBlindAppend=false with a per-op fold over
SchemaOperation::is_blind_append(). All current ops (add/drop/rename/
drop-not-null) are blind-append, matching delta-spark. The flag is
emitted by apply_schema_operations via SchemaEvolutionResult.is_blind_append
alongside the operation string fold, no second iteration.

How was this change tested?

Comment on lines +697 to +699
if S::IS_METADATA_ONLY {
return Ok(());
}
Copy link
Copy Markdown
Collaborator Author

@william-ch-databricks william-ch-databricks Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Starting discussion:

This branch handles metadata-only operations that are blind-append (ADD/DROP/DROP NULL/RENAME). However, not all metadata-only operations are bind-append, for example, SET NOT NULL. This operations requires scanning/reading all existing Parquet data to verify no nulls exist in the target column. Despite no data change, is_blind_append is false.

There is no current way in delta-kernel-rs to express this behaviour. validate_blind_append_semantics checks are scoped to data changes.

Spark captures this as a readFiles HashSet

@william-ch-databricks william-ch-databricks force-pushed the stack/alter-table-9-is-blind-append branch 6 times, most recently from 3d44b51 to 0b22b39 Compare April 23, 2026 20:22
ethan-tyler pushed a commit to ethan-tyler/delta-kernel-rs that referenced this pull request Apr 23, 2026
…a-io#2386)

## Stacked PR
Use this
[link](https://github.com/delta-io/delta-kernel-rs/pull/2386/files) to
review incremental changes.
-
[stack/alter-table-1-refactor-state](delta-io#2385)
[[Files
changed](https://github.com/delta-io/delta-kernel-rs/pull/2385/files)]
[MERGED]
-
[**stack/alter-table-2-supports-data-files**](delta-io#2386)
[[Files
changed](https://github.com/delta-io/delta-kernel-rs/pull/2386/files)]
-
[stack/alter-table-3-framework-add-column](delta-io#2387)
[[Files
changed](https://github.com/delta-io/delta-kernel-rs/pull/2387/files/a71c8b8c83d33b1ab882cae984973a76aafc3a31..6dd1b32e8c5163018f2c836f733b58fc4bdc2dc2)]
-
[stack/alter-table-4-set-nullable](delta-io#2388)
[[Files
changed](https://github.com/delta-io/delta-kernel-rs/pull/2388/files/6dd1b32e8c5163018f2c836f733b58fc4bdc2dc2..d121f77a73a37f12421c42dc5c4a5626ad2a1e10)]
-
[stack/alter-table-5-column-mapping-add](delta-io#2389)
[[Files
changed](https://github.com/delta-io/delta-kernel-rs/pull/2389/files/d121f77a73a37f12421c42dc5c4a5626ad2a1e10..cb5ca4e01024e50ca6b5b0576bbeb9bb6b184d17)]
-
[stack/alter-table-6-drop-column](delta-io#2390)
[[Files
changed](https://github.com/delta-io/delta-kernel-rs/pull/2390/files/cb5ca4e01024e50ca6b5b0576bbeb9bb6b184d17..6e91f960109a4d4db662e417780bb4beaac88f5f)]
-
[stack/alter-table-7-rename-column](delta-io#2391)
[[Files
changed](https://github.com/delta-io/delta-kernel-rs/pull/2391/files/6e91f960109a4d4db662e417780bb4beaac88f5f..300d676946d785eb195018e2a9ee696c1f6623a6)]
-
[stack/alter-table-8-operation-string](delta-io#2447)
[[Files
changed](https://github.com/delta-io/delta-kernel-rs/pull/2447/files/300d676946d785eb195018e2a9ee696c1f6623a6..fee7c029d62cfb55b809ced35c0cb0553ddaf3e7)]
-
[stack/alter-table-9-is-blind-append](delta-io#2448)
[[Files
changed](https://github.com/delta-io/delta-kernel-rs/pull/2448/files/fee7c029d62cfb55b809ced35c0cb0553ddaf3e7..3d44b51e0d037c8f33ffbece2d7deecd800d5544)]

---------
## What changes are proposed in this pull request?

Introduce a `SupportsDataFiles` marker trait to gate data-file methods
at compile time. This prevents future metadata-only transaction types
(like `AlterTable`) from accessing methods that add, remove, or
configure data files.

**Gated behind `SupportsDataFiles` (only `ExistingTable` and
`CreateTable`):**

| Method | Reason |
|--------|--------|
| `stats_schema()` | Returns table-specific stats schema for writing
files |
| `stats_columns()` | Returns column names needing stats collection |
| `generate_logical_to_physical()` | Builds partition column transform
for file writing |
| `get_write_context()` | Returns full write context (target dir,
schemas, column mapping) |
| `add_files()` | Queues file additions |

**Kept on `impl<S> Transaction<S>` (all transaction types):**

| Method | Reason |
|--------|--------|
| `commit()` | Shared commit path; handles empty data gracefully |
| `with_data_change()` / `set_data_change()` | ALTER TABLE sets
`data_change: false` |
| `with_engine_info()` | Applicable to all commits |
| `with_commit_info()` | Applicable to all commits |
| `with_transaction_id()` | Applicable to all commits |
| `with_domain_metadata()` | Schema changes may need domain metadata |
| `add_files_schema()` | Read-only accessor; used by `generate_adds()`
which handles empty state internally. Kept ungated so it remains
available if it becomes dynamic in the future. |
| `validate_add_files_stats()` | Internal; handles empty data with early
return |
| `generate_adds()` | Internal; returns early when no files queued |
| `generate_remove_actions()` | Internal; returns early when no files
queued |
| `generate_dv_update_actions()` | Internal; returns early when no DV
updates |
| `generate_adds_for_dv_update()` | Internal; only called from
`generate_dv_update_actions` |
| `build_crc_delta()` | Internal; shared commit path |
| `into_committed()` / `into_conflicted()` / `into_retryable()` |
Internal; shared commit path |

## How was this change tested?

All existing tests pass unchanged. The trait adds compile-time safety
with no runtime behavior change. A `compile_fail` doctest will be added
in the follow-up PR that introduces `AlterTable` to verify that data
file methods are rejected at compile time.
@william-ch-databricks william-ch-databricks force-pushed the stack/alter-table-9-is-blind-append branch from 0b22b39 to b2c0254 Compare April 23, 2026 22:03
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 23, 2026

Codecov Report

❌ Patch coverage is 97.10445% with 28 lines in your changes missing coverage. Please review.
✅ Project coverage is 88.55%. Comparing base (6486bd2) to head (d658647).

Files with missing lines Patch % Lines
kernel/src/transaction/schema_evolution.rs 97.15% 15 Missing and 6 partials ⚠️
kernel/src/actions/mod.rs 96.36% 0 Missing and 2 partials ⚠️
kernel/src/schema/mod.rs 90.47% 1 Missing and 1 partial ⚠️
kernel/src/transaction/builder/alter_table.rs 95.65% 0 Missing and 2 partials ⚠️
kernel/src/transaction/mod.rs 91.66% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2448      +/-   ##
==========================================
+ Coverage   88.40%   88.55%   +0.14%     
==========================================
  Files         178      178              
  Lines       58235    59177     +942     
  Branches    58235    59177     +942     
==========================================
+ Hits        51485    52403     +918     
- Misses       4783     4799      +16     
- Partials     1967     1975       +8     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@william-ch-databricks william-ch-databricks force-pushed the stack/alter-table-9-is-blind-append branch 3 times, most recently from f2c3521 to 84c293b Compare April 24, 2026 19:08
@william-ch-databricks william-ch-databricks force-pushed the stack/alter-table-9-is-blind-append branch 3 times, most recently from 4d5290e to 6740458 Compare April 25, 2026 00:02
@abhiaagarwal
Copy link
Copy Markdown
Contributor

What is a blind append? I asked this on the delta months ago and it isn't mentioned in the protocol at all. delta-io/delta#5387

@william-ch-databricks william-ch-databricks force-pushed the stack/alter-table-9-is-blind-append branch 2 times, most recently from bc345c7 to 4a37947 Compare April 27, 2026 06:19
Implement the SetNullable schema operation which changes a column's
nullability from NOT NULL to nullable. If the column is already nullable,
this is a no-op.

Only the safe direction is allowed (NOT NULL -> nullable) per the Delta
protocol.
When column mapping is enabled (mode = name or id), newly added columns
via ALTER TABLE ADD COLUMN are now assigned column mapping metadata
(unique ID and UUID-based physical name). The maxColumnId table property
is updated in the evolved metadata configuration.

Reuses the existing assign_field_column_mapping function (now pub(crate))
which handles recursive assignment for nested types.
Implement the DropColumn schema operation which removes a column from
the logical schema. Column mapping must be enabled (mode = name or id)
because without it, physical Parquet files would still contain the
column data.

Validates:
- Column mapping is enabled
- Column exists in the schema
- Column is not a partition column
- Schema won't be empty after dropping
Adds RenameColumn as a terminal operation in the ALTER TABLE framework:
- Renaming type state ensures rename cannot be combined with other operations
- Buildable trait + sealed module for shared build() between Modifying and Renaming
- Case-insensitive sibling conflict check via sibling_names_at_path
- Physical identity (column ID + physical name) preserved via with_name
- Runtime defense-in-depth: RenameColumn must be the sole operation

Full integration test suite covering add/drop/rename/set_nullable, chaining,
sequential alters, checkpoints, time travel, and append-only tables.
Replaces the hardcoded "ALTER TABLE" operation string with a per-op derived
string that matches delta-spark (e.g. "ADD COLUMNS", "ADD COLUMNS +
CHANGE COLUMNS"). The operation name is emitted by `apply_schema_operations`
via `SchemaEvolutionResult.operation`; distinct names are joined with " + "
in first-occurrence order.
Replaces the hardcoded isBlindAppend=false with a per-op fold over
SchemaOperation::is_blind_append(). All current ops (add/drop/rename/
drop-not-null) are blind-append, matching delta-spark. The flag is
emitted by apply_schema_operations via SchemaEvolutionResult.is_blind_append
alongside the operation string fold, no second iteration.
@william-ch-databricks william-ch-databricks force-pushed the stack/alter-table-9-is-blind-append branch from 4a37947 to d658647 Compare April 27, 2026 08:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants