feat: derive ALTER TABLE isBlindAppend from per-op policy#2448
feat: derive ALTER TABLE isBlindAppend from per-op policy#2448william-ch-databricks wants to merge 6 commits intodelta-io:mainfrom
Conversation
| if S::IS_METADATA_ONLY { | ||
| return Ok(()); | ||
| } |
There was a problem hiding this comment.
Starting discussion:
This branch handles metadata-only operations that are blind-append (ADD/DROP/DROP NULL/RENAME). However, not all metadata-only operations are bind-append, for example, SET NOT NULL. This operations requires scanning/reading all existing Parquet data to verify no nulls exist in the target column. Despite no data change, is_blind_append is false.
There is no current way in delta-kernel-rs to express this behaviour. validate_blind_append_semantics checks are scoped to data changes.
Spark captures this as a readFiles HashSet
3d44b51 to
0b22b39
Compare
…a-io#2386) ## Stacked PR Use this [link](https://github.com/delta-io/delta-kernel-rs/pull/2386/files) to review incremental changes. - [stack/alter-table-1-refactor-state](delta-io#2385) [[Files changed](https://github.com/delta-io/delta-kernel-rs/pull/2385/files)] [MERGED] - [**stack/alter-table-2-supports-data-files**](delta-io#2386) [[Files changed](https://github.com/delta-io/delta-kernel-rs/pull/2386/files)] - [stack/alter-table-3-framework-add-column](delta-io#2387) [[Files changed](https://github.com/delta-io/delta-kernel-rs/pull/2387/files/a71c8b8c83d33b1ab882cae984973a76aafc3a31..6dd1b32e8c5163018f2c836f733b58fc4bdc2dc2)] - [stack/alter-table-4-set-nullable](delta-io#2388) [[Files changed](https://github.com/delta-io/delta-kernel-rs/pull/2388/files/6dd1b32e8c5163018f2c836f733b58fc4bdc2dc2..d121f77a73a37f12421c42dc5c4a5626ad2a1e10)] - [stack/alter-table-5-column-mapping-add](delta-io#2389) [[Files changed](https://github.com/delta-io/delta-kernel-rs/pull/2389/files/d121f77a73a37f12421c42dc5c4a5626ad2a1e10..cb5ca4e01024e50ca6b5b0576bbeb9bb6b184d17)] - [stack/alter-table-6-drop-column](delta-io#2390) [[Files changed](https://github.com/delta-io/delta-kernel-rs/pull/2390/files/cb5ca4e01024e50ca6b5b0576bbeb9bb6b184d17..6e91f960109a4d4db662e417780bb4beaac88f5f)] - [stack/alter-table-7-rename-column](delta-io#2391) [[Files changed](https://github.com/delta-io/delta-kernel-rs/pull/2391/files/6e91f960109a4d4db662e417780bb4beaac88f5f..300d676946d785eb195018e2a9ee696c1f6623a6)] - [stack/alter-table-8-operation-string](delta-io#2447) [[Files changed](https://github.com/delta-io/delta-kernel-rs/pull/2447/files/300d676946d785eb195018e2a9ee696c1f6623a6..fee7c029d62cfb55b809ced35c0cb0553ddaf3e7)] - [stack/alter-table-9-is-blind-append](delta-io#2448) [[Files changed](https://github.com/delta-io/delta-kernel-rs/pull/2448/files/fee7c029d62cfb55b809ced35c0cb0553ddaf3e7..3d44b51e0d037c8f33ffbece2d7deecd800d5544)] --------- ## What changes are proposed in this pull request? Introduce a `SupportsDataFiles` marker trait to gate data-file methods at compile time. This prevents future metadata-only transaction types (like `AlterTable`) from accessing methods that add, remove, or configure data files. **Gated behind `SupportsDataFiles` (only `ExistingTable` and `CreateTable`):** | Method | Reason | |--------|--------| | `stats_schema()` | Returns table-specific stats schema for writing files | | `stats_columns()` | Returns column names needing stats collection | | `generate_logical_to_physical()` | Builds partition column transform for file writing | | `get_write_context()` | Returns full write context (target dir, schemas, column mapping) | | `add_files()` | Queues file additions | **Kept on `impl<S> Transaction<S>` (all transaction types):** | Method | Reason | |--------|--------| | `commit()` | Shared commit path; handles empty data gracefully | | `with_data_change()` / `set_data_change()` | ALTER TABLE sets `data_change: false` | | `with_engine_info()` | Applicable to all commits | | `with_commit_info()` | Applicable to all commits | | `with_transaction_id()` | Applicable to all commits | | `with_domain_metadata()` | Schema changes may need domain metadata | | `add_files_schema()` | Read-only accessor; used by `generate_adds()` which handles empty state internally. Kept ungated so it remains available if it becomes dynamic in the future. | | `validate_add_files_stats()` | Internal; handles empty data with early return | | `generate_adds()` | Internal; returns early when no files queued | | `generate_remove_actions()` | Internal; returns early when no files queued | | `generate_dv_update_actions()` | Internal; returns early when no DV updates | | `generate_adds_for_dv_update()` | Internal; only called from `generate_dv_update_actions` | | `build_crc_delta()` | Internal; shared commit path | | `into_committed()` / `into_conflicted()` / `into_retryable()` | Internal; shared commit path | ## How was this change tested? All existing tests pass unchanged. The trait adds compile-time safety with no runtime behavior change. A `compile_fail` doctest will be added in the follow-up PR that introduces `AlterTable` to verify that data file methods are rejected at compile time.
0b22b39 to
b2c0254
Compare
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #2448 +/- ##
==========================================
+ Coverage 88.40% 88.55% +0.14%
==========================================
Files 178 178
Lines 58235 59177 +942
Branches 58235 59177 +942
==========================================
+ Hits 51485 52403 +918
- Misses 4783 4799 +16
- Partials 1967 1975 +8 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
f2c3521 to
84c293b
Compare
4d5290e to
6740458
Compare
|
What is a blind append? I asked this on the delta months ago and it isn't mentioned in the protocol at all. delta-io/delta#5387 |
bc345c7 to
4a37947
Compare
Implement the SetNullable schema operation which changes a column's nullability from NOT NULL to nullable. If the column is already nullable, this is a no-op. Only the safe direction is allowed (NOT NULL -> nullable) per the Delta protocol.
When column mapping is enabled (mode = name or id), newly added columns via ALTER TABLE ADD COLUMN are now assigned column mapping metadata (unique ID and UUID-based physical name). The maxColumnId table property is updated in the evolved metadata configuration. Reuses the existing assign_field_column_mapping function (now pub(crate)) which handles recursive assignment for nested types.
Implement the DropColumn schema operation which removes a column from the logical schema. Column mapping must be enabled (mode = name or id) because without it, physical Parquet files would still contain the column data. Validates: - Column mapping is enabled - Column exists in the schema - Column is not a partition column - Schema won't be empty after dropping
Adds RenameColumn as a terminal operation in the ALTER TABLE framework: - Renaming type state ensures rename cannot be combined with other operations - Buildable trait + sealed module for shared build() between Modifying and Renaming - Case-insensitive sibling conflict check via sibling_names_at_path - Physical identity (column ID + physical name) preserved via with_name - Runtime defense-in-depth: RenameColumn must be the sole operation Full integration test suite covering add/drop/rename/set_nullable, chaining, sequential alters, checkpoints, time travel, and append-only tables.
Replaces the hardcoded "ALTER TABLE" operation string with a per-op derived string that matches delta-spark (e.g. "ADD COLUMNS", "ADD COLUMNS + CHANGE COLUMNS"). The operation name is emitted by `apply_schema_operations` via `SchemaEvolutionResult.operation`; distinct names are joined with " + " in first-occurrence order.
Replaces the hardcoded isBlindAppend=false with a per-op fold over SchemaOperation::is_blind_append(). All current ops (add/drop/rename/ drop-not-null) are blind-append, matching delta-spark. The flag is emitted by apply_schema_operations via SchemaEvolutionResult.is_blind_append alongside the operation string fold, no second iteration.
4a37947 to
d658647
Compare
🥞 Stacked PR
Use this link to review incremental changes.
What changes are proposed in this pull request?
Replaces the hardcoded isBlindAppend=false with a per-op fold over
SchemaOperation::is_blind_append(). All current ops (add/drop/rename/
drop-not-null) are blind-append, matching delta-spark. The flag is
emitted by apply_schema_operations via SchemaEvolutionResult.is_blind_append
alongside the operation string fold, no second iteration.
How was this change tested?