Skip to content

feat: Add geo schema types and table feature#2464

Open
lorenarosati wants to merge 14 commits intodelta-io:mainfrom
lorenarosati:stack/schema-table-feat-geo
Open

feat: Add geo schema types and table feature#2464
lorenarosati wants to merge 14 commits intodelta-io:mainfrom
lorenarosati:stack/schema-table-feat-geo

Conversation

@lorenarosati
Copy link
Copy Markdown
Collaborator

@lorenarosati lorenarosati commented Apr 24, 2026

🥞 Stacked PR

Use this link to review incremental changes.


Note: don't merge until the RFC is merged.

What changes are proposed in this pull request?

This PR implements the following:

  • New schema types for geo: GeometryType, GeographyType
  • Geo schema type serialization/deserialization
  • geospatial reader/writer table feature support

How was this change tested?

All tests follow pre-existing test patterns for other types.

  • Test all valid deserialization cases for correct output - see this comment
  • Test invalid geo serialized format
  • Test roundtrip serde of geo types
  • Test feature validation

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 24, 2026

PR title does not match the required pattern. Please ensure you follow the conventional commits spec.

Your title should start with feat:, fix:, chore:, docs:, perf:, refactor:, test:, or ci:, and if it's a breaking change that should be suffixed with a ! (like feat!:), and then a 1-72 character brief description of your change.

Title: geo schema type and table feat
PR title does not match the required pattern. Please ensure you follow the conventional commits spec.

Your title should start with feat:, fix:, chore:, docs:, perf:, refactor:, test:, or ci:, and if it's a breaking change that should be suffixed with a ! (like feat!:), and then a 1-72 character brief description of your change.

Title: geo schema type and table feat
PR title does not match the required pattern. Please ensure you follow the conventional commits spec.

Your title should start with feat:, fix:, chore:, docs:, perf:, refactor:, test:, or ci:, and if it's a breaking change that should be suffixed with a ! (like feat!:), and then a 1-72 character brief description of your change.

Title: geo schema type and table feat
PR title does not match the required pattern. Please ensure you follow the conventional commits spec.

Your title should start with feat:, fix:, chore:, docs:, perf:, refactor:, test:, or ci:, and if it's a breaking change that should be suffixed with a ! (like feat!:), and then a 1-72 character brief description of your change.

Title: geo schema type and table feat
PR title does not match the required pattern. Please ensure you follow the conventional commits spec.

Your title should start with feat:, fix:, chore:, docs:, perf:, refactor:, test:, or ci:, and if it's a breaking change that should be suffixed with a ! (like feat!:), and then a 1-72 character brief description of your change.

Title: geo schema type and table feat
PR title does not match the required pattern. Please ensure you follow the conventional commits spec.

Your title should start with feat:, fix:, chore:, docs:, perf:, refactor:, test:, or ci:, and if it's a breaking change that should be suffixed with a ! (like feat!:), and then a 1-72 character brief description of your change.

Title: geo schema type and table feat
PR title does not match the required pattern. Please ensure you follow the conventional commits spec.

Your title should start with feat:, fix:, chore:, docs:, perf:, refactor:, test:, or ci:, and if it's a breaking change that should be suffixed with a ! (like feat!:), and then a 1-72 character brief description of your change.

Title: geo schema type and table feat
PR title does not match the required pattern. Please ensure you follow the conventional commits spec.

Your title should start with feat:, fix:, chore:, docs:, perf:, refactor:, test:, or ci:, and if it's a breaking change that should be suffixed with a ! (like feat!:), and then a 1-72 character brief description of your change.

Title: feat: Add geo schema types and table feature
PR title does not match the required pattern. Please ensure you follow the conventional commits spec.

Your title should start with feat:, fix:, chore:, docs:, perf:, refactor:, test:, or ci:, and if it's a breaking change that should be suffixed with a ! (like feat!:), and then a 1-72 character brief description of your change.

Title: feat: Add geo schema types and table feature
PR title does not match the required pattern. Please ensure you follow the conventional commits spec.

Your title should start with feat:, fix:, chore:, docs:, perf:, refactor:, test:, or ci:, and if it's a breaking change that should be suffixed with a ! (like feat!:), and then a 1-72 character brief description of your change.

Title: feat: Add geo schema types and table feature
PR title does not match the required pattern. Please ensure you follow the conventional commits spec.

Your title should start with feat:, fix:, chore:, docs:, perf:, refactor:, test:, or ci:, and if it's a breaking change that should be suffixed with a ! (like feat!:), and then a 1-72 character brief description of your change.

Title: feat: Add geo schema types and table feature
PR title does not match the required pattern. Please ensure you follow the conventional commits spec.

Your title should start with feat:, fix:, chore:, docs:, perf:, refactor:, test:, or ci:, and if it's a breaking change that should be suffixed with a ! (like feat!:), and then a 1-72 character brief description of your change.

Title: feat: Add geo schema types and table feature
PR title does not match the required pattern. Please ensure you follow the conventional commits spec.

Your title should start with feat:, fix:, chore:, docs:, perf:, refactor:, test:, or ci:, and if it's a breaking change that should be suffixed with a ! (like feat!:), and then a 1-72 character brief description of your change.

Title: feat: Add geo schema types and table feature
PR title does not match the required pattern. Please ensure you follow the conventional commits spec.

Your title should start with feat:, fix:, chore:, docs:, perf:, refactor:, test:, or ci:, and if it's a breaking change that should be suffixed with a ! (like feat!:), and then a 1-72 character brief description of your change.

Title: feat: Add geo schema types and table feature
PR title does not match the required pattern. Please ensure you follow the conventional commits spec.

Your title should start with feat:, fix:, chore:, docs:, perf:, refactor:, test:, or ci:, and if it's a breaking change that should be suffixed with a ! (like feat!:), and then a 1-72 character brief description of your change.

Title: feat: Add geo schema types and table feature
PR title does not match the required pattern. Please ensure you follow the conventional commits spec.

Your title should start with feat:, fix:, chore:, docs:, perf:, refactor:, test:, or ci:, and if it's a breaking change that should be suffixed with a ! (like feat!:), and then a 1-72 character brief description of your change.

Title: feat: Add geo schema types and table feature
PR title does not match the required pattern. Please ensure you follow the conventional commits spec.

Your title should start with feat:, fix:, chore:, docs:, perf:, refactor:, test:, or ci:, and if it's a breaking change that should be suffixed with a ! (like feat!:), and then a 1-72 character brief description of your change.

Title: feat: Add geo schema types and table feature
PR title does not match the required pattern. Please ensure you follow the conventional commits spec.

Your title should start with feat:, fix:, chore:, docs:, perf:, refactor:, test:, or ci:, and if it's a breaking change that should be suffixed with a ! (like feat!:), and then a 1-72 character brief description of your change.

Title: feat: Add geo schema types and table feature
PR title does not match the required pattern. Please ensure you follow the conventional commits spec.

Your title should start with feat:, fix:, chore:, docs:, perf:, refactor:, test:, or ci:, and if it's a breaking change that should be suffixed with a ! (like feat!:), and then a 1-72 character brief description of your change.

Title: feat: Add geo schema types and table feature
PR title does not match the required pattern. Please ensure you follow the conventional commits spec.

Your title should start with feat:, fix:, chore:, docs:, perf:, refactor:, test:, or ci:, and if it's a breaking change that should be suffixed with a ! (like feat!:), and then a 1-72 character brief description of your change.

Title: feat: Add geo schema types and table feature
PR title does not match the required pattern. Please ensure you follow the conventional commits spec.

Your title should start with feat:, fix:, chore:, docs:, perf:, refactor:, test:, or ci:, and if it's a breaking change that should be suffixed with a ! (like feat!:), and then a 1-72 character brief description of your change.

Title: feat: Add geo schema types and table feature
PR title does not match the required pattern. Please ensure you follow the conventional commits spec.

Your title should start with feat:, fix:, chore:, docs:, perf:, refactor:, test:, or ci:, and if it's a breaking change that should be suffixed with a ! (like feat!:), and then a 1-72 character brief description of your change.

Title: feat: Add geo schema types and table feature
PR title does not match the required pattern. Please ensure you follow the conventional commits spec.

Your title should start with feat:, fix:, chore:, docs:, perf:, refactor:, test:, or ci:, and if it's a breaking change that should be suffixed with a ! (like feat!:), and then a 1-72 character brief description of your change.

Title: feat: Add geo schema types and table feature
PR title does not match the required pattern. Please ensure you follow the conventional commits spec.

Your title should start with feat:, fix:, chore:, docs:, perf:, refactor:, test:, or ci:, and if it's a breaking change that should be suffixed with a ! (like feat!:), and then a 1-72 character brief description of your change.

Title: feat: Add geo schema types and table feature
PR title does not match the required pattern. Please ensure you follow the conventional commits spec.

Your title should start with feat:, fix:, chore:, docs:, perf:, refactor:, test:, or ci:, and if it's a breaking change that should be suffixed with a ! (like feat!:), and then a 1-72 character brief description of your change.

Title: feat: Add geo schema types and table feature
PR title does not match the required pattern. Please ensure you follow the conventional commits spec.

Your title should start with feat:, fix:, chore:, docs:, perf:, refactor:, test:, or ci:, and if it's a breaking change that should be suffixed with a ! (like feat!:), and then a 1-72 character brief description of your change.

Title: feat: Add geo schema types and table feature
PR title does not match the required pattern. Please ensure you follow the conventional commits spec.

Your title should start with feat:, fix:, chore:, docs:, perf:, refactor:, test:, or ci:, and if it's a breaking change that should be suffixed with a ! (like feat!:), and then a 1-72 character brief description of your change.

Title: feat: Add geo schema types and table feature
PR title does not match the required pattern. Please ensure you follow the conventional commits spec.

Your title should start with feat:, fix:, chore:, docs:, perf:, refactor:, test:, or ci:, and if it's a breaking change that should be suffixed with a ! (like feat!:), and then a 1-72 character brief description of your change.

Title: feat: Add geo schema types and table feature
PR title does not match the required pattern. Please ensure you follow the conventional commits spec.

Your title should start with feat:, fix:, chore:, docs:, perf:, refactor:, test:, or ci:, and if it's a breaking change that should be suffixed with a ! (like feat!:), and then a 1-72 character brief description of your change.

Title: feat: Add geo schema types and table feature
PR title does not match the required pattern. Please ensure you follow the conventional commits spec.

Your title should start with feat:, fix:, chore:, docs:, perf:, refactor:, test:, or ci:, and if it's a breaking change that should be suffixed with a ! (like feat!:), and then a 1-72 character brief description of your change.

Title: feat: Add geo schema types and table feature
PR title does not match the required pattern. Please ensure you follow the conventional commits spec.

Your title should start with feat:, fix:, chore:, docs:, perf:, refactor:, test:, or ci:, and if it's a breaking change that should be suffixed with a ! (like feat!:), and then a 1-72 character brief description of your change.

Title: feat: Add geo schema types and table feature
PR title does not match the required pattern. Please ensure you follow the conventional commits spec.

Your title should start with feat:, fix:, chore:, docs:, perf:, refactor:, test:, or ci:, and if it's a breaking change that should be suffixed with a ! (like feat!:), and then a 1-72 character brief description of your change.

Title: feat: Add geo schema types and table feature

@github-actions
Copy link
Copy Markdown

PR title does not match the required pattern. Please ensure you follow the conventional commits spec.

Your title should start with feat:, fix:, chore:, docs:, perf:, refactor:, test:, or ci:, and if it's a breaking change that should be suffixed with a ! (like feat!:), and then a 1-72 character brief description of your change.

Title: geo schema type and table feat

@lorenarosati
Copy link
Copy Markdown
Collaborator Author

Range-diff: main (b369f06 -> dc5d29e)
.github/workflows/build.yml
@@ -0,0 +1,75 @@
+diff --git a/.github/workflows/build.yml b/.github/workflows/build.yml
+--- a/.github/workflows/build.yml
++++ b/.github/workflows/build.yml
+ # enforce the committed Cargo.lock. This prevents CI from silently resolving a newer
+ # (potentially compromised) dependency version. If Cargo.lock is out of sync with
+ # Cargo.toml, the build fails immediately. Any dependency change must be an explicit,
+-# reviewable update to Cargo.lock in the PR. Commands that skip --locked: cargo fmt
++# reviewable update to Cargo.lock in the PR. Commands that skip --locked: cargo +nightly fmt
+ # (no dep resolution), cargo msrv verify/show (wrapper tool), cargo miri setup (tooling).
+ #
+ # Swatinem/rust-cache caches the cargo registry and target directory (~450MB per job).
+     runs-on: ubuntu-latest
+     steps:
+       - uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4.3.1
+-      - name: Install minimal stable with rustfmt
++      - name: Install nightly with rustfmt
+         uses: actions-rust-lang/setup-rust-toolchain@150fca883cd4034361b621bd4e6a9d34e5143606 # v1.15.4
+         with:
+           cache: false
++          toolchain: nightly
+           components: rustfmt
+       - name: format
+-        run: cargo fmt -- --check
++        run: cargo +nightly fmt -- --check
+ 
+   msrv:
+     runs-on: ubuntu-latest
+           pushd kernel
+           echo "Testing with $(cargo msrv show --output-format minimal)"
+           cargo +$(cargo msrv show --output-format minimal) nextest run --locked
++          cargo +$(cargo msrv show --output-format minimal) test --doc
+   docs:
+     runs-on: ubuntu-latest
+     env:
+           cmake ..
+           make
+           make test
++      - name: build and run create-table test
++        run: |
++          pushd ffi/examples/create-table
++          mkdir build
++          pushd build
++          cmake ..
++          make
++          make test
++      # NOTE: write-table's ctest seeds its target table by invoking the create-table
++      # binary, so create-table must be built first (its build/ dir is preserved by the
++      # preceding step and write-table's CMakeLists references it via a relative path).
++      - name: build and run write-table test
++        run: |
++          pushd ffi/examples/write-table
++          mkdir build
++          pushd build
++          cmake ..
++          make
++          make test
++      - name: build and run read-table-changes test
++        run: |
++          pushd ffi/examples/read-table-changes
++          mkdir build
++          pushd build
++          cmake ..
++          make
++          make test
+   miri:
+     name: "Miri (shard ${{ matrix.partition }}/3)"
+     runs-on: ubuntu-latest
+       - name: Install cargo-llvm-cov
+         uses: taiki-e/install-action@2d15d02e710b40b6332201aba6af30d595b5cd96 # cargo-llvm-cov
+       - name: Generate code coverage
+-        run: cargo llvm-cov --locked --all-features --workspace --codecov --output-path codecov.json -- --skip read_table_version_hdfs
++        run: cargo llvm-cov --locked --all-features --workspace --codecov --output-path codecov.json -- --skip read_table_version_hdfs --skip handle::tests::invalid_handle_code
+       - name: Upload coverage to Codecov
+         uses: codecov/codecov-action@1af58845a975a7985b0beb0cbe6fbbb71a41dbad # v5.5.3
+         with:
\ No newline at end of file
.github/workflows/pr-body-validator.yml
@@ -0,0 +1,27 @@
+diff --git a/.github/workflows/pr-body-validator.yml b/.github/workflows/pr-body-validator.yml
+new file mode 100644
+--- /dev/null
++++ b/.github/workflows/pr-body-validator.yml
++name: Validate PR Body
++
++on:
++  pull_request:
++    types: [opened, edited]
++  merge_group:
++
++jobs:
++  validate-body:
++    runs-on: ubuntu-latest
++    steps:
++      - name: Validate PR Body
++        shell: bash
++        env:
++          PR_BODY: ${{ github.event.pull_request.body }}
++        run: |
++          if LC_ALL=C grep -q '[^[:print:][:space:]]' <<< "$PR_BODY"; then
++            echo "PR body contains non-ascii characters. Please remove them."
++            exit 1
++          else
++            echo "PR body contains ascii characters only"
++          fi
++
\ No newline at end of file
CHANGELOG.md
@@ -0,0 +1,282 @@
+diff --git a/CHANGELOG.md b/CHANGELOG.md
+--- a/CHANGELOG.md
++++ b/CHANGELOG.md
+ # Changelog
+ 
++## [v0.21.0](https://github.com/delta-io/delta-kernel-rs/tree/v0.21.0/) (2026-04-10)
++
++[Full Changelog](https://github.com/delta-io/delta-kernel-rs/compare/v0.20.0...v0.21.0)
++
++
++### 🏗️ Breaking changes
++
++1. Add partitioned variant to DataLayout enum ([#2145])
++   - Adds `Partitioned` variant to `DataLayout` enum. Update match statements to handle the new variant.
++2. Add create many API to engine ([#2070])
++   - Adds `create_many` method to `ParquetHandler` trait. Implementors must add this method. See the trait rustdocs for details.
++3. Rename uc-catalog and uc-client crates ([#2136])
++   - `delta-kernel-uc-catalog` renamed to `delta-kernel-unity-catalog`. `delta-kernel-uc-client` renamed to `unity-catalog-delta-rest-client`. Update `Cargo.toml` dependencies accordingly.
++4. Checksum and checkpoint APIs return updated Snapshot ([#2182])
++   - `Snapshot::checkpoint()` and checksum APIs now return the updated `Snapshot`. Callers must handle the returned value.
++5. Add P&M to CommitMetadata and enforce committer/table type matching ([#2250])
++   - Enforces that committer type matches table type (catalog-managed vs path-based). Use appropriate committer for your table type.
++6. Add UCCommitter validation for catalog-managed tables ([#2254])
++   - `UCCommitter` now rejects commits to non-catalog-managed tables. Use `FileSystemCommitter` for path-based tables.
++7. Refactor snapshot FFI to use builder pattern and enable snapshot reuse ([#2255])
++   - FFI snapshot creation now uses builder pattern. Update FFI callers to use the new builder APIs.
++8. Make tags and remove partition values allow null values in map ([#2281])
++   - `tags` and `partitionValues` map values are now nullable. Update code that assumes non-null values.
++9. Better naming style for column mapping related functions/variables ([#2290])
++   - Renamed: `make_physical` to `to_physical_name`, `make_physical_struct` to `to_physical_schema`, `transform_struct_for_projection` to `projection_transform`. Update call sites.
++10. Remove the catalog-managed feature flag ([#2310])
++    - The `catalog-managed` feature flag is removed. Catalog-managed table support is now always available.
++11. Update snapshot.checkpoint API to return a CheckpointResult ([#2314])
++    - `Snapshot::checkpoint()` now returns `CheckpointResult` instead of `Snapshot`. Access the snapshot via `CheckpointResult::snapshot`.
++12. Remove old non-builder snapshot FFI functions ([#2318])
++    - Removed legacy FFI snapshot functions. Use the new builder-pattern FFI functions instead.
++13. Support version 0 (table creation) commits in UCCommitter ([#2247])
++    - Connectors using `UCCommitter` for table creation must now handle post-commit finalization via the UC create table API.
++14. Pass computed ICT to CommitMetadata instead of wall-clock time ([#2319])
++    - `CommitMetadata` now uses computed in-commit timestamp instead of wall-clock time. Callers relying on wall-clock timing should update accordingly.
++15. Upgrade to arrow-58 and object_store-13, drop arrow-56 support ([#2116])
++    - Minimum supported Arrow version is now arrow-57. Update your `Cargo.toml` if using `arrow-56` feature.
++16. Crc File Histogram Read and Write Support ([#2235])
++    - Adds `AddedHistogram` and `RemovedHistogram` fields to `FileStatsDelta` struct.
++17. Add ScanMetadataCompleted metric event ([#2236])
++    - Adds `ScanMetadataCompleted` variant to `MetricEvent` enum. Update metric reporters to handle the new variant.
++18. Instrument JSON and Parquet handler reads with MetricsReporter ([#2169])
++    - Adds `JsonReadCompleted` and `ParquetReadCompleted` variants to `MetricEvent` enum. Update metric reporters to handle new variants.
++19. New transform helpers for unary and binary children ([#2150])
++    - Removes public `CowExt` trait. Remove any usages of this trait.
++20. New mod transforms for expression and schema transforms ([#2077])
++    - Moves `SchemaTransform` and `ExpressionTransform` to new `transforms` module. Update import paths.
++21. Introduce object_store compat shim ([#2111])
++    - Renames `object_store` dependency to `object_store_12`. Update any direct references.
++22. Consolidate domain metadata reads through Snapshot ([#2065])
++    - Domain metadata reads now go through `Snapshot` methods. Update callers using old free functions.
++23. Don't read or write arrow schema in parquet files ([#2025])
++    - Parquet files no longer include arrow schema metadata. Code relying on this metadata must be updated.
++24. Rename include_stats_columns to include_all_stats_columns ([#1996])
++    - Renames `ScanBuilder::include_stats_columns()` to `ScanBuilder::include_all_stats_columns()`. Update call sites.
++
++### 🚀 Features / new APIs
++
++1. Add SQL -> Kernel predicate parser to benchmark framework ([#2099])
++2. Add observability metrics for scan log replay ([#1866])
++3. Filtered engine data visitor ([#1942])
++4. Trigger benchmarking with comments ([#2089])
++5. Unify data stats and partition values in DataSkippingFilter ([#1948])
++6. Download benchmark workloads from DAT release ([#2163])
++7. Add partitioned variant to DataLayout enum ([#2145])
++8. Expose table_properties in FFI via visit_table_properties ([#2196])
++9. Allow checkpoint stats properties in CREATE TABLE ([#2210])
++10. Add crc file histogram initial struct and methods ([#2212])
++11. BinaryPredicate evaluate expression with ArrowViewType. ([#2052])
++12. Add acceptance workloads testing harness ([#2092])
++13. Enable DeletionVectors table feature in CREATE TABLE ([#2245])
++14. Checksum and checkpoint APIs return updated Snapshot ([#2182])
++15. Adding ScanBuilder FFI functions for Scans ([#2237])
++16. Add CountingReporter and fix metrics forwarding ([#2166])
++17. Instrument JSON and Parquet handler reads with MetricsReporter ([#2169])
++18. Wire CountingReporter into workload benchmarks ([#2171])
++19. Add create many API to engine ([#2070])
++20. Add ScanMetadataCompleted metric event ([#2236])
++21. Allow AppendOnly, ChangeDataFeed, and TypeWidening in CREATE TABLE ([#2279])
++22. Support max timestamp stats for data skipping ([#2249])
++23. Add list with backward checkpoint scan ([#2174])
++24. Add Snapshot::get_timestamp ([#2266])
++25. Make tags  and remove partition values allow null values in map ([#2281])
++26. Support UC credential vending and S3 benchmarks ([#2109])
++27. Add catalogManaged to allowed features in CREATE TABLE ([#2293])
++28. Add catalog-managed table creation utilities ([#2203])
++29. Support version 0 (table creation) commits in UCCommitter ([#2247])
++30. Update snapshot.checkpoint API to return a CheckpointResult ([#2314])
++31. Cached checkpoint output schema ([#2270])
++32. Refactor snapshot FFI to use builder pattern and enable snapshot reuse ([#2255])
++33. Add P&M to CommitMetadata and enforce committer/table type matching ([#2250])
++34. Add UCCommitter validation for catalog-managed tables ([#2254])
++35. Crc File Histogram Read and Write Support ([#2235])
++36. Add FFI function to expose snapshot's timestamp ([#2274])
++37. Add FFI create table DDL functions ([#2296])
++38. Add FFI remove files DML functions ([#2297])
++39. Expose Protocol and Metadata as opaque FFI handle types ([#2260])
++40. Add FFI bindings for domain metadata write operations ([#2327])
++
++### 🐛 Bug Fixes
++
++1. Treat null literal as unknown in meta-predicate evaluation ([#2097])
++2. Update TokioBackgroundExecutor to join thread instead of detaching ([#2126])
++3. Use thread pools and multi-thread tokio executor in read metadata benchmark runner ([#2044])
++4. Emit null stats for all-null columns instead of omitting them ([#2187])
++5. Allow Date/Timestamp casting for stats_parsed compatibility ([#2074])
++6. Filter evaluator input schema ([#2195])
++7. SnapshotCompleted.total_duration now includes log segment loading ([#2183])
++8. Avoid creating empty stats schemas ([#2199])
++9. Prevent dual TLS crypto backends from reqwest default features ([#2178])
++10. Vendor and pin homebrew actions ([#2243])
++11. Validate min_reader/writer_version are at least 1 ([#2202])
++12. Preserve loaded LazyCrc during incremental snapshot updates ([#2211])
++13. Detect stats_parsed in multi-part V1 checkpoints ([#2214])
++14. Downgrade per-batch data skipping log from info to debug ([#2219])
++15. Unknown table features in feature list are "supported" ([#2159])
++16. Remove debug_assert_eq before require in scan evaluator row count checks ([#2262])
++17. Adopt checkpoint written later for same-version snapshot refresh ([#2143])
++18. Return error when parquet handler returns empty data for scan files ([#2261])
++19. Refactor benchmarking workflow to not require criterion compare action ([#2264])
++20. Skip name-based validation for struct columns in expression evaluator ([#2160])
++21. Handle missing leaf columns in nested struct during parquet projection ([#2170])
++22. Pass computed ICT to CommitMetadata instead of wall-clock time ([#2319])
++23. Detect and handle empty (0-byte) log files during listing ([#2336])
++
++### 📚 Documentation
++
++1. Update claude readme to include github actions safety note ([#2190])
++2. Add line width and comment divider style rules to CLAUDE.md ([#2277])
++3. Add documentation for current tags ([#2234])
++4. Document benchmarking in CI accuracy ([#2302])
++
++### ⚡ Performance
++
++1. Pre-size dedup HashSet in ScanLogReplayProcessor ([#2186])
++2. Pre-size HashMap in ArrowEngineData::visit_rows ([#2185])
++3. Remove dead schema conversions in expression evaluators ([#2184])
++
++### 🚜 Refactor
++
++1. Finalized benchmark table names and added new tables ([#2072])
++2. New transform helpers for unary and binary children ([#2150])
++3. Remove legacy row-level partition filter path ([#2158])
++4. Restructured list log files function ([#2173])
++5. Consolidate and add testing for set transaction expiration ([#2176])
++6. Rename uc-catalog and uc-client crates ([#2136])
++7. Better naming style for column mapping related functions/variables ([#2290])
++8. Centralize computation for physical schema without partition columns ([#2142])
++9. Consolidate FFI test setup helpers into ffi_test_utils ([#2307])
++10. *(action_reconciliation)* Combine getter index and field name constants ([#1717]) ([#1774])
++11. Extract shared stat helpers from RowGroupFilter ([#2324])
++12. Extract WriteContext to its own file ([#2349])
++
++### ⚙️ Chores/CI
++
++1. Clean up arrow deps in cargo files ([#2115])
++2. Commit Cargo.lock and enforce --locked in all CI workflows ([#2240])
++3. Harden pr-title-validator a bit ([#2246])
++4. Renable semver ([#2248])
++5. Attempt fixup of semver-label job ([#2253])
++6. Use artifacts for semver label ([#2258])
++7. Remove old non-builder snapshot FFI functions ([#2318])
++8. Remove the catalog-managed feature flag ([#2310])
++9. Upgrade to arrow-58 and object_store-13, drop arrow-56 support ([#2116])
++
++### Other
++
++[#2097]: https://github.com/delta-io/delta-kernel-rs/pull/2097
++[#2099]: https://github.com/delta-io/delta-kernel-rs/pull/2099
++[#2126]: https://github.com/delta-io/delta-kernel-rs/pull/2126
++[#2115]: https://github.com/delta-io/delta-kernel-rs/pull/2115
++[#1866]: https://github.com/delta-io/delta-kernel-rs/pull/1866
++[#2044]: https://github.com/delta-io/delta-kernel-rs/pull/2044
++[#1942]: https://github.com/delta-io/delta-kernel-rs/pull/1942
++[#2072]: https://github.com/delta-io/delta-kernel-rs/pull/2072
++[#2089]: https://github.com/delta-io/delta-kernel-rs/pull/2089
++[#2187]: https://github.com/delta-io/delta-kernel-rs/pull/2187
++[#2190]: https://github.com/delta-io/delta-kernel-rs/pull/2190
++[#1948]: https://github.com/delta-io/delta-kernel-rs/pull/1948
++[#2150]: https://github.com/delta-io/delta-kernel-rs/pull/2150
++[#2074]: https://github.com/delta-io/delta-kernel-rs/pull/2074
++[#2195]: https://github.com/delta-io/delta-kernel-rs/pull/2195
++[#2158]: https://github.com/delta-io/delta-kernel-rs/pull/2158
++[#2186]: https://github.com/delta-io/delta-kernel-rs/pull/2186
++[#2185]: https://github.com/delta-io/delta-kernel-rs/pull/2185
++[#2173]: https://github.com/delta-io/delta-kernel-rs/pull/2173
++[#2163]: https://github.com/delta-io/delta-kernel-rs/pull/2163
++[#2145]: https://github.com/delta-io/delta-kernel-rs/pull/2145
++[#2184]: https://github.com/delta-io/delta-kernel-rs/pull/2184
++[#2183]: https://github.com/delta-io/delta-kernel-rs/pull/2183
++[#2199]: https://github.com/delta-io/delta-kernel-rs/pull/2199
++[#2196]: https://github.com/delta-io/delta-kernel-rs/pull/2196
++[#2210]: https://github.com/delta-io/delta-kernel-rs/pull/2210
++[#2178]: https://github.com/delta-io/delta-kernel-rs/pull/2178
++[#2240]: https://github.com/delta-io/delta-kernel-rs/pull/2240
++[#2243]: https://github.com/delta-io/delta-kernel-rs/pull/2243
++[#2202]: https://github.com/delta-io/delta-kernel-rs/pull/2202
++[#2211]: https://github.com/delta-io/delta-kernel-rs/pull/2211
++[#2214]: https://github.com/delta-io/delta-kernel-rs/pull/2214
++[#2246]: https://github.com/delta-io/delta-kernel-rs/pull/2246
++[#2219]: https://github.com/delta-io/delta-kernel-rs/pull/2219
++[#2212]: https://github.com/delta-io/delta-kernel-rs/pull/2212
++[#2176]: https://github.com/delta-io/delta-kernel-rs/pull/2176
++[#2159]: https://github.com/delta-io/delta-kernel-rs/pull/2159
++[#2248]: https://github.com/delta-io/delta-kernel-rs/pull/2248
++[#2253]: https://github.com/delta-io/delta-kernel-rs/pull/2253
++[#2052]: https://github.com/delta-io/delta-kernel-rs/pull/2052
++[#2092]: https://github.com/delta-io/delta-kernel-rs/pull/2092
++[#2258]: https://github.com/delta-io/delta-kernel-rs/pull/2258
++[#2136]: https://github.com/delta-io/delta-kernel-rs/pull/2136
++[#2245]: https://github.com/delta-io/delta-kernel-rs/pull/2245
++[#2182]: https://github.com/delta-io/delta-kernel-rs/pull/2182
++[#2262]: https://github.com/delta-io/delta-kernel-rs/pull/2262
++[#2237]: https://github.com/delta-io/delta-kernel-rs/pull/2237
++[#2166]: https://github.com/delta-io/delta-kernel-rs/pull/2166
++[#2169]: https://github.com/delta-io/delta-kernel-rs/pull/2169
++[#2171]: https://github.com/delta-io/delta-kernel-rs/pull/2171
++[#2143]: https://github.com/delta-io/delta-kernel-rs/pull/2143
++[#2070]: https://github.com/delta-io/delta-kernel-rs/pull/2070
++[#2261]: https://github.com/delta-io/delta-kernel-rs/pull/2261
++[#2277]: https://github.com/delta-io/delta-kernel-rs/pull/2277
++[#2236]: https://github.com/delta-io/delta-kernel-rs/pull/2236
++[#2279]: https://github.com/delta-io/delta-kernel-rs/pull/2279
++[#2249]: https://github.com/delta-io/delta-kernel-rs/pull/2249
++[#2290]: https://github.com/delta-io/delta-kernel-rs/pull/2290
++[#2174]: https://github.com/delta-io/delta-kernel-rs/pull/2174
++[#2264]: https://github.com/delta-io/delta-kernel-rs/pull/2264
++[#2234]: https://github.com/delta-io/delta-kernel-rs/pull/2234
++[#2302]: https://github.com/delta-io/delta-kernel-rs/pull/2302
++[#2142]: https://github.com/delta-io/delta-kernel-rs/pull/2142
++[#2266]: https://github.com/delta-io/delta-kernel-rs/pull/2266
++[#2281]: https://github.com/delta-io/delta-kernel-rs/pull/2281
++[#2109]: https://github.com/delta-io/delta-kernel-rs/pull/2109
++[#2293]: https://github.com/delta-io/delta-kernel-rs/pull/2293
++[#2203]: https://github.com/delta-io/delta-kernel-rs/pull/2203
++[#2247]: https://github.com/delta-io/delta-kernel-rs/pull/2247
++[#2160]: https://github.com/delta-io/delta-kernel-rs/pull/2160
++[#2314]: https://github.com/delta-io/delta-kernel-rs/pull/2314
++[#2270]: https://github.com/delta-io/delta-kernel-rs/pull/2270
++[#2255]: https://github.com/delta-io/delta-kernel-rs/pull/2255
++[#2250]: https://github.com/delta-io/delta-kernel-rs/pull/2250
++[#2254]: https://github.com/delta-io/delta-kernel-rs/pull/2254
++[#2307]: https://github.com/delta-io/delta-kernel-rs/pull/2307
++[#2170]: https://github.com/delta-io/delta-kernel-rs/pull/2170
++[#2235]: https://github.com/delta-io/delta-kernel-rs/pull/2235
++[#2274]: https://github.com/delta-io/delta-kernel-rs/pull/2274
++[#1774]: https://github.com/delta-io/delta-kernel-rs/pull/1774
++[#2296]: https://github.com/delta-io/delta-kernel-rs/pull/2296
++[#2318]: https://github.com/delta-io/delta-kernel-rs/pull/2318
++[#2310]: https://github.com/delta-io/delta-kernel-rs/pull/2310
++[#2297]: https://github.com/delta-io/delta-kernel-rs/pull/2297
++[#2324]: https://github.com/delta-io/delta-kernel-rs/pull/2324
++[#2260]: https://github.com/delta-io/delta-kernel-rs/pull/2260
++[#2327]: https://github.com/delta-io/delta-kernel-rs/pull/2327
++[#2319]: https://github.com/delta-io/delta-kernel-rs/pull/2319
++[#2116]: https://github.com/delta-io/delta-kernel-rs/pull/2116
++[#2349]: https://github.com/delta-io/delta-kernel-rs/pull/2349
++[#2336]: https://github.com/delta-io/delta-kernel-rs/pull/2336
++[#2077]: https://github.com/delta-io/delta-kernel-rs/pull/2077                                                                                               
++[#2111]: https://github.com/delta-io/delta-kernel-rs/pull/2111                                                                                                 
++[#2065]: https://github.com/delta-io/delta-kernel-rs/pull/2065                                                                                               
++[#2025]: https://github.com/delta-io/delta-kernel-rs/pull/2025                                                                                               
++[#1996]: https://github.com/delta-io/delta-kernel-rs/pull/1996
++[#1717]: https://github.com/delta-io/delta-kernel-rs/pull/1717
++[#1922]: https://github.com/delta-io/delta-kernel-rs/pull/1922
++
+ ## [v0.20.0](https://github.com/delta-io/delta-kernel-rs/tree/v0.20.0/) (2026-02-26)
+ 
+ [Full Changelog](https://github.com/delta-io/delta-kernel-rs/compare/v0.19.2...v0.20.0)
+ 22. Implement schema diffing for flat schemas (2/5]) ([#1478])
+ 23. Add API on Scan to perform 2-phase log replay  ([#1547])
+ 24. Enable distributed log replay serde serialization for serializable scan state ([#1549])
+-25. Add InCommitTimestamp support to ChangeDataFeed ([#1670]) 
++25. Add InCommitTimestamp support to ChangeDataFeed ([#1670])
+ 26. Add include_stats_columns API and output_stats_schema field ([#1728])
+ 27. Add write support for clustered tables behind feature flag ([#1704])
+ 28. Add snapshot load instrumentation ([#1750])
\ No newline at end of file
CLAUDE.md
@@ -0,0 +1,108 @@
+diff --git a/CLAUDE.md b/CLAUDE.md
+--- a/CLAUDE.md
++++ b/CLAUDE.md
+ (`Snapshot`, `Scan`, `Transaction`) and delegates _how_ to the `Engine` trait.
+ 
+ Current capabilities: table reads with predicates, data skipping, deletion vectors, change
+-data feed, checkpoints (V1 & V2), log compaction, blind append writes, table creation
++data feed, checkpoints (V1 & V2), log compaction (disabled, #2337), blind append writes, table creation
+ (including clustered tables), and catalog-managed table support.
+ 
+ ## Build & Test Commands
+ cargo nextest run --workspace --all-features test_name_here
+ 
+ # Format, lint, and doc check (always run after code changes)
+-cargo fmt \
++cargo +nightly fmt \
+   && cargo clippy --workspace --benches --tests --all-features -- -D warnings \
+   && cargo doc --workspace --all-features --no-deps
+ 
+   --exclude delta_kernel --exclude delta_kernel_ffi --exclude delta_kernel_derive --exclude delta_kernel_ffi_macros -- -D warnings
+ 
+ # Quick pre-push check (mimics CI)
+-cargo fmt \
++cargo +nightly fmt \
+   && cargo clippy --workspace --benches --tests --all-features -- -D warnings \
+   && cargo doc --workspace --all-features --no-deps \
+   && cargo nextest run --workspace --all-features
+ 
+ ### Feature Flags
+ 
+-- `default-engine` / `default-engine-rustls` / `default-engine-native-tls` -- async
+-  Arrow/Tokio engine (pick one TLS backend)
++- `default-engine-rustls` / `default-engine-native-tls` -- async Arrow/Tokio engine (pick a TLS backend)
+ - `arrow`, `arrow-XX`, `arrow-YY` -- Arrow version selection (kernel tracks the latest two
+   major Arrow releases; `arrow` defaults to latest). Kernel itself does not depend on Arrow,
+-  but default-engine does.
++  but the default engine does.
+ - `arrow-conversion`, `arrow-expression` -- Arrow interop (auto-enabled by default engine)
+ - `prettyprint` -- enables Arrow pretty-print helpers (primarily test/example oriented)
+-- `catalog-managed` -- catalog-managed table support (experimental)
+ - `clustered-table` -- clustered table write support (experimental)
+ - `internal-api` -- unstable APIs like `parallel_scan_metadata`. Items are marked with the
+   `#[internal_api]` proc macro attribute.
+ `execute()` (simple), `scan_metadata()` (advanced/distributed),
+ `parallel_scan_metadata()` (two-phase distributed log replay).
+ 
+-**Write path:** `Snapshot` -> `Transaction` -> `commit()`. Kernel provides `WriteContext`,
+-assembles commit actions, enforces protocol compliance, delegates atomic commit to a
+-`Committer`.
++**Write path:** `Snapshot` -> `Transaction` -> `commit()`. Kernel provides `WriteContext`
++(via `partitioned_write_context` or `unpartitioned_write_context`), assembles commit
++actions, enforces protocol compliance, delegates atomic commit to a `Committer`.
+ 
+ **Engine trait:** five handlers (`StorageHandler`, `JsonHandler`, `ParquetHandler`,
+ `EvaluationHandler`, optional `MetricsReporter`). `DefaultEngine` lives in
+   or inputs. Prefer `#[case]` over duplicating test functions. When parameters are
+   independent and form a cartesian product, prefer `#[values]` over enumerating
+   every combination with `#[case]`.
++- Actively look for rstest consolidation opportunities: when writing multiple tests
++  that share the same setup/flow and differ only in configuration and expected
++  outcome, write one parameterized rstest instead of separate functions. Also check
++  whether a new test duplicates the flow of an existing nearby test and should be
++  merged into it as a new `#[case]`. A common pattern is toggling a feature (e.g.
++  column mapping on/off) and asserting success vs. error.
+ - Reuse helpers from `test_utils` instead of writing custom ones when possible.
++- **Committing in tests:** Use `txn.commit(engine)?.unwrap_committed()` to assert a
++  successful commit and get the `CommittedTransaction`. Do NOT use `match` + `panic!`
++  for this -- `unwrap_committed()` provides a clear error message on failure. Available
++  under `#[cfg(test)]` and the `test-utils` feature.
++- **Prefer snapshot/public API assertions over reading raw commit JSON.** Only read raw
++  commit JSON when the data is inaccessible via public API (e.g., system domain metadata
++  is blocked by `get_domain_metadata`). For commit JSON reads, use `read_actions_from_commit`
++  from `test_utils` -- do NOT write local helpers that duplicate this.
+ - **`add_commit` and table setup in tests:** `add_commit` takes a `table_root` string and
+   resolves it to an absolute object-store path. The `table_root` must be a proper URL string
+   with a trailing slash (e.g. `"memory:///"`, `"file:///tmp/my_table/"`). Avoid using the
+   `allowColumnDefaults`, `changeDataFeed`, `identityColumns`, `rowTracking`,
+   `domainMetadata`, `icebergCompatV1`, `icebergCompatV2`, `clustering`,
+   `inCommitTimestamp`
+-- Reader + writer: `columnMapping`, `deletionVectors`, `timestampNtz`,
+-  `v2Checkpoint`, `vacuumProtocolCheck`, `variantType`, `variantType-preview`,
+-  `typeWidening`
++- Reader + writer: `catalogManaged`, `catalogOwned-preview`, `columnMapping`,
++  `deletionVectors`, `timestampNtz`, `v2Checkpoint`, `vacuumProtocolCheck`,
++  `variantType`, `variantType-preview`, `typeWidening`
+ 
+ Keep this list updated when new protocol features are added to kernel.
+ 
+ - Code comments state intent and explain "why" -- don't restate what the code self-documents.
+ - Place `use` imports at the top of the file (for non-test code) or at the top of the
+   `mod tests` block (for test code) -- never inside function bodies.
++- Prefer `==` over `matches!` for simple single-variant enum comparisons. `matches!` is
++  for patterns with bindings or guards. For example: `self == Variant` not
++  `matches!(self, Variant)`.
++- Prefer `StructField::nullable` / `StructField::not_null` over
++  `StructField::new(name, type, bool)` when nullability is known at compile time.
++  Reserve `StructField::new` for cases where nullability is a runtime value.
+ - NEVER panic in production code -- use errors instead. Panicking
+   (including `unwrap()`, `expect()`, `panic!()`, `unreachable!()`, etc) is acceptable in test code only.
+ 
+ a newer (potentially compromised) transitive dependency. If `Cargo.lock` is out of sync with
+ `Cargo.toml`, the build fails immediately, forcing dependency changes to be explicit and
+ reviewable. See the top-level comment in `build.yml` for full rationale. Commands exempt from
+-`--locked`: `cargo fmt` (no dep resolution), `cargo msrv verify/show` (wrapper tool),
++`--locked`: `cargo +nightly fmt` (no dep resolution), `cargo msrv verify/show` (wrapper tool),
+ `cargo miri setup` (tooling setup).
+ 
+ Ensure that when writing any github action you are considering safety including thinking of
\ No newline at end of file
CLAUDE/architecture.md
@@ -0,0 +1,49 @@
+diff --git a/CLAUDE/architecture.md b/CLAUDE/architecture.md
+--- a/CLAUDE/architecture.md
++++ b/CLAUDE/architecture.md
+ 
+ Built via `Snapshot::builder_for(url).build(engine)` (latest version) or
+ `.at_version(v).build(engine)` (specific version). For catalog-managed tables,
+-`.with_log_tail(commits)` supplies recent unpublished commits from the catalog.
++`.with_log_tail(commits)` supplies recent unpublished commits from the catalog and
++`.with_max_catalog_version(v)` caps the snapshot at the latest catalog-ratified version.
+ 
+ **Snapshot loading internals:**
+ 1. **LogSegment** (`kernel/src/log_segment/`) -- discovers commits + checkpoints for the
+ 
+ `Snapshot` -> `Transaction` -> commit
+ 
+-The kernel coordinates the write transaction: it provides the write context (target directory,
+-physical schema, stats columns), assembles commit actions (CommitInfo, Add files), enforces
+-protocol compliance (table features, schema validation), and delegates the atomic commit to a
+-`Committer`.
++The kernel coordinates the write transaction: it provides the write context (validated partition
++values, recommended write directory, physical schema, stats columns), assembles commit
++actions (CommitInfo, Add files), enforces protocol compliance (table features, schema validation),
++and delegates the atomic commit to a `Committer`.
+ 
+ **Steps:**
+ 1. Create `Transaction` from a snapshot with a `Committer` (e.g. `FileSystemCommitter`)
+-2. Get `WriteContext` for target dir, physical schema, and stats columns
++2. Get `WriteContext` via `partitioned_write_context(values)` or `unpartitioned_write_context()`
+ 3. Write Parquet files (via engine), collect file metadata
+ 4. Register files via `txn.add_files(metadata)`
+ 5. Commit: returns `CommittedTransaction`, `ConflictedTransaction`, or `RetryableTransaction`
+ - `kernel/src/snapshot/` -- `Snapshot`, `SnapshotBuilder`, entry point for reads/writes
+ - `kernel/src/scan/` -- `Scan`, `ScanBuilder`, log replay, data skipping
+ - `kernel/src/transaction/` -- `Transaction`, `WriteContext`, `create_table` builder
++- `kernel/src/partition/` -- partition value validation, serialization, Hive-style path
++   encoding, URI encoding for `add.path`
+ - `kernel/src/committer/` -- `Committer` trait, `FileSystemCommitter`
+ - `kernel/src/log_segment/` -- log file discovery, Protocol/Metadata replay
+ - `kernel/src/log_replay.rs` -- file-action deduplication, `LogReplayProcessor` trait
+ 
+ Tables whose commits go through a catalog (e.g. Unity Catalog) instead of direct filesystem
+ writes. Kernel doesn't know about catalogs -- the catalog client provides a log tail via
+-`SnapshotBuilder::with_log_tail()` and a custom `Committer` for staging/ratifying/publishing
+-commits. Requires `catalog-managed` feature flag.
++`SnapshotBuilder::with_log_tail()`, caps the version via `with_max_catalog_version()`, and
++uses a custom `Committer` for staging/ratifying/publishing commits.
+ 
+ The `UCCommitter` (in the `delta-kernel-unity-catalog` crate) is the reference implementation of a catalog
+ committer for Unity Catalog. It stages commits to `_staged_commits/`, calls the UC commit API to
\ No newline at end of file
CONTRIBUTING.md
@@ -0,0 +1,19 @@
+diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
+--- a/CONTRIBUTING.md
++++ b/CONTRIBUTING.md
+    # build docs
+    cargo doc --workspace --all-features
+    # highly recommend editor that automatically formats, but in case you need to:
+-   cargo fmt
++   cargo +nightly fmt
+ 
+    # run more tests
+    cargo test --workspace --all-features -- --skip read_table_version_hdfs
+ #### General Tips
+ 
+ 1. When making your first PR, please read our contributor guidelines: https://github.com/delta-incubator/delta-kernel-rs/blob/main/CONTRIBUTING.md
+-2. Run `cargo t --all-features --all-targets` to get started testing, and run `cargo fmt`.
++2. Run `cargo t --all-features --all-targets` to get started testing, and run `cargo +nightly fmt`.
+ 3. Ensure you have added or run the appropriate tests for your PR.
+ 4. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP] Your PR title ...'.
+ 5. Be sure to keep the PR description updated to reflect all changes.
\ No newline at end of file

... (truncated, output exceeded 60000 bytes)

Reproduce locally: git range-diff ac9dc19..b369f06 7866824..dc5d29e | Disable: git config gitstack.push-range-diff false

@github-actions github-actions Bot added the breaking-change Public API change that could cause downstream compilation failures. Requires a major version bump. label Apr 24, 2026
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 24, 2026

Codecov Report

❌ Patch coverage is 87.40458% with 33 lines in your changes missing coverage. Please review.
✅ Project coverage is 88.50%. Comparing base (6486bd2) to head (12c1bf6).
⚠️ Report is 10 commits behind head on main.

Files with missing lines Patch % Lines
kernel/src/schema/mod.rs 89.75% 10 Missing and 7 partials ⚠️
kernel/src/transaction/stats_verifier.rs 0.00% 6 Missing ⚠️
kernel/src/expressions/scalars.rs 0.00% 3 Missing ⚠️
kernel/src/engine/parquet_row_group_skipping.rs 0.00% 2 Missing ⚠️
ffi/src/expressions/kernel_visitor.rs 0.00% 1 Missing ⚠️
ffi/src/schema.rs 0.00% 1 Missing ⚠️
kernel/src/engine/arrow_conversion/mod.rs 0.00% 1 Missing ⚠️
kernel/src/engine/arrow_expression/mod.rs 0.00% 1 Missing ⚠️
test-utils/src/table_builder.rs 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2464      +/-   ##
==========================================
+ Coverage   88.40%   88.50%   +0.09%     
==========================================
  Files         178      179       +1     
  Lines       58235    59313    +1078     
  Branches    58235    59313    +1078     
==========================================
+ Hits        51485    52495    +1010     
- Misses       4783     4800      +17     
- Partials     1967     2018      +51     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@lorenarosati
Copy link
Copy Markdown
Collaborator Author

Range-diff: main (dc5d29e -> b8a88e5)
.github/workflows/build.yml
@@ -0,0 +1,75 @@
+diff --git a/.github/workflows/build.yml b/.github/workflows/build.yml
+--- a/.github/workflows/build.yml
++++ b/.github/workflows/build.yml
+ # enforce the committed Cargo.lock. This prevents CI from silently resolving a newer
+ # (potentially compromised) dependency version. If Cargo.lock is out of sync with
+ # Cargo.toml, the build fails immediately. Any dependency change must be an explicit,
+-# reviewable update to Cargo.lock in the PR. Commands that skip --locked: cargo fmt
++# reviewable update to Cargo.lock in the PR. Commands that skip --locked: cargo +nightly fmt
+ # (no dep resolution), cargo msrv verify/show (wrapper tool), cargo miri setup (tooling).
+ #
+ # Swatinem/rust-cache caches the cargo registry and target directory (~450MB per job).
+     runs-on: ubuntu-latest
+     steps:
+       - uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4.3.1
+-      - name: Install minimal stable with rustfmt
++      - name: Install nightly with rustfmt
+         uses: actions-rust-lang/setup-rust-toolchain@150fca883cd4034361b621bd4e6a9d34e5143606 # v1.15.4
+         with:
+           cache: false
++          toolchain: nightly
+           components: rustfmt
+       - name: format
+-        run: cargo fmt -- --check
++        run: cargo +nightly fmt -- --check
+ 
+   msrv:
+     runs-on: ubuntu-latest
+           pushd kernel
+           echo "Testing with $(cargo msrv show --output-format minimal)"
+           cargo +$(cargo msrv show --output-format minimal) nextest run --locked
++          cargo +$(cargo msrv show --output-format minimal) test --doc
+   docs:
+     runs-on: ubuntu-latest
+     env:
+           cmake ..
+           make
+           make test
++      - name: build and run create-table test
++        run: |
++          pushd ffi/examples/create-table
++          mkdir build
++          pushd build
++          cmake ..
++          make
++          make test
++      # NOTE: write-table's ctest seeds its target table by invoking the create-table
++      # binary, so create-table must be built first (its build/ dir is preserved by the
++      # preceding step and write-table's CMakeLists references it via a relative path).
++      - name: build and run write-table test
++        run: |
++          pushd ffi/examples/write-table
++          mkdir build
++          pushd build
++          cmake ..
++          make
++          make test
++      - name: build and run read-table-changes test
++        run: |
++          pushd ffi/examples/read-table-changes
++          mkdir build
++          pushd build
++          cmake ..
++          make
++          make test
+   miri:
+     name: "Miri (shard ${{ matrix.partition }}/3)"
+     runs-on: ubuntu-latest
+       - name: Install cargo-llvm-cov
+         uses: taiki-e/install-action@2d15d02e710b40b6332201aba6af30d595b5cd96 # cargo-llvm-cov
+       - name: Generate code coverage
+-        run: cargo llvm-cov --locked --all-features --workspace --codecov --output-path codecov.json -- --skip read_table_version_hdfs
++        run: cargo llvm-cov --locked --all-features --workspace --codecov --output-path codecov.json -- --skip read_table_version_hdfs --skip handle::tests::invalid_handle_code
+       - name: Upload coverage to Codecov
+         uses: codecov/codecov-action@1af58845a975a7985b0beb0cbe6fbbb71a41dbad # v5.5.3
+         with:
\ No newline at end of file
.github/workflows/pr-body-validator.yml
@@ -0,0 +1,27 @@
+diff --git a/.github/workflows/pr-body-validator.yml b/.github/workflows/pr-body-validator.yml
+new file mode 100644
+--- /dev/null
++++ b/.github/workflows/pr-body-validator.yml
++name: Validate PR Body
++
++on:
++  pull_request:
++    types: [opened, edited]
++  merge_group:
++
++jobs:
++  validate-body:
++    runs-on: ubuntu-latest
++    steps:
++      - name: Validate PR Body
++        shell: bash
++        env:
++          PR_BODY: ${{ github.event.pull_request.body }}
++        run: |
++          if LC_ALL=C grep -q '[^[:print:][:space:]]' <<< "$PR_BODY"; then
++            echo "PR body contains non-ascii characters. Please remove them."
++            exit 1
++          else
++            echo "PR body contains ascii characters only"
++          fi
++
\ No newline at end of file
CHANGELOG.md
@@ -0,0 +1,282 @@
+diff --git a/CHANGELOG.md b/CHANGELOG.md
+--- a/CHANGELOG.md
++++ b/CHANGELOG.md
+ # Changelog
+ 
++## [v0.21.0](https://github.com/delta-io/delta-kernel-rs/tree/v0.21.0/) (2026-04-10)
++
++[Full Changelog](https://github.com/delta-io/delta-kernel-rs/compare/v0.20.0...v0.21.0)
++
++
++### 🏗️ Breaking changes
++
++1. Add partitioned variant to DataLayout enum ([#2145])
++   - Adds `Partitioned` variant to `DataLayout` enum. Update match statements to handle the new variant.
++2. Add create many API to engine ([#2070])
++   - Adds `create_many` method to `ParquetHandler` trait. Implementors must add this method. See the trait rustdocs for details.
++3. Rename uc-catalog and uc-client crates ([#2136])
++   - `delta-kernel-uc-catalog` renamed to `delta-kernel-unity-catalog`. `delta-kernel-uc-client` renamed to `unity-catalog-delta-rest-client`. Update `Cargo.toml` dependencies accordingly.
++4. Checksum and checkpoint APIs return updated Snapshot ([#2182])
++   - `Snapshot::checkpoint()` and checksum APIs now return the updated `Snapshot`. Callers must handle the returned value.
++5. Add P&M to CommitMetadata and enforce committer/table type matching ([#2250])
++   - Enforces that committer type matches table type (catalog-managed vs path-based). Use appropriate committer for your table type.
++6. Add UCCommitter validation for catalog-managed tables ([#2254])
++   - `UCCommitter` now rejects commits to non-catalog-managed tables. Use `FileSystemCommitter` for path-based tables.
++7. Refactor snapshot FFI to use builder pattern and enable snapshot reuse ([#2255])
++   - FFI snapshot creation now uses builder pattern. Update FFI callers to use the new builder APIs.
++8. Make tags and remove partition values allow null values in map ([#2281])
++   - `tags` and `partitionValues` map values are now nullable. Update code that assumes non-null values.
++9. Better naming style for column mapping related functions/variables ([#2290])
++   - Renamed: `make_physical` to `to_physical_name`, `make_physical_struct` to `to_physical_schema`, `transform_struct_for_projection` to `projection_transform`. Update call sites.
++10. Remove the catalog-managed feature flag ([#2310])
++    - The `catalog-managed` feature flag is removed. Catalog-managed table support is now always available.
++11. Update snapshot.checkpoint API to return a CheckpointResult ([#2314])
++    - `Snapshot::checkpoint()` now returns `CheckpointResult` instead of `Snapshot`. Access the snapshot via `CheckpointResult::snapshot`.
++12. Remove old non-builder snapshot FFI functions ([#2318])
++    - Removed legacy FFI snapshot functions. Use the new builder-pattern FFI functions instead.
++13. Support version 0 (table creation) commits in UCCommitter ([#2247])
++    - Connectors using `UCCommitter` for table creation must now handle post-commit finalization via the UC create table API.
++14. Pass computed ICT to CommitMetadata instead of wall-clock time ([#2319])
++    - `CommitMetadata` now uses computed in-commit timestamp instead of wall-clock time. Callers relying on wall-clock timing should update accordingly.
++15. Upgrade to arrow-58 and object_store-13, drop arrow-56 support ([#2116])
++    - Minimum supported Arrow version is now arrow-57. Update your `Cargo.toml` if using `arrow-56` feature.
++16. Crc File Histogram Read and Write Support ([#2235])
++    - Adds `AddedHistogram` and `RemovedHistogram` fields to `FileStatsDelta` struct.
++17. Add ScanMetadataCompleted metric event ([#2236])
++    - Adds `ScanMetadataCompleted` variant to `MetricEvent` enum. Update metric reporters to handle the new variant.
++18. Instrument JSON and Parquet handler reads with MetricsReporter ([#2169])
++    - Adds `JsonReadCompleted` and `ParquetReadCompleted` variants to `MetricEvent` enum. Update metric reporters to handle new variants.
++19. New transform helpers for unary and binary children ([#2150])
++    - Removes public `CowExt` trait. Remove any usages of this trait.
++20. New mod transforms for expression and schema transforms ([#2077])
++    - Moves `SchemaTransform` and `ExpressionTransform` to new `transforms` module. Update import paths.
++21. Introduce object_store compat shim ([#2111])
++    - Renames `object_store` dependency to `object_store_12`. Update any direct references.
++22. Consolidate domain metadata reads through Snapshot ([#2065])
++    - Domain metadata reads now go through `Snapshot` methods. Update callers using old free functions.
++23. Don't read or write arrow schema in parquet files ([#2025])
++    - Parquet files no longer include arrow schema metadata. Code relying on this metadata must be updated.
++24. Rename include_stats_columns to include_all_stats_columns ([#1996])
++    - Renames `ScanBuilder::include_stats_columns()` to `ScanBuilder::include_all_stats_columns()`. Update call sites.
++
++### 🚀 Features / new APIs
++
++1. Add SQL -> Kernel predicate parser to benchmark framework ([#2099])
++2. Add observability metrics for scan log replay ([#1866])
++3. Filtered engine data visitor ([#1942])
++4. Trigger benchmarking with comments ([#2089])
++5. Unify data stats and partition values in DataSkippingFilter ([#1948])
++6. Download benchmark workloads from DAT release ([#2163])
++7. Add partitioned variant to DataLayout enum ([#2145])
++8. Expose table_properties in FFI via visit_table_properties ([#2196])
++9. Allow checkpoint stats properties in CREATE TABLE ([#2210])
++10. Add crc file histogram initial struct and methods ([#2212])
++11. BinaryPredicate evaluate expression with ArrowViewType. ([#2052])
++12. Add acceptance workloads testing harness ([#2092])
++13. Enable DeletionVectors table feature in CREATE TABLE ([#2245])
++14. Checksum and checkpoint APIs return updated Snapshot ([#2182])
++15. Adding ScanBuilder FFI functions for Scans ([#2237])
++16. Add CountingReporter and fix metrics forwarding ([#2166])
++17. Instrument JSON and Parquet handler reads with MetricsReporter ([#2169])
++18. Wire CountingReporter into workload benchmarks ([#2171])
++19. Add create many API to engine ([#2070])
++20. Add ScanMetadataCompleted metric event ([#2236])
++21. Allow AppendOnly, ChangeDataFeed, and TypeWidening in CREATE TABLE ([#2279])
++22. Support max timestamp stats for data skipping ([#2249])
++23. Add list with backward checkpoint scan ([#2174])
++24. Add Snapshot::get_timestamp ([#2266])
++25. Make tags  and remove partition values allow null values in map ([#2281])
++26. Support UC credential vending and S3 benchmarks ([#2109])
++27. Add catalogManaged to allowed features in CREATE TABLE ([#2293])
++28. Add catalog-managed table creation utilities ([#2203])
++29. Support version 0 (table creation) commits in UCCommitter ([#2247])
++30. Update snapshot.checkpoint API to return a CheckpointResult ([#2314])
++31. Cached checkpoint output schema ([#2270])
++32. Refactor snapshot FFI to use builder pattern and enable snapshot reuse ([#2255])
++33. Add P&M to CommitMetadata and enforce committer/table type matching ([#2250])
++34. Add UCCommitter validation for catalog-managed tables ([#2254])
++35. Crc File Histogram Read and Write Support ([#2235])
++36. Add FFI function to expose snapshot's timestamp ([#2274])
++37. Add FFI create table DDL functions ([#2296])
++38. Add FFI remove files DML functions ([#2297])
++39. Expose Protocol and Metadata as opaque FFI handle types ([#2260])
++40. Add FFI bindings for domain metadata write operations ([#2327])
++
++### 🐛 Bug Fixes
++
++1. Treat null literal as unknown in meta-predicate evaluation ([#2097])
++2. Update TokioBackgroundExecutor to join thread instead of detaching ([#2126])
++3. Use thread pools and multi-thread tokio executor in read metadata benchmark runner ([#2044])
++4. Emit null stats for all-null columns instead of omitting them ([#2187])
++5. Allow Date/Timestamp casting for stats_parsed compatibility ([#2074])
++6. Filter evaluator input schema ([#2195])
++7. SnapshotCompleted.total_duration now includes log segment loading ([#2183])
++8. Avoid creating empty stats schemas ([#2199])
++9. Prevent dual TLS crypto backends from reqwest default features ([#2178])
++10. Vendor and pin homebrew actions ([#2243])
++11. Validate min_reader/writer_version are at least 1 ([#2202])
++12. Preserve loaded LazyCrc during incremental snapshot updates ([#2211])
++13. Detect stats_parsed in multi-part V1 checkpoints ([#2214])
++14. Downgrade per-batch data skipping log from info to debug ([#2219])
++15. Unknown table features in feature list are "supported" ([#2159])
++16. Remove debug_assert_eq before require in scan evaluator row count checks ([#2262])
++17. Adopt checkpoint written later for same-version snapshot refresh ([#2143])
++18. Return error when parquet handler returns empty data for scan files ([#2261])
++19. Refactor benchmarking workflow to not require criterion compare action ([#2264])
++20. Skip name-based validation for struct columns in expression evaluator ([#2160])
++21. Handle missing leaf columns in nested struct during parquet projection ([#2170])
++22. Pass computed ICT to CommitMetadata instead of wall-clock time ([#2319])
++23. Detect and handle empty (0-byte) log files during listing ([#2336])
++
++### 📚 Documentation
++
++1. Update claude readme to include github actions safety note ([#2190])
++2. Add line width and comment divider style rules to CLAUDE.md ([#2277])
++3. Add documentation for current tags ([#2234])
++4. Document benchmarking in CI accuracy ([#2302])
++
++### ⚡ Performance
++
++1. Pre-size dedup HashSet in ScanLogReplayProcessor ([#2186])
++2. Pre-size HashMap in ArrowEngineData::visit_rows ([#2185])
++3. Remove dead schema conversions in expression evaluators ([#2184])
++
++### 🚜 Refactor
++
++1. Finalized benchmark table names and added new tables ([#2072])
++2. New transform helpers for unary and binary children ([#2150])
++3. Remove legacy row-level partition filter path ([#2158])
++4. Restructured list log files function ([#2173])
++5. Consolidate and add testing for set transaction expiration ([#2176])
++6. Rename uc-catalog and uc-client crates ([#2136])
++7. Better naming style for column mapping related functions/variables ([#2290])
++8. Centralize computation for physical schema without partition columns ([#2142])
++9. Consolidate FFI test setup helpers into ffi_test_utils ([#2307])
++10. *(action_reconciliation)* Combine getter index and field name constants ([#1717]) ([#1774])
++11. Extract shared stat helpers from RowGroupFilter ([#2324])
++12. Extract WriteContext to its own file ([#2349])
++
++### ⚙️ Chores/CI
++
++1. Clean up arrow deps in cargo files ([#2115])
++2. Commit Cargo.lock and enforce --locked in all CI workflows ([#2240])
++3. Harden pr-title-validator a bit ([#2246])
++4. Renable semver ([#2248])
++5. Attempt fixup of semver-label job ([#2253])
++6. Use artifacts for semver label ([#2258])
++7. Remove old non-builder snapshot FFI functions ([#2318])
++8. Remove the catalog-managed feature flag ([#2310])
++9. Upgrade to arrow-58 and object_store-13, drop arrow-56 support ([#2116])
++
++### Other
++
++[#2097]: https://github.com/delta-io/delta-kernel-rs/pull/2097
++[#2099]: https://github.com/delta-io/delta-kernel-rs/pull/2099
++[#2126]: https://github.com/delta-io/delta-kernel-rs/pull/2126
++[#2115]: https://github.com/delta-io/delta-kernel-rs/pull/2115
++[#1866]: https://github.com/delta-io/delta-kernel-rs/pull/1866
++[#2044]: https://github.com/delta-io/delta-kernel-rs/pull/2044
++[#1942]: https://github.com/delta-io/delta-kernel-rs/pull/1942
++[#2072]: https://github.com/delta-io/delta-kernel-rs/pull/2072
++[#2089]: https://github.com/delta-io/delta-kernel-rs/pull/2089
++[#2187]: https://github.com/delta-io/delta-kernel-rs/pull/2187
++[#2190]: https://github.com/delta-io/delta-kernel-rs/pull/2190
++[#1948]: https://github.com/delta-io/delta-kernel-rs/pull/1948
++[#2150]: https://github.com/delta-io/delta-kernel-rs/pull/2150
++[#2074]: https://github.com/delta-io/delta-kernel-rs/pull/2074
++[#2195]: https://github.com/delta-io/delta-kernel-rs/pull/2195
++[#2158]: https://github.com/delta-io/delta-kernel-rs/pull/2158
++[#2186]: https://github.com/delta-io/delta-kernel-rs/pull/2186
++[#2185]: https://github.com/delta-io/delta-kernel-rs/pull/2185
++[#2173]: https://github.com/delta-io/delta-kernel-rs/pull/2173
++[#2163]: https://github.com/delta-io/delta-kernel-rs/pull/2163
++[#2145]: https://github.com/delta-io/delta-kernel-rs/pull/2145
++[#2184]: https://github.com/delta-io/delta-kernel-rs/pull/2184
++[#2183]: https://github.com/delta-io/delta-kernel-rs/pull/2183
++[#2199]: https://github.com/delta-io/delta-kernel-rs/pull/2199
++[#2196]: https://github.com/delta-io/delta-kernel-rs/pull/2196
++[#2210]: https://github.com/delta-io/delta-kernel-rs/pull/2210
++[#2178]: https://github.com/delta-io/delta-kernel-rs/pull/2178
++[#2240]: https://github.com/delta-io/delta-kernel-rs/pull/2240
++[#2243]: https://github.com/delta-io/delta-kernel-rs/pull/2243
++[#2202]: https://github.com/delta-io/delta-kernel-rs/pull/2202
++[#2211]: https://github.com/delta-io/delta-kernel-rs/pull/2211
++[#2214]: https://github.com/delta-io/delta-kernel-rs/pull/2214
++[#2246]: https://github.com/delta-io/delta-kernel-rs/pull/2246
++[#2219]: https://github.com/delta-io/delta-kernel-rs/pull/2219
++[#2212]: https://github.com/delta-io/delta-kernel-rs/pull/2212
++[#2176]: https://github.com/delta-io/delta-kernel-rs/pull/2176
++[#2159]: https://github.com/delta-io/delta-kernel-rs/pull/2159
++[#2248]: https://github.com/delta-io/delta-kernel-rs/pull/2248
++[#2253]: https://github.com/delta-io/delta-kernel-rs/pull/2253
++[#2052]: https://github.com/delta-io/delta-kernel-rs/pull/2052
++[#2092]: https://github.com/delta-io/delta-kernel-rs/pull/2092
++[#2258]: https://github.com/delta-io/delta-kernel-rs/pull/2258
++[#2136]: https://github.com/delta-io/delta-kernel-rs/pull/2136
++[#2245]: https://github.com/delta-io/delta-kernel-rs/pull/2245
++[#2182]: https://github.com/delta-io/delta-kernel-rs/pull/2182
++[#2262]: https://github.com/delta-io/delta-kernel-rs/pull/2262
++[#2237]: https://github.com/delta-io/delta-kernel-rs/pull/2237
++[#2166]: https://github.com/delta-io/delta-kernel-rs/pull/2166
++[#2169]: https://github.com/delta-io/delta-kernel-rs/pull/2169
++[#2171]: https://github.com/delta-io/delta-kernel-rs/pull/2171
++[#2143]: https://github.com/delta-io/delta-kernel-rs/pull/2143
++[#2070]: https://github.com/delta-io/delta-kernel-rs/pull/2070
++[#2261]: https://github.com/delta-io/delta-kernel-rs/pull/2261
++[#2277]: https://github.com/delta-io/delta-kernel-rs/pull/2277
++[#2236]: https://github.com/delta-io/delta-kernel-rs/pull/2236
++[#2279]: https://github.com/delta-io/delta-kernel-rs/pull/2279
++[#2249]: https://github.com/delta-io/delta-kernel-rs/pull/2249
++[#2290]: https://github.com/delta-io/delta-kernel-rs/pull/2290
++[#2174]: https://github.com/delta-io/delta-kernel-rs/pull/2174
++[#2264]: https://github.com/delta-io/delta-kernel-rs/pull/2264
++[#2234]: https://github.com/delta-io/delta-kernel-rs/pull/2234
++[#2302]: https://github.com/delta-io/delta-kernel-rs/pull/2302
++[#2142]: https://github.com/delta-io/delta-kernel-rs/pull/2142
++[#2266]: https://github.com/delta-io/delta-kernel-rs/pull/2266
++[#2281]: https://github.com/delta-io/delta-kernel-rs/pull/2281
++[#2109]: https://github.com/delta-io/delta-kernel-rs/pull/2109
++[#2293]: https://github.com/delta-io/delta-kernel-rs/pull/2293
++[#2203]: https://github.com/delta-io/delta-kernel-rs/pull/2203
++[#2247]: https://github.com/delta-io/delta-kernel-rs/pull/2247
++[#2160]: https://github.com/delta-io/delta-kernel-rs/pull/2160
++[#2314]: https://github.com/delta-io/delta-kernel-rs/pull/2314
++[#2270]: https://github.com/delta-io/delta-kernel-rs/pull/2270
++[#2255]: https://github.com/delta-io/delta-kernel-rs/pull/2255
++[#2250]: https://github.com/delta-io/delta-kernel-rs/pull/2250
++[#2254]: https://github.com/delta-io/delta-kernel-rs/pull/2254
++[#2307]: https://github.com/delta-io/delta-kernel-rs/pull/2307
++[#2170]: https://github.com/delta-io/delta-kernel-rs/pull/2170
++[#2235]: https://github.com/delta-io/delta-kernel-rs/pull/2235
++[#2274]: https://github.com/delta-io/delta-kernel-rs/pull/2274
++[#1774]: https://github.com/delta-io/delta-kernel-rs/pull/1774
++[#2296]: https://github.com/delta-io/delta-kernel-rs/pull/2296
++[#2318]: https://github.com/delta-io/delta-kernel-rs/pull/2318
++[#2310]: https://github.com/delta-io/delta-kernel-rs/pull/2310
++[#2297]: https://github.com/delta-io/delta-kernel-rs/pull/2297
++[#2324]: https://github.com/delta-io/delta-kernel-rs/pull/2324
++[#2260]: https://github.com/delta-io/delta-kernel-rs/pull/2260
++[#2327]: https://github.com/delta-io/delta-kernel-rs/pull/2327
++[#2319]: https://github.com/delta-io/delta-kernel-rs/pull/2319
++[#2116]: https://github.com/delta-io/delta-kernel-rs/pull/2116
++[#2349]: https://github.com/delta-io/delta-kernel-rs/pull/2349
++[#2336]: https://github.com/delta-io/delta-kernel-rs/pull/2336
++[#2077]: https://github.com/delta-io/delta-kernel-rs/pull/2077                                                                                               
++[#2111]: https://github.com/delta-io/delta-kernel-rs/pull/2111                                                                                                 
++[#2065]: https://github.com/delta-io/delta-kernel-rs/pull/2065                                                                                               
++[#2025]: https://github.com/delta-io/delta-kernel-rs/pull/2025                                                                                               
++[#1996]: https://github.com/delta-io/delta-kernel-rs/pull/1996
++[#1717]: https://github.com/delta-io/delta-kernel-rs/pull/1717
++[#1922]: https://github.com/delta-io/delta-kernel-rs/pull/1922
++
+ ## [v0.20.0](https://github.com/delta-io/delta-kernel-rs/tree/v0.20.0/) (2026-02-26)
+ 
+ [Full Changelog](https://github.com/delta-io/delta-kernel-rs/compare/v0.19.2...v0.20.0)
+ 22. Implement schema diffing for flat schemas (2/5]) ([#1478])
+ 23. Add API on Scan to perform 2-phase log replay  ([#1547])
+ 24. Enable distributed log replay serde serialization for serializable scan state ([#1549])
+-25. Add InCommitTimestamp support to ChangeDataFeed ([#1670]) 
++25. Add InCommitTimestamp support to ChangeDataFeed ([#1670])
+ 26. Add include_stats_columns API and output_stats_schema field ([#1728])
+ 27. Add write support for clustered tables behind feature flag ([#1704])
+ 28. Add snapshot load instrumentation ([#1750])
\ No newline at end of file
CLAUDE.md
@@ -0,0 +1,108 @@
+diff --git a/CLAUDE.md b/CLAUDE.md
+--- a/CLAUDE.md
++++ b/CLAUDE.md
+ (`Snapshot`, `Scan`, `Transaction`) and delegates _how_ to the `Engine` trait.
+ 
+ Current capabilities: table reads with predicates, data skipping, deletion vectors, change
+-data feed, checkpoints (V1 & V2), log compaction, blind append writes, table creation
++data feed, checkpoints (V1 & V2), log compaction (disabled, #2337), blind append writes, table creation
+ (including clustered tables), and catalog-managed table support.
+ 
+ ## Build & Test Commands
+ cargo nextest run --workspace --all-features test_name_here
+ 
+ # Format, lint, and doc check (always run after code changes)
+-cargo fmt \
++cargo +nightly fmt \
+   && cargo clippy --workspace --benches --tests --all-features -- -D warnings \
+   && cargo doc --workspace --all-features --no-deps
+ 
+   --exclude delta_kernel --exclude delta_kernel_ffi --exclude delta_kernel_derive --exclude delta_kernel_ffi_macros -- -D warnings
+ 
+ # Quick pre-push check (mimics CI)
+-cargo fmt \
++cargo +nightly fmt \
+   && cargo clippy --workspace --benches --tests --all-features -- -D warnings \
+   && cargo doc --workspace --all-features --no-deps \
+   && cargo nextest run --workspace --all-features
+ 
+ ### Feature Flags
+ 
+-- `default-engine` / `default-engine-rustls` / `default-engine-native-tls` -- async
+-  Arrow/Tokio engine (pick one TLS backend)
++- `default-engine-rustls` / `default-engine-native-tls` -- async Arrow/Tokio engine (pick a TLS backend)
+ - `arrow`, `arrow-XX`, `arrow-YY` -- Arrow version selection (kernel tracks the latest two
+   major Arrow releases; `arrow` defaults to latest). Kernel itself does not depend on Arrow,
+-  but default-engine does.
++  but the default engine does.
+ - `arrow-conversion`, `arrow-expression` -- Arrow interop (auto-enabled by default engine)
+ - `prettyprint` -- enables Arrow pretty-print helpers (primarily test/example oriented)
+-- `catalog-managed` -- catalog-managed table support (experimental)
+ - `clustered-table` -- clustered table write support (experimental)
+ - `internal-api` -- unstable APIs like `parallel_scan_metadata`. Items are marked with the
+   `#[internal_api]` proc macro attribute.
+ `execute()` (simple), `scan_metadata()` (advanced/distributed),
+ `parallel_scan_metadata()` (two-phase distributed log replay).
+ 
+-**Write path:** `Snapshot` -> `Transaction` -> `commit()`. Kernel provides `WriteContext`,
+-assembles commit actions, enforces protocol compliance, delegates atomic commit to a
+-`Committer`.
++**Write path:** `Snapshot` -> `Transaction` -> `commit()`. Kernel provides `WriteContext`
++(via `partitioned_write_context` or `unpartitioned_write_context`), assembles commit
++actions, enforces protocol compliance, delegates atomic commit to a `Committer`.
+ 
+ **Engine trait:** five handlers (`StorageHandler`, `JsonHandler`, `ParquetHandler`,
+ `EvaluationHandler`, optional `MetricsReporter`). `DefaultEngine` lives in
+   or inputs. Prefer `#[case]` over duplicating test functions. When parameters are
+   independent and form a cartesian product, prefer `#[values]` over enumerating
+   every combination with `#[case]`.
++- Actively look for rstest consolidation opportunities: when writing multiple tests
++  that share the same setup/flow and differ only in configuration and expected
++  outcome, write one parameterized rstest instead of separate functions. Also check
++  whether a new test duplicates the flow of an existing nearby test and should be
++  merged into it as a new `#[case]`. A common pattern is toggling a feature (e.g.
++  column mapping on/off) and asserting success vs. error.
+ - Reuse helpers from `test_utils` instead of writing custom ones when possible.
++- **Committing in tests:** Use `txn.commit(engine)?.unwrap_committed()` to assert a
++  successful commit and get the `CommittedTransaction`. Do NOT use `match` + `panic!`
++  for this -- `unwrap_committed()` provides a clear error message on failure. Available
++  under `#[cfg(test)]` and the `test-utils` feature.
++- **Prefer snapshot/public API assertions over reading raw commit JSON.** Only read raw
++  commit JSON when the data is inaccessible via public API (e.g., system domain metadata
++  is blocked by `get_domain_metadata`). For commit JSON reads, use `read_actions_from_commit`
++  from `test_utils` -- do NOT write local helpers that duplicate this.
+ - **`add_commit` and table setup in tests:** `add_commit` takes a `table_root` string and
+   resolves it to an absolute object-store path. The `table_root` must be a proper URL string
+   with a trailing slash (e.g. `"memory:///"`, `"file:///tmp/my_table/"`). Avoid using the
+   `allowColumnDefaults`, `changeDataFeed`, `identityColumns`, `rowTracking`,
+   `domainMetadata`, `icebergCompatV1`, `icebergCompatV2`, `clustering`,
+   `inCommitTimestamp`
+-- Reader + writer: `columnMapping`, `deletionVectors`, `timestampNtz`,
+-  `v2Checkpoint`, `vacuumProtocolCheck`, `variantType`, `variantType-preview`,
+-  `typeWidening`
++- Reader + writer: `catalogManaged`, `catalogOwned-preview`, `columnMapping`,
++  `deletionVectors`, `timestampNtz`, `v2Checkpoint`, `vacuumProtocolCheck`,
++  `variantType`, `variantType-preview`, `typeWidening`
+ 
+ Keep this list updated when new protocol features are added to kernel.
+ 
+ - Code comments state intent and explain "why" -- don't restate what the code self-documents.
+ - Place `use` imports at the top of the file (for non-test code) or at the top of the
+   `mod tests` block (for test code) -- never inside function bodies.
++- Prefer `==` over `matches!` for simple single-variant enum comparisons. `matches!` is
++  for patterns with bindings or guards. For example: `self == Variant` not
++  `matches!(self, Variant)`.
++- Prefer `StructField::nullable` / `StructField::not_null` over
++  `StructField::new(name, type, bool)` when nullability is known at compile time.
++  Reserve `StructField::new` for cases where nullability is a runtime value.
+ - NEVER panic in production code -- use errors instead. Panicking
+   (including `unwrap()`, `expect()`, `panic!()`, `unreachable!()`, etc) is acceptable in test code only.
+ 
+ a newer (potentially compromised) transitive dependency. If `Cargo.lock` is out of sync with
+ `Cargo.toml`, the build fails immediately, forcing dependency changes to be explicit and
+ reviewable. See the top-level comment in `build.yml` for full rationale. Commands exempt from
+-`--locked`: `cargo fmt` (no dep resolution), `cargo msrv verify/show` (wrapper tool),
++`--locked`: `cargo +nightly fmt` (no dep resolution), `cargo msrv verify/show` (wrapper tool),
+ `cargo miri setup` (tooling setup).
+ 
+ Ensure that when writing any github action you are considering safety including thinking of
\ No newline at end of file
CLAUDE/architecture.md
@@ -0,0 +1,49 @@
+diff --git a/CLAUDE/architecture.md b/CLAUDE/architecture.md
+--- a/CLAUDE/architecture.md
++++ b/CLAUDE/architecture.md
+ 
+ Built via `Snapshot::builder_for(url).build(engine)` (latest version) or
+ `.at_version(v).build(engine)` (specific version). For catalog-managed tables,
+-`.with_log_tail(commits)` supplies recent unpublished commits from the catalog.
++`.with_log_tail(commits)` supplies recent unpublished commits from the catalog and
++`.with_max_catalog_version(v)` caps the snapshot at the latest catalog-ratified version.
+ 
+ **Snapshot loading internals:**
+ 1. **LogSegment** (`kernel/src/log_segment/`) -- discovers commits + checkpoints for the
+ 
+ `Snapshot` -> `Transaction` -> commit
+ 
+-The kernel coordinates the write transaction: it provides the write context (target directory,
+-physical schema, stats columns), assembles commit actions (CommitInfo, Add files), enforces
+-protocol compliance (table features, schema validation), and delegates the atomic commit to a
+-`Committer`.
++The kernel coordinates the write transaction: it provides the write context (validated partition
++values, recommended write directory, physical schema, stats columns), assembles commit
++actions (CommitInfo, Add files), enforces protocol compliance (table features, schema validation),
++and delegates the atomic commit to a `Committer`.
+ 
+ **Steps:**
+ 1. Create `Transaction` from a snapshot with a `Committer` (e.g. `FileSystemCommitter`)
+-2. Get `WriteContext` for target dir, physical schema, and stats columns
++2. Get `WriteContext` via `partitioned_write_context(values)` or `unpartitioned_write_context()`
+ 3. Write Parquet files (via engine), collect file metadata
+ 4. Register files via `txn.add_files(metadata)`
+ 5. Commit: returns `CommittedTransaction`, `ConflictedTransaction`, or `RetryableTransaction`
+ - `kernel/src/snapshot/` -- `Snapshot`, `SnapshotBuilder`, entry point for reads/writes
+ - `kernel/src/scan/` -- `Scan`, `ScanBuilder`, log replay, data skipping
+ - `kernel/src/transaction/` -- `Transaction`, `WriteContext`, `create_table` builder
++- `kernel/src/partition/` -- partition value validation, serialization, Hive-style path
++   encoding, URI encoding for `add.path`
+ - `kernel/src/committer/` -- `Committer` trait, `FileSystemCommitter`
+ - `kernel/src/log_segment/` -- log file discovery, Protocol/Metadata replay
+ - `kernel/src/log_replay.rs` -- file-action deduplication, `LogReplayProcessor` trait
+ 
+ Tables whose commits go through a catalog (e.g. Unity Catalog) instead of direct filesystem
+ writes. Kernel doesn't know about catalogs -- the catalog client provides a log tail via
+-`SnapshotBuilder::with_log_tail()` and a custom `Committer` for staging/ratifying/publishing
+-commits. Requires `catalog-managed` feature flag.
++`SnapshotBuilder::with_log_tail()`, caps the version via `with_max_catalog_version()`, and
++uses a custom `Committer` for staging/ratifying/publishing commits.
+ 
+ The `UCCommitter` (in the `delta-kernel-unity-catalog` crate) is the reference implementation of a catalog
+ committer for Unity Catalog. It stages commits to `_staged_commits/`, calls the UC commit API to
\ No newline at end of file
CONTRIBUTING.md
@@ -0,0 +1,19 @@
+diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
+--- a/CONTRIBUTING.md
++++ b/CONTRIBUTING.md
+    # build docs
+    cargo doc --workspace --all-features
+    # highly recommend editor that automatically formats, but in case you need to:
+-   cargo fmt
++   cargo +nightly fmt
+ 
+    # run more tests
+    cargo test --workspace --all-features -- --skip read_table_version_hdfs
+ #### General Tips
+ 
+ 1. When making your first PR, please read our contributor guidelines: https://github.com/delta-incubator/delta-kernel-rs/blob/main/CONTRIBUTING.md
+-2. Run `cargo t --all-features --all-targets` to get started testing, and run `cargo fmt`.
++2. Run `cargo t --all-features --all-targets` to get started testing, and run `cargo +nightly fmt`.
+ 3. Ensure you have added or run the appropriate tests for your PR.
+ 4. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP] Your PR title ...'.
+ 5. Be sure to keep the PR description updated to reflect all changes.
\ No newline at end of file

... (truncated, output exceeded 60000 bytes)

Reproduce locally: git range-diff ac9dc19..dc5d29e 7866824..b8a88e5 | Disable: git config gitstack.push-range-diff false

@lorenarosati lorenarosati marked this pull request as draft April 27, 2026 18:51
@lorenarosati lorenarosati force-pushed the stack/schema-table-feat-geo branch from b8a88e5 to dc5d29e Compare April 28, 2026 17:54
Comment thread kernel/src/expressions/scalars.rs Outdated
_ => unreachable!(),
}
}
// Geometry/Geography are not valid partition column types, so there is no
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Geo columns should not be partition columns

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wonder if we want to detect this when we

  1. Create a transaction
  2. Create a table
    ?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Write path considerations like this should be out of scope for now, and I checked and seems like geo follows the same level of checks that other types that can't be partition values do - they all error at parse_partition_value_raw (geo is a primitive so it'll go to parse_Scalar and error there)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

geo follows the same level of checks that other types that can't be partition values do -

Seems not totally same level of checks? Map, Array, Variants as partition columns are rejected in create table(validate_partition_columns) but geo types are not

Comment thread kernel/src/transaction/stats_verifier.rs
/// the `geospatial` feature in both reader and writer features.
pub(crate) fn validate_geospatial_feature_support(tc: &TableConfiguration) -> DeltaResult<()> {
let protocol = tc.protocol();
if !protocol.has_table_feature(&TableFeature::GeospatialType) {
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lorenarosati
Copy link
Copy Markdown
Collaborator Author

Range-diff: main (dc5d29e -> ea12da4)
.github/workflows/build.yml
@@ -0,0 +1,75 @@
+diff --git a/.github/workflows/build.yml b/.github/workflows/build.yml
+--- a/.github/workflows/build.yml
++++ b/.github/workflows/build.yml
+ # enforce the committed Cargo.lock. This prevents CI from silently resolving a newer
+ # (potentially compromised) dependency version. If Cargo.lock is out of sync with
+ # Cargo.toml, the build fails immediately. Any dependency change must be an explicit,
+-# reviewable update to Cargo.lock in the PR. Commands that skip --locked: cargo fmt
++# reviewable update to Cargo.lock in the PR. Commands that skip --locked: cargo +nightly fmt
+ # (no dep resolution), cargo msrv verify/show (wrapper tool), cargo miri setup (tooling).
+ #
+ # Swatinem/rust-cache caches the cargo registry and target directory (~450MB per job).
+     runs-on: ubuntu-latest
+     steps:
+       - uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4.3.1
+-      - name: Install minimal stable with rustfmt
++      - name: Install nightly with rustfmt
+         uses: actions-rust-lang/setup-rust-toolchain@150fca883cd4034361b621bd4e6a9d34e5143606 # v1.15.4
+         with:
+           cache: false
++          toolchain: nightly
+           components: rustfmt
+       - name: format
+-        run: cargo fmt -- --check
++        run: cargo +nightly fmt -- --check
+ 
+   msrv:
+     runs-on: ubuntu-latest
+           pushd kernel
+           echo "Testing with $(cargo msrv show --output-format minimal)"
+           cargo +$(cargo msrv show --output-format minimal) nextest run --locked
++          cargo +$(cargo msrv show --output-format minimal) test --doc
+   docs:
+     runs-on: ubuntu-latest
+     env:
+           cmake ..
+           make
+           make test
++      - name: build and run create-table test
++        run: |
++          pushd ffi/examples/create-table
++          mkdir build
++          pushd build
++          cmake ..
++          make
++          make test
++      # NOTE: write-table's ctest seeds its target table by invoking the create-table
++      # binary, so create-table must be built first (its build/ dir is preserved by the
++      # preceding step and write-table's CMakeLists references it via a relative path).
++      - name: build and run write-table test
++        run: |
++          pushd ffi/examples/write-table
++          mkdir build
++          pushd build
++          cmake ..
++          make
++          make test
++      - name: build and run read-table-changes test
++        run: |
++          pushd ffi/examples/read-table-changes
++          mkdir build
++          pushd build
++          cmake ..
++          make
++          make test
+   miri:
+     name: "Miri (shard ${{ matrix.partition }}/3)"
+     runs-on: ubuntu-latest
+       - name: Install cargo-llvm-cov
+         uses: taiki-e/install-action@2d15d02e710b40b6332201aba6af30d595b5cd96 # cargo-llvm-cov
+       - name: Generate code coverage
+-        run: cargo llvm-cov --locked --all-features --workspace --codecov --output-path codecov.json -- --skip read_table_version_hdfs
++        run: cargo llvm-cov --locked --all-features --workspace --codecov --output-path codecov.json -- --skip read_table_version_hdfs --skip handle::tests::invalid_handle_code
+       - name: Upload coverage to Codecov
+         uses: codecov/codecov-action@1af58845a975a7985b0beb0cbe6fbbb71a41dbad # v5.5.3
+         with:
\ No newline at end of file
.github/workflows/pr-body-validator.yml
@@ -0,0 +1,27 @@
+diff --git a/.github/workflows/pr-body-validator.yml b/.github/workflows/pr-body-validator.yml
+new file mode 100644
+--- /dev/null
++++ b/.github/workflows/pr-body-validator.yml
++name: Validate PR Body
++
++on:
++  pull_request:
++    types: [opened, edited]
++  merge_group:
++
++jobs:
++  validate-body:
++    runs-on: ubuntu-latest
++    steps:
++      - name: Validate PR Body
++        shell: bash
++        env:
++          PR_BODY: ${{ github.event.pull_request.body }}
++        run: |
++          if LC_ALL=C grep -q '[^[:print:][:space:]]' <<< "$PR_BODY"; then
++            echo "PR body contains non-ascii characters. Please remove them."
++            exit 1
++          else
++            echo "PR body contains ascii characters only"
++          fi
++
\ No newline at end of file
CHANGELOG.md
@@ -0,0 +1,282 @@
+diff --git a/CHANGELOG.md b/CHANGELOG.md
+--- a/CHANGELOG.md
++++ b/CHANGELOG.md
+ # Changelog
+ 
++## [v0.21.0](https://github.com/delta-io/delta-kernel-rs/tree/v0.21.0/) (2026-04-10)
++
++[Full Changelog](https://github.com/delta-io/delta-kernel-rs/compare/v0.20.0...v0.21.0)
++
++
++### 🏗️ Breaking changes
++
++1. Add partitioned variant to DataLayout enum ([#2145])
++   - Adds `Partitioned` variant to `DataLayout` enum. Update match statements to handle the new variant.
++2. Add create many API to engine ([#2070])
++   - Adds `create_many` method to `ParquetHandler` trait. Implementors must add this method. See the trait rustdocs for details.
++3. Rename uc-catalog and uc-client crates ([#2136])
++   - `delta-kernel-uc-catalog` renamed to `delta-kernel-unity-catalog`. `delta-kernel-uc-client` renamed to `unity-catalog-delta-rest-client`. Update `Cargo.toml` dependencies accordingly.
++4. Checksum and checkpoint APIs return updated Snapshot ([#2182])
++   - `Snapshot::checkpoint()` and checksum APIs now return the updated `Snapshot`. Callers must handle the returned value.
++5. Add P&M to CommitMetadata and enforce committer/table type matching ([#2250])
++   - Enforces that committer type matches table type (catalog-managed vs path-based). Use appropriate committer for your table type.
++6. Add UCCommitter validation for catalog-managed tables ([#2254])
++   - `UCCommitter` now rejects commits to non-catalog-managed tables. Use `FileSystemCommitter` for path-based tables.
++7. Refactor snapshot FFI to use builder pattern and enable snapshot reuse ([#2255])
++   - FFI snapshot creation now uses builder pattern. Update FFI callers to use the new builder APIs.
++8. Make tags and remove partition values allow null values in map ([#2281])
++   - `tags` and `partitionValues` map values are now nullable. Update code that assumes non-null values.
++9. Better naming style for column mapping related functions/variables ([#2290])
++   - Renamed: `make_physical` to `to_physical_name`, `make_physical_struct` to `to_physical_schema`, `transform_struct_for_projection` to `projection_transform`. Update call sites.
++10. Remove the catalog-managed feature flag ([#2310])
++    - The `catalog-managed` feature flag is removed. Catalog-managed table support is now always available.
++11. Update snapshot.checkpoint API to return a CheckpointResult ([#2314])
++    - `Snapshot::checkpoint()` now returns `CheckpointResult` instead of `Snapshot`. Access the snapshot via `CheckpointResult::snapshot`.
++12. Remove old non-builder snapshot FFI functions ([#2318])
++    - Removed legacy FFI snapshot functions. Use the new builder-pattern FFI functions instead.
++13. Support version 0 (table creation) commits in UCCommitter ([#2247])
++    - Connectors using `UCCommitter` for table creation must now handle post-commit finalization via the UC create table API.
++14. Pass computed ICT to CommitMetadata instead of wall-clock time ([#2319])
++    - `CommitMetadata` now uses computed in-commit timestamp instead of wall-clock time. Callers relying on wall-clock timing should update accordingly.
++15. Upgrade to arrow-58 and object_store-13, drop arrow-56 support ([#2116])
++    - Minimum supported Arrow version is now arrow-57. Update your `Cargo.toml` if using `arrow-56` feature.
++16. Crc File Histogram Read and Write Support ([#2235])
++    - Adds `AddedHistogram` and `RemovedHistogram` fields to `FileStatsDelta` struct.
++17. Add ScanMetadataCompleted metric event ([#2236])
++    - Adds `ScanMetadataCompleted` variant to `MetricEvent` enum. Update metric reporters to handle the new variant.
++18. Instrument JSON and Parquet handler reads with MetricsReporter ([#2169])
++    - Adds `JsonReadCompleted` and `ParquetReadCompleted` variants to `MetricEvent` enum. Update metric reporters to handle new variants.
++19. New transform helpers for unary and binary children ([#2150])
++    - Removes public `CowExt` trait. Remove any usages of this trait.
++20. New mod transforms for expression and schema transforms ([#2077])
++    - Moves `SchemaTransform` and `ExpressionTransform` to new `transforms` module. Update import paths.
++21. Introduce object_store compat shim ([#2111])
++    - Renames `object_store` dependency to `object_store_12`. Update any direct references.
++22. Consolidate domain metadata reads through Snapshot ([#2065])
++    - Domain metadata reads now go through `Snapshot` methods. Update callers using old free functions.
++23. Don't read or write arrow schema in parquet files ([#2025])
++    - Parquet files no longer include arrow schema metadata. Code relying on this metadata must be updated.
++24. Rename include_stats_columns to include_all_stats_columns ([#1996])
++    - Renames `ScanBuilder::include_stats_columns()` to `ScanBuilder::include_all_stats_columns()`. Update call sites.
++
++### 🚀 Features / new APIs
++
++1. Add SQL -> Kernel predicate parser to benchmark framework ([#2099])
++2. Add observability metrics for scan log replay ([#1866])
++3. Filtered engine data visitor ([#1942])
++4. Trigger benchmarking with comments ([#2089])
++5. Unify data stats and partition values in DataSkippingFilter ([#1948])
++6. Download benchmark workloads from DAT release ([#2163])
++7. Add partitioned variant to DataLayout enum ([#2145])
++8. Expose table_properties in FFI via visit_table_properties ([#2196])
++9. Allow checkpoint stats properties in CREATE TABLE ([#2210])
++10. Add crc file histogram initial struct and methods ([#2212])
++11. BinaryPredicate evaluate expression with ArrowViewType. ([#2052])
++12. Add acceptance workloads testing harness ([#2092])
++13. Enable DeletionVectors table feature in CREATE TABLE ([#2245])
++14. Checksum and checkpoint APIs return updated Snapshot ([#2182])
++15. Adding ScanBuilder FFI functions for Scans ([#2237])
++16. Add CountingReporter and fix metrics forwarding ([#2166])
++17. Instrument JSON and Parquet handler reads with MetricsReporter ([#2169])
++18. Wire CountingReporter into workload benchmarks ([#2171])
++19. Add create many API to engine ([#2070])
++20. Add ScanMetadataCompleted metric event ([#2236])
++21. Allow AppendOnly, ChangeDataFeed, and TypeWidening in CREATE TABLE ([#2279])
++22. Support max timestamp stats for data skipping ([#2249])
++23. Add list with backward checkpoint scan ([#2174])
++24. Add Snapshot::get_timestamp ([#2266])
++25. Make tags  and remove partition values allow null values in map ([#2281])
++26. Support UC credential vending and S3 benchmarks ([#2109])
++27. Add catalogManaged to allowed features in CREATE TABLE ([#2293])
++28. Add catalog-managed table creation utilities ([#2203])
++29. Support version 0 (table creation) commits in UCCommitter ([#2247])
++30. Update snapshot.checkpoint API to return a CheckpointResult ([#2314])
++31. Cached checkpoint output schema ([#2270])
++32. Refactor snapshot FFI to use builder pattern and enable snapshot reuse ([#2255])
++33. Add P&M to CommitMetadata and enforce committer/table type matching ([#2250])
++34. Add UCCommitter validation for catalog-managed tables ([#2254])
++35. Crc File Histogram Read and Write Support ([#2235])
++36. Add FFI function to expose snapshot's timestamp ([#2274])
++37. Add FFI create table DDL functions ([#2296])
++38. Add FFI remove files DML functions ([#2297])
++39. Expose Protocol and Metadata as opaque FFI handle types ([#2260])
++40. Add FFI bindings for domain metadata write operations ([#2327])
++
++### 🐛 Bug Fixes
++
++1. Treat null literal as unknown in meta-predicate evaluation ([#2097])
++2. Update TokioBackgroundExecutor to join thread instead of detaching ([#2126])
++3. Use thread pools and multi-thread tokio executor in read metadata benchmark runner ([#2044])
++4. Emit null stats for all-null columns instead of omitting them ([#2187])
++5. Allow Date/Timestamp casting for stats_parsed compatibility ([#2074])
++6. Filter evaluator input schema ([#2195])
++7. SnapshotCompleted.total_duration now includes log segment loading ([#2183])
++8. Avoid creating empty stats schemas ([#2199])
++9. Prevent dual TLS crypto backends from reqwest default features ([#2178])
++10. Vendor and pin homebrew actions ([#2243])
++11. Validate min_reader/writer_version are at least 1 ([#2202])
++12. Preserve loaded LazyCrc during incremental snapshot updates ([#2211])
++13. Detect stats_parsed in multi-part V1 checkpoints ([#2214])
++14. Downgrade per-batch data skipping log from info to debug ([#2219])
++15. Unknown table features in feature list are "supported" ([#2159])
++16. Remove debug_assert_eq before require in scan evaluator row count checks ([#2262])
++17. Adopt checkpoint written later for same-version snapshot refresh ([#2143])
++18. Return error when parquet handler returns empty data for scan files ([#2261])
++19. Refactor benchmarking workflow to not require criterion compare action ([#2264])
++20. Skip name-based validation for struct columns in expression evaluator ([#2160])
++21. Handle missing leaf columns in nested struct during parquet projection ([#2170])
++22. Pass computed ICT to CommitMetadata instead of wall-clock time ([#2319])
++23. Detect and handle empty (0-byte) log files during listing ([#2336])
++
++### 📚 Documentation
++
++1. Update claude readme to include github actions safety note ([#2190])
++2. Add line width and comment divider style rules to CLAUDE.md ([#2277])
++3. Add documentation for current tags ([#2234])
++4. Document benchmarking in CI accuracy ([#2302])
++
++### ⚡ Performance
++
++1. Pre-size dedup HashSet in ScanLogReplayProcessor ([#2186])
++2. Pre-size HashMap in ArrowEngineData::visit_rows ([#2185])
++3. Remove dead schema conversions in expression evaluators ([#2184])
++
++### 🚜 Refactor
++
++1. Finalized benchmark table names and added new tables ([#2072])
++2. New transform helpers for unary and binary children ([#2150])
++3. Remove legacy row-level partition filter path ([#2158])
++4. Restructured list log files function ([#2173])
++5. Consolidate and add testing for set transaction expiration ([#2176])
++6. Rename uc-catalog and uc-client crates ([#2136])
++7. Better naming style for column mapping related functions/variables ([#2290])
++8. Centralize computation for physical schema without partition columns ([#2142])
++9. Consolidate FFI test setup helpers into ffi_test_utils ([#2307])
++10. *(action_reconciliation)* Combine getter index and field name constants ([#1717]) ([#1774])
++11. Extract shared stat helpers from RowGroupFilter ([#2324])
++12. Extract WriteContext to its own file ([#2349])
++
++### ⚙️ Chores/CI
++
++1. Clean up arrow deps in cargo files ([#2115])
++2. Commit Cargo.lock and enforce --locked in all CI workflows ([#2240])
++3. Harden pr-title-validator a bit ([#2246])
++4. Renable semver ([#2248])
++5. Attempt fixup of semver-label job ([#2253])
++6. Use artifacts for semver label ([#2258])
++7. Remove old non-builder snapshot FFI functions ([#2318])
++8. Remove the catalog-managed feature flag ([#2310])
++9. Upgrade to arrow-58 and object_store-13, drop arrow-56 support ([#2116])
++
++### Other
++
++[#2097]: https://github.com/delta-io/delta-kernel-rs/pull/2097
++[#2099]: https://github.com/delta-io/delta-kernel-rs/pull/2099
++[#2126]: https://github.com/delta-io/delta-kernel-rs/pull/2126
++[#2115]: https://github.com/delta-io/delta-kernel-rs/pull/2115
++[#1866]: https://github.com/delta-io/delta-kernel-rs/pull/1866
++[#2044]: https://github.com/delta-io/delta-kernel-rs/pull/2044
++[#1942]: https://github.com/delta-io/delta-kernel-rs/pull/1942
++[#2072]: https://github.com/delta-io/delta-kernel-rs/pull/2072
++[#2089]: https://github.com/delta-io/delta-kernel-rs/pull/2089
++[#2187]: https://github.com/delta-io/delta-kernel-rs/pull/2187
++[#2190]: https://github.com/delta-io/delta-kernel-rs/pull/2190
++[#1948]: https://github.com/delta-io/delta-kernel-rs/pull/1948
++[#2150]: https://github.com/delta-io/delta-kernel-rs/pull/2150
++[#2074]: https://github.com/delta-io/delta-kernel-rs/pull/2074
++[#2195]: https://github.com/delta-io/delta-kernel-rs/pull/2195
++[#2158]: https://github.com/delta-io/delta-kernel-rs/pull/2158
++[#2186]: https://github.com/delta-io/delta-kernel-rs/pull/2186
++[#2185]: https://github.com/delta-io/delta-kernel-rs/pull/2185
++[#2173]: https://github.com/delta-io/delta-kernel-rs/pull/2173
++[#2163]: https://github.com/delta-io/delta-kernel-rs/pull/2163
++[#2145]: https://github.com/delta-io/delta-kernel-rs/pull/2145
++[#2184]: https://github.com/delta-io/delta-kernel-rs/pull/2184
++[#2183]: https://github.com/delta-io/delta-kernel-rs/pull/2183
++[#2199]: https://github.com/delta-io/delta-kernel-rs/pull/2199
++[#2196]: https://github.com/delta-io/delta-kernel-rs/pull/2196
++[#2210]: https://github.com/delta-io/delta-kernel-rs/pull/2210
++[#2178]: https://github.com/delta-io/delta-kernel-rs/pull/2178
++[#2240]: https://github.com/delta-io/delta-kernel-rs/pull/2240
++[#2243]: https://github.com/delta-io/delta-kernel-rs/pull/2243
++[#2202]: https://github.com/delta-io/delta-kernel-rs/pull/2202
++[#2211]: https://github.com/delta-io/delta-kernel-rs/pull/2211
++[#2214]: https://github.com/delta-io/delta-kernel-rs/pull/2214
++[#2246]: https://github.com/delta-io/delta-kernel-rs/pull/2246
++[#2219]: https://github.com/delta-io/delta-kernel-rs/pull/2219
++[#2212]: https://github.com/delta-io/delta-kernel-rs/pull/2212
++[#2176]: https://github.com/delta-io/delta-kernel-rs/pull/2176
++[#2159]: https://github.com/delta-io/delta-kernel-rs/pull/2159
++[#2248]: https://github.com/delta-io/delta-kernel-rs/pull/2248
++[#2253]: https://github.com/delta-io/delta-kernel-rs/pull/2253
++[#2052]: https://github.com/delta-io/delta-kernel-rs/pull/2052
++[#2092]: https://github.com/delta-io/delta-kernel-rs/pull/2092
++[#2258]: https://github.com/delta-io/delta-kernel-rs/pull/2258
++[#2136]: https://github.com/delta-io/delta-kernel-rs/pull/2136
++[#2245]: https://github.com/delta-io/delta-kernel-rs/pull/2245
++[#2182]: https://github.com/delta-io/delta-kernel-rs/pull/2182
++[#2262]: https://github.com/delta-io/delta-kernel-rs/pull/2262
++[#2237]: https://github.com/delta-io/delta-kernel-rs/pull/2237
++[#2166]: https://github.com/delta-io/delta-kernel-rs/pull/2166
++[#2169]: https://github.com/delta-io/delta-kernel-rs/pull/2169
++[#2171]: https://github.com/delta-io/delta-kernel-rs/pull/2171
++[#2143]: https://github.com/delta-io/delta-kernel-rs/pull/2143
++[#2070]: https://github.com/delta-io/delta-kernel-rs/pull/2070
++[#2261]: https://github.com/delta-io/delta-kernel-rs/pull/2261
++[#2277]: https://github.com/delta-io/delta-kernel-rs/pull/2277
++[#2236]: https://github.com/delta-io/delta-kernel-rs/pull/2236
++[#2279]: https://github.com/delta-io/delta-kernel-rs/pull/2279
++[#2249]: https://github.com/delta-io/delta-kernel-rs/pull/2249
++[#2290]: https://github.com/delta-io/delta-kernel-rs/pull/2290
++[#2174]: https://github.com/delta-io/delta-kernel-rs/pull/2174
++[#2264]: https://github.com/delta-io/delta-kernel-rs/pull/2264
++[#2234]: https://github.com/delta-io/delta-kernel-rs/pull/2234
++[#2302]: https://github.com/delta-io/delta-kernel-rs/pull/2302
++[#2142]: https://github.com/delta-io/delta-kernel-rs/pull/2142
++[#2266]: https://github.com/delta-io/delta-kernel-rs/pull/2266
++[#2281]: https://github.com/delta-io/delta-kernel-rs/pull/2281
++[#2109]: https://github.com/delta-io/delta-kernel-rs/pull/2109
++[#2293]: https://github.com/delta-io/delta-kernel-rs/pull/2293
++[#2203]: https://github.com/delta-io/delta-kernel-rs/pull/2203
++[#2247]: https://github.com/delta-io/delta-kernel-rs/pull/2247
++[#2160]: https://github.com/delta-io/delta-kernel-rs/pull/2160
++[#2314]: https://github.com/delta-io/delta-kernel-rs/pull/2314
++[#2270]: https://github.com/delta-io/delta-kernel-rs/pull/2270
++[#2255]: https://github.com/delta-io/delta-kernel-rs/pull/2255
++[#2250]: https://github.com/delta-io/delta-kernel-rs/pull/2250
++[#2254]: https://github.com/delta-io/delta-kernel-rs/pull/2254
++[#2307]: https://github.com/delta-io/delta-kernel-rs/pull/2307
++[#2170]: https://github.com/delta-io/delta-kernel-rs/pull/2170
++[#2235]: https://github.com/delta-io/delta-kernel-rs/pull/2235
++[#2274]: https://github.com/delta-io/delta-kernel-rs/pull/2274
++[#1774]: https://github.com/delta-io/delta-kernel-rs/pull/1774
++[#2296]: https://github.com/delta-io/delta-kernel-rs/pull/2296
++[#2318]: https://github.com/delta-io/delta-kernel-rs/pull/2318
++[#2310]: https://github.com/delta-io/delta-kernel-rs/pull/2310
++[#2297]: https://github.com/delta-io/delta-kernel-rs/pull/2297
++[#2324]: https://github.com/delta-io/delta-kernel-rs/pull/2324
++[#2260]: https://github.com/delta-io/delta-kernel-rs/pull/2260
++[#2327]: https://github.com/delta-io/delta-kernel-rs/pull/2327
++[#2319]: https://github.com/delta-io/delta-kernel-rs/pull/2319
++[#2116]: https://github.com/delta-io/delta-kernel-rs/pull/2116
++[#2349]: https://github.com/delta-io/delta-kernel-rs/pull/2349
++[#2336]: https://github.com/delta-io/delta-kernel-rs/pull/2336
++[#2077]: https://github.com/delta-io/delta-kernel-rs/pull/2077                                                                                               
++[#2111]: https://github.com/delta-io/delta-kernel-rs/pull/2111                                                                                                 
++[#2065]: https://github.com/delta-io/delta-kernel-rs/pull/2065                                                                                               
++[#2025]: https://github.com/delta-io/delta-kernel-rs/pull/2025                                                                                               
++[#1996]: https://github.com/delta-io/delta-kernel-rs/pull/1996
++[#1717]: https://github.com/delta-io/delta-kernel-rs/pull/1717
++[#1922]: https://github.com/delta-io/delta-kernel-rs/pull/1922
++
+ ## [v0.20.0](https://github.com/delta-io/delta-kernel-rs/tree/v0.20.0/) (2026-02-26)
+ 
+ [Full Changelog](https://github.com/delta-io/delta-kernel-rs/compare/v0.19.2...v0.20.0)
+ 22. Implement schema diffing for flat schemas (2/5]) ([#1478])
+ 23. Add API on Scan to perform 2-phase log replay  ([#1547])
+ 24. Enable distributed log replay serde serialization for serializable scan state ([#1549])
+-25. Add InCommitTimestamp support to ChangeDataFeed ([#1670]) 
++25. Add InCommitTimestamp support to ChangeDataFeed ([#1670])
+ 26. Add include_stats_columns API and output_stats_schema field ([#1728])
+ 27. Add write support for clustered tables behind feature flag ([#1704])
+ 28. Add snapshot load instrumentation ([#1750])
\ No newline at end of file
CLAUDE.md
@@ -0,0 +1,108 @@
+diff --git a/CLAUDE.md b/CLAUDE.md
+--- a/CLAUDE.md
++++ b/CLAUDE.md
+ (`Snapshot`, `Scan`, `Transaction`) and delegates _how_ to the `Engine` trait.
+ 
+ Current capabilities: table reads with predicates, data skipping, deletion vectors, change
+-data feed, checkpoints (V1 & V2), log compaction, blind append writes, table creation
++data feed, checkpoints (V1 & V2), log compaction (disabled, #2337), blind append writes, table creation
+ (including clustered tables), and catalog-managed table support.
+ 
+ ## Build & Test Commands
+ cargo nextest run --workspace --all-features test_name_here
+ 
+ # Format, lint, and doc check (always run after code changes)
+-cargo fmt \
++cargo +nightly fmt \
+   && cargo clippy --workspace --benches --tests --all-features -- -D warnings \
+   && cargo doc --workspace --all-features --no-deps
+ 
+   --exclude delta_kernel --exclude delta_kernel_ffi --exclude delta_kernel_derive --exclude delta_kernel_ffi_macros -- -D warnings
+ 
+ # Quick pre-push check (mimics CI)
+-cargo fmt \
++cargo +nightly fmt \
+   && cargo clippy --workspace --benches --tests --all-features -- -D warnings \
+   && cargo doc --workspace --all-features --no-deps \
+   && cargo nextest run --workspace --all-features
+ 
+ ### Feature Flags
+ 
+-- `default-engine` / `default-engine-rustls` / `default-engine-native-tls` -- async
+-  Arrow/Tokio engine (pick one TLS backend)
++- `default-engine-rustls` / `default-engine-native-tls` -- async Arrow/Tokio engine (pick a TLS backend)
+ - `arrow`, `arrow-XX`, `arrow-YY` -- Arrow version selection (kernel tracks the latest two
+   major Arrow releases; `arrow` defaults to latest). Kernel itself does not depend on Arrow,
+-  but default-engine does.
++  but the default engine does.
+ - `arrow-conversion`, `arrow-expression` -- Arrow interop (auto-enabled by default engine)
+ - `prettyprint` -- enables Arrow pretty-print helpers (primarily test/example oriented)
+-- `catalog-managed` -- catalog-managed table support (experimental)
+ - `clustered-table` -- clustered table write support (experimental)
+ - `internal-api` -- unstable APIs like `parallel_scan_metadata`. Items are marked with the
+   `#[internal_api]` proc macro attribute.
+ `execute()` (simple), `scan_metadata()` (advanced/distributed),
+ `parallel_scan_metadata()` (two-phase distributed log replay).
+ 
+-**Write path:** `Snapshot` -> `Transaction` -> `commit()`. Kernel provides `WriteContext`,
+-assembles commit actions, enforces protocol compliance, delegates atomic commit to a
+-`Committer`.
++**Write path:** `Snapshot` -> `Transaction` -> `commit()`. Kernel provides `WriteContext`
++(via `partitioned_write_context` or `unpartitioned_write_context`), assembles commit
++actions, enforces protocol compliance, delegates atomic commit to a `Committer`.
+ 
+ **Engine trait:** five handlers (`StorageHandler`, `JsonHandler`, `ParquetHandler`,
+ `EvaluationHandler`, optional `MetricsReporter`). `DefaultEngine` lives in
+   or inputs. Prefer `#[case]` over duplicating test functions. When parameters are
+   independent and form a cartesian product, prefer `#[values]` over enumerating
+   every combination with `#[case]`.
++- Actively look for rstest consolidation opportunities: when writing multiple tests
++  that share the same setup/flow and differ only in configuration and expected
++  outcome, write one parameterized rstest instead of separate functions. Also check
++  whether a new test duplicates the flow of an existing nearby test and should be
++  merged into it as a new `#[case]`. A common pattern is toggling a feature (e.g.
++  column mapping on/off) and asserting success vs. error.
+ - Reuse helpers from `test_utils` instead of writing custom ones when possible.
++- **Committing in tests:** Use `txn.commit(engine)?.unwrap_committed()` to assert a
++  successful commit and get the `CommittedTransaction`. Do NOT use `match` + `panic!`
++  for this -- `unwrap_committed()` provides a clear error message on failure. Available
++  under `#[cfg(test)]` and the `test-utils` feature.
++- **Prefer snapshot/public API assertions over reading raw commit JSON.** Only read raw
++  commit JSON when the data is inaccessible via public API (e.g., system domain metadata
++  is blocked by `get_domain_metadata`). For commit JSON reads, use `read_actions_from_commit`
++  from `test_utils` -- do NOT write local helpers that duplicate this.
+ - **`add_commit` and table setup in tests:** `add_commit` takes a `table_root` string and
+   resolves it to an absolute object-store path. The `table_root` must be a proper URL string
+   with a trailing slash (e.g. `"memory:///"`, `"file:///tmp/my_table/"`). Avoid using the
+   `allowColumnDefaults`, `changeDataFeed`, `identityColumns`, `rowTracking`,
+   `domainMetadata`, `icebergCompatV1`, `icebergCompatV2`, `clustering`,
+   `inCommitTimestamp`
+-- Reader + writer: `columnMapping`, `deletionVectors`, `timestampNtz`,
+-  `v2Checkpoint`, `vacuumProtocolCheck`, `variantType`, `variantType-preview`,
+-  `typeWidening`
++- Reader + writer: `catalogManaged`, `catalogOwned-preview`, `columnMapping`,
++  `deletionVectors`, `timestampNtz`, `v2Checkpoint`, `vacuumProtocolCheck`,
++  `variantType`, `variantType-preview`, `typeWidening`
+ 
+ Keep this list updated when new protocol features are added to kernel.
+ 
+ - Code comments state intent and explain "why" -- don't restate what the code self-documents.
+ - Place `use` imports at the top of the file (for non-test code) or at the top of the
+   `mod tests` block (for test code) -- never inside function bodies.
++- Prefer `==` over `matches!` for simple single-variant enum comparisons. `matches!` is
++  for patterns with bindings or guards. For example: `self == Variant` not
++  `matches!(self, Variant)`.
++- Prefer `StructField::nullable` / `StructField::not_null` over
++  `StructField::new(name, type, bool)` when nullability is known at compile time.
++  Reserve `StructField::new` for cases where nullability is a runtime value.
+ - NEVER panic in production code -- use errors instead. Panicking
+   (including `unwrap()`, `expect()`, `panic!()`, `unreachable!()`, etc) is acceptable in test code only.
+ 
+ a newer (potentially compromised) transitive dependency. If `Cargo.lock` is out of sync with
+ `Cargo.toml`, the build fails immediately, forcing dependency changes to be explicit and
+ reviewable. See the top-level comment in `build.yml` for full rationale. Commands exempt from
+-`--locked`: `cargo fmt` (no dep resolution), `cargo msrv verify/show` (wrapper tool),
++`--locked`: `cargo +nightly fmt` (no dep resolution), `cargo msrv verify/show` (wrapper tool),
+ `cargo miri setup` (tooling setup).
+ 
+ Ensure that when writing any github action you are considering safety including thinking of
\ No newline at end of file
CLAUDE/architecture.md
@@ -0,0 +1,49 @@
+diff --git a/CLAUDE/architecture.md b/CLAUDE/architecture.md
+--- a/CLAUDE/architecture.md
++++ b/CLAUDE/architecture.md
+ 
+ Built via `Snapshot::builder_for(url).build(engine)` (latest version) or
+ `.at_version(v).build(engine)` (specific version). For catalog-managed tables,
+-`.with_log_tail(commits)` supplies recent unpublished commits from the catalog.
++`.with_log_tail(commits)` supplies recent unpublished commits from the catalog and
++`.with_max_catalog_version(v)` caps the snapshot at the latest catalog-ratified version.
+ 
+ **Snapshot loading internals:**
+ 1. **LogSegment** (`kernel/src/log_segment/`) -- discovers commits + checkpoints for the
+ 
+ `Snapshot` -> `Transaction` -> commit
+ 
+-The kernel coordinates the write transaction: it provides the write context (target directory,
+-physical schema, stats columns), assembles commit actions (CommitInfo, Add files), enforces
+-protocol compliance (table features, schema validation), and delegates the atomic commit to a
+-`Committer`.
++The kernel coordinates the write transaction: it provides the write context (validated partition
++values, recommended write directory, physical schema, stats columns), assembles commit
++actions (CommitInfo, Add files), enforces protocol compliance (table features, schema validation),
++and delegates the atomic commit to a `Committer`.
+ 
+ **Steps:**
+ 1. Create `Transaction` from a snapshot with a `Committer` (e.g. `FileSystemCommitter`)
+-2. Get `WriteContext` for target dir, physical schema, and stats columns
++2. Get `WriteContext` via `partitioned_write_context(values)` or `unpartitioned_write_context()`
+ 3. Write Parquet files (via engine), collect file metadata
+ 4. Register files via `txn.add_files(metadata)`
+ 5. Commit: returns `CommittedTransaction`, `ConflictedTransaction`, or `RetryableTransaction`
+ - `kernel/src/snapshot/` -- `Snapshot`, `SnapshotBuilder`, entry point for reads/writes
+ - `kernel/src/scan/` -- `Scan`, `ScanBuilder`, log replay, data skipping
+ - `kernel/src/transaction/` -- `Transaction`, `WriteContext`, `create_table` builder
++- `kernel/src/partition/` -- partition value validation, serialization, Hive-style path
++   encoding, URI encoding for `add.path`
+ - `kernel/src/committer/` -- `Committer` trait, `FileSystemCommitter`
+ - `kernel/src/log_segment/` -- log file discovery, Protocol/Metadata replay
+ - `kernel/src/log_replay.rs` -- file-action deduplication, `LogReplayProcessor` trait
+ 
+ Tables whose commits go through a catalog (e.g. Unity Catalog) instead of direct filesystem
+ writes. Kernel doesn't know about catalogs -- the catalog client provides a log tail via
+-`SnapshotBuilder::with_log_tail()` and a custom `Committer` for staging/ratifying/publishing
+-commits. Requires `catalog-managed` feature flag.
++`SnapshotBuilder::with_log_tail()`, caps the version via `with_max_catalog_version()`, and
++uses a custom `Committer` for staging/ratifying/publishing commits.
+ 
+ The `UCCommitter` (in the `delta-kernel-unity-catalog` crate) is the reference implementation of a catalog
+ committer for Unity Catalog. It stages commits to `_staged_commits/`, calls the UC commit API to
\ No newline at end of file
CONTRIBUTING.md
@@ -0,0 +1,19 @@
+diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
+--- a/CONTRIBUTING.md
++++ b/CONTRIBUTING.md
+    # build docs
+    cargo doc --workspace --all-features
+    # highly recommend editor that automatically formats, but in case you need to:
+-   cargo fmt
++   cargo +nightly fmt
+ 
+    # run more tests
+    cargo test --workspace --all-features -- --skip read_table_version_hdfs
+ #### General Tips
+ 
+ 1. When making your first PR, please read our contributor guidelines: https://github.com/delta-incubator/delta-kernel-rs/blob/main/CONTRIBUTING.md
+-2. Run `cargo t --all-features --all-targets` to get started testing, and run `cargo fmt`.
++2. Run `cargo t --all-features --all-targets` to get started testing, and run `cargo +nightly fmt`.
+ 3. Ensure you have added or run the appropriate tests for your PR.
+ 4. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP] Your PR title ...'.
+ 5. Be sure to keep the PR description updated to reflect all changes.
\ No newline at end of file

... (truncated, output exceeded 60000 bytes)

Reproduce locally: git range-diff ac9dc19..dc5d29e 7866824..ea12da4 | Disable: git config gitstack.push-range-diff false

Comment thread kernel/src/schema/mod.rs Outdated
Ok(Self { srid, algorithm })
}

/// Creates a new GeographyType with the given SRID and the default edge interpolation
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mirrors constructors available in java kernel

Comment thread kernel/src/schema/mod.rs Outdated
.map(PrimitiveType::Decimal)
.map_err(serde::de::Error::custom)
}
"geometry" => Ok(PrimitiveType::Geometry(GeometryType::default())),
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lorenarosati
Copy link
Copy Markdown
Collaborator Author

Range-diff: main (ea12da4 -> 14a713c)
.github/workflows/build.yml
@@ -0,0 +1,75 @@
+diff --git a/.github/workflows/build.yml b/.github/workflows/build.yml
+--- a/.github/workflows/build.yml
++++ b/.github/workflows/build.yml
+ # enforce the committed Cargo.lock. This prevents CI from silently resolving a newer
+ # (potentially compromised) dependency version. If Cargo.lock is out of sync with
+ # Cargo.toml, the build fails immediately. Any dependency change must be an explicit,
+-# reviewable update to Cargo.lock in the PR. Commands that skip --locked: cargo fmt
++# reviewable update to Cargo.lock in the PR. Commands that skip --locked: cargo +nightly fmt
+ # (no dep resolution), cargo msrv verify/show (wrapper tool), cargo miri setup (tooling).
+ #
+ # Swatinem/rust-cache caches the cargo registry and target directory (~450MB per job).
+     runs-on: ubuntu-latest
+     steps:
+       - uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4.3.1
+-      - name: Install minimal stable with rustfmt
++      - name: Install nightly with rustfmt
+         uses: actions-rust-lang/setup-rust-toolchain@150fca883cd4034361b621bd4e6a9d34e5143606 # v1.15.4
+         with:
+           cache: false
++          toolchain: nightly
+           components: rustfmt
+       - name: format
+-        run: cargo fmt -- --check
++        run: cargo +nightly fmt -- --check
+ 
+   msrv:
+     runs-on: ubuntu-latest
+           pushd kernel
+           echo "Testing with $(cargo msrv show --output-format minimal)"
+           cargo +$(cargo msrv show --output-format minimal) nextest run --locked
++          cargo +$(cargo msrv show --output-format minimal) test --doc
+   docs:
+     runs-on: ubuntu-latest
+     env:
+           cmake ..
+           make
+           make test
++      - name: build and run create-table test
++        run: |
++          pushd ffi/examples/create-table
++          mkdir build
++          pushd build
++          cmake ..
++          make
++          make test
++      # NOTE: write-table's ctest seeds its target table by invoking the create-table
++      # binary, so create-table must be built first (its build/ dir is preserved by the
++      # preceding step and write-table's CMakeLists references it via a relative path).
++      - name: build and run write-table test
++        run: |
++          pushd ffi/examples/write-table
++          mkdir build
++          pushd build
++          cmake ..
++          make
++          make test
++      - name: build and run read-table-changes test
++        run: |
++          pushd ffi/examples/read-table-changes
++          mkdir build
++          pushd build
++          cmake ..
++          make
++          make test
+   miri:
+     name: "Miri (shard ${{ matrix.partition }}/3)"
+     runs-on: ubuntu-latest
+       - name: Install cargo-llvm-cov
+         uses: taiki-e/install-action@2d15d02e710b40b6332201aba6af30d595b5cd96 # cargo-llvm-cov
+       - name: Generate code coverage
+-        run: cargo llvm-cov --locked --all-features --workspace --codecov --output-path codecov.json -- --skip read_table_version_hdfs
++        run: cargo llvm-cov --locked --all-features --workspace --codecov --output-path codecov.json -- --skip read_table_version_hdfs --skip handle::tests::invalid_handle_code
+       - name: Upload coverage to Codecov
+         uses: codecov/codecov-action@1af58845a975a7985b0beb0cbe6fbbb71a41dbad # v5.5.3
+         with:
\ No newline at end of file
.github/workflows/pr-body-validator.yml
@@ -0,0 +1,27 @@
+diff --git a/.github/workflows/pr-body-validator.yml b/.github/workflows/pr-body-validator.yml
+new file mode 100644
+--- /dev/null
++++ b/.github/workflows/pr-body-validator.yml
++name: Validate PR Body
++
++on:
++  pull_request:
++    types: [opened, edited]
++  merge_group:
++
++jobs:
++  validate-body:
++    runs-on: ubuntu-latest
++    steps:
++      - name: Validate PR Body
++        shell: bash
++        env:
++          PR_BODY: ${{ github.event.pull_request.body }}
++        run: |
++          if LC_ALL=C grep -q '[^[:print:][:space:]]' <<< "$PR_BODY"; then
++            echo "PR body contains non-ascii characters. Please remove them."
++            exit 1
++          else
++            echo "PR body contains ascii characters only"
++          fi
++
\ No newline at end of file
CHANGELOG.md
@@ -0,0 +1,282 @@
+diff --git a/CHANGELOG.md b/CHANGELOG.md
+--- a/CHANGELOG.md
++++ b/CHANGELOG.md
+ # Changelog
+ 
++## [v0.21.0](https://github.com/delta-io/delta-kernel-rs/tree/v0.21.0/) (2026-04-10)
++
++[Full Changelog](https://github.com/delta-io/delta-kernel-rs/compare/v0.20.0...v0.21.0)
++
++
++### 🏗️ Breaking changes
++
++1. Add partitioned variant to DataLayout enum ([#2145])
++   - Adds `Partitioned` variant to `DataLayout` enum. Update match statements to handle the new variant.
++2. Add create many API to engine ([#2070])
++   - Adds `create_many` method to `ParquetHandler` trait. Implementors must add this method. See the trait rustdocs for details.
++3. Rename uc-catalog and uc-client crates ([#2136])
++   - `delta-kernel-uc-catalog` renamed to `delta-kernel-unity-catalog`. `delta-kernel-uc-client` renamed to `unity-catalog-delta-rest-client`. Update `Cargo.toml` dependencies accordingly.
++4. Checksum and checkpoint APIs return updated Snapshot ([#2182])
++   - `Snapshot::checkpoint()` and checksum APIs now return the updated `Snapshot`. Callers must handle the returned value.
++5. Add P&M to CommitMetadata and enforce committer/table type matching ([#2250])
++   - Enforces that committer type matches table type (catalog-managed vs path-based). Use appropriate committer for your table type.
++6. Add UCCommitter validation for catalog-managed tables ([#2254])
++   - `UCCommitter` now rejects commits to non-catalog-managed tables. Use `FileSystemCommitter` for path-based tables.
++7. Refactor snapshot FFI to use builder pattern and enable snapshot reuse ([#2255])
++   - FFI snapshot creation now uses builder pattern. Update FFI callers to use the new builder APIs.
++8. Make tags and remove partition values allow null values in map ([#2281])
++   - `tags` and `partitionValues` map values are now nullable. Update code that assumes non-null values.
++9. Better naming style for column mapping related functions/variables ([#2290])
++   - Renamed: `make_physical` to `to_physical_name`, `make_physical_struct` to `to_physical_schema`, `transform_struct_for_projection` to `projection_transform`. Update call sites.
++10. Remove the catalog-managed feature flag ([#2310])
++    - The `catalog-managed` feature flag is removed. Catalog-managed table support is now always available.
++11. Update snapshot.checkpoint API to return a CheckpointResult ([#2314])
++    - `Snapshot::checkpoint()` now returns `CheckpointResult` instead of `Snapshot`. Access the snapshot via `CheckpointResult::snapshot`.
++12. Remove old non-builder snapshot FFI functions ([#2318])
++    - Removed legacy FFI snapshot functions. Use the new builder-pattern FFI functions instead.
++13. Support version 0 (table creation) commits in UCCommitter ([#2247])
++    - Connectors using `UCCommitter` for table creation must now handle post-commit finalization via the UC create table API.
++14. Pass computed ICT to CommitMetadata instead of wall-clock time ([#2319])
++    - `CommitMetadata` now uses computed in-commit timestamp instead of wall-clock time. Callers relying on wall-clock timing should update accordingly.
++15. Upgrade to arrow-58 and object_store-13, drop arrow-56 support ([#2116])
++    - Minimum supported Arrow version is now arrow-57. Update your `Cargo.toml` if using `arrow-56` feature.
++16. Crc File Histogram Read and Write Support ([#2235])
++    - Adds `AddedHistogram` and `RemovedHistogram` fields to `FileStatsDelta` struct.
++17. Add ScanMetadataCompleted metric event ([#2236])
++    - Adds `ScanMetadataCompleted` variant to `MetricEvent` enum. Update metric reporters to handle the new variant.
++18. Instrument JSON and Parquet handler reads with MetricsReporter ([#2169])
++    - Adds `JsonReadCompleted` and `ParquetReadCompleted` variants to `MetricEvent` enum. Update metric reporters to handle new variants.
++19. New transform helpers for unary and binary children ([#2150])
++    - Removes public `CowExt` trait. Remove any usages of this trait.
++20. New mod transforms for expression and schema transforms ([#2077])
++    - Moves `SchemaTransform` and `ExpressionTransform` to new `transforms` module. Update import paths.
++21. Introduce object_store compat shim ([#2111])
++    - Renames `object_store` dependency to `object_store_12`. Update any direct references.
++22. Consolidate domain metadata reads through Snapshot ([#2065])
++    - Domain metadata reads now go through `Snapshot` methods. Update callers using old free functions.
++23. Don't read or write arrow schema in parquet files ([#2025])
++    - Parquet files no longer include arrow schema metadata. Code relying on this metadata must be updated.
++24. Rename include_stats_columns to include_all_stats_columns ([#1996])
++    - Renames `ScanBuilder::include_stats_columns()` to `ScanBuilder::include_all_stats_columns()`. Update call sites.
++
++### 🚀 Features / new APIs
++
++1. Add SQL -> Kernel predicate parser to benchmark framework ([#2099])
++2. Add observability metrics for scan log replay ([#1866])
++3. Filtered engine data visitor ([#1942])
++4. Trigger benchmarking with comments ([#2089])
++5. Unify data stats and partition values in DataSkippingFilter ([#1948])
++6. Download benchmark workloads from DAT release ([#2163])
++7. Add partitioned variant to DataLayout enum ([#2145])
++8. Expose table_properties in FFI via visit_table_properties ([#2196])
++9. Allow checkpoint stats properties in CREATE TABLE ([#2210])
++10. Add crc file histogram initial struct and methods ([#2212])
++11. BinaryPredicate evaluate expression with ArrowViewType. ([#2052])
++12. Add acceptance workloads testing harness ([#2092])
++13. Enable DeletionVectors table feature in CREATE TABLE ([#2245])
++14. Checksum and checkpoint APIs return updated Snapshot ([#2182])
++15. Adding ScanBuilder FFI functions for Scans ([#2237])
++16. Add CountingReporter and fix metrics forwarding ([#2166])
++17. Instrument JSON and Parquet handler reads with MetricsReporter ([#2169])
++18. Wire CountingReporter into workload benchmarks ([#2171])
++19. Add create many API to engine ([#2070])
++20. Add ScanMetadataCompleted metric event ([#2236])
++21. Allow AppendOnly, ChangeDataFeed, and TypeWidening in CREATE TABLE ([#2279])
++22. Support max timestamp stats for data skipping ([#2249])
++23. Add list with backward checkpoint scan ([#2174])
++24. Add Snapshot::get_timestamp ([#2266])
++25. Make tags  and remove partition values allow null values in map ([#2281])
++26. Support UC credential vending and S3 benchmarks ([#2109])
++27. Add catalogManaged to allowed features in CREATE TABLE ([#2293])
++28. Add catalog-managed table creation utilities ([#2203])
++29. Support version 0 (table creation) commits in UCCommitter ([#2247])
++30. Update snapshot.checkpoint API to return a CheckpointResult ([#2314])
++31. Cached checkpoint output schema ([#2270])
++32. Refactor snapshot FFI to use builder pattern and enable snapshot reuse ([#2255])
++33. Add P&M to CommitMetadata and enforce committer/table type matching ([#2250])
++34. Add UCCommitter validation for catalog-managed tables ([#2254])
++35. Crc File Histogram Read and Write Support ([#2235])
++36. Add FFI function to expose snapshot's timestamp ([#2274])
++37. Add FFI create table DDL functions ([#2296])
++38. Add FFI remove files DML functions ([#2297])
++39. Expose Protocol and Metadata as opaque FFI handle types ([#2260])
++40. Add FFI bindings for domain metadata write operations ([#2327])
++
++### 🐛 Bug Fixes
++
++1. Treat null literal as unknown in meta-predicate evaluation ([#2097])
++2. Update TokioBackgroundExecutor to join thread instead of detaching ([#2126])
++3. Use thread pools and multi-thread tokio executor in read metadata benchmark runner ([#2044])
++4. Emit null stats for all-null columns instead of omitting them ([#2187])
++5. Allow Date/Timestamp casting for stats_parsed compatibility ([#2074])
++6. Filter evaluator input schema ([#2195])
++7. SnapshotCompleted.total_duration now includes log segment loading ([#2183])
++8. Avoid creating empty stats schemas ([#2199])
++9. Prevent dual TLS crypto backends from reqwest default features ([#2178])
++10. Vendor and pin homebrew actions ([#2243])
++11. Validate min_reader/writer_version are at least 1 ([#2202])
++12. Preserve loaded LazyCrc during incremental snapshot updates ([#2211])
++13. Detect stats_parsed in multi-part V1 checkpoints ([#2214])
++14. Downgrade per-batch data skipping log from info to debug ([#2219])
++15. Unknown table features in feature list are "supported" ([#2159])
++16. Remove debug_assert_eq before require in scan evaluator row count checks ([#2262])
++17. Adopt checkpoint written later for same-version snapshot refresh ([#2143])
++18. Return error when parquet handler returns empty data for scan files ([#2261])
++19. Refactor benchmarking workflow to not require criterion compare action ([#2264])
++20. Skip name-based validation for struct columns in expression evaluator ([#2160])
++21. Handle missing leaf columns in nested struct during parquet projection ([#2170])
++22. Pass computed ICT to CommitMetadata instead of wall-clock time ([#2319])
++23. Detect and handle empty (0-byte) log files during listing ([#2336])
++
++### 📚 Documentation
++
++1. Update claude readme to include github actions safety note ([#2190])
++2. Add line width and comment divider style rules to CLAUDE.md ([#2277])
++3. Add documentation for current tags ([#2234])
++4. Document benchmarking in CI accuracy ([#2302])
++
++### ⚡ Performance
++
++1. Pre-size dedup HashSet in ScanLogReplayProcessor ([#2186])
++2. Pre-size HashMap in ArrowEngineData::visit_rows ([#2185])
++3. Remove dead schema conversions in expression evaluators ([#2184])
++
++### 🚜 Refactor
++
++1. Finalized benchmark table names and added new tables ([#2072])
++2. New transform helpers for unary and binary children ([#2150])
++3. Remove legacy row-level partition filter path ([#2158])
++4. Restructured list log files function ([#2173])
++5. Consolidate and add testing for set transaction expiration ([#2176])
++6. Rename uc-catalog and uc-client crates ([#2136])
++7. Better naming style for column mapping related functions/variables ([#2290])
++8. Centralize computation for physical schema without partition columns ([#2142])
++9. Consolidate FFI test setup helpers into ffi_test_utils ([#2307])
++10. *(action_reconciliation)* Combine getter index and field name constants ([#1717]) ([#1774])
++11. Extract shared stat helpers from RowGroupFilter ([#2324])
++12. Extract WriteContext to its own file ([#2349])
++
++### ⚙️ Chores/CI
++
++1. Clean up arrow deps in cargo files ([#2115])
++2. Commit Cargo.lock and enforce --locked in all CI workflows ([#2240])
++3. Harden pr-title-validator a bit ([#2246])
++4. Renable semver ([#2248])
++5. Attempt fixup of semver-label job ([#2253])
++6. Use artifacts for semver label ([#2258])
++7. Remove old non-builder snapshot FFI functions ([#2318])
++8. Remove the catalog-managed feature flag ([#2310])
++9. Upgrade to arrow-58 and object_store-13, drop arrow-56 support ([#2116])
++
++### Other
++
++[#2097]: https://github.com/delta-io/delta-kernel-rs/pull/2097
++[#2099]: https://github.com/delta-io/delta-kernel-rs/pull/2099
++[#2126]: https://github.com/delta-io/delta-kernel-rs/pull/2126
++[#2115]: https://github.com/delta-io/delta-kernel-rs/pull/2115
++[#1866]: https://github.com/delta-io/delta-kernel-rs/pull/1866
++[#2044]: https://github.com/delta-io/delta-kernel-rs/pull/2044
++[#1942]: https://github.com/delta-io/delta-kernel-rs/pull/1942
++[#2072]: https://github.com/delta-io/delta-kernel-rs/pull/2072
++[#2089]: https://github.com/delta-io/delta-kernel-rs/pull/2089
++[#2187]: https://github.com/delta-io/delta-kernel-rs/pull/2187
++[#2190]: https://github.com/delta-io/delta-kernel-rs/pull/2190
++[#1948]: https://github.com/delta-io/delta-kernel-rs/pull/1948
++[#2150]: https://github.com/delta-io/delta-kernel-rs/pull/2150
++[#2074]: https://github.com/delta-io/delta-kernel-rs/pull/2074
++[#2195]: https://github.com/delta-io/delta-kernel-rs/pull/2195
++[#2158]: https://github.com/delta-io/delta-kernel-rs/pull/2158
++[#2186]: https://github.com/delta-io/delta-kernel-rs/pull/2186
++[#2185]: https://github.com/delta-io/delta-kernel-rs/pull/2185
++[#2173]: https://github.com/delta-io/delta-kernel-rs/pull/2173
++[#2163]: https://github.com/delta-io/delta-kernel-rs/pull/2163
++[#2145]: https://github.com/delta-io/delta-kernel-rs/pull/2145
++[#2184]: https://github.com/delta-io/delta-kernel-rs/pull/2184
++[#2183]: https://github.com/delta-io/delta-kernel-rs/pull/2183
++[#2199]: https://github.com/delta-io/delta-kernel-rs/pull/2199
++[#2196]: https://github.com/delta-io/delta-kernel-rs/pull/2196
++[#2210]: https://github.com/delta-io/delta-kernel-rs/pull/2210
++[#2178]: https://github.com/delta-io/delta-kernel-rs/pull/2178
++[#2240]: https://github.com/delta-io/delta-kernel-rs/pull/2240
++[#2243]: https://github.com/delta-io/delta-kernel-rs/pull/2243
++[#2202]: https://github.com/delta-io/delta-kernel-rs/pull/2202
++[#2211]: https://github.com/delta-io/delta-kernel-rs/pull/2211
++[#2214]: https://github.com/delta-io/delta-kernel-rs/pull/2214
++[#2246]: https://github.com/delta-io/delta-kernel-rs/pull/2246
++[#2219]: https://github.com/delta-io/delta-kernel-rs/pull/2219
++[#2212]: https://github.com/delta-io/delta-kernel-rs/pull/2212
++[#2176]: https://github.com/delta-io/delta-kernel-rs/pull/2176
++[#2159]: https://github.com/delta-io/delta-kernel-rs/pull/2159
++[#2248]: https://github.com/delta-io/delta-kernel-rs/pull/2248
++[#2253]: https://github.com/delta-io/delta-kernel-rs/pull/2253
++[#2052]: https://github.com/delta-io/delta-kernel-rs/pull/2052
++[#2092]: https://github.com/delta-io/delta-kernel-rs/pull/2092
++[#2258]: https://github.com/delta-io/delta-kernel-rs/pull/2258
++[#2136]: https://github.com/delta-io/delta-kernel-rs/pull/2136
++[#2245]: https://github.com/delta-io/delta-kernel-rs/pull/2245
++[#2182]: https://github.com/delta-io/delta-kernel-rs/pull/2182
++[#2262]: https://github.com/delta-io/delta-kernel-rs/pull/2262
++[#2237]: https://github.com/delta-io/delta-kernel-rs/pull/2237
++[#2166]: https://github.com/delta-io/delta-kernel-rs/pull/2166
++[#2169]: https://github.com/delta-io/delta-kernel-rs/pull/2169
++[#2171]: https://github.com/delta-io/delta-kernel-rs/pull/2171
++[#2143]: https://github.com/delta-io/delta-kernel-rs/pull/2143
++[#2070]: https://github.com/delta-io/delta-kernel-rs/pull/2070
++[#2261]: https://github.com/delta-io/delta-kernel-rs/pull/2261
++[#2277]: https://github.com/delta-io/delta-kernel-rs/pull/2277
++[#2236]: https://github.com/delta-io/delta-kernel-rs/pull/2236
++[#2279]: https://github.com/delta-io/delta-kernel-rs/pull/2279
++[#2249]: https://github.com/delta-io/delta-kernel-rs/pull/2249
++[#2290]: https://github.com/delta-io/delta-kernel-rs/pull/2290
++[#2174]: https://github.com/delta-io/delta-kernel-rs/pull/2174
++[#2264]: https://github.com/delta-io/delta-kernel-rs/pull/2264
++[#2234]: https://github.com/delta-io/delta-kernel-rs/pull/2234
++[#2302]: https://github.com/delta-io/delta-kernel-rs/pull/2302
++[#2142]: https://github.com/delta-io/delta-kernel-rs/pull/2142
++[#2266]: https://github.com/delta-io/delta-kernel-rs/pull/2266
++[#2281]: https://github.com/delta-io/delta-kernel-rs/pull/2281
++[#2109]: https://github.com/delta-io/delta-kernel-rs/pull/2109
++[#2293]: https://github.com/delta-io/delta-kernel-rs/pull/2293
++[#2203]: https://github.com/delta-io/delta-kernel-rs/pull/2203
++[#2247]: https://github.com/delta-io/delta-kernel-rs/pull/2247
++[#2160]: https://github.com/delta-io/delta-kernel-rs/pull/2160
++[#2314]: https://github.com/delta-io/delta-kernel-rs/pull/2314
++[#2270]: https://github.com/delta-io/delta-kernel-rs/pull/2270
++[#2255]: https://github.com/delta-io/delta-kernel-rs/pull/2255
++[#2250]: https://github.com/delta-io/delta-kernel-rs/pull/2250
++[#2254]: https://github.com/delta-io/delta-kernel-rs/pull/2254
++[#2307]: https://github.com/delta-io/delta-kernel-rs/pull/2307
++[#2170]: https://github.com/delta-io/delta-kernel-rs/pull/2170
++[#2235]: https://github.com/delta-io/delta-kernel-rs/pull/2235
++[#2274]: https://github.com/delta-io/delta-kernel-rs/pull/2274
++[#1774]: https://github.com/delta-io/delta-kernel-rs/pull/1774
++[#2296]: https://github.com/delta-io/delta-kernel-rs/pull/2296
++[#2318]: https://github.com/delta-io/delta-kernel-rs/pull/2318
++[#2310]: https://github.com/delta-io/delta-kernel-rs/pull/2310
++[#2297]: https://github.com/delta-io/delta-kernel-rs/pull/2297
++[#2324]: https://github.com/delta-io/delta-kernel-rs/pull/2324
++[#2260]: https://github.com/delta-io/delta-kernel-rs/pull/2260
++[#2327]: https://github.com/delta-io/delta-kernel-rs/pull/2327
++[#2319]: https://github.com/delta-io/delta-kernel-rs/pull/2319
++[#2116]: https://github.com/delta-io/delta-kernel-rs/pull/2116
++[#2349]: https://github.com/delta-io/delta-kernel-rs/pull/2349
++[#2336]: https://github.com/delta-io/delta-kernel-rs/pull/2336
++[#2077]: https://github.com/delta-io/delta-kernel-rs/pull/2077                                                                                               
++[#2111]: https://github.com/delta-io/delta-kernel-rs/pull/2111                                                                                                 
++[#2065]: https://github.com/delta-io/delta-kernel-rs/pull/2065                                                                                               
++[#2025]: https://github.com/delta-io/delta-kernel-rs/pull/2025                                                                                               
++[#1996]: https://github.com/delta-io/delta-kernel-rs/pull/1996
++[#1717]: https://github.com/delta-io/delta-kernel-rs/pull/1717
++[#1922]: https://github.com/delta-io/delta-kernel-rs/pull/1922
++
+ ## [v0.20.0](https://github.com/delta-io/delta-kernel-rs/tree/v0.20.0/) (2026-02-26)
+ 
+ [Full Changelog](https://github.com/delta-io/delta-kernel-rs/compare/v0.19.2...v0.20.0)
+ 22. Implement schema diffing for flat schemas (2/5]) ([#1478])
+ 23. Add API on Scan to perform 2-phase log replay  ([#1547])
+ 24. Enable distributed log replay serde serialization for serializable scan state ([#1549])
+-25. Add InCommitTimestamp support to ChangeDataFeed ([#1670]) 
++25. Add InCommitTimestamp support to ChangeDataFeed ([#1670])
+ 26. Add include_stats_columns API and output_stats_schema field ([#1728])
+ 27. Add write support for clustered tables behind feature flag ([#1704])
+ 28. Add snapshot load instrumentation ([#1750])
\ No newline at end of file
CLAUDE.md
@@ -0,0 +1,108 @@
+diff --git a/CLAUDE.md b/CLAUDE.md
+--- a/CLAUDE.md
++++ b/CLAUDE.md
+ (`Snapshot`, `Scan`, `Transaction`) and delegates _how_ to the `Engine` trait.
+ 
+ Current capabilities: table reads with predicates, data skipping, deletion vectors, change
+-data feed, checkpoints (V1 & V2), log compaction, blind append writes, table creation
++data feed, checkpoints (V1 & V2), log compaction (disabled, #2337), blind append writes, table creation
+ (including clustered tables), and catalog-managed table support.
+ 
+ ## Build & Test Commands
+ cargo nextest run --workspace --all-features test_name_here
+ 
+ # Format, lint, and doc check (always run after code changes)
+-cargo fmt \
++cargo +nightly fmt \
+   && cargo clippy --workspace --benches --tests --all-features -- -D warnings \
+   && cargo doc --workspace --all-features --no-deps
+ 
+   --exclude delta_kernel --exclude delta_kernel_ffi --exclude delta_kernel_derive --exclude delta_kernel_ffi_macros -- -D warnings
+ 
+ # Quick pre-push check (mimics CI)
+-cargo fmt \
++cargo +nightly fmt \
+   && cargo clippy --workspace --benches --tests --all-features -- -D warnings \
+   && cargo doc --workspace --all-features --no-deps \
+   && cargo nextest run --workspace --all-features
+ 
+ ### Feature Flags
+ 
+-- `default-engine` / `default-engine-rustls` / `default-engine-native-tls` -- async
+-  Arrow/Tokio engine (pick one TLS backend)
++- `default-engine-rustls` / `default-engine-native-tls` -- async Arrow/Tokio engine (pick a TLS backend)
+ - `arrow`, `arrow-XX`, `arrow-YY` -- Arrow version selection (kernel tracks the latest two
+   major Arrow releases; `arrow` defaults to latest). Kernel itself does not depend on Arrow,
+-  but default-engine does.
++  but the default engine does.
+ - `arrow-conversion`, `arrow-expression` -- Arrow interop (auto-enabled by default engine)
+ - `prettyprint` -- enables Arrow pretty-print helpers (primarily test/example oriented)
+-- `catalog-managed` -- catalog-managed table support (experimental)
+ - `clustered-table` -- clustered table write support (experimental)
+ - `internal-api` -- unstable APIs like `parallel_scan_metadata`. Items are marked with the
+   `#[internal_api]` proc macro attribute.
+ `execute()` (simple), `scan_metadata()` (advanced/distributed),
+ `parallel_scan_metadata()` (two-phase distributed log replay).
+ 
+-**Write path:** `Snapshot` -> `Transaction` -> `commit()`. Kernel provides `WriteContext`,
+-assembles commit actions, enforces protocol compliance, delegates atomic commit to a
+-`Committer`.
++**Write path:** `Snapshot` -> `Transaction` -> `commit()`. Kernel provides `WriteContext`
++(via `partitioned_write_context` or `unpartitioned_write_context`), assembles commit
++actions, enforces protocol compliance, delegates atomic commit to a `Committer`.
+ 
+ **Engine trait:** five handlers (`StorageHandler`, `JsonHandler`, `ParquetHandler`,
+ `EvaluationHandler`, optional `MetricsReporter`). `DefaultEngine` lives in
+   or inputs. Prefer `#[case]` over duplicating test functions. When parameters are
+   independent and form a cartesian product, prefer `#[values]` over enumerating
+   every combination with `#[case]`.
++- Actively look for rstest consolidation opportunities: when writing multiple tests
++  that share the same setup/flow and differ only in configuration and expected
++  outcome, write one parameterized rstest instead of separate functions. Also check
++  whether a new test duplicates the flow of an existing nearby test and should be
++  merged into it as a new `#[case]`. A common pattern is toggling a feature (e.g.
++  column mapping on/off) and asserting success vs. error.
+ - Reuse helpers from `test_utils` instead of writing custom ones when possible.
++- **Committing in tests:** Use `txn.commit(engine)?.unwrap_committed()` to assert a
++  successful commit and get the `CommittedTransaction`. Do NOT use `match` + `panic!`
++  for this -- `unwrap_committed()` provides a clear error message on failure. Available
++  under `#[cfg(test)]` and the `test-utils` feature.
++- **Prefer snapshot/public API assertions over reading raw commit JSON.** Only read raw
++  commit JSON when the data is inaccessible via public API (e.g., system domain metadata
++  is blocked by `get_domain_metadata`). For commit JSON reads, use `read_actions_from_commit`
++  from `test_utils` -- do NOT write local helpers that duplicate this.
+ - **`add_commit` and table setup in tests:** `add_commit` takes a `table_root` string and
+   resolves it to an absolute object-store path. The `table_root` must be a proper URL string
+   with a trailing slash (e.g. `"memory:///"`, `"file:///tmp/my_table/"`). Avoid using the
+   `allowColumnDefaults`, `changeDataFeed`, `identityColumns`, `rowTracking`,
+   `domainMetadata`, `icebergCompatV1`, `icebergCompatV2`, `clustering`,
+   `inCommitTimestamp`
+-- Reader + writer: `columnMapping`, `deletionVectors`, `timestampNtz`,
+-  `v2Checkpoint`, `vacuumProtocolCheck`, `variantType`, `variantType-preview`,
+-  `typeWidening`
++- Reader + writer: `catalogManaged`, `catalogOwned-preview`, `columnMapping`,
++  `deletionVectors`, `timestampNtz`, `v2Checkpoint`, `vacuumProtocolCheck`,
++  `variantType`, `variantType-preview`, `typeWidening`
+ 
+ Keep this list updated when new protocol features are added to kernel.
+ 
+ - Code comments state intent and explain "why" -- don't restate what the code self-documents.
+ - Place `use` imports at the top of the file (for non-test code) or at the top of the
+   `mod tests` block (for test code) -- never inside function bodies.
++- Prefer `==` over `matches!` for simple single-variant enum comparisons. `matches!` is
++  for patterns with bindings or guards. For example: `self == Variant` not
++  `matches!(self, Variant)`.
++- Prefer `StructField::nullable` / `StructField::not_null` over
++  `StructField::new(name, type, bool)` when nullability is known at compile time.
++  Reserve `StructField::new` for cases where nullability is a runtime value.
+ - NEVER panic in production code -- use errors instead. Panicking
+   (including `unwrap()`, `expect()`, `panic!()`, `unreachable!()`, etc) is acceptable in test code only.
+ 
+ a newer (potentially compromised) transitive dependency. If `Cargo.lock` is out of sync with
+ `Cargo.toml`, the build fails immediately, forcing dependency changes to be explicit and
+ reviewable. See the top-level comment in `build.yml` for full rationale. Commands exempt from
+-`--locked`: `cargo fmt` (no dep resolution), `cargo msrv verify/show` (wrapper tool),
++`--locked`: `cargo +nightly fmt` (no dep resolution), `cargo msrv verify/show` (wrapper tool),
+ `cargo miri setup` (tooling setup).
+ 
+ Ensure that when writing any github action you are considering safety including thinking of
\ No newline at end of file
CLAUDE/architecture.md
@@ -0,0 +1,49 @@
+diff --git a/CLAUDE/architecture.md b/CLAUDE/architecture.md
+--- a/CLAUDE/architecture.md
++++ b/CLAUDE/architecture.md
+ 
+ Built via `Snapshot::builder_for(url).build(engine)` (latest version) or
+ `.at_version(v).build(engine)` (specific version). For catalog-managed tables,
+-`.with_log_tail(commits)` supplies recent unpublished commits from the catalog.
++`.with_log_tail(commits)` supplies recent unpublished commits from the catalog and
++`.with_max_catalog_version(v)` caps the snapshot at the latest catalog-ratified version.
+ 
+ **Snapshot loading internals:**
+ 1. **LogSegment** (`kernel/src/log_segment/`) -- discovers commits + checkpoints for the
+ 
+ `Snapshot` -> `Transaction` -> commit
+ 
+-The kernel coordinates the write transaction: it provides the write context (target directory,
+-physical schema, stats columns), assembles commit actions (CommitInfo, Add files), enforces
+-protocol compliance (table features, schema validation), and delegates the atomic commit to a
+-`Committer`.
++The kernel coordinates the write transaction: it provides the write context (validated partition
++values, recommended write directory, physical schema, stats columns), assembles commit
++actions (CommitInfo, Add files), enforces protocol compliance (table features, schema validation),
++and delegates the atomic commit to a `Committer`.
+ 
+ **Steps:**
+ 1. Create `Transaction` from a snapshot with a `Committer` (e.g. `FileSystemCommitter`)
+-2. Get `WriteContext` for target dir, physical schema, and stats columns
++2. Get `WriteContext` via `partitioned_write_context(values)` or `unpartitioned_write_context()`
+ 3. Write Parquet files (via engine), collect file metadata
+ 4. Register files via `txn.add_files(metadata)`
+ 5. Commit: returns `CommittedTransaction`, `ConflictedTransaction`, or `RetryableTransaction`
+ - `kernel/src/snapshot/` -- `Snapshot`, `SnapshotBuilder`, entry point for reads/writes
+ - `kernel/src/scan/` -- `Scan`, `ScanBuilder`, log replay, data skipping
+ - `kernel/src/transaction/` -- `Transaction`, `WriteContext`, `create_table` builder
++- `kernel/src/partition/` -- partition value validation, serialization, Hive-style path
++   encoding, URI encoding for `add.path`
+ - `kernel/src/committer/` -- `Committer` trait, `FileSystemCommitter`
+ - `kernel/src/log_segment/` -- log file discovery, Protocol/Metadata replay
+ - `kernel/src/log_replay.rs` -- file-action deduplication, `LogReplayProcessor` trait
+ 
+ Tables whose commits go through a catalog (e.g. Unity Catalog) instead of direct filesystem
+ writes. Kernel doesn't know about catalogs -- the catalog client provides a log tail via
+-`SnapshotBuilder::with_log_tail()` and a custom `Committer` for staging/ratifying/publishing
+-commits. Requires `catalog-managed` feature flag.
++`SnapshotBuilder::with_log_tail()`, caps the version via `with_max_catalog_version()`, and
++uses a custom `Committer` for staging/ratifying/publishing commits.
+ 
+ The `UCCommitter` (in the `delta-kernel-unity-catalog` crate) is the reference implementation of a catalog
+ committer for Unity Catalog. It stages commits to `_staged_commits/`, calls the UC commit API to
\ No newline at end of file
CONTRIBUTING.md
@@ -0,0 +1,19 @@
+diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
+--- a/CONTRIBUTING.md
++++ b/CONTRIBUTING.md
+    # build docs
+    cargo doc --workspace --all-features
+    # highly recommend editor that automatically formats, but in case you need to:
+-   cargo fmt
++   cargo +nightly fmt
+ 
+    # run more tests
+    cargo test --workspace --all-features -- --skip read_table_version_hdfs
+ #### General Tips
+ 
+ 1. When making your first PR, please read our contributor guidelines: https://github.com/delta-incubator/delta-kernel-rs/blob/main/CONTRIBUTING.md
+-2. Run `cargo t --all-features --all-targets` to get started testing, and run `cargo fmt`.
++2. Run `cargo t --all-features --all-targets` to get started testing, and run `cargo +nightly fmt`.
+ 3. Ensure you have added or run the appropriate tests for your PR.
+ 4. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP] Your PR title ...'.
+ 5. Be sure to keep the PR description updated to reflect all changes.
\ No newline at end of file

... (truncated, output exceeded 60000 bytes)

Reproduce locally: git range-diff ac9dc19..ea12da4 7866824..14a713c | Disable: git config gitstack.push-range-diff false

@lorenarosati
Copy link
Copy Markdown
Collaborator Author

Range-diff: main (14a713c -> 68b8cd8)
.github/workflows/build.yml
@@ -0,0 +1,75 @@
+diff --git a/.github/workflows/build.yml b/.github/workflows/build.yml
+--- a/.github/workflows/build.yml
++++ b/.github/workflows/build.yml
+ # enforce the committed Cargo.lock. This prevents CI from silently resolving a newer
+ # (potentially compromised) dependency version. If Cargo.lock is out of sync with
+ # Cargo.toml, the build fails immediately. Any dependency change must be an explicit,
+-# reviewable update to Cargo.lock in the PR. Commands that skip --locked: cargo fmt
++# reviewable update to Cargo.lock in the PR. Commands that skip --locked: cargo +nightly fmt
+ # (no dep resolution), cargo msrv verify/show (wrapper tool), cargo miri setup (tooling).
+ #
+ # Swatinem/rust-cache caches the cargo registry and target directory (~450MB per job).
+     runs-on: ubuntu-latest
+     steps:
+       - uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4.3.1
+-      - name: Install minimal stable with rustfmt
++      - name: Install nightly with rustfmt
+         uses: actions-rust-lang/setup-rust-toolchain@150fca883cd4034361b621bd4e6a9d34e5143606 # v1.15.4
+         with:
+           cache: false
++          toolchain: nightly
+           components: rustfmt
+       - name: format
+-        run: cargo fmt -- --check
++        run: cargo +nightly fmt -- --check
+ 
+   msrv:
+     runs-on: ubuntu-latest
+           pushd kernel
+           echo "Testing with $(cargo msrv show --output-format minimal)"
+           cargo +$(cargo msrv show --output-format minimal) nextest run --locked
++          cargo +$(cargo msrv show --output-format minimal) test --doc
+   docs:
+     runs-on: ubuntu-latest
+     env:
+           cmake ..
+           make
+           make test
++      - name: build and run create-table test
++        run: |
++          pushd ffi/examples/create-table
++          mkdir build
++          pushd build
++          cmake ..
++          make
++          make test
++      # NOTE: write-table's ctest seeds its target table by invoking the create-table
++      # binary, so create-table must be built first (its build/ dir is preserved by the
++      # preceding step and write-table's CMakeLists references it via a relative path).
++      - name: build and run write-table test
++        run: |
++          pushd ffi/examples/write-table
++          mkdir build
++          pushd build
++          cmake ..
++          make
++          make test
++      - name: build and run read-table-changes test
++        run: |
++          pushd ffi/examples/read-table-changes
++          mkdir build
++          pushd build
++          cmake ..
++          make
++          make test
+   miri:
+     name: "Miri (shard ${{ matrix.partition }}/3)"
+     runs-on: ubuntu-latest
+       - name: Install cargo-llvm-cov
+         uses: taiki-e/install-action@2d15d02e710b40b6332201aba6af30d595b5cd96 # cargo-llvm-cov
+       - name: Generate code coverage
+-        run: cargo llvm-cov --locked --all-features --workspace --codecov --output-path codecov.json -- --skip read_table_version_hdfs
++        run: cargo llvm-cov --locked --all-features --workspace --codecov --output-path codecov.json -- --skip read_table_version_hdfs --skip handle::tests::invalid_handle_code
+       - name: Upload coverage to Codecov
+         uses: codecov/codecov-action@1af58845a975a7985b0beb0cbe6fbbb71a41dbad # v5.5.3
+         with:
\ No newline at end of file
.github/workflows/pr-body-validator.yml
@@ -0,0 +1,27 @@
+diff --git a/.github/workflows/pr-body-validator.yml b/.github/workflows/pr-body-validator.yml
+new file mode 100644
+--- /dev/null
++++ b/.github/workflows/pr-body-validator.yml
++name: Validate PR Body
++
++on:
++  pull_request:
++    types: [opened, edited]
++  merge_group:
++
++jobs:
++  validate-body:
++    runs-on: ubuntu-latest
++    steps:
++      - name: Validate PR Body
++        shell: bash
++        env:
++          PR_BODY: ${{ github.event.pull_request.body }}
++        run: |
++          if LC_ALL=C grep -q '[^[:print:][:space:]]' <<< "$PR_BODY"; then
++            echo "PR body contains non-ascii characters. Please remove them."
++            exit 1
++          else
++            echo "PR body contains ascii characters only"
++          fi
++
\ No newline at end of file
CHANGELOG.md
@@ -0,0 +1,282 @@
+diff --git a/CHANGELOG.md b/CHANGELOG.md
+--- a/CHANGELOG.md
++++ b/CHANGELOG.md
+ # Changelog
+ 
++## [v0.21.0](https://github.com/delta-io/delta-kernel-rs/tree/v0.21.0/) (2026-04-10)
++
++[Full Changelog](https://github.com/delta-io/delta-kernel-rs/compare/v0.20.0...v0.21.0)
++
++
++### 🏗️ Breaking changes
++
++1. Add partitioned variant to DataLayout enum ([#2145])
++   - Adds `Partitioned` variant to `DataLayout` enum. Update match statements to handle the new variant.
++2. Add create many API to engine ([#2070])
++   - Adds `create_many` method to `ParquetHandler` trait. Implementors must add this method. See the trait rustdocs for details.
++3. Rename uc-catalog and uc-client crates ([#2136])
++   - `delta-kernel-uc-catalog` renamed to `delta-kernel-unity-catalog`. `delta-kernel-uc-client` renamed to `unity-catalog-delta-rest-client`. Update `Cargo.toml` dependencies accordingly.
++4. Checksum and checkpoint APIs return updated Snapshot ([#2182])
++   - `Snapshot::checkpoint()` and checksum APIs now return the updated `Snapshot`. Callers must handle the returned value.
++5. Add P&M to CommitMetadata and enforce committer/table type matching ([#2250])
++   - Enforces that committer type matches table type (catalog-managed vs path-based). Use appropriate committer for your table type.
++6. Add UCCommitter validation for catalog-managed tables ([#2254])
++   - `UCCommitter` now rejects commits to non-catalog-managed tables. Use `FileSystemCommitter` for path-based tables.
++7. Refactor snapshot FFI to use builder pattern and enable snapshot reuse ([#2255])
++   - FFI snapshot creation now uses builder pattern. Update FFI callers to use the new builder APIs.
++8. Make tags and remove partition values allow null values in map ([#2281])
++   - `tags` and `partitionValues` map values are now nullable. Update code that assumes non-null values.
++9. Better naming style for column mapping related functions/variables ([#2290])
++   - Renamed: `make_physical` to `to_physical_name`, `make_physical_struct` to `to_physical_schema`, `transform_struct_for_projection` to `projection_transform`. Update call sites.
++10. Remove the catalog-managed feature flag ([#2310])
++    - The `catalog-managed` feature flag is removed. Catalog-managed table support is now always available.
++11. Update snapshot.checkpoint API to return a CheckpointResult ([#2314])
++    - `Snapshot::checkpoint()` now returns `CheckpointResult` instead of `Snapshot`. Access the snapshot via `CheckpointResult::snapshot`.
++12. Remove old non-builder snapshot FFI functions ([#2318])
++    - Removed legacy FFI snapshot functions. Use the new builder-pattern FFI functions instead.
++13. Support version 0 (table creation) commits in UCCommitter ([#2247])
++    - Connectors using `UCCommitter` for table creation must now handle post-commit finalization via the UC create table API.
++14. Pass computed ICT to CommitMetadata instead of wall-clock time ([#2319])
++    - `CommitMetadata` now uses computed in-commit timestamp instead of wall-clock time. Callers relying on wall-clock timing should update accordingly.
++15. Upgrade to arrow-58 and object_store-13, drop arrow-56 support ([#2116])
++    - Minimum supported Arrow version is now arrow-57. Update your `Cargo.toml` if using `arrow-56` feature.
++16. Crc File Histogram Read and Write Support ([#2235])
++    - Adds `AddedHistogram` and `RemovedHistogram` fields to `FileStatsDelta` struct.
++17. Add ScanMetadataCompleted metric event ([#2236])
++    - Adds `ScanMetadataCompleted` variant to `MetricEvent` enum. Update metric reporters to handle the new variant.
++18. Instrument JSON and Parquet handler reads with MetricsReporter ([#2169])
++    - Adds `JsonReadCompleted` and `ParquetReadCompleted` variants to `MetricEvent` enum. Update metric reporters to handle new variants.
++19. New transform helpers for unary and binary children ([#2150])
++    - Removes public `CowExt` trait. Remove any usages of this trait.
++20. New mod transforms for expression and schema transforms ([#2077])
++    - Moves `SchemaTransform` and `ExpressionTransform` to new `transforms` module. Update import paths.
++21. Introduce object_store compat shim ([#2111])
++    - Renames `object_store` dependency to `object_store_12`. Update any direct references.
++22. Consolidate domain metadata reads through Snapshot ([#2065])
++    - Domain metadata reads now go through `Snapshot` methods. Update callers using old free functions.
++23. Don't read or write arrow schema in parquet files ([#2025])
++    - Parquet files no longer include arrow schema metadata. Code relying on this metadata must be updated.
++24. Rename include_stats_columns to include_all_stats_columns ([#1996])
++    - Renames `ScanBuilder::include_stats_columns()` to `ScanBuilder::include_all_stats_columns()`. Update call sites.
++
++### 🚀 Features / new APIs
++
++1. Add SQL -> Kernel predicate parser to benchmark framework ([#2099])
++2. Add observability metrics for scan log replay ([#1866])
++3. Filtered engine data visitor ([#1942])
++4. Trigger benchmarking with comments ([#2089])
++5. Unify data stats and partition values in DataSkippingFilter ([#1948])
++6. Download benchmark workloads from DAT release ([#2163])
++7. Add partitioned variant to DataLayout enum ([#2145])
++8. Expose table_properties in FFI via visit_table_properties ([#2196])
++9. Allow checkpoint stats properties in CREATE TABLE ([#2210])
++10. Add crc file histogram initial struct and methods ([#2212])
++11. BinaryPredicate evaluate expression with ArrowViewType. ([#2052])
++12. Add acceptance workloads testing harness ([#2092])
++13. Enable DeletionVectors table feature in CREATE TABLE ([#2245])
++14. Checksum and checkpoint APIs return updated Snapshot ([#2182])
++15. Adding ScanBuilder FFI functions for Scans ([#2237])
++16. Add CountingReporter and fix metrics forwarding ([#2166])
++17. Instrument JSON and Parquet handler reads with MetricsReporter ([#2169])
++18. Wire CountingReporter into workload benchmarks ([#2171])
++19. Add create many API to engine ([#2070])
++20. Add ScanMetadataCompleted metric event ([#2236])
++21. Allow AppendOnly, ChangeDataFeed, and TypeWidening in CREATE TABLE ([#2279])
++22. Support max timestamp stats for data skipping ([#2249])
++23. Add list with backward checkpoint scan ([#2174])
++24. Add Snapshot::get_timestamp ([#2266])
++25. Make tags  and remove partition values allow null values in map ([#2281])
++26. Support UC credential vending and S3 benchmarks ([#2109])
++27. Add catalogManaged to allowed features in CREATE TABLE ([#2293])
++28. Add catalog-managed table creation utilities ([#2203])
++29. Support version 0 (table creation) commits in UCCommitter ([#2247])
++30. Update snapshot.checkpoint API to return a CheckpointResult ([#2314])
++31. Cached checkpoint output schema ([#2270])
++32. Refactor snapshot FFI to use builder pattern and enable snapshot reuse ([#2255])
++33. Add P&M to CommitMetadata and enforce committer/table type matching ([#2250])
++34. Add UCCommitter validation for catalog-managed tables ([#2254])
++35. Crc File Histogram Read and Write Support ([#2235])
++36. Add FFI function to expose snapshot's timestamp ([#2274])
++37. Add FFI create table DDL functions ([#2296])
++38. Add FFI remove files DML functions ([#2297])
++39. Expose Protocol and Metadata as opaque FFI handle types ([#2260])
++40. Add FFI bindings for domain metadata write operations ([#2327])
++
++### 🐛 Bug Fixes
++
++1. Treat null literal as unknown in meta-predicate evaluation ([#2097])
++2. Update TokioBackgroundExecutor to join thread instead of detaching ([#2126])
++3. Use thread pools and multi-thread tokio executor in read metadata benchmark runner ([#2044])
++4. Emit null stats for all-null columns instead of omitting them ([#2187])
++5. Allow Date/Timestamp casting for stats_parsed compatibility ([#2074])
++6. Filter evaluator input schema ([#2195])
++7. SnapshotCompleted.total_duration now includes log segment loading ([#2183])
++8. Avoid creating empty stats schemas ([#2199])
++9. Prevent dual TLS crypto backends from reqwest default features ([#2178])
++10. Vendor and pin homebrew actions ([#2243])
++11. Validate min_reader/writer_version are at least 1 ([#2202])
++12. Preserve loaded LazyCrc during incremental snapshot updates ([#2211])
++13. Detect stats_parsed in multi-part V1 checkpoints ([#2214])
++14. Downgrade per-batch data skipping log from info to debug ([#2219])
++15. Unknown table features in feature list are "supported" ([#2159])
++16. Remove debug_assert_eq before require in scan evaluator row count checks ([#2262])
++17. Adopt checkpoint written later for same-version snapshot refresh ([#2143])
++18. Return error when parquet handler returns empty data for scan files ([#2261])
++19. Refactor benchmarking workflow to not require criterion compare action ([#2264])
++20. Skip name-based validation for struct columns in expression evaluator ([#2160])
++21. Handle missing leaf columns in nested struct during parquet projection ([#2170])
++22. Pass computed ICT to CommitMetadata instead of wall-clock time ([#2319])
++23. Detect and handle empty (0-byte) log files during listing ([#2336])
++
++### 📚 Documentation
++
++1. Update claude readme to include github actions safety note ([#2190])
++2. Add line width and comment divider style rules to CLAUDE.md ([#2277])
++3. Add documentation for current tags ([#2234])
++4. Document benchmarking in CI accuracy ([#2302])
++
++### ⚡ Performance
++
++1. Pre-size dedup HashSet in ScanLogReplayProcessor ([#2186])
++2. Pre-size HashMap in ArrowEngineData::visit_rows ([#2185])
++3. Remove dead schema conversions in expression evaluators ([#2184])
++
++### 🚜 Refactor
++
++1. Finalized benchmark table names and added new tables ([#2072])
++2. New transform helpers for unary and binary children ([#2150])
++3. Remove legacy row-level partition filter path ([#2158])
++4. Restructured list log files function ([#2173])
++5. Consolidate and add testing for set transaction expiration ([#2176])
++6. Rename uc-catalog and uc-client crates ([#2136])
++7. Better naming style for column mapping related functions/variables ([#2290])
++8. Centralize computation for physical schema without partition columns ([#2142])
++9. Consolidate FFI test setup helpers into ffi_test_utils ([#2307])
++10. *(action_reconciliation)* Combine getter index and field name constants ([#1717]) ([#1774])
++11. Extract shared stat helpers from RowGroupFilter ([#2324])
++12. Extract WriteContext to its own file ([#2349])
++
++### ⚙️ Chores/CI
++
++1. Clean up arrow deps in cargo files ([#2115])
++2. Commit Cargo.lock and enforce --locked in all CI workflows ([#2240])
++3. Harden pr-title-validator a bit ([#2246])
++4. Renable semver ([#2248])
++5. Attempt fixup of semver-label job ([#2253])
++6. Use artifacts for semver label ([#2258])
++7. Remove old non-builder snapshot FFI functions ([#2318])
++8. Remove the catalog-managed feature flag ([#2310])
++9. Upgrade to arrow-58 and object_store-13, drop arrow-56 support ([#2116])
++
++### Other
++
++[#2097]: https://github.com/delta-io/delta-kernel-rs/pull/2097
++[#2099]: https://github.com/delta-io/delta-kernel-rs/pull/2099
++[#2126]: https://github.com/delta-io/delta-kernel-rs/pull/2126
++[#2115]: https://github.com/delta-io/delta-kernel-rs/pull/2115
++[#1866]: https://github.com/delta-io/delta-kernel-rs/pull/1866
++[#2044]: https://github.com/delta-io/delta-kernel-rs/pull/2044
++[#1942]: https://github.com/delta-io/delta-kernel-rs/pull/1942
++[#2072]: https://github.com/delta-io/delta-kernel-rs/pull/2072
++[#2089]: https://github.com/delta-io/delta-kernel-rs/pull/2089
++[#2187]: https://github.com/delta-io/delta-kernel-rs/pull/2187
++[#2190]: https://github.com/delta-io/delta-kernel-rs/pull/2190
++[#1948]: https://github.com/delta-io/delta-kernel-rs/pull/1948
++[#2150]: https://github.com/delta-io/delta-kernel-rs/pull/2150
++[#2074]: https://github.com/delta-io/delta-kernel-rs/pull/2074
++[#2195]: https://github.com/delta-io/delta-kernel-rs/pull/2195
++[#2158]: https://github.com/delta-io/delta-kernel-rs/pull/2158
++[#2186]: https://github.com/delta-io/delta-kernel-rs/pull/2186
++[#2185]: https://github.com/delta-io/delta-kernel-rs/pull/2185
++[#2173]: https://github.com/delta-io/delta-kernel-rs/pull/2173
++[#2163]: https://github.com/delta-io/delta-kernel-rs/pull/2163
++[#2145]: https://github.com/delta-io/delta-kernel-rs/pull/2145
++[#2184]: https://github.com/delta-io/delta-kernel-rs/pull/2184
++[#2183]: https://github.com/delta-io/delta-kernel-rs/pull/2183
++[#2199]: https://github.com/delta-io/delta-kernel-rs/pull/2199
++[#2196]: https://github.com/delta-io/delta-kernel-rs/pull/2196
++[#2210]: https://github.com/delta-io/delta-kernel-rs/pull/2210
++[#2178]: https://github.com/delta-io/delta-kernel-rs/pull/2178
++[#2240]: https://github.com/delta-io/delta-kernel-rs/pull/2240
++[#2243]: https://github.com/delta-io/delta-kernel-rs/pull/2243
++[#2202]: https://github.com/delta-io/delta-kernel-rs/pull/2202
++[#2211]: https://github.com/delta-io/delta-kernel-rs/pull/2211
++[#2214]: https://github.com/delta-io/delta-kernel-rs/pull/2214
++[#2246]: https://github.com/delta-io/delta-kernel-rs/pull/2246
++[#2219]: https://github.com/delta-io/delta-kernel-rs/pull/2219
++[#2212]: https://github.com/delta-io/delta-kernel-rs/pull/2212
++[#2176]: https://github.com/delta-io/delta-kernel-rs/pull/2176
++[#2159]: https://github.com/delta-io/delta-kernel-rs/pull/2159
++[#2248]: https://github.com/delta-io/delta-kernel-rs/pull/2248
++[#2253]: https://github.com/delta-io/delta-kernel-rs/pull/2253
++[#2052]: https://github.com/delta-io/delta-kernel-rs/pull/2052
++[#2092]: https://github.com/delta-io/delta-kernel-rs/pull/2092
++[#2258]: https://github.com/delta-io/delta-kernel-rs/pull/2258
++[#2136]: https://github.com/delta-io/delta-kernel-rs/pull/2136
++[#2245]: https://github.com/delta-io/delta-kernel-rs/pull/2245
++[#2182]: https://github.com/delta-io/delta-kernel-rs/pull/2182
++[#2262]: https://github.com/delta-io/delta-kernel-rs/pull/2262
++[#2237]: https://github.com/delta-io/delta-kernel-rs/pull/2237
++[#2166]: https://github.com/delta-io/delta-kernel-rs/pull/2166
++[#2169]: https://github.com/delta-io/delta-kernel-rs/pull/2169
++[#2171]: https://github.com/delta-io/delta-kernel-rs/pull/2171
++[#2143]: https://github.com/delta-io/delta-kernel-rs/pull/2143
++[#2070]: https://github.com/delta-io/delta-kernel-rs/pull/2070
++[#2261]: https://github.com/delta-io/delta-kernel-rs/pull/2261
++[#2277]: https://github.com/delta-io/delta-kernel-rs/pull/2277
++[#2236]: https://github.com/delta-io/delta-kernel-rs/pull/2236
++[#2279]: https://github.com/delta-io/delta-kernel-rs/pull/2279
++[#2249]: https://github.com/delta-io/delta-kernel-rs/pull/2249
++[#2290]: https://github.com/delta-io/delta-kernel-rs/pull/2290
++[#2174]: https://github.com/delta-io/delta-kernel-rs/pull/2174
++[#2264]: https://github.com/delta-io/delta-kernel-rs/pull/2264
++[#2234]: https://github.com/delta-io/delta-kernel-rs/pull/2234
++[#2302]: https://github.com/delta-io/delta-kernel-rs/pull/2302
++[#2142]: https://github.com/delta-io/delta-kernel-rs/pull/2142
++[#2266]: https://github.com/delta-io/delta-kernel-rs/pull/2266
++[#2281]: https://github.com/delta-io/delta-kernel-rs/pull/2281
++[#2109]: https://github.com/delta-io/delta-kernel-rs/pull/2109
++[#2293]: https://github.com/delta-io/delta-kernel-rs/pull/2293
++[#2203]: https://github.com/delta-io/delta-kernel-rs/pull/2203
++[#2247]: https://github.com/delta-io/delta-kernel-rs/pull/2247
++[#2160]: https://github.com/delta-io/delta-kernel-rs/pull/2160
++[#2314]: https://github.com/delta-io/delta-kernel-rs/pull/2314
++[#2270]: https://github.com/delta-io/delta-kernel-rs/pull/2270
++[#2255]: https://github.com/delta-io/delta-kernel-rs/pull/2255
++[#2250]: https://github.com/delta-io/delta-kernel-rs/pull/2250
++[#2254]: https://github.com/delta-io/delta-kernel-rs/pull/2254
++[#2307]: https://github.com/delta-io/delta-kernel-rs/pull/2307
++[#2170]: https://github.com/delta-io/delta-kernel-rs/pull/2170
++[#2235]: https://github.com/delta-io/delta-kernel-rs/pull/2235
++[#2274]: https://github.com/delta-io/delta-kernel-rs/pull/2274
++[#1774]: https://github.com/delta-io/delta-kernel-rs/pull/1774
++[#2296]: https://github.com/delta-io/delta-kernel-rs/pull/2296
++[#2318]: https://github.com/delta-io/delta-kernel-rs/pull/2318
++[#2310]: https://github.com/delta-io/delta-kernel-rs/pull/2310
++[#2297]: https://github.com/delta-io/delta-kernel-rs/pull/2297
++[#2324]: https://github.com/delta-io/delta-kernel-rs/pull/2324
++[#2260]: https://github.com/delta-io/delta-kernel-rs/pull/2260
++[#2327]: https://github.com/delta-io/delta-kernel-rs/pull/2327
++[#2319]: https://github.com/delta-io/delta-kernel-rs/pull/2319
++[#2116]: https://github.com/delta-io/delta-kernel-rs/pull/2116
++[#2349]: https://github.com/delta-io/delta-kernel-rs/pull/2349
++[#2336]: https://github.com/delta-io/delta-kernel-rs/pull/2336
++[#2077]: https://github.com/delta-io/delta-kernel-rs/pull/2077                                                                                               
++[#2111]: https://github.com/delta-io/delta-kernel-rs/pull/2111                                                                                                 
++[#2065]: https://github.com/delta-io/delta-kernel-rs/pull/2065                                                                                               
++[#2025]: https://github.com/delta-io/delta-kernel-rs/pull/2025                                                                                               
++[#1996]: https://github.com/delta-io/delta-kernel-rs/pull/1996
++[#1717]: https://github.com/delta-io/delta-kernel-rs/pull/1717
++[#1922]: https://github.com/delta-io/delta-kernel-rs/pull/1922
++
+ ## [v0.20.0](https://github.com/delta-io/delta-kernel-rs/tree/v0.20.0/) (2026-02-26)
+ 
+ [Full Changelog](https://github.com/delta-io/delta-kernel-rs/compare/v0.19.2...v0.20.0)
+ 22. Implement schema diffing for flat schemas (2/5]) ([#1478])
+ 23. Add API on Scan to perform 2-phase log replay  ([#1547])
+ 24. Enable distributed log replay serde serialization for serializable scan state ([#1549])
+-25. Add InCommitTimestamp support to ChangeDataFeed ([#1670]) 
++25. Add InCommitTimestamp support to ChangeDataFeed ([#1670])
+ 26. Add include_stats_columns API and output_stats_schema field ([#1728])
+ 27. Add write support for clustered tables behind feature flag ([#1704])
+ 28. Add snapshot load instrumentation ([#1750])
\ No newline at end of file
CLAUDE.md
@@ -0,0 +1,108 @@
+diff --git a/CLAUDE.md b/CLAUDE.md
+--- a/CLAUDE.md
++++ b/CLAUDE.md
+ (`Snapshot`, `Scan`, `Transaction`) and delegates _how_ to the `Engine` trait.
+ 
+ Current capabilities: table reads with predicates, data skipping, deletion vectors, change
+-data feed, checkpoints (V1 & V2), log compaction, blind append writes, table creation
++data feed, checkpoints (V1 & V2), log compaction (disabled, #2337), blind append writes, table creation
+ (including clustered tables), and catalog-managed table support.
+ 
+ ## Build & Test Commands
+ cargo nextest run --workspace --all-features test_name_here
+ 
+ # Format, lint, and doc check (always run after code changes)
+-cargo fmt \
++cargo +nightly fmt \
+   && cargo clippy --workspace --benches --tests --all-features -- -D warnings \
+   && cargo doc --workspace --all-features --no-deps
+ 
+   --exclude delta_kernel --exclude delta_kernel_ffi --exclude delta_kernel_derive --exclude delta_kernel_ffi_macros -- -D warnings
+ 
+ # Quick pre-push check (mimics CI)
+-cargo fmt \
++cargo +nightly fmt \
+   && cargo clippy --workspace --benches --tests --all-features -- -D warnings \
+   && cargo doc --workspace --all-features --no-deps \
+   && cargo nextest run --workspace --all-features
+ 
+ ### Feature Flags
+ 
+-- `default-engine` / `default-engine-rustls` / `default-engine-native-tls` -- async
+-  Arrow/Tokio engine (pick one TLS backend)
++- `default-engine-rustls` / `default-engine-native-tls` -- async Arrow/Tokio engine (pick a TLS backend)
+ - `arrow`, `arrow-XX`, `arrow-YY` -- Arrow version selection (kernel tracks the latest two
+   major Arrow releases; `arrow` defaults to latest). Kernel itself does not depend on Arrow,
+-  but default-engine does.
++  but the default engine does.
+ - `arrow-conversion`, `arrow-expression` -- Arrow interop (auto-enabled by default engine)
+ - `prettyprint` -- enables Arrow pretty-print helpers (primarily test/example oriented)
+-- `catalog-managed` -- catalog-managed table support (experimental)
+ - `clustered-table` -- clustered table write support (experimental)
+ - `internal-api` -- unstable APIs like `parallel_scan_metadata`. Items are marked with the
+   `#[internal_api]` proc macro attribute.
+ `execute()` (simple), `scan_metadata()` (advanced/distributed),
+ `parallel_scan_metadata()` (two-phase distributed log replay).
+ 
+-**Write path:** `Snapshot` -> `Transaction` -> `commit()`. Kernel provides `WriteContext`,
+-assembles commit actions, enforces protocol compliance, delegates atomic commit to a
+-`Committer`.
++**Write path:** `Snapshot` -> `Transaction` -> `commit()`. Kernel provides `WriteContext`
++(via `partitioned_write_context` or `unpartitioned_write_context`), assembles commit
++actions, enforces protocol compliance, delegates atomic commit to a `Committer`.
+ 
+ **Engine trait:** five handlers (`StorageHandler`, `JsonHandler`, `ParquetHandler`,
+ `EvaluationHandler`, optional `MetricsReporter`). `DefaultEngine` lives in
+   or inputs. Prefer `#[case]` over duplicating test functions. When parameters are
+   independent and form a cartesian product, prefer `#[values]` over enumerating
+   every combination with `#[case]`.
++- Actively look for rstest consolidation opportunities: when writing multiple tests
++  that share the same setup/flow and differ only in configuration and expected
++  outcome, write one parameterized rstest instead of separate functions. Also check
++  whether a new test duplicates the flow of an existing nearby test and should be
++  merged into it as a new `#[case]`. A common pattern is toggling a feature (e.g.
++  column mapping on/off) and asserting success vs. error.
+ - Reuse helpers from `test_utils` instead of writing custom ones when possible.
++- **Committing in tests:** Use `txn.commit(engine)?.unwrap_committed()` to assert a
++  successful commit and get the `CommittedTransaction`. Do NOT use `match` + `panic!`
++  for this -- `unwrap_committed()` provides a clear error message on failure. Available
++  under `#[cfg(test)]` and the `test-utils` feature.
++- **Prefer snapshot/public API assertions over reading raw commit JSON.** Only read raw
++  commit JSON when the data is inaccessible via public API (e.g., system domain metadata
++  is blocked by `get_domain_metadata`). For commit JSON reads, use `read_actions_from_commit`
++  from `test_utils` -- do NOT write local helpers that duplicate this.
+ - **`add_commit` and table setup in tests:** `add_commit` takes a `table_root` string and
+   resolves it to an absolute object-store path. The `table_root` must be a proper URL string
+   with a trailing slash (e.g. `"memory:///"`, `"file:///tmp/my_table/"`). Avoid using the
+   `allowColumnDefaults`, `changeDataFeed`, `identityColumns`, `rowTracking`,
+   `domainMetadata`, `icebergCompatV1`, `icebergCompatV2`, `clustering`,
+   `inCommitTimestamp`
+-- Reader + writer: `columnMapping`, `deletionVectors`, `timestampNtz`,
+-  `v2Checkpoint`, `vacuumProtocolCheck`, `variantType`, `variantType-preview`,
+-  `typeWidening`
++- Reader + writer: `catalogManaged`, `catalogOwned-preview`, `columnMapping`,
++  `deletionVectors`, `timestampNtz`, `v2Checkpoint`, `vacuumProtocolCheck`,
++  `variantType`, `variantType-preview`, `typeWidening`
+ 
+ Keep this list updated when new protocol features are added to kernel.
+ 
+ - Code comments state intent and explain "why" -- don't restate what the code self-documents.
+ - Place `use` imports at the top of the file (for non-test code) or at the top of the
+   `mod tests` block (for test code) -- never inside function bodies.
++- Prefer `==` over `matches!` for simple single-variant enum comparisons. `matches!` is
++  for patterns with bindings or guards. For example: `self == Variant` not
++  `matches!(self, Variant)`.
++- Prefer `StructField::nullable` / `StructField::not_null` over
++  `StructField::new(name, type, bool)` when nullability is known at compile time.
++  Reserve `StructField::new` for cases where nullability is a runtime value.
+ - NEVER panic in production code -- use errors instead. Panicking
+   (including `unwrap()`, `expect()`, `panic!()`, `unreachable!()`, etc) is acceptable in test code only.
+ 
+ a newer (potentially compromised) transitive dependency. If `Cargo.lock` is out of sync with
+ `Cargo.toml`, the build fails immediately, forcing dependency changes to be explicit and
+ reviewable. See the top-level comment in `build.yml` for full rationale. Commands exempt from
+-`--locked`: `cargo fmt` (no dep resolution), `cargo msrv verify/show` (wrapper tool),
++`--locked`: `cargo +nightly fmt` (no dep resolution), `cargo msrv verify/show` (wrapper tool),
+ `cargo miri setup` (tooling setup).
+ 
+ Ensure that when writing any github action you are considering safety including thinking of
\ No newline at end of file
CLAUDE/architecture.md
@@ -0,0 +1,49 @@
+diff --git a/CLAUDE/architecture.md b/CLAUDE/architecture.md
+--- a/CLAUDE/architecture.md
++++ b/CLAUDE/architecture.md
+ 
+ Built via `Snapshot::builder_for(url).build(engine)` (latest version) or
+ `.at_version(v).build(engine)` (specific version). For catalog-managed tables,
+-`.with_log_tail(commits)` supplies recent unpublished commits from the catalog.
++`.with_log_tail(commits)` supplies recent unpublished commits from the catalog and
++`.with_max_catalog_version(v)` caps the snapshot at the latest catalog-ratified version.
+ 
+ **Snapshot loading internals:**
+ 1. **LogSegment** (`kernel/src/log_segment/`) -- discovers commits + checkpoints for the
+ 
+ `Snapshot` -> `Transaction` -> commit
+ 
+-The kernel coordinates the write transaction: it provides the write context (target directory,
+-physical schema, stats columns), assembles commit actions (CommitInfo, Add files), enforces
+-protocol compliance (table features, schema validation), and delegates the atomic commit to a
+-`Committer`.
++The kernel coordinates the write transaction: it provides the write context (validated partition
++values, recommended write directory, physical schema, stats columns), assembles commit
++actions (CommitInfo, Add files), enforces protocol compliance (table features, schema validation),
++and delegates the atomic commit to a `Committer`.
+ 
+ **Steps:**
+ 1. Create `Transaction` from a snapshot with a `Committer` (e.g. `FileSystemCommitter`)
+-2. Get `WriteContext` for target dir, physical schema, and stats columns
++2. Get `WriteContext` via `partitioned_write_context(values)` or `unpartitioned_write_context()`
+ 3. Write Parquet files (via engine), collect file metadata
+ 4. Register files via `txn.add_files(metadata)`
+ 5. Commit: returns `CommittedTransaction`, `ConflictedTransaction`, or `RetryableTransaction`
+ - `kernel/src/snapshot/` -- `Snapshot`, `SnapshotBuilder`, entry point for reads/writes
+ - `kernel/src/scan/` -- `Scan`, `ScanBuilder`, log replay, data skipping
+ - `kernel/src/transaction/` -- `Transaction`, `WriteContext`, `create_table` builder
++- `kernel/src/partition/` -- partition value validation, serialization, Hive-style path
++   encoding, URI encoding for `add.path`
+ - `kernel/src/committer/` -- `Committer` trait, `FileSystemCommitter`
+ - `kernel/src/log_segment/` -- log file discovery, Protocol/Metadata replay
+ - `kernel/src/log_replay.rs` -- file-action deduplication, `LogReplayProcessor` trait
+ 
+ Tables whose commits go through a catalog (e.g. Unity Catalog) instead of direct filesystem
+ writes. Kernel doesn't know about catalogs -- the catalog client provides a log tail via
+-`SnapshotBuilder::with_log_tail()` and a custom `Committer` for staging/ratifying/publishing
+-commits. Requires `catalog-managed` feature flag.
++`SnapshotBuilder::with_log_tail()`, caps the version via `with_max_catalog_version()`, and
++uses a custom `Committer` for staging/ratifying/publishing commits.
+ 
+ The `UCCommitter` (in the `delta-kernel-unity-catalog` crate) is the reference implementation of a catalog
+ committer for Unity Catalog. It stages commits to `_staged_commits/`, calls the UC commit API to
\ No newline at end of file
CONTRIBUTING.md
@@ -0,0 +1,19 @@
+diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
+--- a/CONTRIBUTING.md
++++ b/CONTRIBUTING.md
+    # build docs
+    cargo doc --workspace --all-features
+    # highly recommend editor that automatically formats, but in case you need to:
+-   cargo fmt
++   cargo +nightly fmt
+ 
+    # run more tests
+    cargo test --workspace --all-features -- --skip read_table_version_hdfs
+ #### General Tips
+ 
+ 1. When making your first PR, please read our contributor guidelines: https://github.com/delta-incubator/delta-kernel-rs/blob/main/CONTRIBUTING.md
+-2. Run `cargo t --all-features --all-targets` to get started testing, and run `cargo fmt`.
++2. Run `cargo t --all-features --all-targets` to get started testing, and run `cargo +nightly fmt`.
+ 3. Ensure you have added or run the appropriate tests for your PR.
+ 4. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP] Your PR title ...'.
+ 5. Be sure to keep the PR description updated to reflect all changes.
\ No newline at end of file

... (truncated, output exceeded 60000 bytes)

Reproduce locally: git range-diff ac9dc19..14a713c 7866824..68b8cd8 | Disable: git config gitstack.push-range-diff false

@lorenarosati
Copy link
Copy Markdown
Collaborator Author

Range-diff: main (68b8cd8 -> 848b098)
.github/workflows/build.yml
@@ -0,0 +1,75 @@
+diff --git a/.github/workflows/build.yml b/.github/workflows/build.yml
+--- a/.github/workflows/build.yml
++++ b/.github/workflows/build.yml
+ # enforce the committed Cargo.lock. This prevents CI from silently resolving a newer
+ # (potentially compromised) dependency version. If Cargo.lock is out of sync with
+ # Cargo.toml, the build fails immediately. Any dependency change must be an explicit,
+-# reviewable update to Cargo.lock in the PR. Commands that skip --locked: cargo fmt
++# reviewable update to Cargo.lock in the PR. Commands that skip --locked: cargo +nightly fmt
+ # (no dep resolution), cargo msrv verify/show (wrapper tool), cargo miri setup (tooling).
+ #
+ # Swatinem/rust-cache caches the cargo registry and target directory (~450MB per job).
+     runs-on: ubuntu-latest
+     steps:
+       - uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4.3.1
+-      - name: Install minimal stable with rustfmt
++      - name: Install nightly with rustfmt
+         uses: actions-rust-lang/setup-rust-toolchain@150fca883cd4034361b621bd4e6a9d34e5143606 # v1.15.4
+         with:
+           cache: false
++          toolchain: nightly
+           components: rustfmt
+       - name: format
+-        run: cargo fmt -- --check
++        run: cargo +nightly fmt -- --check
+ 
+   msrv:
+     runs-on: ubuntu-latest
+           pushd kernel
+           echo "Testing with $(cargo msrv show --output-format minimal)"
+           cargo +$(cargo msrv show --output-format minimal) nextest run --locked
++          cargo +$(cargo msrv show --output-format minimal) test --doc
+   docs:
+     runs-on: ubuntu-latest
+     env:
+           cmake ..
+           make
+           make test
++      - name: build and run create-table test
++        run: |
++          pushd ffi/examples/create-table
++          mkdir build
++          pushd build
++          cmake ..
++          make
++          make test
++      # NOTE: write-table's ctest seeds its target table by invoking the create-table
++      # binary, so create-table must be built first (its build/ dir is preserved by the
++      # preceding step and write-table's CMakeLists references it via a relative path).
++      - name: build and run write-table test
++        run: |
++          pushd ffi/examples/write-table
++          mkdir build
++          pushd build
++          cmake ..
++          make
++          make test
++      - name: build and run read-table-changes test
++        run: |
++          pushd ffi/examples/read-table-changes
++          mkdir build
++          pushd build
++          cmake ..
++          make
++          make test
+   miri:
+     name: "Miri (shard ${{ matrix.partition }}/3)"
+     runs-on: ubuntu-latest
+       - name: Install cargo-llvm-cov
+         uses: taiki-e/install-action@2d15d02e710b40b6332201aba6af30d595b5cd96 # cargo-llvm-cov
+       - name: Generate code coverage
+-        run: cargo llvm-cov --locked --all-features --workspace --codecov --output-path codecov.json -- --skip read_table_version_hdfs
++        run: cargo llvm-cov --locked --all-features --workspace --codecov --output-path codecov.json -- --skip read_table_version_hdfs --skip handle::tests::invalid_handle_code
+       - name: Upload coverage to Codecov
+         uses: codecov/codecov-action@1af58845a975a7985b0beb0cbe6fbbb71a41dbad # v5.5.3
+         with:
\ No newline at end of file
.github/workflows/pr-body-validator.yml
@@ -0,0 +1,27 @@
+diff --git a/.github/workflows/pr-body-validator.yml b/.github/workflows/pr-body-validator.yml
+new file mode 100644
+--- /dev/null
++++ b/.github/workflows/pr-body-validator.yml
++name: Validate PR Body
++
++on:
++  pull_request:
++    types: [opened, edited]
++  merge_group:
++
++jobs:
++  validate-body:
++    runs-on: ubuntu-latest
++    steps:
++      - name: Validate PR Body
++        shell: bash
++        env:
++          PR_BODY: ${{ github.event.pull_request.body }}
++        run: |
++          if LC_ALL=C grep -q '[^[:print:][:space:]]' <<< "$PR_BODY"; then
++            echo "PR body contains non-ascii characters. Please remove them."
++            exit 1
++          else
++            echo "PR body contains ascii characters only"
++          fi
++
\ No newline at end of file
CHANGELOG.md
@@ -0,0 +1,282 @@
+diff --git a/CHANGELOG.md b/CHANGELOG.md
+--- a/CHANGELOG.md
++++ b/CHANGELOG.md
+ # Changelog
+ 
++## [v0.21.0](https://github.com/delta-io/delta-kernel-rs/tree/v0.21.0/) (2026-04-10)
++
++[Full Changelog](https://github.com/delta-io/delta-kernel-rs/compare/v0.20.0...v0.21.0)
++
++
++### 🏗️ Breaking changes
++
++1. Add partitioned variant to DataLayout enum ([#2145])
++   - Adds `Partitioned` variant to `DataLayout` enum. Update match statements to handle the new variant.
++2. Add create many API to engine ([#2070])
++   - Adds `create_many` method to `ParquetHandler` trait. Implementors must add this method. See the trait rustdocs for details.
++3. Rename uc-catalog and uc-client crates ([#2136])
++   - `delta-kernel-uc-catalog` renamed to `delta-kernel-unity-catalog`. `delta-kernel-uc-client` renamed to `unity-catalog-delta-rest-client`. Update `Cargo.toml` dependencies accordingly.
++4. Checksum and checkpoint APIs return updated Snapshot ([#2182])
++   - `Snapshot::checkpoint()` and checksum APIs now return the updated `Snapshot`. Callers must handle the returned value.
++5. Add P&M to CommitMetadata and enforce committer/table type matching ([#2250])
++   - Enforces that committer type matches table type (catalog-managed vs path-based). Use appropriate committer for your table type.
++6. Add UCCommitter validation for catalog-managed tables ([#2254])
++   - `UCCommitter` now rejects commits to non-catalog-managed tables. Use `FileSystemCommitter` for path-based tables.
++7. Refactor snapshot FFI to use builder pattern and enable snapshot reuse ([#2255])
++   - FFI snapshot creation now uses builder pattern. Update FFI callers to use the new builder APIs.
++8. Make tags and remove partition values allow null values in map ([#2281])
++   - `tags` and `partitionValues` map values are now nullable. Update code that assumes non-null values.
++9. Better naming style for column mapping related functions/variables ([#2290])
++   - Renamed: `make_physical` to `to_physical_name`, `make_physical_struct` to `to_physical_schema`, `transform_struct_for_projection` to `projection_transform`. Update call sites.
++10. Remove the catalog-managed feature flag ([#2310])
++    - The `catalog-managed` feature flag is removed. Catalog-managed table support is now always available.
++11. Update snapshot.checkpoint API to return a CheckpointResult ([#2314])
++    - `Snapshot::checkpoint()` now returns `CheckpointResult` instead of `Snapshot`. Access the snapshot via `CheckpointResult::snapshot`.
++12. Remove old non-builder snapshot FFI functions ([#2318])
++    - Removed legacy FFI snapshot functions. Use the new builder-pattern FFI functions instead.
++13. Support version 0 (table creation) commits in UCCommitter ([#2247])
++    - Connectors using `UCCommitter` for table creation must now handle post-commit finalization via the UC create table API.
++14. Pass computed ICT to CommitMetadata instead of wall-clock time ([#2319])
++    - `CommitMetadata` now uses computed in-commit timestamp instead of wall-clock time. Callers relying on wall-clock timing should update accordingly.
++15. Upgrade to arrow-58 and object_store-13, drop arrow-56 support ([#2116])
++    - Minimum supported Arrow version is now arrow-57. Update your `Cargo.toml` if using `arrow-56` feature.
++16. Crc File Histogram Read and Write Support ([#2235])
++    - Adds `AddedHistogram` and `RemovedHistogram` fields to `FileStatsDelta` struct.
++17. Add ScanMetadataCompleted metric event ([#2236])
++    - Adds `ScanMetadataCompleted` variant to `MetricEvent` enum. Update metric reporters to handle the new variant.
++18. Instrument JSON and Parquet handler reads with MetricsReporter ([#2169])
++    - Adds `JsonReadCompleted` and `ParquetReadCompleted` variants to `MetricEvent` enum. Update metric reporters to handle new variants.
++19. New transform helpers for unary and binary children ([#2150])
++    - Removes public `CowExt` trait. Remove any usages of this trait.
++20. New mod transforms for expression and schema transforms ([#2077])
++    - Moves `SchemaTransform` and `ExpressionTransform` to new `transforms` module. Update import paths.
++21. Introduce object_store compat shim ([#2111])
++    - Renames `object_store` dependency to `object_store_12`. Update any direct references.
++22. Consolidate domain metadata reads through Snapshot ([#2065])
++    - Domain metadata reads now go through `Snapshot` methods. Update callers using old free functions.
++23. Don't read or write arrow schema in parquet files ([#2025])
++    - Parquet files no longer include arrow schema metadata. Code relying on this metadata must be updated.
++24. Rename include_stats_columns to include_all_stats_columns ([#1996])
++    - Renames `ScanBuilder::include_stats_columns()` to `ScanBuilder::include_all_stats_columns()`. Update call sites.
++
++### 🚀 Features / new APIs
++
++1. Add SQL -> Kernel predicate parser to benchmark framework ([#2099])
++2. Add observability metrics for scan log replay ([#1866])
++3. Filtered engine data visitor ([#1942])
++4. Trigger benchmarking with comments ([#2089])
++5. Unify data stats and partition values in DataSkippingFilter ([#1948])
++6. Download benchmark workloads from DAT release ([#2163])
++7. Add partitioned variant to DataLayout enum ([#2145])
++8. Expose table_properties in FFI via visit_table_properties ([#2196])
++9. Allow checkpoint stats properties in CREATE TABLE ([#2210])
++10. Add crc file histogram initial struct and methods ([#2212])
++11. BinaryPredicate evaluate expression with ArrowViewType. ([#2052])
++12. Add acceptance workloads testing harness ([#2092])
++13. Enable DeletionVectors table feature in CREATE TABLE ([#2245])
++14. Checksum and checkpoint APIs return updated Snapshot ([#2182])
++15. Adding ScanBuilder FFI functions for Scans ([#2237])
++16. Add CountingReporter and fix metrics forwarding ([#2166])
++17. Instrument JSON and Parquet handler reads with MetricsReporter ([#2169])
++18. Wire CountingReporter into workload benchmarks ([#2171])
++19. Add create many API to engine ([#2070])
++20. Add ScanMetadataCompleted metric event ([#2236])
++21. Allow AppendOnly, ChangeDataFeed, and TypeWidening in CREATE TABLE ([#2279])
++22. Support max timestamp stats for data skipping ([#2249])
++23. Add list with backward checkpoint scan ([#2174])
++24. Add Snapshot::get_timestamp ([#2266])
++25. Make tags  and remove partition values allow null values in map ([#2281])
++26. Support UC credential vending and S3 benchmarks ([#2109])
++27. Add catalogManaged to allowed features in CREATE TABLE ([#2293])
++28. Add catalog-managed table creation utilities ([#2203])
++29. Support version 0 (table creation) commits in UCCommitter ([#2247])
++30. Update snapshot.checkpoint API to return a CheckpointResult ([#2314])
++31. Cached checkpoint output schema ([#2270])
++32. Refactor snapshot FFI to use builder pattern and enable snapshot reuse ([#2255])
++33. Add P&M to CommitMetadata and enforce committer/table type matching ([#2250])
++34. Add UCCommitter validation for catalog-managed tables ([#2254])
++35. Crc File Histogram Read and Write Support ([#2235])
++36. Add FFI function to expose snapshot's timestamp ([#2274])
++37. Add FFI create table DDL functions ([#2296])
++38. Add FFI remove files DML functions ([#2297])
++39. Expose Protocol and Metadata as opaque FFI handle types ([#2260])
++40. Add FFI bindings for domain metadata write operations ([#2327])
++
++### 🐛 Bug Fixes
++
++1. Treat null literal as unknown in meta-predicate evaluation ([#2097])
++2. Update TokioBackgroundExecutor to join thread instead of detaching ([#2126])
++3. Use thread pools and multi-thread tokio executor in read metadata benchmark runner ([#2044])
++4. Emit null stats for all-null columns instead of omitting them ([#2187])
++5. Allow Date/Timestamp casting for stats_parsed compatibility ([#2074])
++6. Filter evaluator input schema ([#2195])
++7. SnapshotCompleted.total_duration now includes log segment loading ([#2183])
++8. Avoid creating empty stats schemas ([#2199])
++9. Prevent dual TLS crypto backends from reqwest default features ([#2178])
++10. Vendor and pin homebrew actions ([#2243])
++11. Validate min_reader/writer_version are at least 1 ([#2202])
++12. Preserve loaded LazyCrc during incremental snapshot updates ([#2211])
++13. Detect stats_parsed in multi-part V1 checkpoints ([#2214])
++14. Downgrade per-batch data skipping log from info to debug ([#2219])
++15. Unknown table features in feature list are "supported" ([#2159])
++16. Remove debug_assert_eq before require in scan evaluator row count checks ([#2262])
++17. Adopt checkpoint written later for same-version snapshot refresh ([#2143])
++18. Return error when parquet handler returns empty data for scan files ([#2261])
++19. Refactor benchmarking workflow to not require criterion compare action ([#2264])
++20. Skip name-based validation for struct columns in expression evaluator ([#2160])
++21. Handle missing leaf columns in nested struct during parquet projection ([#2170])
++22. Pass computed ICT to CommitMetadata instead of wall-clock time ([#2319])
++23. Detect and handle empty (0-byte) log files during listing ([#2336])
++
++### 📚 Documentation
++
++1. Update claude readme to include github actions safety note ([#2190])
++2. Add line width and comment divider style rules to CLAUDE.md ([#2277])
++3. Add documentation for current tags ([#2234])
++4. Document benchmarking in CI accuracy ([#2302])
++
++### ⚡ Performance
++
++1. Pre-size dedup HashSet in ScanLogReplayProcessor ([#2186])
++2. Pre-size HashMap in ArrowEngineData::visit_rows ([#2185])
++3. Remove dead schema conversions in expression evaluators ([#2184])
++
++### 🚜 Refactor
++
++1. Finalized benchmark table names and added new tables ([#2072])
++2. New transform helpers for unary and binary children ([#2150])
++3. Remove legacy row-level partition filter path ([#2158])
++4. Restructured list log files function ([#2173])
++5. Consolidate and add testing for set transaction expiration ([#2176])
++6. Rename uc-catalog and uc-client crates ([#2136])
++7. Better naming style for column mapping related functions/variables ([#2290])
++8. Centralize computation for physical schema without partition columns ([#2142])
++9. Consolidate FFI test setup helpers into ffi_test_utils ([#2307])
++10. *(action_reconciliation)* Combine getter index and field name constants ([#1717]) ([#1774])
++11. Extract shared stat helpers from RowGroupFilter ([#2324])
++12. Extract WriteContext to its own file ([#2349])
++
++### ⚙️ Chores/CI
++
++1. Clean up arrow deps in cargo files ([#2115])
++2. Commit Cargo.lock and enforce --locked in all CI workflows ([#2240])
++3. Harden pr-title-validator a bit ([#2246])
++4. Renable semver ([#2248])
++5. Attempt fixup of semver-label job ([#2253])
++6. Use artifacts for semver label ([#2258])
++7. Remove old non-builder snapshot FFI functions ([#2318])
++8. Remove the catalog-managed feature flag ([#2310])
++9. Upgrade to arrow-58 and object_store-13, drop arrow-56 support ([#2116])
++
++### Other
++
++[#2097]: https://github.com/delta-io/delta-kernel-rs/pull/2097
++[#2099]: https://github.com/delta-io/delta-kernel-rs/pull/2099
++[#2126]: https://github.com/delta-io/delta-kernel-rs/pull/2126
++[#2115]: https://github.com/delta-io/delta-kernel-rs/pull/2115
++[#1866]: https://github.com/delta-io/delta-kernel-rs/pull/1866
++[#2044]: https://github.com/delta-io/delta-kernel-rs/pull/2044
++[#1942]: https://github.com/delta-io/delta-kernel-rs/pull/1942
++[#2072]: https://github.com/delta-io/delta-kernel-rs/pull/2072
++[#2089]: https://github.com/delta-io/delta-kernel-rs/pull/2089
++[#2187]: https://github.com/delta-io/delta-kernel-rs/pull/2187
++[#2190]: https://github.com/delta-io/delta-kernel-rs/pull/2190
++[#1948]: https://github.com/delta-io/delta-kernel-rs/pull/1948
++[#2150]: https://github.com/delta-io/delta-kernel-rs/pull/2150
++[#2074]: https://github.com/delta-io/delta-kernel-rs/pull/2074
++[#2195]: https://github.com/delta-io/delta-kernel-rs/pull/2195
++[#2158]: https://github.com/delta-io/delta-kernel-rs/pull/2158
++[#2186]: https://github.com/delta-io/delta-kernel-rs/pull/2186
++[#2185]: https://github.com/delta-io/delta-kernel-rs/pull/2185
++[#2173]: https://github.com/delta-io/delta-kernel-rs/pull/2173
++[#2163]: https://github.com/delta-io/delta-kernel-rs/pull/2163
++[#2145]: https://github.com/delta-io/delta-kernel-rs/pull/2145
++[#2184]: https://github.com/delta-io/delta-kernel-rs/pull/2184
++[#2183]: https://github.com/delta-io/delta-kernel-rs/pull/2183
++[#2199]: https://github.com/delta-io/delta-kernel-rs/pull/2199
++[#2196]: https://github.com/delta-io/delta-kernel-rs/pull/2196
++[#2210]: https://github.com/delta-io/delta-kernel-rs/pull/2210
++[#2178]: https://github.com/delta-io/delta-kernel-rs/pull/2178
++[#2240]: https://github.com/delta-io/delta-kernel-rs/pull/2240
++[#2243]: https://github.com/delta-io/delta-kernel-rs/pull/2243
++[#2202]: https://github.com/delta-io/delta-kernel-rs/pull/2202
++[#2211]: https://github.com/delta-io/delta-kernel-rs/pull/2211
++[#2214]: https://github.com/delta-io/delta-kernel-rs/pull/2214
++[#2246]: https://github.com/delta-io/delta-kernel-rs/pull/2246
++[#2219]: https://github.com/delta-io/delta-kernel-rs/pull/2219
++[#2212]: https://github.com/delta-io/delta-kernel-rs/pull/2212
++[#2176]: https://github.com/delta-io/delta-kernel-rs/pull/2176
++[#2159]: https://github.com/delta-io/delta-kernel-rs/pull/2159
++[#2248]: https://github.com/delta-io/delta-kernel-rs/pull/2248
++[#2253]: https://github.com/delta-io/delta-kernel-rs/pull/2253
++[#2052]: https://github.com/delta-io/delta-kernel-rs/pull/2052
++[#2092]: https://github.com/delta-io/delta-kernel-rs/pull/2092
++[#2258]: https://github.com/delta-io/delta-kernel-rs/pull/2258
++[#2136]: https://github.com/delta-io/delta-kernel-rs/pull/2136
++[#2245]: https://github.com/delta-io/delta-kernel-rs/pull/2245
++[#2182]: https://github.com/delta-io/delta-kernel-rs/pull/2182
++[#2262]: https://github.com/delta-io/delta-kernel-rs/pull/2262
++[#2237]: https://github.com/delta-io/delta-kernel-rs/pull/2237
++[#2166]: https://github.com/delta-io/delta-kernel-rs/pull/2166
++[#2169]: https://github.com/delta-io/delta-kernel-rs/pull/2169
++[#2171]: https://github.com/delta-io/delta-kernel-rs/pull/2171
++[#2143]: https://github.com/delta-io/delta-kernel-rs/pull/2143
++[#2070]: https://github.com/delta-io/delta-kernel-rs/pull/2070
++[#2261]: https://github.com/delta-io/delta-kernel-rs/pull/2261
++[#2277]: https://github.com/delta-io/delta-kernel-rs/pull/2277
++[#2236]: https://github.com/delta-io/delta-kernel-rs/pull/2236
++[#2279]: https://github.com/delta-io/delta-kernel-rs/pull/2279
++[#2249]: https://github.com/delta-io/delta-kernel-rs/pull/2249
++[#2290]: https://github.com/delta-io/delta-kernel-rs/pull/2290
++[#2174]: https://github.com/delta-io/delta-kernel-rs/pull/2174
++[#2264]: https://github.com/delta-io/delta-kernel-rs/pull/2264
++[#2234]: https://github.com/delta-io/delta-kernel-rs/pull/2234
++[#2302]: https://github.com/delta-io/delta-kernel-rs/pull/2302
++[#2142]: https://github.com/delta-io/delta-kernel-rs/pull/2142
++[#2266]: https://github.com/delta-io/delta-kernel-rs/pull/2266
++[#2281]: https://github.com/delta-io/delta-kernel-rs/pull/2281
++[#2109]: https://github.com/delta-io/delta-kernel-rs/pull/2109
++[#2293]: https://github.com/delta-io/delta-kernel-rs/pull/2293
++[#2203]: https://github.com/delta-io/delta-kernel-rs/pull/2203
++[#2247]: https://github.com/delta-io/delta-kernel-rs/pull/2247
++[#2160]: https://github.com/delta-io/delta-kernel-rs/pull/2160
++[#2314]: https://github.com/delta-io/delta-kernel-rs/pull/2314
++[#2270]: https://github.com/delta-io/delta-kernel-rs/pull/2270
++[#2255]: https://github.com/delta-io/delta-kernel-rs/pull/2255
++[#2250]: https://github.com/delta-io/delta-kernel-rs/pull/2250
++[#2254]: https://github.com/delta-io/delta-kernel-rs/pull/2254
++[#2307]: https://github.com/delta-io/delta-kernel-rs/pull/2307
++[#2170]: https://github.com/delta-io/delta-kernel-rs/pull/2170
++[#2235]: https://github.com/delta-io/delta-kernel-rs/pull/2235
++[#2274]: https://github.com/delta-io/delta-kernel-rs/pull/2274
++[#1774]: https://github.com/delta-io/delta-kernel-rs/pull/1774
++[#2296]: https://github.com/delta-io/delta-kernel-rs/pull/2296
++[#2318]: https://github.com/delta-io/delta-kernel-rs/pull/2318
++[#2310]: https://github.com/delta-io/delta-kernel-rs/pull/2310
++[#2297]: https://github.com/delta-io/delta-kernel-rs/pull/2297
++[#2324]: https://github.com/delta-io/delta-kernel-rs/pull/2324
++[#2260]: https://github.com/delta-io/delta-kernel-rs/pull/2260
++[#2327]: https://github.com/delta-io/delta-kernel-rs/pull/2327
++[#2319]: https://github.com/delta-io/delta-kernel-rs/pull/2319
++[#2116]: https://github.com/delta-io/delta-kernel-rs/pull/2116
++[#2349]: https://github.com/delta-io/delta-kernel-rs/pull/2349
++[#2336]: https://github.com/delta-io/delta-kernel-rs/pull/2336
++[#2077]: https://github.com/delta-io/delta-kernel-rs/pull/2077                                                                                               
++[#2111]: https://github.com/delta-io/delta-kernel-rs/pull/2111                                                                                                 
++[#2065]: https://github.com/delta-io/delta-kernel-rs/pull/2065                                                                                               
++[#2025]: https://github.com/delta-io/delta-kernel-rs/pull/2025                                                                                               
++[#1996]: https://github.com/delta-io/delta-kernel-rs/pull/1996
++[#1717]: https://github.com/delta-io/delta-kernel-rs/pull/1717
++[#1922]: https://github.com/delta-io/delta-kernel-rs/pull/1922
++
+ ## [v0.20.0](https://github.com/delta-io/delta-kernel-rs/tree/v0.20.0/) (2026-02-26)
+ 
+ [Full Changelog](https://github.com/delta-io/delta-kernel-rs/compare/v0.19.2...v0.20.0)
+ 22. Implement schema diffing for flat schemas (2/5]) ([#1478])
+ 23. Add API on Scan to perform 2-phase log replay  ([#1547])
+ 24. Enable distributed log replay serde serialization for serializable scan state ([#1549])
+-25. Add InCommitTimestamp support to ChangeDataFeed ([#1670]) 
++25. Add InCommitTimestamp support to ChangeDataFeed ([#1670])
+ 26. Add include_stats_columns API and output_stats_schema field ([#1728])
+ 27. Add write support for clustered tables behind feature flag ([#1704])
+ 28. Add snapshot load instrumentation ([#1750])
\ No newline at end of file
CLAUDE.md
@@ -0,0 +1,108 @@
+diff --git a/CLAUDE.md b/CLAUDE.md
+--- a/CLAUDE.md
++++ b/CLAUDE.md
+ (`Snapshot`, `Scan`, `Transaction`) and delegates _how_ to the `Engine` trait.
+ 
+ Current capabilities: table reads with predicates, data skipping, deletion vectors, change
+-data feed, checkpoints (V1 & V2), log compaction, blind append writes, table creation
++data feed, checkpoints (V1 & V2), log compaction (disabled, #2337), blind append writes, table creation
+ (including clustered tables), and catalog-managed table support.
+ 
+ ## Build & Test Commands
+ cargo nextest run --workspace --all-features test_name_here
+ 
+ # Format, lint, and doc check (always run after code changes)
+-cargo fmt \
++cargo +nightly fmt \
+   && cargo clippy --workspace --benches --tests --all-features -- -D warnings \
+   && cargo doc --workspace --all-features --no-deps
+ 
+   --exclude delta_kernel --exclude delta_kernel_ffi --exclude delta_kernel_derive --exclude delta_kernel_ffi_macros -- -D warnings
+ 
+ # Quick pre-push check (mimics CI)
+-cargo fmt \
++cargo +nightly fmt \
+   && cargo clippy --workspace --benches --tests --all-features -- -D warnings \
+   && cargo doc --workspace --all-features --no-deps \
+   && cargo nextest run --workspace --all-features
+ 
+ ### Feature Flags
+ 
+-- `default-engine` / `default-engine-rustls` / `default-engine-native-tls` -- async
+-  Arrow/Tokio engine (pick one TLS backend)
++- `default-engine-rustls` / `default-engine-native-tls` -- async Arrow/Tokio engine (pick a TLS backend)
+ - `arrow`, `arrow-XX`, `arrow-YY` -- Arrow version selection (kernel tracks the latest two
+   major Arrow releases; `arrow` defaults to latest). Kernel itself does not depend on Arrow,
+-  but default-engine does.
++  but the default engine does.
+ - `arrow-conversion`, `arrow-expression` -- Arrow interop (auto-enabled by default engine)
+ - `prettyprint` -- enables Arrow pretty-print helpers (primarily test/example oriented)
+-- `catalog-managed` -- catalog-managed table support (experimental)
+ - `clustered-table` -- clustered table write support (experimental)
+ - `internal-api` -- unstable APIs like `parallel_scan_metadata`. Items are marked with the
+   `#[internal_api]` proc macro attribute.
+ `execute()` (simple), `scan_metadata()` (advanced/distributed),
+ `parallel_scan_metadata()` (two-phase distributed log replay).
+ 
+-**Write path:** `Snapshot` -> `Transaction` -> `commit()`. Kernel provides `WriteContext`,
+-assembles commit actions, enforces protocol compliance, delegates atomic commit to a
+-`Committer`.
++**Write path:** `Snapshot` -> `Transaction` -> `commit()`. Kernel provides `WriteContext`
++(via `partitioned_write_context` or `unpartitioned_write_context`), assembles commit
++actions, enforces protocol compliance, delegates atomic commit to a `Committer`.
+ 
+ **Engine trait:** five handlers (`StorageHandler`, `JsonHandler`, `ParquetHandler`,
+ `EvaluationHandler`, optional `MetricsReporter`). `DefaultEngine` lives in
+   or inputs. Prefer `#[case]` over duplicating test functions. When parameters are
+   independent and form a cartesian product, prefer `#[values]` over enumerating
+   every combination with `#[case]`.
++- Actively look for rstest consolidation opportunities: when writing multiple tests
++  that share the same setup/flow and differ only in configuration and expected
++  outcome, write one parameterized rstest instead of separate functions. Also check
++  whether a new test duplicates the flow of an existing nearby test and should be
++  merged into it as a new `#[case]`. A common pattern is toggling a feature (e.g.
++  column mapping on/off) and asserting success vs. error.
+ - Reuse helpers from `test_utils` instead of writing custom ones when possible.
++- **Committing in tests:** Use `txn.commit(engine)?.unwrap_committed()` to assert a
++  successful commit and get the `CommittedTransaction`. Do NOT use `match` + `panic!`
++  for this -- `unwrap_committed()` provides a clear error message on failure. Available
++  under `#[cfg(test)]` and the `test-utils` feature.
++- **Prefer snapshot/public API assertions over reading raw commit JSON.** Only read raw
++  commit JSON when the data is inaccessible via public API (e.g., system domain metadata
++  is blocked by `get_domain_metadata`). For commit JSON reads, use `read_actions_from_commit`
++  from `test_utils` -- do NOT write local helpers that duplicate this.
+ - **`add_commit` and table setup in tests:** `add_commit` takes a `table_root` string and
+   resolves it to an absolute object-store path. The `table_root` must be a proper URL string
+   with a trailing slash (e.g. `"memory:///"`, `"file:///tmp/my_table/"`). Avoid using the
+   `allowColumnDefaults`, `changeDataFeed`, `identityColumns`, `rowTracking`,
+   `domainMetadata`, `icebergCompatV1`, `icebergCompatV2`, `clustering`,
+   `inCommitTimestamp`
+-- Reader + writer: `columnMapping`, `deletionVectors`, `timestampNtz`,
+-  `v2Checkpoint`, `vacuumProtocolCheck`, `variantType`, `variantType-preview`,
+-  `typeWidening`
++- Reader + writer: `catalogManaged`, `catalogOwned-preview`, `columnMapping`,
++  `deletionVectors`, `timestampNtz`, `v2Checkpoint`, `vacuumProtocolCheck`,
++  `variantType`, `variantType-preview`, `typeWidening`
+ 
+ Keep this list updated when new protocol features are added to kernel.
+ 
+ - Code comments state intent and explain "why" -- don't restate what the code self-documents.
+ - Place `use` imports at the top of the file (for non-test code) or at the top of the
+   `mod tests` block (for test code) -- never inside function bodies.
++- Prefer `==` over `matches!` for simple single-variant enum comparisons. `matches!` is
++  for patterns with bindings or guards. For example: `self == Variant` not
++  `matches!(self, Variant)`.
++- Prefer `StructField::nullable` / `StructField::not_null` over
++  `StructField::new(name, type, bool)` when nullability is known at compile time.
++  Reserve `StructField::new` for cases where nullability is a runtime value.
+ - NEVER panic in production code -- use errors instead. Panicking
+   (including `unwrap()`, `expect()`, `panic!()`, `unreachable!()`, etc) is acceptable in test code only.
+ 
+ a newer (potentially compromised) transitive dependency. If `Cargo.lock` is out of sync with
+ `Cargo.toml`, the build fails immediately, forcing dependency changes to be explicit and
+ reviewable. See the top-level comment in `build.yml` for full rationale. Commands exempt from
+-`--locked`: `cargo fmt` (no dep resolution), `cargo msrv verify/show` (wrapper tool),
++`--locked`: `cargo +nightly fmt` (no dep resolution), `cargo msrv verify/show` (wrapper tool),
+ `cargo miri setup` (tooling setup).
+ 
+ Ensure that when writing any github action you are considering safety including thinking of
\ No newline at end of file
CLAUDE/architecture.md
@@ -0,0 +1,49 @@
+diff --git a/CLAUDE/architecture.md b/CLAUDE/architecture.md
+--- a/CLAUDE/architecture.md
++++ b/CLAUDE/architecture.md
+ 
+ Built via `Snapshot::builder_for(url).build(engine)` (latest version) or
+ `.at_version(v).build(engine)` (specific version). For catalog-managed tables,
+-`.with_log_tail(commits)` supplies recent unpublished commits from the catalog.
++`.with_log_tail(commits)` supplies recent unpublished commits from the catalog and
++`.with_max_catalog_version(v)` caps the snapshot at the latest catalog-ratified version.
+ 
+ **Snapshot loading internals:**
+ 1. **LogSegment** (`kernel/src/log_segment/`) -- discovers commits + checkpoints for the
+ 
+ `Snapshot` -> `Transaction` -> commit
+ 
+-The kernel coordinates the write transaction: it provides the write context (target directory,
+-physical schema, stats columns), assembles commit actions (CommitInfo, Add files), enforces
+-protocol compliance (table features, schema validation), and delegates the atomic commit to a
+-`Committer`.
++The kernel coordinates the write transaction: it provides the write context (validated partition
++values, recommended write directory, physical schema, stats columns), assembles commit
++actions (CommitInfo, Add files), enforces protocol compliance (table features, schema validation),
++and delegates the atomic commit to a `Committer`.
+ 
+ **Steps:**
+ 1. Create `Transaction` from a snapshot with a `Committer` (e.g. `FileSystemCommitter`)
+-2. Get `WriteContext` for target dir, physical schema, and stats columns
++2. Get `WriteContext` via `partitioned_write_context(values)` or `unpartitioned_write_context()`
+ 3. Write Parquet files (via engine), collect file metadata
+ 4. Register files via `txn.add_files(metadata)`
+ 5. Commit: returns `CommittedTransaction`, `ConflictedTransaction`, or `RetryableTransaction`
+ - `kernel/src/snapshot/` -- `Snapshot`, `SnapshotBuilder`, entry point for reads/writes
+ - `kernel/src/scan/` -- `Scan`, `ScanBuilder`, log replay, data skipping
+ - `kernel/src/transaction/` -- `Transaction`, `WriteContext`, `create_table` builder
++- `kernel/src/partition/` -- partition value validation, serialization, Hive-style path
++   encoding, URI encoding for `add.path`
+ - `kernel/src/committer/` -- `Committer` trait, `FileSystemCommitter`
+ - `kernel/src/log_segment/` -- log file discovery, Protocol/Metadata replay
+ - `kernel/src/log_replay.rs` -- file-action deduplication, `LogReplayProcessor` trait
+ 
+ Tables whose commits go through a catalog (e.g. Unity Catalog) instead of direct filesystem
+ writes. Kernel doesn't know about catalogs -- the catalog client provides a log tail via
+-`SnapshotBuilder::with_log_tail()` and a custom `Committer` for staging/ratifying/publishing
+-commits. Requires `catalog-managed` feature flag.
++`SnapshotBuilder::with_log_tail()`, caps the version via `with_max_catalog_version()`, and
++uses a custom `Committer` for staging/ratifying/publishing commits.
+ 
+ The `UCCommitter` (in the `delta-kernel-unity-catalog` crate) is the reference implementation of a catalog
+ committer for Unity Catalog. It stages commits to `_staged_commits/`, calls the UC commit API to
\ No newline at end of file
CONTRIBUTING.md
@@ -0,0 +1,19 @@
+diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
+--- a/CONTRIBUTING.md
++++ b/CONTRIBUTING.md
+    # build docs
+    cargo doc --workspace --all-features
+    # highly recommend editor that automatically formats, but in case you need to:
+-   cargo fmt
++   cargo +nightly fmt
+ 
+    # run more tests
+    cargo test --workspace --all-features -- --skip read_table_version_hdfs
+ #### General Tips
+ 
+ 1. When making your first PR, please read our contributor guidelines: https://github.com/delta-incubator/delta-kernel-rs/blob/main/CONTRIBUTING.md
+-2. Run `cargo t --all-features --all-targets` to get started testing, and run `cargo fmt`.
++2. Run `cargo t --all-features --all-targets` to get started testing, and run `cargo +nightly fmt`.
+ 3. Ensure you have added or run the appropriate tests for your PR.
+ 4. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP] Your PR title ...'.
+ 5. Be sure to keep the PR description updated to reflect all changes.
\ No newline at end of file

... (truncated, output exceeded 60000 bytes)

Reproduce locally: git range-diff ac9dc19..68b8cd8 7866824..848b098 | Disable: git config gitstack.push-range-diff false

@lorenarosati lorenarosati changed the title geo schema type and table feat feat: Add geo schema types and table feature Apr 28, 2026
@lorenarosati lorenarosati force-pushed the stack/schema-table-feat-geo branch from 848b098 to 06499d0 Compare April 28, 2026 22:39
@lorenarosati
Copy link
Copy Markdown
Collaborator Author

Range-diff: main (06499d0 -> 3d37746)
kernel/src/schema/mod.rs
@@ -165,9 +165,9 @@
      #[serde(serialize_with = "serialize_decimal", untagged)]
      Decimal(DecimalType),
 +    #[serde(serialize_with = "serialize_geometry", untagged)]
-+    Geometry(GeometryType),
++    Geometry(Box<GeometryType>),
 +    #[serde(serialize_with = "serialize_geography", untagged)]
-+    Geography(GeographyType),
++    Geography(Box<GeographyType>),
  }
  
  impl PrimitiveType {
@@ -194,14 +194,15 @@
                      .map(PrimitiveType::Decimal)
                      .map_err(serde::de::Error::custom)
              }
-+            "geometry" => Ok(PrimitiveType::Geometry(GeometryType::default())),
++            "geometry" => Ok(PrimitiveType::Geometry(Box::default())),
 +            geo_str if geo_str.starts_with("geometry(") && geo_str.ends_with(')') => {
 +                let srid = &geo_str[9..geo_str.len() - 1];
 +                GeometryType::try_new(srid.trim())
++                    .map(Box::new)
 +                    .map(PrimitiveType::Geometry)
 +                    .map_err(serde::de::Error::custom)
 +            }
-+            "geography" => Ok(PrimitiveType::Geography(GeographyType::default())),
++            "geography" => Ok(PrimitiveType::Geography(Box::default())),
 +            geo_str if geo_str.starts_with("geography(") && geo_str.ends_with(')') => {
 +                let inner = &geo_str[10..geo_str.len() - 1];
 +                // Three accepted shapes:
@@ -215,6 +216,7 @@
 +                        let algorithm: EdgeInterpolationAlgorithm =
 +                            algo_str.parse().map_err(serde::de::Error::custom)?;
 +                        GeographyType::try_new(srid, algorithm)
++                            .map(Box::new)
 +                            .map(PrimitiveType::Geography)
 +                            .map_err(serde::de::Error::custom)
 +                    }
@@ -222,12 +224,14 @@
 +                        let trimmed = inner.trim();
 +                        if trimmed.contains(':') {
 +                            GeographyType::try_new_with_srid(trimmed)
++                                .map(Box::new)
 +                                .map(PrimitiveType::Geography)
 +                                .map_err(serde::de::Error::custom)
 +                        } else {
 +                            let algorithm: EdgeInterpolationAlgorithm =
 +                                trimmed.parse().map_err(serde::de::Error::custom)?;
 +                            GeographyType::try_new_with_algorithm(algorithm)
++                                .map(Box::new)
 +                                .map(PrimitiveType::Geography)
 +                                .map_err(serde::de::Error::custom)
 +                        }
@@ -250,7 +254,7 @@
  }
 +impl From<GeometryType> for PrimitiveType {
 +    fn from(gtype: GeometryType) -> Self {
-+        PrimitiveType::Geometry(gtype)
++        PrimitiveType::Geometry(Box::new(gtype))
 +    }
 +}
 +impl From<GeometryType> for DataType {
@@ -260,7 +264,7 @@
 +}
 +impl From<GeographyType> for PrimitiveType {
 +    fn from(gtype: GeographyType) -> Self {
-+        PrimitiveType::Geography(gtype)
++        PrimitiveType::Geography(Box::new(gtype))
 +    }
 +}
 +impl From<GeographyType> for DataType {
@@ -287,9 +291,9 @@
 +        let field: StructField = serde_json::from_str(data).unwrap();
 +        assert_eq!(
 +            field.data_type,
-+            DataType::Primitive(PrimitiveType::Geometry(
++            DataType::Primitive(PrimitiveType::Geometry(Box::new(
 +                GeometryType::try_new("EPSG:4326").unwrap()
-+            ))
++            )))
 +        );
 +
 +        let json_str = serde_json::to_string(&field).unwrap();
@@ -312,9 +316,9 @@
 +        let field: StructField = serde_json::from_str(data).unwrap();
 +        assert_eq!(
 +            field.data_type,
-+            DataType::Primitive(PrimitiveType::Geography(
++            DataType::Primitive(PrimitiveType::Geography(Box::new(
 +                GeographyType::try_new("EPSG:4326", EdgeInterpolationAlgorithm::Vincenty).unwrap()
-+            ))
++            )))
 +        );
 +
 +        let json_str = serde_json::to_string(&field).unwrap();
@@ -331,27 +335,27 @@
      }
  
 +    #[rstest]
-+    #[case("geometry", PrimitiveType::Geometry(GeometryType::default()))]
-+    #[case("geography", PrimitiveType::Geography(GeographyType::default()))]
++    #[case("geometry", PrimitiveType::Geometry(Box::default()))]
++    #[case("geography", PrimitiveType::Geography(Box::default()))]
 +    #[case(
 +        "geometry(EPSG:4326)",
-+        PrimitiveType::Geometry(GeometryType::try_new("EPSG:4326").unwrap())
++        PrimitiveType::Geometry(Box::new(GeometryType::try_new("EPSG:4326").unwrap()))
 +    )]
 +    #[case(
 +        "geography(EPSG:4326)",
-+        PrimitiveType::Geography(GeographyType::try_new_with_srid("EPSG:4326").unwrap())
++        PrimitiveType::Geography(Box::new(GeographyType::try_new_with_srid("EPSG:4326").unwrap()))
 +    )]
 +    #[case(
 +        "geography(EPSG:4326, vincenty)",
-+        PrimitiveType::Geography(
++        PrimitiveType::Geography(Box::new(
 +            GeographyType::try_new("EPSG:4326", EdgeInterpolationAlgorithm::Vincenty).unwrap()
-+        )
++        ))
 +    )]
 +    #[case(
 +        "geography(vincenty)",
-+        PrimitiveType::Geography(
++        PrimitiveType::Geography(Box::new(
 +            GeographyType::try_new_with_algorithm(EdgeInterpolationAlgorithm::Vincenty).unwrap()
-+        )
++        ))
 +    )]
 +    fn test_geo_deserialize_defaults(#[case] type_str: &str, #[case] expected: PrimitiveType) {
 +        let json = format!(r#"{{"name":"g","type":"{type_str}","nullable":true,"metadata":{{}}}}"#);
kernel/src/table_configuration.rs
@@ -9,38 +9,6 @@
      validate_timestamp_ntz_feature_support, ColumnMappingMode, EnablementCheck, FeatureRequirement,
      FeatureType, KernelSupport, Operation, TableFeature, LEGACY_READER_FEATURES,
      LEGACY_WRITER_FEATURES, MAX_VALID_READER_VERSION, MAX_VALID_WRITER_VERSION,
-         version: Version,
-     ) -> DeltaResult<Self> {
-         let logical_schema = Arc::new(metadata.parse_schema()?);
-+        Self::try_new_inner(metadata, protocol, table_root, version, logical_schema)
-+    }
-+
-+    /// Like [`try_new`](Self::try_new), but reuses `base`'s protocol, table root, and version
-+    /// and takes a pre-parsed `logical_schema`.
-+    pub(crate) fn try_new_with_schema(
-+        base: &Self,
-+        metadata: Metadata,
-+        logical_schema: SchemaRef,
-+    ) -> DeltaResult<Self> {
-+        Self::try_new_inner(
-+            metadata,
-+            base.protocol.clone(),
-+            base.table_root.clone(),
-+            base.version,
-+            logical_schema,
-+        )
-+    }
-+
-+    fn try_new_inner(
-+        metadata: Metadata,
-+        protocol: Protocol,
-+        table_root: Url,
-+        version: Version,
-+        logical_schema: SchemaRef,
-+    ) -> DeltaResult<Self> {
-         let table_properties = metadata.parse_table_properties();
-         let column_mapping_mode = column_mapping_mode(&protocol, &table_properties);
- 
  
          // Validate schema against protocol features now that we have a TC instance.
          validate_timestamp_ntz_feature_support(&table_config)?;
kernel/src/table_features/geospatial.rs
@@ -53,9 +53,7 @@
 +#[cfg(test)]
 +mod tests {
 +    use crate::actions::Protocol;
-+    use crate::schema::{
-+        DataType, GeographyType, GeometryType, PrimitiveType, StructField, StructType,
-+    };
++    use crate::schema::{DataType, PrimitiveType, StructField, StructType};
 +    use crate::table_features::TableFeature;
 +    use crate::utils::test_utils::{
 +        assert_result_error_with_message, assert_schema_feature_validation,
@@ -67,7 +65,7 @@
 +            StructField::new("id", DataType::INTEGER, false),
 +            StructField::new(
 +                "geom",
-+                DataType::Primitive(PrimitiveType::Geometry(GeometryType::default())),
++                DataType::Primitive(PrimitiveType::Geometry(Box::default())),
 +                true,
 +            ),
 +        ]);
@@ -81,7 +79,7 @@
 +                "nested",
 +                DataType::Struct(Box::new(StructType::new_unchecked([StructField::new(
 +                    "inner_geo",
-+                    DataType::Primitive(PrimitiveType::Geography(GeographyType::default())),
++                    DataType::Primitive(PrimitiveType::Geography(Box::default())),
 +                    true,
 +                )]))),
 +                true,
kernel/src/actions/mod.rs
@@ -1,32 +0,0 @@
-diff --git a/kernel/src/actions/mod.rs b/kernel/src/actions/mod.rs
---- a/kernel/src/actions/mod.rs
-+++ b/kernel/src/actions/mod.rs
- }
- 
- // Serde derives are needed for CRC file deserialization (see `crc::reader`).
-+//
-+// TODO(#2446): `Metadata` stores the schema only as a JSON string. Callers that already hold
-+// a parsed `SchemaRef` (e.g. CREATE TABLE) serialize into `schema_string` and then re-parse
-+// downstream in `TableConfiguration::try_new` via `parse_schema()`. Caching the parsed schema
-+// on `Metadata` would eliminate the round-trip.
- #[derive(Debug, Default, Clone, PartialEq, Eq, Serialize, Deserialize, ToSchema)]
- #[serde(rename_all = "camelCase")]
- #[internal_api]
-         TableProperties::from(self.configuration.iter())
-     }
- 
-+    /// Returns a new Metadata with the schema replaced, preserving all other fields.
-+    ///
-+    /// # Errors
-+    ///
-+    /// Returns an error if schema serialization fails.
-+    pub(crate) fn with_schema(self, schema: SchemaRef) -> DeltaResult<Self> {
-+        Ok(Self {
-+            schema_string: serde_json::to_string(&schema)?,
-+            ..self
-+        })
-+    }
-+
-     #[cfg(test)]
-     #[allow(clippy::too_many_arguments)]
-     pub(crate) fn new_unchecked(
\ No newline at end of file
kernel/src/engine/arrow_expression/evaluate_expression.rs
@@ -1,154 +0,0 @@
-diff --git a/kernel/src/engine/arrow_expression/evaluate_expression.rs b/kernel/src/engine/arrow_expression/evaluate_expression.rs
---- a/kernel/src/engine/arrow_expression/evaluate_expression.rs
-+++ b/kernel/src/engine/arrow_expression/evaluate_expression.rs
-         (Literal(scalar), _) => {
-             validate_array_type(scalar.to_array(batch.num_rows())?, result_type)
-         }
--        (Column(name), _) => {
--            // Column extraction uses ordinal-based struct validation because column mapping
--            // can cause physical/logical name mismatches. apply_schema handles renaming.
--            let arr = extract_column(batch, name)?;
--            if let Some(expected) = result_type {
--                ensure_data_types(expected, arr.data_type(), ValidationMode::TypesOnly)?;
--            }
--            Ok(arr)
--        }
-+        (Column(name), _) => validate_array_type(extract_column(batch, name)?, result_type),
-         (Struct(fields, nullability), Some(DataType::Struct(output_schema))) => {
-             evaluate_struct_expression(fields, batch, output_schema, nullability.as_ref())
-         }
-     }
- 
-     #[test]
--    fn column_extract_struct_with_mismatched_field_names() {
-+    fn column_extract_struct_rejects_mismatched_field_names() {
-         let batch = make_struct_batch(
-             vec![
-                 ArrowField::new("col-abc-001", ArrowDataType::Int64, true),
-             ],
-         );
- 
--        // Logical names differ from physical names due to column mapping
-         let logical_type = DataType::try_struct_type([
-             StructField::nullable("my_column", DataType::LONG),
-             StructField::nullable("other_column", DataType::LONG),
- 
-         let expr = column_expr!("stats");
-         let result = evaluate_expression(&expr, &batch, Some(&logical_type));
--
--        // Ordinal-based validation passes: same field count and types by position.
--        // The downstream apply_schema transformation handles renaming.
--        let arr = result.expect("should succeed with mismatched names but matching types");
--        let struct_arr = arr.as_any().downcast_ref::<StructArray>().unwrap();
--        assert_eq!(struct_arr.num_columns(), 2);
--        assert_eq!(struct_arr.len(), 2);
--    }
--
--    #[test]
--    fn column_extract_struct_rejects_mismatched_field_count() {
--        let batch = make_struct_batch(
--            vec![ArrowField::new("col-abc-001", ArrowDataType::Int64, true)],
--            vec![Arc::new(Int64Array::from(vec![Some(1), Some(2)]))],
--        );
--
--        let logical_type = DataType::try_struct_type([
--            StructField::nullable("a", DataType::LONG),
--            StructField::nullable("b", DataType::LONG),
--        ])
--        .unwrap();
--
--        let expr = column_expr!("stats");
--        let result = evaluate_expression(&expr, &batch, Some(&logical_type));
--        assert_result_error_with_message(result, "Struct field count mismatch");
-+        assert_result_error_with_message(result, "Missing Struct fields");
-     }
- 
-     #[test]
-     fn column_extract_struct_rejects_mismatched_child_types() {
-         let batch = make_struct_batch(
-             vec![
--                ArrowField::new("col-abc-001", ArrowDataType::Int64, true),
--                ArrowField::new("col-abc-002", ArrowDataType::Utf8, true),
-+                ArrowField::new("a", ArrowDataType::Int64, true),
-+                ArrowField::new("b", ArrowDataType::Utf8, true),
-             ],
-             vec![
-                 Arc::new(Int64Array::from(vec![Some(1)])),
-             ],
-         );
- 
--        // Expect two LONG columns, but the second arrow field is Utf8
-         let logical_type = DataType::try_struct_type([
-             StructField::nullable("a", DataType::LONG),
-             StructField::nullable("b", DataType::LONG),
-     }
- 
-     #[test]
--    fn column_extract_struct_with_matching_names_still_works() {
-+    fn column_extract_struct_with_matching_names_works() {
-         let batch = make_struct_batch(
-             vec![
-                 ArrowField::new("a", ArrowDataType::Int64, true),
-         assert!(result.is_ok());
-     }
- 
--    /// Exercises the exact code path from `get_add_transform_expr` where a `struct_from`
--    /// expression wraps `column_expr!("add.stats_parsed")`. When the checkpoint parquet has
--    /// stats_parsed with physical column names (e.g. `col-abc-001`) but the output schema
--    /// uses logical names (e.g. `id`), `evaluate_struct_expression` calls
--    /// `evaluate_expression(Column, struct_result_type)` with mismatched field names.
--    /// Without ordinal-based validation this fails with a name mismatch error.
-+    /// When a `struct_from` expression wraps a `Column` referencing stats_parsed, and the
-+    /// checkpoint parquet has physical column names (e.g. `col-abc-001`) but the output schema
-+    /// uses logical names (e.g. `id`), name-based validation correctly rejects the mismatch.
-     #[test]
--    fn struct_from_with_column_tolerates_nested_name_mismatch() {
--        // Build a batch mimicking checkpoint data: add.stats_parsed uses physical names
-+    fn struct_from_with_column_rejects_nested_name_mismatch() {
-         let stats_fields: Vec<ArrowField> = vec![
-             ArrowField::new("col-abc-001", ArrowDataType::Int64, true),
-             ArrowField::new("col-abc-002", ArrowDataType::Int64, true),
-         )]);
-         let batch = RecordBatch::try_new(Arc::new(schema), vec![Arc::new(add_struct)]).unwrap();
- 
--        // struct_from mimicking get_add_transform_expr: wraps a Column referencing stats_parsed
-         let expr = Expr::struct_from([
-             column_expr_ref!("add.path"),
-             column_expr_ref!("add.stats_parsed"),
-         .unwrap();
- 
-         let result = evaluate_expression(&expr, &batch, Some(&output_type));
--        result.expect("struct_from with Column sub-expression should tolerate field name mismatch");
--    }
--
--    #[test]
--    fn column_extract_nested_struct_with_mismatched_names() {
--        let inner_fields = vec![ArrowField::new("phys-inner", ArrowDataType::Int64, true)];
--        let inner_struct = ArrowDataType::Struct(inner_fields.clone().into());
--        let batch = make_struct_batch(
--            vec![ArrowField::new("phys-outer", inner_struct, true)],
--            vec![Arc::new(
--                StructArray::try_new(
--                    inner_fields.into(),
--                    vec![Arc::new(Int64Array::from(vec![Some(42)]))],
--                    None,
--                )
--                .unwrap(),
--            )],
--        );
--
--        let logical_type = DataType::try_struct_type([StructField::nullable(
--            "logical_outer",
--            DataType::struct_type_unchecked([StructField::nullable(
--                "logical_inner",
--                DataType::LONG,
--            )]),
--        )])
--        .unwrap();
--
--        let expr = column_expr!("stats");
--        let result = evaluate_expression(&expr, &batch, Some(&logical_type));
--        assert!(result.is_ok());
-+        assert_result_error_with_message(result, "Missing Struct fields");
-     }
- }
\ No newline at end of file
kernel/src/engine/ensure_data_types.rs
@@ -1,13 +0,0 @@
-diff --git a/kernel/src/engine/ensure_data_types.rs b/kernel/src/engine/ensure_data_types.rs
---- a/kernel/src/engine/ensure_data_types.rs
-+++ b/kernel/src/engine/ensure_data_types.rs
- #[internal_api]
- pub(crate) enum ValidationMode {
-     /// Check types only. Struct fields are matched by ordinal position, not by name.
--    /// Nullability and metadata are not checked. Used by the expression evaluator where
--    /// column mapping can cause physical/logical name mismatches.
-+    /// Nullability and metadata are not checked.
-+    #[allow(dead_code)]
-     TypesOnly,
-     /// Check types and match struct fields by name, but skip nullability and metadata.
-     /// Used by the parquet reader where fields are already resolved by name upstream.
\ No newline at end of file
kernel/src/schema/validation.rs
@@ -1,48 +0,0 @@
-diff --git a/kernel/src/schema/validation.rs b/kernel/src/schema/validation.rs
---- a/kernel/src/schema/validation.rs
-+++ b/kernel/src/schema/validation.rs
--//! Schema validation utilities for Delta table creation.
-+//! Schema validation utilities shared by table creation and schema evolution.
- //!
- //! Validates schemas per the Delta protocol specification.
- 
- /// These characters have special meaning in Parquet schema syntax.
- const INVALID_PARQUET_CHARS: &[char] = &[' ', ',', ';', '{', '}', '(', ')', '\n', '\t', '='];
- 
--/// Validates a schema for table creation.
-+/// Validates a schema for CREATE TABLE or ALTER TABLE.
- ///
- /// Performs the following checks:
- /// 1. Schema is non-empty
- /// 3. Column names contain only valid characters
- /// 4. Rejects fields with `delta.invariants` metadata (SQL expression invariants are not supported
- ///    by kernel; see `TableConfiguration::ensure_write_supported`)
--pub(crate) fn validate_schema_for_create(
-+pub(crate) fn validate_schema(
-     schema: &StructType,
-     column_mapping_mode: ColumnMappingMode,
- ) -> DeltaResult<()> {
-     #[case::dot_in_name_with_cm(schema_with_dot(), ColumnMappingMode::Name)]
-     #[case::different_struct_children(schema_different_struct_children(), ColumnMappingMode::None)]
-     fn valid_schema_accepted(#[case] schema: StructType, #[case] cm: ColumnMappingMode) {
--        assert!(validate_schema_for_create(&schema, cm).is_ok());
-+        assert!(validate_schema(&schema, cm).is_ok());
-     }
- 
-     // === Invalid schemas ===
-         #[case] cm: ColumnMappingMode,
-         #[case] expected_errs: &[&str],
-     ) {
--        let result = validate_schema_for_create(&schema, cm);
-+        let result = validate_schema(&schema, cm);
-         assert!(result.is_err());
-         let err = result.unwrap_err().to_string();
-         for expected in expected_errs {
-     #[case::array_nested(schema_array_nested_invariant(), "arr.child")]
-     #[case::map_nested(schema_map_nested_invariant(), "map.child")]
-     fn invariants_metadata_rejected(#[case] schema: StructType, #[case] expected_path: &str) {
--        let result = validate_schema_for_create(&schema, ColumnMappingMode::None);
-+        let result = validate_schema(&schema, ColumnMappingMode::None);
-         let err = result.expect_err("expected delta.invariants metadata rejection");
-         let msg = err.to_string();
-         assert!(
\ No newline at end of file
kernel/src/snapshot/mod.rs
@@ -1,27 +0,0 @@
-diff --git a/kernel/src/snapshot/mod.rs b/kernel/src/snapshot/mod.rs
---- a/kernel/src/snapshot/mod.rs
-+++ b/kernel/src/snapshot/mod.rs
- use crate::table_configuration::{InCommitTimestampEnablement, TableConfiguration};
- use crate::table_features::{physical_to_logical_column_name, ColumnMappingMode, TableFeature};
- use crate::table_properties::TableProperties;
-+use crate::transaction::builder::alter_table::AlterTableTransactionBuilder;
- use crate::transaction::Transaction;
- use crate::utils::require;
- use crate::{DeltaResult, Engine, Error, LogCompactionWriter, Version};
-         Transaction::try_new_existing_table(self, committer, engine)
-     }
- 
-+    /// Creates a builder for altering this table's metadata. Currently supports schema change
-+    /// operations.
-+    ///
-+    /// The returned builder allows chaining operations before building an
-+    /// [`AlterTableTransaction`] that can be committed.
-+    ///
-+    /// [`AlterTableTransaction`]: crate::transaction::AlterTableTransaction
-+    pub fn alter_table(self: Arc<Self>) -> AlterTableTransactionBuilder {
-+        AlterTableTransactionBuilder::new(self)
-+    }
-+
-     /// Fetch the latest version of the provided `application_id` for this snapshot. Filters the
-     /// txn based on the delta.setTransactionRetentionDuration property and lastUpdated.
-     ///
\ No newline at end of file
kernel/src/transaction/alter_table.rs
@@ -1,81 +0,0 @@
-diff --git a/kernel/src/transaction/alter_table.rs b/kernel/src/transaction/alter_table.rs
-new file mode 100644
---- /dev/null
-+++ b/kernel/src/transaction/alter_table.rs
-+//! Alter table transaction types and constructor.
-+//!
-+//! This module defines the [`AlterTableTransaction`] type alias and the
-+//! [`try_new_alter_table`](AlterTableTransaction::try_new_alter_table) constructor.
-+//! The builder logic lives in [`builder::alter_table`](super::builder::alter_table).
-+
-+#![allow(unreachable_pub)]
-+
-+use std::marker::PhantomData;
-+use std::sync::OnceLock;
-+
-+use crate::committer::Committer;
-+use crate::snapshot::SnapshotRef;
-+use crate::table_configuration::TableConfiguration;
-+use crate::transaction::{AlterTable, Transaction};
-+use crate::utils::current_time_ms;
-+use crate::DeltaResult;
-+
-+/// A type alias for alter-table transactions.
-+///
-+/// This provides a restricted API surface that only exposes operations valid during ALTER
-+/// commands. Data file operations are not available at compile time because `AlterTable`
-+/// does not implement [`SupportsDataFiles`](super::SupportsDataFiles).
-+pub type AlterTableTransaction = Transaction<AlterTable>;
-+
-+impl AlterTableTransaction {
-+    /// Create a new transaction for altering a table's schema. Produces a metadata-only commit
-+    /// that emits an updated Metadata action with the evolved schema.
-+    ///
-+    /// The `effective_table_config` is the evolved table configuration (new schema, same
-+    /// protocol). It must be fully validated before calling this constructor (e.g. schema
-+    /// operations applied, protocol feature checks passed). The `read_snapshot` provides the
-+    /// pre-commit table state (version, previous protocol/metadata, ICT timestamps) used for
-+    /// commit versioning and post-commit snapshots.
-+    ///
-+    /// This is typically called via `AlterTableTransactionBuilder::build()` rather than directly.
-+    pub(crate) fn try_new_alter_table(
-+        read_snapshot: SnapshotRef,
-+        effective_table_config: TableConfiguration,
-+        committer: Box<dyn Committer>,
-+    ) -> DeltaResult<Self> {
-+        let span = tracing::info_span!(
-+            "txn",
-+            path = %read_snapshot.table_root(),
-+            read_version = read_snapshot.version(),
-+            operation = "ALTER TABLE",
-+        );
-+
-+        Ok(Transaction {
-+            span,
-+            read_snapshot_opt: Some(read_snapshot),
-+            effective_table_config,
-+            should_emit_protocol: false,
-+            should_emit_metadata: true,
-+            committer,
-+            operation: Some("ALTER TABLE".to_string()),
-+            engine_info: None,
-+            add_files_metadata: vec![],
-+            remove_files_metadata: vec![],
-+            set_transactions: vec![],
-+            commit_timestamp: current_time_ms()?,
-+            user_domain_metadata_additions: vec![],
-+            system_domain_metadata_additions: vec![],
-+            user_domain_removals: vec![],
-+            data_change: false,
-+            shared_write_state: OnceLock::new(),
-+            engine_commit_info: None,
-+            // TODO(#2446): match delta-spark's per-op isBlindAppend policy
-+            // (ADD/DROP/DROP NOT NULL -> true, SET NOT NULL -> false). Hardcoded false for
-+            // now: safe, but misses the true-case optimization delta-spark applies.
-+            is_blind_append: false,
-+            dv_matched_files: vec![],
-+            physical_clustering_columns: None,
-+            _state: PhantomData,
-+        })
-+    }
-+}
\ No newline at end of file
kernel/src/transaction/builder/alter_table.rs
@@ -1,168 +0,0 @@
-diff --git a/kernel/src/transaction/builder/alter_table.rs b/kernel/src/transaction/builder/alter_table.rs
-new file mode 100644
---- /dev/null
-+++ b/kernel/src/transaction/builder/alter_table.rs
-+//! Builder for ALTER TABLE (schema evolution) transactions.
-+//!
-+//! This module contains [`AlterTableTransactionBuilder`], which uses a type-state pattern to
-+//! enforce valid operation chaining at compile time.
-+//!
-+//! # Type States
-+//!
-+//! - [`Ready`]: Initial state. Operations are available, but `build()` is not (at least one
-+//!   operation is required).
-+//! - [`Modifying`]: After any chainable schema operation. More ops can be chained, and `build()` is
-+//!   available. See [`AlterTableTransactionBuilder<Modifying>`] for ops.
-+//!
-+//! # Transitions
-+//!
-+//! Each `impl` block below is gated by a state bound and documents which operations that
-+//! state enables. Chainable schema operations live on `impl<S: Chainable>` and transition
-+//! the builder to a chainable state; `build()` lives on states that are buildable.
-+//!
-+//! ```ignore
-+//! // Allowed: at least one op queued before build().
-+//! snapshot.alter_table().add_column(field).build(engine, committer)?;
-+//!
-+//! // Not allowed: build() is not defined on Ready (no ops queued).
-+//! snapshot.alter_table().build(engine, committer)?;  // compile error
-+//! ```
-+
-+use std::marker::PhantomData;
-+use std::sync::Arc;
-+
-+use crate::committer::Committer;
-+use crate::schema::StructField;
-+use crate::snapshot::SnapshotRef;
-+use crate::table_configuration::TableConfiguration;
-+use crate::table_features::Operation;
-+use crate::transaction::alter_table::AlterTableTransaction;
-+use crate::transaction::schema_evolution::{
-+    apply_schema_operations, SchemaEvolutionResult, SchemaOperation,
-+};
-+use crate::{DeltaResult, Engine};
-+
-+/// Initial state: `build()` is not yet available (at least one operation is required).
-+/// See [`Chainable`] for the operations available on this state.
-+pub struct Ready;
-+
-+/// State after at least one operation has been added. `build()` is available.
-+/// See [`Chainable`] for the operations available on this state.
-+pub struct Modifying;
-+
-+/// Marker trait for builder states that accept chainable schema operations. Grouping states
-+/// under one bound lets each op (like `add_column`) live on a single `impl<S: Chainable>`
-+/// block -- chainable states share the body rather than duplicating it per state.
-+///
-+/// Sealed: external types cannot implement this, keeping the set of chainable states closed.
-+pub trait Chainable: sealed::Sealed {}
-+impl Chainable for Ready {}
-+impl Chainable for Modifying {}
-+
-+mod sealed {
-+    pub trait Sealed {}
-+    impl Sealed for super::Ready {}
-+    impl Sealed for super::Modifying {}
-+}
-+
-+/// Builder for constructing an [`AlterTableTransaction`] with schema evolution operations.
-+///
-+/// Uses a type-state pattern (`S`) to enforce at compile time:
-+/// - At least one schema operation must be queued before `build()` is callable.
-+/// - Only operations valid for the current state can be chained. This will disallow incompatibel
-+///   chaining.
-+pub struct AlterTableTransactionBuilder<S = Ready> {
-+    snapshot: SnapshotRef,
-+    operations: Vec<SchemaOperation>,
-+    // PhantomData marker for builder state (Ready or Modifying).
-+    // Zero-sized; only affects which methods are available at compile time.
-+    _state: PhantomData<S>,
-+}
-+
-+impl<S> AlterTableTransactionBuilder<S> {
-+    // Reconstructs the builder with a different PhantomData marker, changing which methods
-+    // are available at compile time (e.g. Ready -> Modifying enables `build()`). All real
-+    // fields are moved as-is; only the zero-sized type state changes.
-+    //
-+    // `T` (distinct from the struct's `S`) lets the caller pick the target state:
-+    // `self.transition::<Modifying>()` returns `AlterTableTransactionBuilder<Modifying>`.
-+    fn transition<T>(self) -> AlterTableTransactionBuilder<T> {
-+        AlterTableTransactionBuilder {
-+            snapshot: self.snapshot,
-+            operations: self.operations,
-+            _state: PhantomData,
-+        }
-+    }
-+}
-+
-+impl AlterTableTransactionBuilder<Ready> {
-+    /// Create a new builder from a snapshot.
-+    pub(crate) fn new(snapshot: SnapshotRef) -> Self {
-+        AlterTableTransactionBuilder {
-+            snapshot,
-+            operations: Vec::new(),
-+            _state: PhantomData,
-+        }
-+    }
-+}
-+
-+impl<S: Chainable> AlterTableTransactionBuilder<S> {
-+    /// Add a new top-level column to the table schema.
-+    ///
-+    /// The field must not already exist in the schema (case-insensitive). The field must be
-+    /// nullable because existing data files do not contain this column and will read NULL for it.
-+    /// These constraints are validated during [`build()`](AlterTableTransactionBuilder::build).
-+    pub fn add_column(mut self, field: StructField) -> AlterTableTransactionBuilder<Modifying> {
-+        self.operations.push(SchemaOperation::AddColumn { field });
-+        self.transition()
-+    }
-+}
-+
-+impl AlterTableTransactionBuilder<Modifying> {
-+    /// Validate and apply schema operations, then build the [`AlterTableTransaction`].
-+    ///
-+    /// This method:
-+    /// 1. Validates the table supports writes
-+    /// 2. Applies each operation sequentially against the evolving schema
-+    /// 3. Constructs new Metadata action with evolved schema
-+    /// 4. Builds the evolved table configuration
-+    /// 5. Creates the transaction
-+    ///
-+    /// # Errors
-+    ///
-+    /// - Any individual operation fails validation (see per-method errors above)
-+    /// - Table does not support writes (unsupported features)
-+    /// - The evolved schema requires protocol features not enabled on the table (e.g. adding a
-+    ///   `timestampNtz` column without the `timestampNtz` feature)
-+    pub fn build(
-+        self,
-+        _engine: &dyn Engine,
-+        committer: Box<dyn Committer>,
-+    ) -> DeltaResult<AlterTableTransaction> {
-+        let table_config = self.snapshot.table_configuration();
-+        // Rejects writes to tables kernel can't safely commit to: writer version out of
-+        // kernel's supported range, unsupported writer features, or schemas with SQL-expression
-+        // invariants. Runs on the pre-alter snapshot; future ALTER variants that change the
-+        // protocol must also re-check this on the evolved `TableConfiguration`.
-+        table_config.ensure_operation_supported(Operation::Write)?;
-+
-+        let schema = Arc::unwrap_or_clone(table_config.logical_schema());
-+        let SchemaEvolutionResult {
-+            schema: evolved_schema,
-+        } = apply_schema_operations(schema, self.operations, table_config.column_mapping_mode())?;
-+
-+        let evolved_metadata = table_config
-+            .metadata()
-+            .clone()
-+            .with_schema(evolved_schema.clone())?;
-+
-+        // Validates the evolved metadata against the protocol.
-+        let evolved_table_config = TableConfiguration::try_new_with_schema(
-+            table_config,
-+            evolved_metadata,
-+            evolved_schema,
-+        )?;
-+
-+        AlterTableTransaction::try_new_alter_table(self.snapshot, evolved_table_config, committer)
-+    }
-+}
\ No newline at end of file
kernel/src/transaction/builder/create_table.rs
@@ -1,27 +0,0 @@
-diff --git a/kernel/src/transaction/builder/create_table.rs b/kernel/src/transaction/builder/create_table.rs
---- a/kernel/src/transaction/builder/create_table.rs
-+++ b/kernel/src/transaction/builder/create_table.rs
- use crate::clustering::{create_clustering_domain_metadata, validate_clustering_columns};
- use crate::committer::Committer;
- use crate::expressions::ColumnName;
--use crate::schema::validation::validate_schema_for_create;
-+use crate::schema::validation::validate_schema;
- use crate::schema::variant_utils::schema_contains_variant_type;
- use crate::schema::{
-     normalize_column_names_to_schema_casing, schema_contains_non_null_fields, DataType, SchemaRef,
- /// compatible with Spark readers/writers.
- ///
- /// Explicit `delta.invariants` metadata annotations are rejected by
--/// `validate_schema_for_create`, so this only flips on the feature for nullability-driven
-+/// `validate_schema`, so this only flips on the feature for nullability-driven
- /// invariants. Kernel does not itself enforce the null mask at write time -- it relies on
- /// the engine's `ParquetHandler` to do so. Kernel's default `ParquetHandler` uses
- /// `arrow-rs`, whose `RecordBatch::try_new` rejects null values in fields marked
-             maybe_apply_column_mapping_for_table_create(&self.schema, &mut validated)?;
- 
-         // Validate schema (non-empty, column names, duplicates, no `delta.invariants` metadata)
--        validate_schema_for_create(&effective_schema, column_mapping_mode)?;
-+        validate_schema(&effective_schema, column_mapping_mode)?;
- 
-         // Validate data layout and resolve column names (physical for clustering, logical
-         // for partitioning). Adds required table features for clustering.
\ No newline at end of file
kernel/src/transaction/builder/mod.rs
@@ -1,8 +0,0 @@
-diff --git a/kernel/src/transaction/builder/mod.rs b/kernel/src/transaction/builder/mod.rs
---- a/kernel/src/transaction/builder/mod.rs
-+++ b/kernel/src/transaction/builder/mod.rs
- // and for tests. Also allow dead_code since these are used by integration tests.
- #![allow(unreachable_pub, dead_code)]
- 
-+pub mod alter_table;
- pub mod create_table;
\ No newline at end of file
kernel/src/transaction/mod.rs
@@ -1,35 +0,0 @@
-diff --git a/kernel/src/transaction/mod.rs b/kernel/src/transaction/mod.rs
---- a/kernel/src/transaction/mod.rs
-+++ b/kernel/src/transaction/mod.rs
- #[cfg(not(feature = "internal-api"))]
- pub(crate) mod data_layout;
- 
-+pub(crate) mod alter_table;
-+pub use alter_table::AlterTableTransaction;
- mod commit_info;
- mod domain_metadata;
-+pub(crate) mod schema_evolution;
- mod stats_verifier;
- mod update;
- mod write_context;
- #[derive(Debug)]
- pub struct CreateTable;
- 
-+/// Marker type for alter-table (schema evolution) transactions.
-+///
-+/// Transactions in this state perform metadata-only commits. Data file operations are not
-+/// available at compile time because `AlterTable` does not implement [`SupportsDataFiles`].
-+#[derive(Debug)]
-+pub struct AlterTable;
-+
- /// Marker trait for transaction states that support data file operations.
- ///
- /// Only transaction types that implement this trait can access methods for adding, removing, or
- 
-     // Note: Additional test coverage for partial file matching (where some files in a scan
-     // have DV updates but others don't) is provided by the end-to-end integration test
--    // kernel/tests/dv.rs and kernel/tests/write.rs, which exercises
-+    // kernel/tests/dv.rs and kernel/tests/write_remove_dv.rs, which exercise
-     // the full deletion vector write workflow including the DvMatchVisitor logic.
- 
-     #[test]
\ No newline at end of file
kernel/src/transaction/schema_evolution.rs
@@ -1,190 +0,0 @@
-diff --git a/kernel/src/transaction/schema_evolution.rs b/kernel/src/transaction/schema_evolution.rs
-new file mode 100644
---- /dev/null
-+++ b/kernel/src/transaction/schema_evolution.rs
-+//! Schema evolution operations for ALTER TABLE.
-+//!
-+//! This module defines the [`SchemaOperation`] enum and the [`apply_schema_operations`] function
-+//! that validates and applies schema changes to produce an evolved schema.
-+
-+use indexmap::IndexMap;
-+
-+use crate::error::Error;
-+use crate::schema::validation::validate_schema;
-+use crate::schema::{SchemaRef, StructField, StructType};
-+use crate::table_features::ColumnMappingMode;
-+use crate::DeltaResult;
-+
-+/// A schema evolution operation to be applied during ALTER TABLE.
-+///
-+/// Operations are validated and applied in order during
-+/// [`apply_schema_operations`]. Each operation sees the schema state after all prior operations
-+/// have been applied.
-+#[derive(Debug, Clone)]
-+pub(crate) enum SchemaOperation {
-+    /// Add a top-level column.
-+    AddColumn { field: StructField },
-+}
-+
-+/// The result of applying schema operations.
-+#[derive(Debug)]
-+pub(crate) struct SchemaEvolutionResult {
-+    /// The evolved schema after all operations are applied.
-+    pub schema: SchemaRef,
-+}
-+
-+/// Applies a sequence of schema operations to the given schema, returning the evolved schema.
-+///
-+/// Operations are applied sequentially: each one validates against and modifies the schema
-+/// produced by all preceding operations, not the original input schema.
-+///
-+/// # Errors
-+///
-+/// Returns an error if any operation fails validation. The error message identifies which
-+/// operation failed and why.
-+pub(crate) fn apply_schema_operations(
-+    schema: StructType,
-+    operations: Vec<SchemaOperation>,
-+    column_mapping_mode: ColumnMappingMode,
-+) -> DeltaResult<SchemaEvolutionResult> {
-+    let cm_enabled = column_mapping_mode != ColumnMappingMode::None;
-+    // IndexMap preserves field insertion order. Keys are lowercased for case-insensitive
-+    // duplicate detection; StructFields retain their original casing.
-+    let mut fields: IndexMap<String, StructField> = schema
-+        .into_fields()
-+        .map(|f| (f.name().to_lowercase(), f))
-+        .collect();
-+
-+    for op in operations {
-+        match op {
-+            // Protocol feature checks for the field's data type (e.g. `timestampNtz`) happen
-+            // later when the caller builds a new TableConfiguration from the evolved schema --
-+            // the alter is rejected if the table doesn't already have the required feature
-+            // enabled. This matches Spark, which also rejects with
-+            // `DELTA_FEATURES_REQUIRE_MANUAL_ENABLEMENT` and requires the user to enable the
-+            // feature explicitly before adding such a column.
-+            SchemaOperation::AddColumn { field } => {
-+                // TODO: support column mapping for add_column (assign ID + physical name,
-+                // update delta.columnMapping.maxColumnId).
-+                if cm_enabled {
-+                    return Err(Error::unsupported(
-+                        "ALTER TABLE add_column is not yet supported on tables with \
-+                         column mapping enabled",
-+                    ));
-+                }
-+                if field.is_metadata_column() {
-+                    return Err(Error::schema(format!(
-+                        "Cannot add column '{}': metadata columns are not allowed in \
-+                         a table schema",
-+                        field.name()
-+                    )));
-+                }
-+                let key = field.name().to_lowercase();
-+                if fields.contains_key(&key) {
-+                    return Err(Error::schema(format!(
-+                        "Cannot add column '{}': a column with that name already exists",
-+                        field.name()
-+                    )));
-+                }
-+                // Validate field is nullable (Delta protocol requires added columns to be
-+                // nullable so existing data files can return NULL for the new column)
-+                // NOTE: non-nullable columns depend on invariants feature
-+                if !field.is_nullable() {
-+                    return Err(Error::schema(format!(
-+                        "Cannot add non-nullable column '{}'. Added columns must be nullable \
-+                         because existing data files do not contain this column.",
-+                        field.name()
-+                    )));
-+                }
-+                fields.insert(key, field);
-+            }
-+        }
-+    }
-+
-+    let evolved_schema = StructType::try_new(fields.into_values())?;
-+
-+    validate_schema(&evolved_schema, column_mapping_mode)?;
-+    Ok(SchemaEvolutionResult {
-+        schema: evolved_schema.into(),
-+    })
-+}
-+
-+#[cfg(test)]
-+mod tests {
-+    use rstest::rstest;
-+
-+    use super::*;
-+    use crate::schema::{DataType, MetadataColumnSpec, StructField, StructType};
-+
-+    fn simple_schema() -> StructType {
-+        StructType::try_new(vec![
-+            StructField::not_null("id", DataType::INTEGER),
-+            StructField::nullable("name", DataType::STRING),
-+        ])
-+        .unwrap()
-+    }
-+
-+    fn add_col(name: &str, nullable: bool) -> SchemaOperation {
-+        let field = if nullable {
-+            StructField::nullable(name, DataType::STRING)
-+        } else {
-+            StructField::not_null(name, DataType::STRING)
-+        };
-+        SchemaOperation::AddColumn { field }
-+    }
-+
-+    // Builds a struct column whose nested leaf field has the given name. Used to prove that
-+    // `validate_schema` (not just the top-level dup check or `StructType::try_new`) is
-+    // reached from `apply_schema_operations`.
-+    fn add_struct_with_nested_leaf(name: &str, leaf_name: &str) -> SchemaOperation {
-+        let inner =
-+            StructType::try_new(vec![StructField::nullable(leaf_name, DataType::STRING)]).unwrap();
-+        SchemaOperation::AddColumn {
-+            field: StructField::nullable(name, inner),
-+        }
-+    }
-+
-+    #[rstest]
-+    #[case::dup_exact(vec![add_col("name", true)], "already exists")]
-+    #[case::dup_case_insensitive(vec![add_col("Name", true)], "already exists")]
-+    #[case::dup_within_batch(
-+        vec![add_col("email", true), add_col("email", true)],
-+        "already exists"
-+    )]
-+    #[case::non_nullable(vec![add_col("age", false)], "non-nullable")]
-+    #[case::invalid_parquet_char(vec![add_col("foo,bar", true)], "invalid character")]
-+    #[case::nested_invalid_parquet_char(
-+        vec![add_struct_with_nested_leaf("addr", "bad,leaf")],
-+        "invalid character"
-+    )]
-+    #[case::metadata_column(
-+        vec![SchemaOperation::AddColumn {
-+            field: StructField::create_metadata_column("row_idx", MetadataColumnSpec::RowIndex),
-+        }],
-+        "metadata columns are not allowed"
-+    )]
-+    fn apply_schema_operations_rejects(
-+        #[case] ops: Vec<SchemaOperation>,
-+        #[case] error_contains: &str,
-+    ) {
-+        let err =
-+            apply_schema_operations(simple_schema(), ops, ColumnMappingMode::None).unwrap_err();
-+        assert!(err.to_string().contains(error_contains));
-+    }
-+
-+    #[rstest]
-+    #[case::single(vec![add_col("email", true)], &["id", "name", "email"])]
-+    #[case::multiple(
-+        vec![add_col("email", true), add_col("age", true)],
-+        &["id", "name", "email", "age"]
-+    )]
-+    fn apply_schema_operations_succeeds(
-+        #[case] ops: Vec<SchemaOperation>,
-+        #[case] expected_names: &[&str],
-+    ) {
-+        let result =
-+            apply_schema_operations(simple_schema(), ops, ColumnMappingMode::None).unwrap();
-+        let actual: Vec<&str> = result.schema.fields().map(|f| f.name().as_str()).collect();
-+        assert_eq!(&actual, expected_names);
-+    }
-+}
\ No newline at end of file
kernel/tests/README.md
@@ -1,31 +0,0 @@
-diff --git a/kernel/tests/README.md b/kernel/tests/README.md
---- a/kernel/tests/README.md
-+++ b/kernel/tests/README.md
- 
- | Table | Location | Schema | Protocol (R/W) | Features | Description | Tests |
- |-------|----------|--------|----------|----------|-------------|-------|
--| `table-with-dv-small` | data/ | `value: int` | v3/v7 | r:`deletionVectors` w:`deletionVectors` | 10 rows, 2 soft-deleted by DV, 8 visible. Most heavily referenced test table. | `dv.rs::test_table_scan(with_dv)`, `write.rs::test_remove_files_adds_expected_entries`, `write.rs::test_update_deletion_vectors_adds_expected_entries`, `read.rs::with_predicate_and_removes`, `path.rs::test_to_uri/test_child/test_child_escapes`, `snapshot.rs::test_snapshot_read_metadata/test_new_snapshot/test_snapshot_new_from/test_read_table_with_missing_last_checkpoint/test_log_compaction_writer`, `deletion_vector.rs` tests, `transaction/mod.rs::setup_dv_enabled_table/test_add_files_schema/test_new_deletion_vector_path`, `default/parquet.rs` read test, `default/json.rs` read test, `log_compaction/tests.rs::create_mock_snapshot`, `resolve_dvs.rs` tests |
-+| `table-with-dv-small` | data/ | `value: int` | v3/v7 | r:`deletionVectors` w:`deletionVectors` | 10 rows, 2 soft-deleted by DV, 8 visible. Most heavily referenced test table. | `dv.rs::test_table_scan(with_dv)`, `write_remove_dv.rs::test_remove_files_adds_expected_entries`, `write_remove_dv.rs::test_update_deletion_vectors_adds_expected_entries`, `read.rs::with_predicate_and_removes`, `path.rs::test_to_uri/test_child/test_child_escapes`, `snapshot.rs::test_snapshot_read_metadata/test_new_snapshot/test_snapshot_new_from/test_read_table_with_missing_last_checkpoint/test_log_compaction_writer`, `deletion_vector.rs` tests, `transaction/mod.rs::setup_dv_enabled_table/test_add_files_schema/test_new_deletion_vector_path`, `default/parquet.rs` read test, `default/json.rs` read test, `log_compaction/tests.rs::create_mock_snapshot`, `resolve_dvs.rs` tests |
- | `table-without-dv-small` | data/ | `value: long` | v1/v2 | | 10 rows, all visible. Companion to table-with-dv-small. | `dv.rs::test_table_scan(without_dv)`, `transaction/mod.rs::setup_non_dv_table/create_existing_table_txn/test_commit_io_error_returns_retryable_transaction`, `sequential_phase.rs::test_sequential_v2_with_commits_only/test_sequential_finish_before_exhaustion_error`, `parallel_phase.rs` tests, `scan/tests.rs::test_scan_metadata_paths/test_scan_metadata/test_scan_metadata_from_same_version` |
- | `with-short-dv` | data/ | `id: long, value: string, timestamp: timestamp, rand: double` | v3/v7 | r:`deletionVectors` w:`deletionVectors` | 2 files x 5 rows. First file has inline DV (`storageType="u"`) deleting 3 rows. | `read.rs::short_dv` |
- | `dv-partitioned-with-checkpoint` | golden_data/ | `value: int, part: int` partitioned by `part` | v3/v7 | r:`deletionVectors` w:`deletionVectors` | DVs on a partitioned table with a checkpoint | `golden_tables.rs::golden_test!` |
- 
- | Table | Location | Schema | Protocol (R/W) | Features | Description | Tests |
- |-------|----------|--------|----------|----------|-------------|-------|
--| `partition_cm/none` | data/ | `value: int, category: string` partitioned by `category` | v1/v1 | `columnMapping.mode=none` | Partitioned write with CM disabled | `write.rs::test_column_mapping_partitioned_write(cm_none)` |
--| `partition_cm/id` | data/ | `value: int, category: string` partitioned by `category` | v3/v7 | r:`columnMapping` w:`columnMapping`, `columnMapping.mode=id` | Partitioned write with CM id mode | `write.rs::test_column_mapping_partitioned_write(cm_id)` |
--| `partition_cm/name` | data/ | `value: int, category: string` partitioned by `category` | v3/v7 | r:`columnMapping` w:`columnMapping`, `columnMapping.mode=name` | Partitioned write with CM name mode | `write.rs::test_column_mapping_partitioned_write(cm_name)` |
-+| `partition_cm/none` | data/ | `value: int, category: string` partitioned by `category` | v1/v1 | `columnMapping.mode=none` | Partitioned write with CM disabled | `write_column_mapping.rs::test_column_mapping_partitioned_write(cm_none)` |
-+| `partition_cm/id` | data/ | `value: int, category: string` partitioned by `category` | v3/v7 | r:`columnMapping` w:`columnMapping`, `columnMapping.mode=id` | Partitioned write with CM id mode | `write_column_mapping.rs::test_column_mapping_partitioned_write(cm_id)` |
-+| `partition_cm/name` | data/ | `value: int, category: string` partitioned by `category` | v3/v7 | r:`columnMapping` w:`columnMapping`, `columnMapping.mode=name` | Partitioned write with CM name mode | `write_column_mapping.rs::test_column_mapping_partitioned_write(cm_name)` |
- | `table-with-columnmapping-mode-name` | golden_data/ | `ByteType: byte, ShortType: short, IntegerType: int, LongType: long, FloatType: float, DoubleType: double, decimal: decimal(10,2), BooleanType: boolean, StringType: string, BinaryType: binary, DateType: date, TimestampType: timestamp, nested_struct: struct{aa: string, ac: struct{aca: int}}, array_of_prims: array<int>, array_of_arrays: array<array<int>>, array_of_structs: array<struct{ab: long}>, map_of_prims: map<int,long>, map_of_rows: map<int,struct{ab: long}>, map_of_arrays: map<long,array<int>>` | v2/v5 | `columnMapping.mode=name` | Column mapping name mode | `golden_tables.rs::golden_test!` |
- | `table-with-columnmapping-mode-id` | golden_data/ | `ByteType: byte, ShortType: short, IntegerType: int, LongType: long, FloatType: float, DoubleType: double, decimal: decimal(10,2), BooleanType: boolean, StringType: string, BinaryType: binary, DateType: date, TimestampType: timestamp, nested_struct: struct{aa: string, ac: struct{aca: int}}, array_of_prims: array<int>, array_of_arrays: array<array<int>>, array_of_structs: array<struct{ab: long}>, map_of_prims: map<int,long>, map_of_rows: map<int,struct{ab: long}>, map_of_arrays: map<long,array<int>>` | v2/v5 | `columnMapping.mode=id` | Column mapping id mode | `golden_tables.rs::golden_test!` |
- 
- | Table | Location | Schema | Protocol (R/W) | Features | Description | Tests |
- |-------|----------|--------|----------|----------|-------------|-------|
- | `with_checkpoint_no_last_checkpoint` | data/ | `letter: string, int: long, date: date` | v1/v2 | `checkpointInterval=2` | Checkpoint at v2 but missing `_last_checkpoint` hint file | `snapshot.rs::test_read_table_with_checkpoint`, `scan/tests.rs::test_scan_with_checkpoint`, `sequential_phase.rs::test_sequential_checkpoint_no_commits`, `checkpoint_manifest.rs` tests, `sync/parquet.rs` test, `default/parquet.rs` test |
--| `external-table-different-nullability` | data/ | `i: int` | v1/v2 | `checkpointInterval=2` | Parquet files have different nullability than Delta schema; includes checkpoint | `write.rs::test_checkpoint_non_kernel_written_table` |
-+| `external-table-different-nullability` | data/ | `i: int` | v1/v2 | `checkpointInterval=2` | Parquet files have different nullability than Delta schema; includes checkpoint | `write_clustered.rs::test_checkpoint_non_kernel_written_table` |
- | `checkpoint` | golden_data/ | `intCol: int` | v1/v2 | | Basic checkpoint read | `golden_tables.rs::golden_test!(checkpoint_test)` |
- | `corrupted-last-checkpoint-kernel` | golden_data/ | `id: long` | v1/v2 | | Corrupted `_last_checkpoint` file | `golden_tables.rs::golden_test!` |
- | `multi-part-checkpoint` | golden_data/ | `id: long` | v1/v2 | `checkpointInterval=1` | Multi-part checkpoint files | `golden_tables.rs::golden_test!` |
\ No newline at end of file

... (truncated, output exceeded 60000 bytes)

Reproduce locally: git range-diff ac9dc19..06499d0 6486bd2..3d37746 | Disable: git config gitstack.push-range-diff false

@lorenarosati
Copy link
Copy Markdown
Collaborator Author

Range-diff: main (3d37746 -> aac9b0a)
ffi/src/expressions/kernel_visitor.rs
@@ -4,8 +4,8 @@
                  PrimitiveType::Timestamp => (Self::Timestamp, 0, 0),
                  PrimitiveType::TimestampNtz => (Self::TimestampNtz, 0, 0),
                  PrimitiveType::Decimal(dt) => (Self::Decimal, dt.precision(), dt.scale()),
-+                // TODO: Once real FFI geo support lands, new NullTypeTag::Geometry / ::Geography variants
-+                // will replace this arm.
++                // TODO: Once real FFI geo support lands, new NullTypeTag::Geometry / ::Geography
++                // variants will replace this arm.
 +                PrimitiveType::Geometry(_) | PrimitiveType::Geography(_) => (Self::Binary, 0, 0),
              },
              _ => (Self::NonPrimitive, 0, 0),
ffi/src/schema.rs
@@ -5,7 +5,8 @@
              &DataType::TIMESTAMP => call!(visit_timestamp),
              &DataType::TIMESTAMP_NTZ => call!(visit_timestamp_ntz),
 +            // Geometry/Geography are WKB-encoded bytes at the FFI layer
-+            // TODO: Add visit_geometry / visit_geography callbacks carrying the SRID once real FFI geo support lands
++            // TODO: Add visit_geometry / visit_geography callbacks carrying the SRID once real FFI
++            // geo support lands
 +            DataType::Primitive(PrimitiveType::Geometry(_))
 +            | DataType::Primitive(PrimitiveType::Geography(_)) => call!(visit_binary),
          }
kernel/src/schema/mod.rs
@@ -63,8 +63,8 @@
 +    /// Creates a new GeometryType with the given SRID.
 +    ///
 +    /// # Parameters
-+    /// - `srid`: The spatial reference identifier (e.g. `"OGC:CRS84"`, `"EPSG:4326"`).
-+    ///   Must be non-empty.
++    /// - `srid`: The spatial reference identifier (e.g. `"OGC:CRS84"`, `"EPSG:4326"`). Must be
++    ///   non-empty.
 +    pub fn try_new(srid: impl Into<String>) -> DeltaResult<Self> {
 +        let srid = srid.into();
 +        if srid.is_empty() {
kernel/src/table_configuration.rs
@@ -9,38 +9,6 @@
      validate_timestamp_ntz_feature_support, ColumnMappingMode, EnablementCheck, FeatureRequirement,
      FeatureType, KernelSupport, Operation, TableFeature, LEGACY_READER_FEATURES,
      LEGACY_WRITER_FEATURES, MAX_VALID_READER_VERSION, MAX_VALID_WRITER_VERSION,
-         version: Version,
-     ) -> DeltaResult<Self> {
-         let logical_schema = Arc::new(metadata.parse_schema()?);
-+        Self::try_new_inner(metadata, protocol, table_root, version, logical_schema)
-+    }
-+
-+    /// Like [`try_new`](Self::try_new), but reuses `base`'s protocol, table root, and version
-+    /// and takes a pre-parsed `logical_schema`.
-+    pub(crate) fn try_new_with_schema(
-+        base: &Self,
-+        metadata: Metadata,
-+        logical_schema: SchemaRef,
-+    ) -> DeltaResult<Self> {
-+        Self::try_new_inner(
-+            metadata,
-+            base.protocol.clone(),
-+            base.table_root.clone(),
-+            base.version,
-+            logical_schema,
-+        )
-+    }
-+
-+    fn try_new_inner(
-+        metadata: Metadata,
-+        protocol: Protocol,
-+        table_root: Url,
-+        version: Version,
-+        logical_schema: SchemaRef,
-+    ) -> DeltaResult<Self> {
-         let table_properties = metadata.parse_table_properties();
-         let column_mapping_mode = column_mapping_mode(&protocol, &table_properties);
- 
  
          // Validate schema against protocol features now that we have a TC instance.
          validate_timestamp_ntz_feature_support(&table_config)?;
kernel/src/actions/mod.rs
@@ -1,32 +0,0 @@
-diff --git a/kernel/src/actions/mod.rs b/kernel/src/actions/mod.rs
---- a/kernel/src/actions/mod.rs
-+++ b/kernel/src/actions/mod.rs
- }
- 
- // Serde derives are needed for CRC file deserialization (see `crc::reader`).
-+//
-+// TODO(#2446): `Metadata` stores the schema only as a JSON string. Callers that already hold
-+// a parsed `SchemaRef` (e.g. CREATE TABLE) serialize into `schema_string` and then re-parse
-+// downstream in `TableConfiguration::try_new` via `parse_schema()`. Caching the parsed schema
-+// on `Metadata` would eliminate the round-trip.
- #[derive(Debug, Default, Clone, PartialEq, Eq, Serialize, Deserialize, ToSchema)]
- #[serde(rename_all = "camelCase")]
- #[internal_api]
-         TableProperties::from(self.configuration.iter())
-     }
- 
-+    /// Returns a new Metadata with the schema replaced, preserving all other fields.
-+    ///
-+    /// # Errors
-+    ///
-+    /// Returns an error if schema serialization fails.
-+    pub(crate) fn with_schema(self, schema: SchemaRef) -> DeltaResult<Self> {
-+        Ok(Self {
-+            schema_string: serde_json::to_string(&schema)?,
-+            ..self
-+        })
-+    }
-+
-     #[cfg(test)]
-     #[allow(clippy::too_many_arguments)]
-     pub(crate) fn new_unchecked(
\ No newline at end of file
kernel/src/engine/arrow_expression/evaluate_expression.rs
@@ -1,154 +0,0 @@
-diff --git a/kernel/src/engine/arrow_expression/evaluate_expression.rs b/kernel/src/engine/arrow_expression/evaluate_expression.rs
---- a/kernel/src/engine/arrow_expression/evaluate_expression.rs
-+++ b/kernel/src/engine/arrow_expression/evaluate_expression.rs
-         (Literal(scalar), _) => {
-             validate_array_type(scalar.to_array(batch.num_rows())?, result_type)
-         }
--        (Column(name), _) => {
--            // Column extraction uses ordinal-based struct validation because column mapping
--            // can cause physical/logical name mismatches. apply_schema handles renaming.
--            let arr = extract_column(batch, name)?;
--            if let Some(expected) = result_type {
--                ensure_data_types(expected, arr.data_type(), ValidationMode::TypesOnly)?;
--            }
--            Ok(arr)
--        }
-+        (Column(name), _) => validate_array_type(extract_column(batch, name)?, result_type),
-         (Struct(fields, nullability), Some(DataType::Struct(output_schema))) => {
-             evaluate_struct_expression(fields, batch, output_schema, nullability.as_ref())
-         }
-     }
- 
-     #[test]
--    fn column_extract_struct_with_mismatched_field_names() {
-+    fn column_extract_struct_rejects_mismatched_field_names() {
-         let batch = make_struct_batch(
-             vec![
-                 ArrowField::new("col-abc-001", ArrowDataType::Int64, true),
-             ],
-         );
- 
--        // Logical names differ from physical names due to column mapping
-         let logical_type = DataType::try_struct_type([
-             StructField::nullable("my_column", DataType::LONG),
-             StructField::nullable("other_column", DataType::LONG),
- 
-         let expr = column_expr!("stats");
-         let result = evaluate_expression(&expr, &batch, Some(&logical_type));
--
--        // Ordinal-based validation passes: same field count and types by position.
--        // The downstream apply_schema transformation handles renaming.
--        let arr = result.expect("should succeed with mismatched names but matching types");
--        let struct_arr = arr.as_any().downcast_ref::<StructArray>().unwrap();
--        assert_eq!(struct_arr.num_columns(), 2);
--        assert_eq!(struct_arr.len(), 2);
--    }
--
--    #[test]
--    fn column_extract_struct_rejects_mismatched_field_count() {
--        let batch = make_struct_batch(
--            vec![ArrowField::new("col-abc-001", ArrowDataType::Int64, true)],
--            vec![Arc::new(Int64Array::from(vec![Some(1), Some(2)]))],
--        );
--
--        let logical_type = DataType::try_struct_type([
--            StructField::nullable("a", DataType::LONG),
--            StructField::nullable("b", DataType::LONG),
--        ])
--        .unwrap();
--
--        let expr = column_expr!("stats");
--        let result = evaluate_expression(&expr, &batch, Some(&logical_type));
--        assert_result_error_with_message(result, "Struct field count mismatch");
-+        assert_result_error_with_message(result, "Missing Struct fields");
-     }
- 
-     #[test]
-     fn column_extract_struct_rejects_mismatched_child_types() {
-         let batch = make_struct_batch(
-             vec![
--                ArrowField::new("col-abc-001", ArrowDataType::Int64, true),
--                ArrowField::new("col-abc-002", ArrowDataType::Utf8, true),
-+                ArrowField::new("a", ArrowDataType::Int64, true),
-+                ArrowField::new("b", ArrowDataType::Utf8, true),
-             ],
-             vec![
-                 Arc::new(Int64Array::from(vec![Some(1)])),
-             ],
-         );
- 
--        // Expect two LONG columns, but the second arrow field is Utf8
-         let logical_type = DataType::try_struct_type([
-             StructField::nullable("a", DataType::LONG),
-             StructField::nullable("b", DataType::LONG),
-     }
- 
-     #[test]
--    fn column_extract_struct_with_matching_names_still_works() {
-+    fn column_extract_struct_with_matching_names_works() {
-         let batch = make_struct_batch(
-             vec![
-                 ArrowField::new("a", ArrowDataType::Int64, true),
-         assert!(result.is_ok());
-     }
- 
--    /// Exercises the exact code path from `get_add_transform_expr` where a `struct_from`
--    /// expression wraps `column_expr!("add.stats_parsed")`. When the checkpoint parquet has
--    /// stats_parsed with physical column names (e.g. `col-abc-001`) but the output schema
--    /// uses logical names (e.g. `id`), `evaluate_struct_expression` calls
--    /// `evaluate_expression(Column, struct_result_type)` with mismatched field names.
--    /// Without ordinal-based validation this fails with a name mismatch error.
-+    /// When a `struct_from` expression wraps a `Column` referencing stats_parsed, and the
-+    /// checkpoint parquet has physical column names (e.g. `col-abc-001`) but the output schema
-+    /// uses logical names (e.g. `id`), name-based validation correctly rejects the mismatch.
-     #[test]
--    fn struct_from_with_column_tolerates_nested_name_mismatch() {
--        // Build a batch mimicking checkpoint data: add.stats_parsed uses physical names
-+    fn struct_from_with_column_rejects_nested_name_mismatch() {
-         let stats_fields: Vec<ArrowField> = vec![
-             ArrowField::new("col-abc-001", ArrowDataType::Int64, true),
-             ArrowField::new("col-abc-002", ArrowDataType::Int64, true),
-         )]);
-         let batch = RecordBatch::try_new(Arc::new(schema), vec![Arc::new(add_struct)]).unwrap();
- 
--        // struct_from mimicking get_add_transform_expr: wraps a Column referencing stats_parsed
-         let expr = Expr::struct_from([
-             column_expr_ref!("add.path"),
-             column_expr_ref!("add.stats_parsed"),
-         .unwrap();
- 
-         let result = evaluate_expression(&expr, &batch, Some(&output_type));
--        result.expect("struct_from with Column sub-expression should tolerate field name mismatch");
--    }
--
--    #[test]
--    fn column_extract_nested_struct_with_mismatched_names() {
--        let inner_fields = vec![ArrowField::new("phys-inner", ArrowDataType::Int64, true)];
--        let inner_struct = ArrowDataType::Struct(inner_fields.clone().into());
--        let batch = make_struct_batch(
--            vec![ArrowField::new("phys-outer", inner_struct, true)],
--            vec![Arc::new(
--                StructArray::try_new(
--                    inner_fields.into(),
--                    vec![Arc::new(Int64Array::from(vec![Some(42)]))],
--                    None,
--                )
--                .unwrap(),
--            )],
--        );
--
--        let logical_type = DataType::try_struct_type([StructField::nullable(
--            "logical_outer",
--            DataType::struct_type_unchecked([StructField::nullable(
--                "logical_inner",
--                DataType::LONG,
--            )]),
--        )])
--        .unwrap();
--
--        let expr = column_expr!("stats");
--        let result = evaluate_expression(&expr, &batch, Some(&logical_type));
--        assert!(result.is_ok());
-+        assert_result_error_with_message(result, "Missing Struct fields");
-     }
- }
\ No newline at end of file
kernel/src/engine/ensure_data_types.rs
@@ -1,13 +0,0 @@
-diff --git a/kernel/src/engine/ensure_data_types.rs b/kernel/src/engine/ensure_data_types.rs
---- a/kernel/src/engine/ensure_data_types.rs
-+++ b/kernel/src/engine/ensure_data_types.rs
- #[internal_api]
- pub(crate) enum ValidationMode {
-     /// Check types only. Struct fields are matched by ordinal position, not by name.
--    /// Nullability and metadata are not checked. Used by the expression evaluator where
--    /// column mapping can cause physical/logical name mismatches.
-+    /// Nullability and metadata are not checked.
-+    #[allow(dead_code)]
-     TypesOnly,
-     /// Check types and match struct fields by name, but skip nullability and metadata.
-     /// Used by the parquet reader where fields are already resolved by name upstream.
\ No newline at end of file
kernel/src/schema/validation.rs
@@ -1,48 +0,0 @@
-diff --git a/kernel/src/schema/validation.rs b/kernel/src/schema/validation.rs
---- a/kernel/src/schema/validation.rs
-+++ b/kernel/src/schema/validation.rs
--//! Schema validation utilities for Delta table creation.
-+//! Schema validation utilities shared by table creation and schema evolution.
- //!
- //! Validates schemas per the Delta protocol specification.
- 
- /// These characters have special meaning in Parquet schema syntax.
- const INVALID_PARQUET_CHARS: &[char] = &[' ', ',', ';', '{', '}', '(', ')', '\n', '\t', '='];
- 
--/// Validates a schema for table creation.
-+/// Validates a schema for CREATE TABLE or ALTER TABLE.
- ///
- /// Performs the following checks:
- /// 1. Schema is non-empty
- /// 3. Column names contain only valid characters
- /// 4. Rejects fields with `delta.invariants` metadata (SQL expression invariants are not supported
- ///    by kernel; see `TableConfiguration::ensure_write_supported`)
--pub(crate) fn validate_schema_for_create(
-+pub(crate) fn validate_schema(
-     schema: &StructType,
-     column_mapping_mode: ColumnMappingMode,
- ) -> DeltaResult<()> {
-     #[case::dot_in_name_with_cm(schema_with_dot(), ColumnMappingMode::Name)]
-     #[case::different_struct_children(schema_different_struct_children(), ColumnMappingMode::None)]
-     fn valid_schema_accepted(#[case] schema: StructType, #[case] cm: ColumnMappingMode) {
--        assert!(validate_schema_for_create(&schema, cm).is_ok());
-+        assert!(validate_schema(&schema, cm).is_ok());
-     }
- 
-     // === Invalid schemas ===
-         #[case] cm: ColumnMappingMode,
-         #[case] expected_errs: &[&str],
-     ) {
--        let result = validate_schema_for_create(&schema, cm);
-+        let result = validate_schema(&schema, cm);
-         assert!(result.is_err());
-         let err = result.unwrap_err().to_string();
-         for expected in expected_errs {
-     #[case::array_nested(schema_array_nested_invariant(), "arr.child")]
-     #[case::map_nested(schema_map_nested_invariant(), "map.child")]
-     fn invariants_metadata_rejected(#[case] schema: StructType, #[case] expected_path: &str) {
--        let result = validate_schema_for_create(&schema, ColumnMappingMode::None);
-+        let result = validate_schema(&schema, ColumnMappingMode::None);
-         let err = result.expect_err("expected delta.invariants metadata rejection");
-         let msg = err.to_string();
-         assert!(
\ No newline at end of file
kernel/src/snapshot/mod.rs
@@ -1,27 +0,0 @@
-diff --git a/kernel/src/snapshot/mod.rs b/kernel/src/snapshot/mod.rs
---- a/kernel/src/snapshot/mod.rs
-+++ b/kernel/src/snapshot/mod.rs
- use crate::table_configuration::{InCommitTimestampEnablement, TableConfiguration};
- use crate::table_features::{physical_to_logical_column_name, ColumnMappingMode, TableFeature};
- use crate::table_properties::TableProperties;
-+use crate::transaction::builder::alter_table::AlterTableTransactionBuilder;
- use crate::transaction::Transaction;
- use crate::utils::require;
- use crate::{DeltaResult, Engine, Error, LogCompactionWriter, Version};
-         Transaction::try_new_existing_table(self, committer, engine)
-     }
- 
-+    /// Creates a builder for altering this table's metadata. Currently supports schema change
-+    /// operations.
-+    ///
-+    /// The returned builder allows chaining operations before building an
-+    /// [`AlterTableTransaction`] that can be committed.
-+    ///
-+    /// [`AlterTableTransaction`]: crate::transaction::AlterTableTransaction
-+    pub fn alter_table(self: Arc<Self>) -> AlterTableTransactionBuilder {
-+        AlterTableTransactionBuilder::new(self)
-+    }
-+
-     /// Fetch the latest version of the provided `application_id` for this snapshot. Filters the
-     /// txn based on the delta.setTransactionRetentionDuration property and lastUpdated.
-     ///
\ No newline at end of file
kernel/src/transaction/alter_table.rs
@@ -1,81 +0,0 @@
-diff --git a/kernel/src/transaction/alter_table.rs b/kernel/src/transaction/alter_table.rs
-new file mode 100644
---- /dev/null
-+++ b/kernel/src/transaction/alter_table.rs
-+//! Alter table transaction types and constructor.
-+//!
-+//! This module defines the [`AlterTableTransaction`] type alias and the
-+//! [`try_new_alter_table`](AlterTableTransaction::try_new_alter_table) constructor.
-+//! The builder logic lives in [`builder::alter_table`](super::builder::alter_table).
-+
-+#![allow(unreachable_pub)]
-+
-+use std::marker::PhantomData;
-+use std::sync::OnceLock;
-+
-+use crate::committer::Committer;
-+use crate::snapshot::SnapshotRef;
-+use crate::table_configuration::TableConfiguration;
-+use crate::transaction::{AlterTable, Transaction};
-+use crate::utils::current_time_ms;
-+use crate::DeltaResult;
-+
-+/// A type alias for alter-table transactions.
-+///
-+/// This provides a restricted API surface that only exposes operations valid during ALTER
-+/// commands. Data file operations are not available at compile time because `AlterTable`
-+/// does not implement [`SupportsDataFiles`](super::SupportsDataFiles).
-+pub type AlterTableTransaction = Transaction<AlterTable>;
-+
-+impl AlterTableTransaction {
-+    /// Create a new transaction for altering a table's schema. Produces a metadata-only commit
-+    /// that emits an updated Metadata action with the evolved schema.
-+    ///
-+    /// The `effective_table_config` is the evolved table configuration (new schema, same
-+    /// protocol). It must be fully validated before calling this constructor (e.g. schema
-+    /// operations applied, protocol feature checks passed). The `read_snapshot` provides the
-+    /// pre-commit table state (version, previous protocol/metadata, ICT timestamps) used for
-+    /// commit versioning and post-commit snapshots.
-+    ///
-+    /// This is typically called via `AlterTableTransactionBuilder::build()` rather than directly.
-+    pub(crate) fn try_new_alter_table(
-+        read_snapshot: SnapshotRef,
-+        effective_table_config: TableConfiguration,
-+        committer: Box<dyn Committer>,
-+    ) -> DeltaResult<Self> {
-+        let span = tracing::info_span!(
-+            "txn",
-+            path = %read_snapshot.table_root(),
-+            read_version = read_snapshot.version(),
-+            operation = "ALTER TABLE",
-+        );
-+
-+        Ok(Transaction {
-+            span,
-+            read_snapshot_opt: Some(read_snapshot),
-+            effective_table_config,
-+            should_emit_protocol: false,
-+            should_emit_metadata: true,
-+            committer,
-+            operation: Some("ALTER TABLE".to_string()),
-+            engine_info: None,
-+            add_files_metadata: vec![],
-+            remove_files_metadata: vec![],
-+            set_transactions: vec![],
-+            commit_timestamp: current_time_ms()?,
-+            user_domain_metadata_additions: vec![],
-+            system_domain_metadata_additions: vec![],
-+            user_domain_removals: vec![],
-+            data_change: false,
-+            shared_write_state: OnceLock::new(),
-+            engine_commit_info: None,
-+            // TODO(#2446): match delta-spark's per-op isBlindAppend policy
-+            // (ADD/DROP/DROP NOT NULL -> true, SET NOT NULL -> false). Hardcoded false for
-+            // now: safe, but misses the true-case optimization delta-spark applies.
-+            is_blind_append: false,
-+            dv_matched_files: vec![],
-+            physical_clustering_columns: None,
-+            _state: PhantomData,
-+        })
-+    }
-+}
\ No newline at end of file
kernel/src/transaction/builder/alter_table.rs
@@ -1,168 +0,0 @@
-diff --git a/kernel/src/transaction/builder/alter_table.rs b/kernel/src/transaction/builder/alter_table.rs
-new file mode 100644
---- /dev/null
-+++ b/kernel/src/transaction/builder/alter_table.rs
-+//! Builder for ALTER TABLE (schema evolution) transactions.
-+//!
-+//! This module contains [`AlterTableTransactionBuilder`], which uses a type-state pattern to
-+//! enforce valid operation chaining at compile time.
-+//!
-+//! # Type States
-+//!
-+//! - [`Ready`]: Initial state. Operations are available, but `build()` is not (at least one
-+//!   operation is required).
-+//! - [`Modifying`]: After any chainable schema operation. More ops can be chained, and `build()` is
-+//!   available. See [`AlterTableTransactionBuilder<Modifying>`] for ops.
-+//!
-+//! # Transitions
-+//!
-+//! Each `impl` block below is gated by a state bound and documents which operations that
-+//! state enables. Chainable schema operations live on `impl<S: Chainable>` and transition
-+//! the builder to a chainable state; `build()` lives on states that are buildable.
-+//!
-+//! ```ignore
-+//! // Allowed: at least one op queued before build().
-+//! snapshot.alter_table().add_column(field).build(engine, committer)?;
-+//!
-+//! // Not allowed: build() is not defined on Ready (no ops queued).
-+//! snapshot.alter_table().build(engine, committer)?;  // compile error
-+//! ```
-+
-+use std::marker::PhantomData;
-+use std::sync::Arc;
-+
-+use crate::committer::Committer;
-+use crate::schema::StructField;
-+use crate::snapshot::SnapshotRef;
-+use crate::table_configuration::TableConfiguration;
-+use crate::table_features::Operation;
-+use crate::transaction::alter_table::AlterTableTransaction;
-+use crate::transaction::schema_evolution::{
-+    apply_schema_operations, SchemaEvolutionResult, SchemaOperation,
-+};
-+use crate::{DeltaResult, Engine};
-+
-+/// Initial state: `build()` is not yet available (at least one operation is required).
-+/// See [`Chainable`] for the operations available on this state.
-+pub struct Ready;
-+
-+/// State after at least one operation has been added. `build()` is available.
-+/// See [`Chainable`] for the operations available on this state.
-+pub struct Modifying;
-+
-+/// Marker trait for builder states that accept chainable schema operations. Grouping states
-+/// under one bound lets each op (like `add_column`) live on a single `impl<S: Chainable>`
-+/// block -- chainable states share the body rather than duplicating it per state.
-+///
-+/// Sealed: external types cannot implement this, keeping the set of chainable states closed.
-+pub trait Chainable: sealed::Sealed {}
-+impl Chainable for Ready {}
-+impl Chainable for Modifying {}
-+
-+mod sealed {
-+    pub trait Sealed {}
-+    impl Sealed for super::Ready {}
-+    impl Sealed for super::Modifying {}
-+}
-+
-+/// Builder for constructing an [`AlterTableTransaction`] with schema evolution operations.
-+///
-+/// Uses a type-state pattern (`S`) to enforce at compile time:
-+/// - At least one schema operation must be queued before `build()` is callable.
-+/// - Only operations valid for the current state can be chained. This will disallow incompatibel
-+///   chaining.
-+pub struct AlterTableTransactionBuilder<S = Ready> {
-+    snapshot: SnapshotRef,
-+    operations: Vec<SchemaOperation>,
-+    // PhantomData marker for builder state (Ready or Modifying).
-+    // Zero-sized; only affects which methods are available at compile time.
-+    _state: PhantomData<S>,
-+}
-+
-+impl<S> AlterTableTransactionBuilder<S> {
-+    // Reconstructs the builder with a different PhantomData marker, changing which methods
-+    // are available at compile time (e.g. Ready -> Modifying enables `build()`). All real
-+    // fields are moved as-is; only the zero-sized type state changes.
-+    //
-+    // `T` (distinct from the struct's `S`) lets the caller pick the target state:
-+    // `self.transition::<Modifying>()` returns `AlterTableTransactionBuilder<Modifying>`.
-+    fn transition<T>(self) -> AlterTableTransactionBuilder<T> {
-+        AlterTableTransactionBuilder {
-+            snapshot: self.snapshot,
-+            operations: self.operations,
-+            _state: PhantomData,
-+        }
-+    }
-+}
-+
-+impl AlterTableTransactionBuilder<Ready> {
-+    /// Create a new builder from a snapshot.
-+    pub(crate) fn new(snapshot: SnapshotRef) -> Self {
-+        AlterTableTransactionBuilder {
-+            snapshot,
-+            operations: Vec::new(),
-+            _state: PhantomData,
-+        }
-+    }
-+}
-+
-+impl<S: Chainable> AlterTableTransactionBuilder<S> {
-+    /// Add a new top-level column to the table schema.
-+    ///
-+    /// The field must not already exist in the schema (case-insensitive). The field must be
-+    /// nullable because existing data files do not contain this column and will read NULL for it.
-+    /// These constraints are validated during [`build()`](AlterTableTransactionBuilder::build).
-+    pub fn add_column(mut self, field: StructField) -> AlterTableTransactionBuilder<Modifying> {
-+        self.operations.push(SchemaOperation::AddColumn { field });
-+        self.transition()
-+    }
-+}
-+
-+impl AlterTableTransactionBuilder<Modifying> {
-+    /// Validate and apply schema operations, then build the [`AlterTableTransaction`].
-+    ///
-+    /// This method:
-+    /// 1. Validates the table supports writes
-+    /// 2. Applies each operation sequentially against the evolving schema
-+    /// 3. Constructs new Metadata action with evolved schema
-+    /// 4. Builds the evolved table configuration
-+    /// 5. Creates the transaction
-+    ///
-+    /// # Errors
-+    ///
-+    /// - Any individual operation fails validation (see per-method errors above)
-+    /// - Table does not support writes (unsupported features)
-+    /// - The evolved schema requires protocol features not enabled on the table (e.g. adding a
-+    ///   `timestampNtz` column without the `timestampNtz` feature)
-+    pub fn build(
-+        self,
-+        _engine: &dyn Engine,
-+        committer: Box<dyn Committer>,
-+    ) -> DeltaResult<AlterTableTransaction> {
-+        let table_config = self.snapshot.table_configuration();
-+        // Rejects writes to tables kernel can't safely commit to: writer version out of
-+        // kernel's supported range, unsupported writer features, or schemas with SQL-expression
-+        // invariants. Runs on the pre-alter snapshot; future ALTER variants that change the
-+        // protocol must also re-check this on the evolved `TableConfiguration`.
-+        table_config.ensure_operation_supported(Operation::Write)?;
-+
-+        let schema = Arc::unwrap_or_clone(table_config.logical_schema());
-+        let SchemaEvolutionResult {
-+            schema: evolved_schema,
-+        } = apply_schema_operations(schema, self.operations, table_config.column_mapping_mode())?;
-+
-+        let evolved_metadata = table_config
-+            .metadata()
-+            .clone()
-+            .with_schema(evolved_schema.clone())?;
-+
-+        // Validates the evolved metadata against the protocol.
-+        let evolved_table_config = TableConfiguration::try_new_with_schema(
-+            table_config,
-+            evolved_metadata,
-+            evolved_schema,
-+        )?;
-+
-+        AlterTableTransaction::try_new_alter_table(self.snapshot, evolved_table_config, committer)
-+    }
-+}
\ No newline at end of file
kernel/src/transaction/builder/create_table.rs
@@ -1,27 +0,0 @@
-diff --git a/kernel/src/transaction/builder/create_table.rs b/kernel/src/transaction/builder/create_table.rs
---- a/kernel/src/transaction/builder/create_table.rs
-+++ b/kernel/src/transaction/builder/create_table.rs
- use crate::clustering::{create_clustering_domain_metadata, validate_clustering_columns};
- use crate::committer::Committer;
- use crate::expressions::ColumnName;
--use crate::schema::validation::validate_schema_for_create;
-+use crate::schema::validation::validate_schema;
- use crate::schema::variant_utils::schema_contains_variant_type;
- use crate::schema::{
-     normalize_column_names_to_schema_casing, schema_contains_non_null_fields, DataType, SchemaRef,
- /// compatible with Spark readers/writers.
- ///
- /// Explicit `delta.invariants` metadata annotations are rejected by
--/// `validate_schema_for_create`, so this only flips on the feature for nullability-driven
-+/// `validate_schema`, so this only flips on the feature for nullability-driven
- /// invariants. Kernel does not itself enforce the null mask at write time -- it relies on
- /// the engine's `ParquetHandler` to do so. Kernel's default `ParquetHandler` uses
- /// `arrow-rs`, whose `RecordBatch::try_new` rejects null values in fields marked
-             maybe_apply_column_mapping_for_table_create(&self.schema, &mut validated)?;
- 
-         // Validate schema (non-empty, column names, duplicates, no `delta.invariants` metadata)
--        validate_schema_for_create(&effective_schema, column_mapping_mode)?;
-+        validate_schema(&effective_schema, column_mapping_mode)?;
- 
-         // Validate data layout and resolve column names (physical for clustering, logical
-         // for partitioning). Adds required table features for clustering.
\ No newline at end of file
kernel/src/transaction/builder/mod.rs
@@ -1,8 +0,0 @@
-diff --git a/kernel/src/transaction/builder/mod.rs b/kernel/src/transaction/builder/mod.rs
---- a/kernel/src/transaction/builder/mod.rs
-+++ b/kernel/src/transaction/builder/mod.rs
- // and for tests. Also allow dead_code since these are used by integration tests.
- #![allow(unreachable_pub, dead_code)]
- 
-+pub mod alter_table;
- pub mod create_table;
\ No newline at end of file
kernel/src/transaction/mod.rs
@@ -1,35 +0,0 @@
-diff --git a/kernel/src/transaction/mod.rs b/kernel/src/transaction/mod.rs
---- a/kernel/src/transaction/mod.rs
-+++ b/kernel/src/transaction/mod.rs
- #[cfg(not(feature = "internal-api"))]
- pub(crate) mod data_layout;
- 
-+pub(crate) mod alter_table;
-+pub use alter_table::AlterTableTransaction;
- mod commit_info;
- mod domain_metadata;
-+pub(crate) mod schema_evolution;
- mod stats_verifier;
- mod update;
- mod write_context;
- #[derive(Debug)]
- pub struct CreateTable;
- 
-+/// Marker type for alter-table (schema evolution) transactions.
-+///
-+/// Transactions in this state perform metadata-only commits. Data file operations are not
-+/// available at compile time because `AlterTable` does not implement [`SupportsDataFiles`].
-+#[derive(Debug)]
-+pub struct AlterTable;
-+
- /// Marker trait for transaction states that support data file operations.
- ///
- /// Only transaction types that implement this trait can access methods for adding, removing, or
- 
-     // Note: Additional test coverage for partial file matching (where some files in a scan
-     // have DV updates but others don't) is provided by the end-to-end integration test
--    // kernel/tests/dv.rs and kernel/tests/write.rs, which exercises
-+    // kernel/tests/dv.rs and kernel/tests/write_remove_dv.rs, which exercise
-     // the full deletion vector write workflow including the DvMatchVisitor logic.
- 
-     #[test]
\ No newline at end of file
kernel/src/transaction/schema_evolution.rs
@@ -1,190 +0,0 @@
-diff --git a/kernel/src/transaction/schema_evolution.rs b/kernel/src/transaction/schema_evolution.rs
-new file mode 100644
---- /dev/null
-+++ b/kernel/src/transaction/schema_evolution.rs
-+//! Schema evolution operations for ALTER TABLE.
-+//!
-+//! This module defines the [`SchemaOperation`] enum and the [`apply_schema_operations`] function
-+//! that validates and applies schema changes to produce an evolved schema.
-+
-+use indexmap::IndexMap;
-+
-+use crate::error::Error;
-+use crate::schema::validation::validate_schema;
-+use crate::schema::{SchemaRef, StructField, StructType};
-+use crate::table_features::ColumnMappingMode;
-+use crate::DeltaResult;
-+
-+/// A schema evolution operation to be applied during ALTER TABLE.
-+///
-+/// Operations are validated and applied in order during
-+/// [`apply_schema_operations`]. Each operation sees the schema state after all prior operations
-+/// have been applied.
-+#[derive(Debug, Clone)]
-+pub(crate) enum SchemaOperation {
-+    /// Add a top-level column.
-+    AddColumn { field: StructField },
-+}
-+
-+/// The result of applying schema operations.
-+#[derive(Debug)]
-+pub(crate) struct SchemaEvolutionResult {
-+    /// The evolved schema after all operations are applied.
-+    pub schema: SchemaRef,
-+}
-+
-+/// Applies a sequence of schema operations to the given schema, returning the evolved schema.
-+///
-+/// Operations are applied sequentially: each one validates against and modifies the schema
-+/// produced by all preceding operations, not the original input schema.
-+///
-+/// # Errors
-+///
-+/// Returns an error if any operation fails validation. The error message identifies which
-+/// operation failed and why.
-+pub(crate) fn apply_schema_operations(
-+    schema: StructType,
-+    operations: Vec<SchemaOperation>,
-+    column_mapping_mode: ColumnMappingMode,
-+) -> DeltaResult<SchemaEvolutionResult> {
-+    let cm_enabled = column_mapping_mode != ColumnMappingMode::None;
-+    // IndexMap preserves field insertion order. Keys are lowercased for case-insensitive
-+    // duplicate detection; StructFields retain their original casing.
-+    let mut fields: IndexMap<String, StructField> = schema
-+        .into_fields()
-+        .map(|f| (f.name().to_lowercase(), f))
-+        .collect();
-+
-+    for op in operations {
-+        match op {
-+            // Protocol feature checks for the field's data type (e.g. `timestampNtz`) happen
-+            // later when the caller builds a new TableConfiguration from the evolved schema --
-+            // the alter is rejected if the table doesn't already have the required feature
-+            // enabled. This matches Spark, which also rejects with
-+            // `DELTA_FEATURES_REQUIRE_MANUAL_ENABLEMENT` and requires the user to enable the
-+            // feature explicitly before adding such a column.
-+            SchemaOperation::AddColumn { field } => {
-+                // TODO: support column mapping for add_column (assign ID + physical name,
-+                // update delta.columnMapping.maxColumnId).
-+                if cm_enabled {
-+                    return Err(Error::unsupported(
-+                        "ALTER TABLE add_column is not yet supported on tables with \
-+                         column mapping enabled",
-+                    ));
-+                }
-+                if field.is_metadata_column() {
-+                    return Err(Error::schema(format!(
-+                        "Cannot add column '{}': metadata columns are not allowed in \
-+                         a table schema",
-+                        field.name()
-+                    )));
-+                }
-+                let key = field.name().to_lowercase();
-+                if fields.contains_key(&key) {
-+                    return Err(Error::schema(format!(
-+                        "Cannot add column '{}': a column with that name already exists",
-+                        field.name()
-+                    )));
-+                }
-+                // Validate field is nullable (Delta protocol requires added columns to be
-+                // nullable so existing data files can return NULL for the new column)
-+                // NOTE: non-nullable columns depend on invariants feature
-+                if !field.is_nullable() {
-+                    return Err(Error::schema(format!(
-+                        "Cannot add non-nullable column '{}'. Added columns must be nullable \
-+                         because existing data files do not contain this column.",
-+                        field.name()
-+                    )));
-+                }
-+                fields.insert(key, field);
-+            }
-+        }
-+    }
-+
-+    let evolved_schema = StructType::try_new(fields.into_values())?;
-+
-+    validate_schema(&evolved_schema, column_mapping_mode)?;
-+    Ok(SchemaEvolutionResult {
-+        schema: evolved_schema.into(),
-+    })
-+}
-+
-+#[cfg(test)]
-+mod tests {
-+    use rstest::rstest;
-+
-+    use super::*;
-+    use crate::schema::{DataType, MetadataColumnSpec, StructField, StructType};
-+
-+    fn simple_schema() -> StructType {
-+        StructType::try_new(vec![
-+            StructField::not_null("id", DataType::INTEGER),
-+            StructField::nullable("name", DataType::STRING),
-+        ])
-+        .unwrap()
-+    }
-+
-+    fn add_col(name: &str, nullable: bool) -> SchemaOperation {
-+        let field = if nullable {
-+            StructField::nullable(name, DataType::STRING)
-+        } else {
-+            StructField::not_null(name, DataType::STRING)
-+        };
-+        SchemaOperation::AddColumn { field }
-+    }
-+
-+    // Builds a struct column whose nested leaf field has the given name. Used to prove that
-+    // `validate_schema` (not just the top-level dup check or `StructType::try_new`) is
-+    // reached from `apply_schema_operations`.
-+    fn add_struct_with_nested_leaf(name: &str, leaf_name: &str) -> SchemaOperation {
-+        let inner =
-+            StructType::try_new(vec![StructField::nullable(leaf_name, DataType::STRING)]).unwrap();
-+        SchemaOperation::AddColumn {
-+            field: StructField::nullable(name, inner),
-+        }
-+    }
-+
-+    #[rstest]
-+    #[case::dup_exact(vec![add_col("name", true)], "already exists")]
-+    #[case::dup_case_insensitive(vec![add_col("Name", true)], "already exists")]
-+    #[case::dup_within_batch(
-+        vec![add_col("email", true), add_col("email", true)],
-+        "already exists"
-+    )]
-+    #[case::non_nullable(vec![add_col("age", false)], "non-nullable")]
-+    #[case::invalid_parquet_char(vec![add_col("foo,bar", true)], "invalid character")]
-+    #[case::nested_invalid_parquet_char(
-+        vec![add_struct_with_nested_leaf("addr", "bad,leaf")],
-+        "invalid character"
-+    )]
-+    #[case::metadata_column(
-+        vec![SchemaOperation::AddColumn {
-+            field: StructField::create_metadata_column("row_idx", MetadataColumnSpec::RowIndex),
-+        }],
-+        "metadata columns are not allowed"
-+    )]
-+    fn apply_schema_operations_rejects(
-+        #[case] ops: Vec<SchemaOperation>,
-+        #[case] error_contains: &str,
-+    ) {
-+        let err =
-+            apply_schema_operations(simple_schema(), ops, ColumnMappingMode::None).unwrap_err();
-+        assert!(err.to_string().contains(error_contains));
-+    }
-+
-+    #[rstest]
-+    #[case::single(vec![add_col("email", true)], &["id", "name", "email"])]
-+    #[case::multiple(
-+        vec![add_col("email", true), add_col("age", true)],
-+        &["id", "name", "email", "age"]
-+    )]
-+    fn apply_schema_operations_succeeds(
-+        #[case] ops: Vec<SchemaOperation>,
-+        #[case] expected_names: &[&str],
-+    ) {
-+        let result =
-+            apply_schema_operations(simple_schema(), ops, ColumnMappingMode::None).unwrap();
-+        let actual: Vec<&str> = result.schema.fields().map(|f| f.name().as_str()).collect();
-+        assert_eq!(&actual, expected_names);
-+    }
-+}
\ No newline at end of file
kernel/tests/README.md
@@ -1,31 +0,0 @@
-diff --git a/kernel/tests/README.md b/kernel/tests/README.md
---- a/kernel/tests/README.md
-+++ b/kernel/tests/README.md
- 
- | Table | Location | Schema | Protocol (R/W) | Features | Description | Tests |
- |-------|----------|--------|----------|----------|-------------|-------|
--| `table-with-dv-small` | data/ | `value: int` | v3/v7 | r:`deletionVectors` w:`deletionVectors` | 10 rows, 2 soft-deleted by DV, 8 visible. Most heavily referenced test table. | `dv.rs::test_table_scan(with_dv)`, `write.rs::test_remove_files_adds_expected_entries`, `write.rs::test_update_deletion_vectors_adds_expected_entries`, `read.rs::with_predicate_and_removes`, `path.rs::test_to_uri/test_child/test_child_escapes`, `snapshot.rs::test_snapshot_read_metadata/test_new_snapshot/test_snapshot_new_from/test_read_table_with_missing_last_checkpoint/test_log_compaction_writer`, `deletion_vector.rs` tests, `transaction/mod.rs::setup_dv_enabled_table/test_add_files_schema/test_new_deletion_vector_path`, `default/parquet.rs` read test, `default/json.rs` read test, `log_compaction/tests.rs::create_mock_snapshot`, `resolve_dvs.rs` tests |
-+| `table-with-dv-small` | data/ | `value: int` | v3/v7 | r:`deletionVectors` w:`deletionVectors` | 10 rows, 2 soft-deleted by DV, 8 visible. Most heavily referenced test table. | `dv.rs::test_table_scan(with_dv)`, `write_remove_dv.rs::test_remove_files_adds_expected_entries`, `write_remove_dv.rs::test_update_deletion_vectors_adds_expected_entries`, `read.rs::with_predicate_and_removes`, `path.rs::test_to_uri/test_child/test_child_escapes`, `snapshot.rs::test_snapshot_read_metadata/test_new_snapshot/test_snapshot_new_from/test_read_table_with_missing_last_checkpoint/test_log_compaction_writer`, `deletion_vector.rs` tests, `transaction/mod.rs::setup_dv_enabled_table/test_add_files_schema/test_new_deletion_vector_path`, `default/parquet.rs` read test, `default/json.rs` read test, `log_compaction/tests.rs::create_mock_snapshot`, `resolve_dvs.rs` tests |
- | `table-without-dv-small` | data/ | `value: long` | v1/v2 | | 10 rows, all visible. Companion to table-with-dv-small. | `dv.rs::test_table_scan(without_dv)`, `transaction/mod.rs::setup_non_dv_table/create_existing_table_txn/test_commit_io_error_returns_retryable_transaction`, `sequential_phase.rs::test_sequential_v2_with_commits_only/test_sequential_finish_before_exhaustion_error`, `parallel_phase.rs` tests, `scan/tests.rs::test_scan_metadata_paths/test_scan_metadata/test_scan_metadata_from_same_version` |
- | `with-short-dv` | data/ | `id: long, value: string, timestamp: timestamp, rand: double` | v3/v7 | r:`deletionVectors` w:`deletionVectors` | 2 files x 5 rows. First file has inline DV (`storageType="u"`) deleting 3 rows. | `read.rs::short_dv` |
- | `dv-partitioned-with-checkpoint` | golden_data/ | `value: int, part: int` partitioned by `part` | v3/v7 | r:`deletionVectors` w:`deletionVectors` | DVs on a partitioned table with a checkpoint | `golden_tables.rs::golden_test!` |
- 
- | Table | Location | Schema | Protocol (R/W) | Features | Description | Tests |
- |-------|----------|--------|----------|----------|-------------|-------|
--| `partition_cm/none` | data/ | `value: int, category: string` partitioned by `category` | v1/v1 | `columnMapping.mode=none` | Partitioned write with CM disabled | `write.rs::test_column_mapping_partitioned_write(cm_none)` |
--| `partition_cm/id` | data/ | `value: int, category: string` partitioned by `category` | v3/v7 | r:`columnMapping` w:`columnMapping`, `columnMapping.mode=id` | Partitioned write with CM id mode | `write.rs::test_column_mapping_partitioned_write(cm_id)` |
--| `partition_cm/name` | data/ | `value: int, category: string` partitioned by `category` | v3/v7 | r:`columnMapping` w:`columnMapping`, `columnMapping.mode=name` | Partitioned write with CM name mode | `write.rs::test_column_mapping_partitioned_write(cm_name)` |
-+| `partition_cm/none` | data/ | `value: int, category: string` partitioned by `category` | v1/v1 | `columnMapping.mode=none` | Partitioned write with CM disabled | `write_column_mapping.rs::test_column_mapping_partitioned_write(cm_none)` |
-+| `partition_cm/id` | data/ | `value: int, category: string` partitioned by `category` | v3/v7 | r:`columnMapping` w:`columnMapping`, `columnMapping.mode=id` | Partitioned write with CM id mode | `write_column_mapping.rs::test_column_mapping_partitioned_write(cm_id)` |
-+| `partition_cm/name` | data/ | `value: int, category: string` partitioned by `category` | v3/v7 | r:`columnMapping` w:`columnMapping`, `columnMapping.mode=name` | Partitioned write with CM name mode | `write_column_mapping.rs::test_column_mapping_partitioned_write(cm_name)` |
- | `table-with-columnmapping-mode-name` | golden_data/ | `ByteType: byte, ShortType: short, IntegerType: int, LongType: long, FloatType: float, DoubleType: double, decimal: decimal(10,2), BooleanType: boolean, StringType: string, BinaryType: binary, DateType: date, TimestampType: timestamp, nested_struct: struct{aa: string, ac: struct{aca: int}}, array_of_prims: array<int>, array_of_arrays: array<array<int>>, array_of_structs: array<struct{ab: long}>, map_of_prims: map<int,long>, map_of_rows: map<int,struct{ab: long}>, map_of_arrays: map<long,array<int>>` | v2/v5 | `columnMapping.mode=name` | Column mapping name mode | `golden_tables.rs::golden_test!` |
- | `table-with-columnmapping-mode-id` | golden_data/ | `ByteType: byte, ShortType: short, IntegerType: int, LongType: long, FloatType: float, DoubleType: double, decimal: decimal(10,2), BooleanType: boolean, StringType: string, BinaryType: binary, DateType: date, TimestampType: timestamp, nested_struct: struct{aa: string, ac: struct{aca: int}}, array_of_prims: array<int>, array_of_arrays: array<array<int>>, array_of_structs: array<struct{ab: long}>, map_of_prims: map<int,long>, map_of_rows: map<int,struct{ab: long}>, map_of_arrays: map<long,array<int>>` | v2/v5 | `columnMapping.mode=id` | Column mapping id mode | `golden_tables.rs::golden_test!` |
- 
- | Table | Location | Schema | Protocol (R/W) | Features | Description | Tests |
- |-------|----------|--------|----------|----------|-------------|-------|
- | `with_checkpoint_no_last_checkpoint` | data/ | `letter: string, int: long, date: date` | v1/v2 | `checkpointInterval=2` | Checkpoint at v2 but missing `_last_checkpoint` hint file | `snapshot.rs::test_read_table_with_checkpoint`, `scan/tests.rs::test_scan_with_checkpoint`, `sequential_phase.rs::test_sequential_checkpoint_no_commits`, `checkpoint_manifest.rs` tests, `sync/parquet.rs` test, `default/parquet.rs` test |
--| `external-table-different-nullability` | data/ | `i: int` | v1/v2 | `checkpointInterval=2` | Parquet files have different nullability than Delta schema; includes checkpoint | `write.rs::test_checkpoint_non_kernel_written_table` |
-+| `external-table-different-nullability` | data/ | `i: int` | v1/v2 | `checkpointInterval=2` | Parquet files have different nullability than Delta schema; includes checkpoint | `write_clustered.rs::test_checkpoint_non_kernel_written_table` |
- | `checkpoint` | golden_data/ | `intCol: int` | v1/v2 | | Basic checkpoint read | `golden_tables.rs::golden_test!(checkpoint_test)` |
- | `corrupted-last-checkpoint-kernel` | golden_data/ | `id: long` | v1/v2 | | Corrupted `_last_checkpoint` file | `golden_tables.rs::golden_test!` |
- | `multi-part-checkpoint` | golden_data/ | `id: long` | v1/v2 | `checkpointInterval=1` | Multi-part checkpoint files | `golden_tables.rs::golden_test!` |
\ No newline at end of file

... (truncated, output exceeded 60000 bytes)

Reproduce locally: git range-diff ac9dc19..3d37746 6486bd2..aac9b0a | Disable: git config gitstack.push-range-diff false

Comment thread kernel/src/schema/mod.rs
#[serde(serialize_with = "serialize_decimal", untagged)]
Decimal(DecimalType),
#[serde(serialize_with = "serialize_geometry", untagged)]
Geometry(Box<GeometryType>),
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Box b/c adding geo types made the Error enum exceed 128 bytes and failed clippy check (this solution matches existing DataType convention of boxing types)

@lorenarosati
Copy link
Copy Markdown
Collaborator Author

Range-diff: main (aac9b0a -> 97281f0)
kernel/src/schema/mod.rs
@@ -4,21 +4,16 @@
      true
  }
  
-+/// The default spatial reference identifier for geometry and geography types.
++/// Default spatial reference identifier for geometry and geography types.
 +pub const DEFAULT_GEO_SRID: &str = "OGC:CRS84";
 +
-+/// The algorithm used to interpolate edges between two vertices of a geography path.
++/// Algorithm used to interpolate edges between two vertices of a geography path.
 +#[derive(Debug, Clone, PartialEq, Eq, Hash, Serialize, Deserialize)]
 +pub enum EdgeInterpolationAlgorithm {
-+    /// Great circle paths on a sphere.
 +    Spherical,
-+    /// Vincenty's inverse formula on an ellipsoid.
 +    Vincenty,
-+    /// Thomas's formula on an ellipsoid.
 +    Thomas,
-+    /// Andoyer's method on an ellipsoid.
 +    Andoyer,
-+    /// Karney's method on an ellipsoid.
 +    Karney,
 +}
 +
@@ -50,21 +45,13 @@
 +    }
 +}
 +
-+/// A geometry column type with an associated spatial reference identifier (SRID).
-+///
-+/// Geometry values are encoded as WKB (Well-Known Binary) bytes in the Arrow layer,
-+/// represented as a `Binary` physical array with GeoArrow field metadata.
++/// A geometry column type with an associated spatial reference identifier (SRID)
 +#[derive(Debug, Clone, PartialEq, Eq, Hash, Serialize, Deserialize)]
 +pub struct GeometryType {
 +    srid: String,
 +}
 +
 +impl GeometryType {
-+    /// Creates a new GeometryType with the given SRID.
-+    ///
-+    /// # Parameters
-+    /// - `srid`: The spatial reference identifier (e.g. `"OGC:CRS84"`, `"EPSG:4326"`). Must be
-+    ///   non-empty.
 +    pub fn try_new(srid: impl Into<String>) -> DeltaResult<Self> {
 +        let srid = srid.into();
 +        if srid.is_empty() {
@@ -73,7 +60,6 @@
 +        Ok(Self { srid })
 +    }
 +
-+    /// Returns the SRID associated with this geometry type.
 +    pub fn srid(&self) -> &str {
 +        &self.srid
 +    }
@@ -93,10 +79,7 @@
 +    }
 +}
 +
-+/// A geography column type with an associated SRID and edge interpolation algorithm.
-+///
-+/// Geography values are encoded as WKB bytes in the Arrow layer,
-+/// represented as a `Binary` physical array with GeoArrow field metadata.
++/// Geography column type with an associated SRID and edge interpolation algorithm.
 +#[derive(Debug, Clone, PartialEq, Eq, Hash, Serialize, Deserialize)]
 +pub struct GeographyType {
 +    srid: String,
@@ -104,11 +87,6 @@
 +}
 +
 +impl GeographyType {
-+    /// Creates a new GeographyType with the given SRID and edge interpolation algorithm.
-+    ///
-+    /// # Parameters
-+    /// - `srid`: The spatial reference identifier (e.g. `"OGC:CRS84"`). Must be non-empty.
-+    /// - `algorithm`: The edge interpolation algorithm to use.
 +    pub fn try_new(
 +        srid: impl Into<String>,
 +        algorithm: EdgeInterpolationAlgorithm,
@@ -120,24 +98,18 @@
 +        Ok(Self { srid, algorithm })
 +    }
 +
-+    /// Creates a new GeographyType with the given SRID and the default edge interpolation
-+    /// algorithm ([`EdgeInterpolationAlgorithm::Spherical`]).
 +    pub fn try_new_with_srid(srid: impl Into<String>) -> DeltaResult<Self> {
 +        Self::try_new(srid, EdgeInterpolationAlgorithm::Spherical)
 +    }
 +
-+    /// Creates a new GeographyType with the given edge interpolation algorithm and the
-+    /// default SRID (DEFAULT_GEO_SRID).
 +    pub fn try_new_with_algorithm(algorithm: EdgeInterpolationAlgorithm) -> DeltaResult<Self> {
 +        Self::try_new(DEFAULT_GEO_SRID, algorithm)
 +    }
 +
-+    /// Returns the SRID associated with this geography type.
 +    pub fn srid(&self) -> &str {
 +        &self.srid
 +    }
 +
-+    /// Returns the edge interpolation algorithm for this geography type.
 +    pub fn algorithm(&self) -> &EdgeInterpolationAlgorithm {
 +        &self.algorithm
 +    }
@@ -206,9 +178,9 @@
 +            geo_str if geo_str.starts_with("geography(") && geo_str.ends_with(')') => {
 +                let inner = &geo_str[10..geo_str.len() - 1];
 +                // Three accepted shapes:
-+                //   geography(<srid>, <algorithm>) -- both
-+                //   geography(<srid>)              -- SRID only (contains ':')
-+                //   geography(<algorithm>)         -- algorithm only (no ':')
++                //   geography(<srid>, <algorithm>) - both
++                //   geography(<srid>)              - SRID only (contains ':')
++                //   geography(<algorithm>)         - algorithm only (no ':')
 +                match inner.rfind(',') {
 +                    Some(pos) => {
 +                        let srid = inner[..pos].trim();
kernel/src/table_configuration.rs
@@ -9,38 +9,6 @@
      validate_timestamp_ntz_feature_support, ColumnMappingMode, EnablementCheck, FeatureRequirement,
      FeatureType, KernelSupport, Operation, TableFeature, LEGACY_READER_FEATURES,
      LEGACY_WRITER_FEATURES, MAX_VALID_READER_VERSION, MAX_VALID_WRITER_VERSION,
-         version: Version,
-     ) -> DeltaResult<Self> {
-         let logical_schema = Arc::new(metadata.parse_schema()?);
-+        Self::try_new_inner(metadata, protocol, table_root, version, logical_schema)
-+    }
-+
-+    /// Like [`try_new`](Self::try_new), but reuses `base`'s protocol, table root, and version
-+    /// and takes a pre-parsed `logical_schema`.
-+    pub(crate) fn try_new_with_schema(
-+        base: &Self,
-+        metadata: Metadata,
-+        logical_schema: SchemaRef,
-+    ) -> DeltaResult<Self> {
-+        Self::try_new_inner(
-+            metadata,
-+            base.protocol.clone(),
-+            base.table_root.clone(),
-+            base.version,
-+            logical_schema,
-+        )
-+    }
-+
-+    fn try_new_inner(
-+        metadata: Metadata,
-+        protocol: Protocol,
-+        table_root: Url,
-+        version: Version,
-+        logical_schema: SchemaRef,
-+    ) -> DeltaResult<Self> {
-         let table_properties = metadata.parse_table_properties();
-         let column_mapping_mode = column_mapping_mode(&protocol, &table_properties);
- 
  
          // Validate schema against protocol features now that we have a TC instance.
          validate_timestamp_ntz_feature_support(&table_config)?;
kernel/src/actions/mod.rs
@@ -1,32 +0,0 @@
-diff --git a/kernel/src/actions/mod.rs b/kernel/src/actions/mod.rs
---- a/kernel/src/actions/mod.rs
-+++ b/kernel/src/actions/mod.rs
- }
- 
- // Serde derives are needed for CRC file deserialization (see `crc::reader`).
-+//
-+// TODO(#2446): `Metadata` stores the schema only as a JSON string. Callers that already hold
-+// a parsed `SchemaRef` (e.g. CREATE TABLE) serialize into `schema_string` and then re-parse
-+// downstream in `TableConfiguration::try_new` via `parse_schema()`. Caching the parsed schema
-+// on `Metadata` would eliminate the round-trip.
- #[derive(Debug, Default, Clone, PartialEq, Eq, Serialize, Deserialize, ToSchema)]
- #[serde(rename_all = "camelCase")]
- #[internal_api]
-         TableProperties::from(self.configuration.iter())
-     }
- 
-+    /// Returns a new Metadata with the schema replaced, preserving all other fields.
-+    ///
-+    /// # Errors
-+    ///
-+    /// Returns an error if schema serialization fails.
-+    pub(crate) fn with_schema(self, schema: SchemaRef) -> DeltaResult<Self> {
-+        Ok(Self {
-+            schema_string: serde_json::to_string(&schema)?,
-+            ..self
-+        })
-+    }
-+
-     #[cfg(test)]
-     #[allow(clippy::too_many_arguments)]
-     pub(crate) fn new_unchecked(
\ No newline at end of file
kernel/src/engine/arrow_expression/evaluate_expression.rs
@@ -1,154 +0,0 @@
-diff --git a/kernel/src/engine/arrow_expression/evaluate_expression.rs b/kernel/src/engine/arrow_expression/evaluate_expression.rs
---- a/kernel/src/engine/arrow_expression/evaluate_expression.rs
-+++ b/kernel/src/engine/arrow_expression/evaluate_expression.rs
-         (Literal(scalar), _) => {
-             validate_array_type(scalar.to_array(batch.num_rows())?, result_type)
-         }
--        (Column(name), _) => {
--            // Column extraction uses ordinal-based struct validation because column mapping
--            // can cause physical/logical name mismatches. apply_schema handles renaming.
--            let arr = extract_column(batch, name)?;
--            if let Some(expected) = result_type {
--                ensure_data_types(expected, arr.data_type(), ValidationMode::TypesOnly)?;
--            }
--            Ok(arr)
--        }
-+        (Column(name), _) => validate_array_type(extract_column(batch, name)?, result_type),
-         (Struct(fields, nullability), Some(DataType::Struct(output_schema))) => {
-             evaluate_struct_expression(fields, batch, output_schema, nullability.as_ref())
-         }
-     }
- 
-     #[test]
--    fn column_extract_struct_with_mismatched_field_names() {
-+    fn column_extract_struct_rejects_mismatched_field_names() {
-         let batch = make_struct_batch(
-             vec![
-                 ArrowField::new("col-abc-001", ArrowDataType::Int64, true),
-             ],
-         );
- 
--        // Logical names differ from physical names due to column mapping
-         let logical_type = DataType::try_struct_type([
-             StructField::nullable("my_column", DataType::LONG),
-             StructField::nullable("other_column", DataType::LONG),
- 
-         let expr = column_expr!("stats");
-         let result = evaluate_expression(&expr, &batch, Some(&logical_type));
--
--        // Ordinal-based validation passes: same field count and types by position.
--        // The downstream apply_schema transformation handles renaming.
--        let arr = result.expect("should succeed with mismatched names but matching types");
--        let struct_arr = arr.as_any().downcast_ref::<StructArray>().unwrap();
--        assert_eq!(struct_arr.num_columns(), 2);
--        assert_eq!(struct_arr.len(), 2);
--    }
--
--    #[test]
--    fn column_extract_struct_rejects_mismatched_field_count() {
--        let batch = make_struct_batch(
--            vec![ArrowField::new("col-abc-001", ArrowDataType::Int64, true)],
--            vec![Arc::new(Int64Array::from(vec![Some(1), Some(2)]))],
--        );
--
--        let logical_type = DataType::try_struct_type([
--            StructField::nullable("a", DataType::LONG),
--            StructField::nullable("b", DataType::LONG),
--        ])
--        .unwrap();
--
--        let expr = column_expr!("stats");
--        let result = evaluate_expression(&expr, &batch, Some(&logical_type));
--        assert_result_error_with_message(result, "Struct field count mismatch");
-+        assert_result_error_with_message(result, "Missing Struct fields");
-     }
- 
-     #[test]
-     fn column_extract_struct_rejects_mismatched_child_types() {
-         let batch = make_struct_batch(
-             vec![
--                ArrowField::new("col-abc-001", ArrowDataType::Int64, true),
--                ArrowField::new("col-abc-002", ArrowDataType::Utf8, true),
-+                ArrowField::new("a", ArrowDataType::Int64, true),
-+                ArrowField::new("b", ArrowDataType::Utf8, true),
-             ],
-             vec![
-                 Arc::new(Int64Array::from(vec![Some(1)])),
-             ],
-         );
- 
--        // Expect two LONG columns, but the second arrow field is Utf8
-         let logical_type = DataType::try_struct_type([
-             StructField::nullable("a", DataType::LONG),
-             StructField::nullable("b", DataType::LONG),
-     }
- 
-     #[test]
--    fn column_extract_struct_with_matching_names_still_works() {
-+    fn column_extract_struct_with_matching_names_works() {
-         let batch = make_struct_batch(
-             vec![
-                 ArrowField::new("a", ArrowDataType::Int64, true),
-         assert!(result.is_ok());
-     }
- 
--    /// Exercises the exact code path from `get_add_transform_expr` where a `struct_from`
--    /// expression wraps `column_expr!("add.stats_parsed")`. When the checkpoint parquet has
--    /// stats_parsed with physical column names (e.g. `col-abc-001`) but the output schema
--    /// uses logical names (e.g. `id`), `evaluate_struct_expression` calls
--    /// `evaluate_expression(Column, struct_result_type)` with mismatched field names.
--    /// Without ordinal-based validation this fails with a name mismatch error.
-+    /// When a `struct_from` expression wraps a `Column` referencing stats_parsed, and the
-+    /// checkpoint parquet has physical column names (e.g. `col-abc-001`) but the output schema
-+    /// uses logical names (e.g. `id`), name-based validation correctly rejects the mismatch.
-     #[test]
--    fn struct_from_with_column_tolerates_nested_name_mismatch() {
--        // Build a batch mimicking checkpoint data: add.stats_parsed uses physical names
-+    fn struct_from_with_column_rejects_nested_name_mismatch() {
-         let stats_fields: Vec<ArrowField> = vec![
-             ArrowField::new("col-abc-001", ArrowDataType::Int64, true),
-             ArrowField::new("col-abc-002", ArrowDataType::Int64, true),
-         )]);
-         let batch = RecordBatch::try_new(Arc::new(schema), vec![Arc::new(add_struct)]).unwrap();
- 
--        // struct_from mimicking get_add_transform_expr: wraps a Column referencing stats_parsed
-         let expr = Expr::struct_from([
-             column_expr_ref!("add.path"),
-             column_expr_ref!("add.stats_parsed"),
-         .unwrap();
- 
-         let result = evaluate_expression(&expr, &batch, Some(&output_type));
--        result.expect("struct_from with Column sub-expression should tolerate field name mismatch");
--    }
--
--    #[test]
--    fn column_extract_nested_struct_with_mismatched_names() {
--        let inner_fields = vec![ArrowField::new("phys-inner", ArrowDataType::Int64, true)];
--        let inner_struct = ArrowDataType::Struct(inner_fields.clone().into());
--        let batch = make_struct_batch(
--            vec![ArrowField::new("phys-outer", inner_struct, true)],
--            vec![Arc::new(
--                StructArray::try_new(
--                    inner_fields.into(),
--                    vec![Arc::new(Int64Array::from(vec![Some(42)]))],
--                    None,
--                )
--                .unwrap(),
--            )],
--        );
--
--        let logical_type = DataType::try_struct_type([StructField::nullable(
--            "logical_outer",
--            DataType::struct_type_unchecked([StructField::nullable(
--                "logical_inner",
--                DataType::LONG,
--            )]),
--        )])
--        .unwrap();
--
--        let expr = column_expr!("stats");
--        let result = evaluate_expression(&expr, &batch, Some(&logical_type));
--        assert!(result.is_ok());
-+        assert_result_error_with_message(result, "Missing Struct fields");
-     }
- }
\ No newline at end of file
kernel/src/engine/ensure_data_types.rs
@@ -1,13 +0,0 @@
-diff --git a/kernel/src/engine/ensure_data_types.rs b/kernel/src/engine/ensure_data_types.rs
---- a/kernel/src/engine/ensure_data_types.rs
-+++ b/kernel/src/engine/ensure_data_types.rs
- #[internal_api]
- pub(crate) enum ValidationMode {
-     /// Check types only. Struct fields are matched by ordinal position, not by name.
--    /// Nullability and metadata are not checked. Used by the expression evaluator where
--    /// column mapping can cause physical/logical name mismatches.
-+    /// Nullability and metadata are not checked.
-+    #[allow(dead_code)]
-     TypesOnly,
-     /// Check types and match struct fields by name, but skip nullability and metadata.
-     /// Used by the parquet reader where fields are already resolved by name upstream.
\ No newline at end of file
kernel/src/schema/validation.rs
@@ -1,48 +0,0 @@
-diff --git a/kernel/src/schema/validation.rs b/kernel/src/schema/validation.rs
---- a/kernel/src/schema/validation.rs
-+++ b/kernel/src/schema/validation.rs
--//! Schema validation utilities for Delta table creation.
-+//! Schema validation utilities shared by table creation and schema evolution.
- //!
- //! Validates schemas per the Delta protocol specification.
- 
- /// These characters have special meaning in Parquet schema syntax.
- const INVALID_PARQUET_CHARS: &[char] = &[' ', ',', ';', '{', '}', '(', ')', '\n', '\t', '='];
- 
--/// Validates a schema for table creation.
-+/// Validates a schema for CREATE TABLE or ALTER TABLE.
- ///
- /// Performs the following checks:
- /// 1. Schema is non-empty
- /// 3. Column names contain only valid characters
- /// 4. Rejects fields with `delta.invariants` metadata (SQL expression invariants are not supported
- ///    by kernel; see `TableConfiguration::ensure_write_supported`)
--pub(crate) fn validate_schema_for_create(
-+pub(crate) fn validate_schema(
-     schema: &StructType,
-     column_mapping_mode: ColumnMappingMode,
- ) -> DeltaResult<()> {
-     #[case::dot_in_name_with_cm(schema_with_dot(), ColumnMappingMode::Name)]
-     #[case::different_struct_children(schema_different_struct_children(), ColumnMappingMode::None)]
-     fn valid_schema_accepted(#[case] schema: StructType, #[case] cm: ColumnMappingMode) {
--        assert!(validate_schema_for_create(&schema, cm).is_ok());
-+        assert!(validate_schema(&schema, cm).is_ok());
-     }
- 
-     // === Invalid schemas ===
-         #[case] cm: ColumnMappingMode,
-         #[case] expected_errs: &[&str],
-     ) {
--        let result = validate_schema_for_create(&schema, cm);
-+        let result = validate_schema(&schema, cm);
-         assert!(result.is_err());
-         let err = result.unwrap_err().to_string();
-         for expected in expected_errs {
-     #[case::array_nested(schema_array_nested_invariant(), "arr.child")]
-     #[case::map_nested(schema_map_nested_invariant(), "map.child")]
-     fn invariants_metadata_rejected(#[case] schema: StructType, #[case] expected_path: &str) {
--        let result = validate_schema_for_create(&schema, ColumnMappingMode::None);
-+        let result = validate_schema(&schema, ColumnMappingMode::None);
-         let err = result.expect_err("expected delta.invariants metadata rejection");
-         let msg = err.to_string();
-         assert!(
\ No newline at end of file
kernel/src/snapshot/mod.rs
@@ -1,27 +0,0 @@
-diff --git a/kernel/src/snapshot/mod.rs b/kernel/src/snapshot/mod.rs
---- a/kernel/src/snapshot/mod.rs
-+++ b/kernel/src/snapshot/mod.rs
- use crate::table_configuration::{InCommitTimestampEnablement, TableConfiguration};
- use crate::table_features::{physical_to_logical_column_name, ColumnMappingMode, TableFeature};
- use crate::table_properties::TableProperties;
-+use crate::transaction::builder::alter_table::AlterTableTransactionBuilder;
- use crate::transaction::Transaction;
- use crate::utils::require;
- use crate::{DeltaResult, Engine, Error, LogCompactionWriter, Version};
-         Transaction::try_new_existing_table(self, committer, engine)
-     }
- 
-+    /// Creates a builder for altering this table's metadata. Currently supports schema change
-+    /// operations.
-+    ///
-+    /// The returned builder allows chaining operations before building an
-+    /// [`AlterTableTransaction`] that can be committed.
-+    ///
-+    /// [`AlterTableTransaction`]: crate::transaction::AlterTableTransaction
-+    pub fn alter_table(self: Arc<Self>) -> AlterTableTransactionBuilder {
-+        AlterTableTransactionBuilder::new(self)
-+    }
-+
-     /// Fetch the latest version of the provided `application_id` for this snapshot. Filters the
-     /// txn based on the delta.setTransactionRetentionDuration property and lastUpdated.
-     ///
\ No newline at end of file
kernel/src/transaction/alter_table.rs
@@ -1,81 +0,0 @@
-diff --git a/kernel/src/transaction/alter_table.rs b/kernel/src/transaction/alter_table.rs
-new file mode 100644
---- /dev/null
-+++ b/kernel/src/transaction/alter_table.rs
-+//! Alter table transaction types and constructor.
-+//!
-+//! This module defines the [`AlterTableTransaction`] type alias and the
-+//! [`try_new_alter_table`](AlterTableTransaction::try_new_alter_table) constructor.
-+//! The builder logic lives in [`builder::alter_table`](super::builder::alter_table).
-+
-+#![allow(unreachable_pub)]
-+
-+use std::marker::PhantomData;
-+use std::sync::OnceLock;
-+
-+use crate::committer::Committer;
-+use crate::snapshot::SnapshotRef;
-+use crate::table_configuration::TableConfiguration;
-+use crate::transaction::{AlterTable, Transaction};
-+use crate::utils::current_time_ms;
-+use crate::DeltaResult;
-+
-+/// A type alias for alter-table transactions.
-+///
-+/// This provides a restricted API surface that only exposes operations valid during ALTER
-+/// commands. Data file operations are not available at compile time because `AlterTable`
-+/// does not implement [`SupportsDataFiles`](super::SupportsDataFiles).
-+pub type AlterTableTransaction = Transaction<AlterTable>;
-+
-+impl AlterTableTransaction {
-+    /// Create a new transaction for altering a table's schema. Produces a metadata-only commit
-+    /// that emits an updated Metadata action with the evolved schema.
-+    ///
-+    /// The `effective_table_config` is the evolved table configuration (new schema, same
-+    /// protocol). It must be fully validated before calling this constructor (e.g. schema
-+    /// operations applied, protocol feature checks passed). The `read_snapshot` provides the
-+    /// pre-commit table state (version, previous protocol/metadata, ICT timestamps) used for
-+    /// commit versioning and post-commit snapshots.
-+    ///
-+    /// This is typically called via `AlterTableTransactionBuilder::build()` rather than directly.
-+    pub(crate) fn try_new_alter_table(
-+        read_snapshot: SnapshotRef,
-+        effective_table_config: TableConfiguration,
-+        committer: Box<dyn Committer>,
-+    ) -> DeltaResult<Self> {
-+        let span = tracing::info_span!(
-+            "txn",
-+            path = %read_snapshot.table_root(),
-+            read_version = read_snapshot.version(),
-+            operation = "ALTER TABLE",
-+        );
-+
-+        Ok(Transaction {
-+            span,
-+            read_snapshot_opt: Some(read_snapshot),
-+            effective_table_config,
-+            should_emit_protocol: false,
-+            should_emit_metadata: true,
-+            committer,
-+            operation: Some("ALTER TABLE".to_string()),
-+            engine_info: None,
-+            add_files_metadata: vec![],
-+            remove_files_metadata: vec![],
-+            set_transactions: vec![],
-+            commit_timestamp: current_time_ms()?,
-+            user_domain_metadata_additions: vec![],
-+            system_domain_metadata_additions: vec![],
-+            user_domain_removals: vec![],
-+            data_change: false,
-+            shared_write_state: OnceLock::new(),
-+            engine_commit_info: None,
-+            // TODO(#2446): match delta-spark's per-op isBlindAppend policy
-+            // (ADD/DROP/DROP NOT NULL -> true, SET NOT NULL -> false). Hardcoded false for
-+            // now: safe, but misses the true-case optimization delta-spark applies.
-+            is_blind_append: false,
-+            dv_matched_files: vec![],
-+            physical_clustering_columns: None,
-+            _state: PhantomData,
-+        })
-+    }
-+}
\ No newline at end of file
kernel/src/transaction/builder/alter_table.rs
@@ -1,168 +0,0 @@
-diff --git a/kernel/src/transaction/builder/alter_table.rs b/kernel/src/transaction/builder/alter_table.rs
-new file mode 100644
---- /dev/null
-+++ b/kernel/src/transaction/builder/alter_table.rs
-+//! Builder for ALTER TABLE (schema evolution) transactions.
-+//!
-+//! This module contains [`AlterTableTransactionBuilder`], which uses a type-state pattern to
-+//! enforce valid operation chaining at compile time.
-+//!
-+//! # Type States
-+//!
-+//! - [`Ready`]: Initial state. Operations are available, but `build()` is not (at least one
-+//!   operation is required).
-+//! - [`Modifying`]: After any chainable schema operation. More ops can be chained, and `build()` is
-+//!   available. See [`AlterTableTransactionBuilder<Modifying>`] for ops.
-+//!
-+//! # Transitions
-+//!
-+//! Each `impl` block below is gated by a state bound and documents which operations that
-+//! state enables. Chainable schema operations live on `impl<S: Chainable>` and transition
-+//! the builder to a chainable state; `build()` lives on states that are buildable.
-+//!
-+//! ```ignore
-+//! // Allowed: at least one op queued before build().
-+//! snapshot.alter_table().add_column(field).build(engine, committer)?;
-+//!
-+//! // Not allowed: build() is not defined on Ready (no ops queued).
-+//! snapshot.alter_table().build(engine, committer)?;  // compile error
-+//! ```
-+
-+use std::marker::PhantomData;
-+use std::sync::Arc;
-+
-+use crate::committer::Committer;
-+use crate::schema::StructField;
-+use crate::snapshot::SnapshotRef;
-+use crate::table_configuration::TableConfiguration;
-+use crate::table_features::Operation;
-+use crate::transaction::alter_table::AlterTableTransaction;
-+use crate::transaction::schema_evolution::{
-+    apply_schema_operations, SchemaEvolutionResult, SchemaOperation,
-+};
-+use crate::{DeltaResult, Engine};
-+
-+/// Initial state: `build()` is not yet available (at least one operation is required).
-+/// See [`Chainable`] for the operations available on this state.
-+pub struct Ready;
-+
-+/// State after at least one operation has been added. `build()` is available.
-+/// See [`Chainable`] for the operations available on this state.
-+pub struct Modifying;
-+
-+/// Marker trait for builder states that accept chainable schema operations. Grouping states
-+/// under one bound lets each op (like `add_column`) live on a single `impl<S: Chainable>`
-+/// block -- chainable states share the body rather than duplicating it per state.
-+///
-+/// Sealed: external types cannot implement this, keeping the set of chainable states closed.
-+pub trait Chainable: sealed::Sealed {}
-+impl Chainable for Ready {}
-+impl Chainable for Modifying {}
-+
-+mod sealed {
-+    pub trait Sealed {}
-+    impl Sealed for super::Ready {}
-+    impl Sealed for super::Modifying {}
-+}
-+
-+/// Builder for constructing an [`AlterTableTransaction`] with schema evolution operations.
-+///
-+/// Uses a type-state pattern (`S`) to enforce at compile time:
-+/// - At least one schema operation must be queued before `build()` is callable.
-+/// - Only operations valid for the current state can be chained. This will disallow incompatibel
-+///   chaining.
-+pub struct AlterTableTransactionBuilder<S = Ready> {
-+    snapshot: SnapshotRef,
-+    operations: Vec<SchemaOperation>,
-+    // PhantomData marker for builder state (Ready or Modifying).
-+    // Zero-sized; only affects which methods are available at compile time.
-+    _state: PhantomData<S>,
-+}
-+
-+impl<S> AlterTableTransactionBuilder<S> {
-+    // Reconstructs the builder with a different PhantomData marker, changing which methods
-+    // are available at compile time (e.g. Ready -> Modifying enables `build()`). All real
-+    // fields are moved as-is; only the zero-sized type state changes.
-+    //
-+    // `T` (distinct from the struct's `S`) lets the caller pick the target state:
-+    // `self.transition::<Modifying>()` returns `AlterTableTransactionBuilder<Modifying>`.
-+    fn transition<T>(self) -> AlterTableTransactionBuilder<T> {
-+        AlterTableTransactionBuilder {
-+            snapshot: self.snapshot,
-+            operations: self.operations,
-+            _state: PhantomData,
-+        }
-+    }
-+}
-+
-+impl AlterTableTransactionBuilder<Ready> {
-+    /// Create a new builder from a snapshot.
-+    pub(crate) fn new(snapshot: SnapshotRef) -> Self {
-+        AlterTableTransactionBuilder {
-+            snapshot,
-+            operations: Vec::new(),
-+            _state: PhantomData,
-+        }
-+    }
-+}
-+
-+impl<S: Chainable> AlterTableTransactionBuilder<S> {
-+    /// Add a new top-level column to the table schema.
-+    ///
-+    /// The field must not already exist in the schema (case-insensitive). The field must be
-+    /// nullable because existing data files do not contain this column and will read NULL for it.
-+    /// These constraints are validated during [`build()`](AlterTableTransactionBuilder::build).
-+    pub fn add_column(mut self, field: StructField) -> AlterTableTransactionBuilder<Modifying> {
-+        self.operations.push(SchemaOperation::AddColumn { field });
-+        self.transition()
-+    }
-+}
-+
-+impl AlterTableTransactionBuilder<Modifying> {
-+    /// Validate and apply schema operations, then build the [`AlterTableTransaction`].
-+    ///
-+    /// This method:
-+    /// 1. Validates the table supports writes
-+    /// 2. Applies each operation sequentially against the evolving schema
-+    /// 3. Constructs new Metadata action with evolved schema
-+    /// 4. Builds the evolved table configuration
-+    /// 5. Creates the transaction
-+    ///
-+    /// # Errors
-+    ///
-+    /// - Any individual operation fails validation (see per-method errors above)
-+    /// - Table does not support writes (unsupported features)
-+    /// - The evolved schema requires protocol features not enabled on the table (e.g. adding a
-+    ///   `timestampNtz` column without the `timestampNtz` feature)
-+    pub fn build(
-+        self,
-+        _engine: &dyn Engine,
-+        committer: Box<dyn Committer>,
-+    ) -> DeltaResult<AlterTableTransaction> {
-+        let table_config = self.snapshot.table_configuration();
-+        // Rejects writes to tables kernel can't safely commit to: writer version out of
-+        // kernel's supported range, unsupported writer features, or schemas with SQL-expression
-+        // invariants. Runs on the pre-alter snapshot; future ALTER variants that change the
-+        // protocol must also re-check this on the evolved `TableConfiguration`.
-+        table_config.ensure_operation_supported(Operation::Write)?;
-+
-+        let schema = Arc::unwrap_or_clone(table_config.logical_schema());
-+        let SchemaEvolutionResult {
-+            schema: evolved_schema,
-+        } = apply_schema_operations(schema, self.operations, table_config.column_mapping_mode())?;
-+
-+        let evolved_metadata = table_config
-+            .metadata()
-+            .clone()
-+            .with_schema(evolved_schema.clone())?;
-+
-+        // Validates the evolved metadata against the protocol.
-+        let evolved_table_config = TableConfiguration::try_new_with_schema(
-+            table_config,
-+            evolved_metadata,
-+            evolved_schema,
-+        )?;
-+
-+        AlterTableTransaction::try_new_alter_table(self.snapshot, evolved_table_config, committer)
-+    }
-+}
\ No newline at end of file
kernel/src/transaction/builder/create_table.rs
@@ -1,27 +0,0 @@
-diff --git a/kernel/src/transaction/builder/create_table.rs b/kernel/src/transaction/builder/create_table.rs
---- a/kernel/src/transaction/builder/create_table.rs
-+++ b/kernel/src/transaction/builder/create_table.rs
- use crate::clustering::{create_clustering_domain_metadata, validate_clustering_columns};
- use crate::committer::Committer;
- use crate::expressions::ColumnName;
--use crate::schema::validation::validate_schema_for_create;
-+use crate::schema::validation::validate_schema;
- use crate::schema::variant_utils::schema_contains_variant_type;
- use crate::schema::{
-     normalize_column_names_to_schema_casing, schema_contains_non_null_fields, DataType, SchemaRef,
- /// compatible with Spark readers/writers.
- ///
- /// Explicit `delta.invariants` metadata annotations are rejected by
--/// `validate_schema_for_create`, so this only flips on the feature for nullability-driven
-+/// `validate_schema`, so this only flips on the feature for nullability-driven
- /// invariants. Kernel does not itself enforce the null mask at write time -- it relies on
- /// the engine's `ParquetHandler` to do so. Kernel's default `ParquetHandler` uses
- /// `arrow-rs`, whose `RecordBatch::try_new` rejects null values in fields marked
-             maybe_apply_column_mapping_for_table_create(&self.schema, &mut validated)?;
- 
-         // Validate schema (non-empty, column names, duplicates, no `delta.invariants` metadata)
--        validate_schema_for_create(&effective_schema, column_mapping_mode)?;
-+        validate_schema(&effective_schema, column_mapping_mode)?;
- 
-         // Validate data layout and resolve column names (physical for clustering, logical
-         // for partitioning). Adds required table features for clustering.
\ No newline at end of file
kernel/src/transaction/builder/mod.rs
@@ -1,8 +0,0 @@
-diff --git a/kernel/src/transaction/builder/mod.rs b/kernel/src/transaction/builder/mod.rs
---- a/kernel/src/transaction/builder/mod.rs
-+++ b/kernel/src/transaction/builder/mod.rs
- // and for tests. Also allow dead_code since these are used by integration tests.
- #![allow(unreachable_pub, dead_code)]
- 
-+pub mod alter_table;
- pub mod create_table;
\ No newline at end of file
kernel/src/transaction/mod.rs
@@ -1,35 +0,0 @@
-diff --git a/kernel/src/transaction/mod.rs b/kernel/src/transaction/mod.rs
---- a/kernel/src/transaction/mod.rs
-+++ b/kernel/src/transaction/mod.rs
- #[cfg(not(feature = "internal-api"))]
- pub(crate) mod data_layout;
- 
-+pub(crate) mod alter_table;
-+pub use alter_table::AlterTableTransaction;
- mod commit_info;
- mod domain_metadata;
-+pub(crate) mod schema_evolution;
- mod stats_verifier;
- mod update;
- mod write_context;
- #[derive(Debug)]
- pub struct CreateTable;
- 
-+/// Marker type for alter-table (schema evolution) transactions.
-+///
-+/// Transactions in this state perform metadata-only commits. Data file operations are not
-+/// available at compile time because `AlterTable` does not implement [`SupportsDataFiles`].
-+#[derive(Debug)]
-+pub struct AlterTable;
-+
- /// Marker trait for transaction states that support data file operations.
- ///
- /// Only transaction types that implement this trait can access methods for adding, removing, or
- 
-     // Note: Additional test coverage for partial file matching (where some files in a scan
-     // have DV updates but others don't) is provided by the end-to-end integration test
--    // kernel/tests/dv.rs and kernel/tests/write.rs, which exercises
-+    // kernel/tests/dv.rs and kernel/tests/write_remove_dv.rs, which exercise
-     // the full deletion vector write workflow including the DvMatchVisitor logic.
- 
-     #[test]
\ No newline at end of file
kernel/src/transaction/schema_evolution.rs
@@ -1,190 +0,0 @@
-diff --git a/kernel/src/transaction/schema_evolution.rs b/kernel/src/transaction/schema_evolution.rs
-new file mode 100644
---- /dev/null
-+++ b/kernel/src/transaction/schema_evolution.rs
-+//! Schema evolution operations for ALTER TABLE.
-+//!
-+//! This module defines the [`SchemaOperation`] enum and the [`apply_schema_operations`] function
-+//! that validates and applies schema changes to produce an evolved schema.
-+
-+use indexmap::IndexMap;
-+
-+use crate::error::Error;
-+use crate::schema::validation::validate_schema;
-+use crate::schema::{SchemaRef, StructField, StructType};
-+use crate::table_features::ColumnMappingMode;
-+use crate::DeltaResult;
-+
-+/// A schema evolution operation to be applied during ALTER TABLE.
-+///
-+/// Operations are validated and applied in order during
-+/// [`apply_schema_operations`]. Each operation sees the schema state after all prior operations
-+/// have been applied.
-+#[derive(Debug, Clone)]
-+pub(crate) enum SchemaOperation {
-+    /// Add a top-level column.
-+    AddColumn { field: StructField },
-+}
-+
-+/// The result of applying schema operations.
-+#[derive(Debug)]
-+pub(crate) struct SchemaEvolutionResult {
-+    /// The evolved schema after all operations are applied.
-+    pub schema: SchemaRef,
-+}
-+
-+/// Applies a sequence of schema operations to the given schema, returning the evolved schema.
-+///
-+/// Operations are applied sequentially: each one validates against and modifies the schema
-+/// produced by all preceding operations, not the original input schema.
-+///
-+/// # Errors
-+///
-+/// Returns an error if any operation fails validation. The error message identifies which
-+/// operation failed and why.
-+pub(crate) fn apply_schema_operations(
-+    schema: StructType,
-+    operations: Vec<SchemaOperation>,
-+    column_mapping_mode: ColumnMappingMode,
-+) -> DeltaResult<SchemaEvolutionResult> {
-+    let cm_enabled = column_mapping_mode != ColumnMappingMode::None;
-+    // IndexMap preserves field insertion order. Keys are lowercased for case-insensitive
-+    // duplicate detection; StructFields retain their original casing.
-+    let mut fields: IndexMap<String, StructField> = schema
-+        .into_fields()
-+        .map(|f| (f.name().to_lowercase(), f))
-+        .collect();
-+
-+    for op in operations {
-+        match op {
-+            // Protocol feature checks for the field's data type (e.g. `timestampNtz`) happen
-+            // later when the caller builds a new TableConfiguration from the evolved schema --
-+            // the alter is rejected if the table doesn't already have the required feature
-+            // enabled. This matches Spark, which also rejects with
-+            // `DELTA_FEATURES_REQUIRE_MANUAL_ENABLEMENT` and requires the user to enable the
-+            // feature explicitly before adding such a column.
-+            SchemaOperation::AddColumn { field } => {
-+                // TODO: support column mapping for add_column (assign ID + physical name,
-+                // update delta.columnMapping.maxColumnId).
-+                if cm_enabled {
-+                    return Err(Error::unsupported(
-+                        "ALTER TABLE add_column is not yet supported on tables with \
-+                         column mapping enabled",
-+                    ));
-+                }
-+                if field.is_metadata_column() {
-+                    return Err(Error::schema(format!(
-+                        "Cannot add column '{}': metadata columns are not allowed in \
-+                         a table schema",
-+                        field.name()
-+                    )));
-+                }
-+                let key = field.name().to_lowercase();
-+                if fields.contains_key(&key) {
-+                    return Err(Error::schema(format!(
-+                        "Cannot add column '{}': a column with that name already exists",
-+                        field.name()
-+                    )));
-+                }
-+                // Validate field is nullable (Delta protocol requires added columns to be
-+                // nullable so existing data files can return NULL for the new column)
-+                // NOTE: non-nullable columns depend on invariants feature
-+                if !field.is_nullable() {
-+                    return Err(Error::schema(format!(
-+                        "Cannot add non-nullable column '{}'. Added columns must be nullable \
-+                         because existing data files do not contain this column.",
-+                        field.name()
-+                    )));
-+                }
-+                fields.insert(key, field);
-+            }
-+        }
-+    }
-+
-+    let evolved_schema = StructType::try_new(fields.into_values())?;
-+
-+    validate_schema(&evolved_schema, column_mapping_mode)?;
-+    Ok(SchemaEvolutionResult {
-+        schema: evolved_schema.into(),
-+    })
-+}
-+
-+#[cfg(test)]
-+mod tests {
-+    use rstest::rstest;
-+
-+    use super::*;
-+    use crate::schema::{DataType, MetadataColumnSpec, StructField, StructType};
-+
-+    fn simple_schema() -> StructType {
-+        StructType::try_new(vec![
-+            StructField::not_null("id", DataType::INTEGER),
-+            StructField::nullable("name", DataType::STRING),
-+        ])
-+        .unwrap()
-+    }
-+
-+    fn add_col(name: &str, nullable: bool) -> SchemaOperation {
-+        let field = if nullable {
-+            StructField::nullable(name, DataType::STRING)
-+        } else {
-+            StructField::not_null(name, DataType::STRING)
-+        };
-+        SchemaOperation::AddColumn { field }
-+    }
-+
-+    // Builds a struct column whose nested leaf field has the given name. Used to prove that
-+    // `validate_schema` (not just the top-level dup check or `StructType::try_new`) is
-+    // reached from `apply_schema_operations`.
-+    fn add_struct_with_nested_leaf(name: &str, leaf_name: &str) -> SchemaOperation {
-+        let inner =
-+            StructType::try_new(vec![StructField::nullable(leaf_name, DataType::STRING)]).unwrap();
-+        SchemaOperation::AddColumn {
-+            field: StructField::nullable(name, inner),
-+        }
-+    }
-+
-+    #[rstest]
-+    #[case::dup_exact(vec![add_col("name", true)], "already exists")]
-+    #[case::dup_case_insensitive(vec![add_col("Name", true)], "already exists")]
-+    #[case::dup_within_batch(
-+        vec![add_col("email", true), add_col("email", true)],
-+        "already exists"
-+    )]
-+    #[case::non_nullable(vec![add_col("age", false)], "non-nullable")]
-+    #[case::invalid_parquet_char(vec![add_col("foo,bar", true)], "invalid character")]
-+    #[case::nested_invalid_parquet_char(
-+        vec![add_struct_with_nested_leaf("addr", "bad,leaf")],
-+        "invalid character"
-+    )]
-+    #[case::metadata_column(
-+        vec![SchemaOperation::AddColumn {
-+            field: StructField::create_metadata_column("row_idx", MetadataColumnSpec::RowIndex),
-+        }],
-+        "metadata columns are not allowed"
-+    )]
-+    fn apply_schema_operations_rejects(
-+        #[case] ops: Vec<SchemaOperation>,
-+        #[case] error_contains: &str,
-+    ) {
-+        let err =
-+            apply_schema_operations(simple_schema(), ops, ColumnMappingMode::None).unwrap_err();
-+        assert!(err.to_string().contains(error_contains));
-+    }
-+
-+    #[rstest]
-+    #[case::single(vec![add_col("email", true)], &["id", "name", "email"])]
-+    #[case::multiple(
-+        vec![add_col("email", true), add_col("age", true)],
-+        &["id", "name", "email", "age"]
-+    )]
-+    fn apply_schema_operations_succeeds(
-+        #[case] ops: Vec<SchemaOperation>,
-+        #[case] expected_names: &[&str],
-+    ) {
-+        let result =
-+            apply_schema_operations(simple_schema(), ops, ColumnMappingMode::None).unwrap();
-+        let actual: Vec<&str> = result.schema.fields().map(|f| f.name().as_str()).collect();
-+        assert_eq!(&actual, expected_names);
-+    }
-+}
\ No newline at end of file
kernel/tests/README.md
@@ -1,31 +0,0 @@
-diff --git a/kernel/tests/README.md b/kernel/tests/README.md
---- a/kernel/tests/README.md
-+++ b/kernel/tests/README.md
- 
- | Table | Location | Schema | Protocol (R/W) | Features | Description | Tests |
- |-------|----------|--------|----------|----------|-------------|-------|
--| `table-with-dv-small` | data/ | `value: int` | v3/v7 | r:`deletionVectors` w:`deletionVectors` | 10 rows, 2 soft-deleted by DV, 8 visible. Most heavily referenced test table. | `dv.rs::test_table_scan(with_dv)`, `write.rs::test_remove_files_adds_expected_entries`, `write.rs::test_update_deletion_vectors_adds_expected_entries`, `read.rs::with_predicate_and_removes`, `path.rs::test_to_uri/test_child/test_child_escapes`, `snapshot.rs::test_snapshot_read_metadata/test_new_snapshot/test_snapshot_new_from/test_read_table_with_missing_last_checkpoint/test_log_compaction_writer`, `deletion_vector.rs` tests, `transaction/mod.rs::setup_dv_enabled_table/test_add_files_schema/test_new_deletion_vector_path`, `default/parquet.rs` read test, `default/json.rs` read test, `log_compaction/tests.rs::create_mock_snapshot`, `resolve_dvs.rs` tests |
-+| `table-with-dv-small` | data/ | `value: int` | v3/v7 | r:`deletionVectors` w:`deletionVectors` | 10 rows, 2 soft-deleted by DV, 8 visible. Most heavily referenced test table. | `dv.rs::test_table_scan(with_dv)`, `write_remove_dv.rs::test_remove_files_adds_expected_entries`, `write_remove_dv.rs::test_update_deletion_vectors_adds_expected_entries`, `read.rs::with_predicate_and_removes`, `path.rs::test_to_uri/test_child/test_child_escapes`, `snapshot.rs::test_snapshot_read_metadata/test_new_snapshot/test_snapshot_new_from/test_read_table_with_missing_last_checkpoint/test_log_compaction_writer`, `deletion_vector.rs` tests, `transaction/mod.rs::setup_dv_enabled_table/test_add_files_schema/test_new_deletion_vector_path`, `default/parquet.rs` read test, `default/json.rs` read test, `log_compaction/tests.rs::create_mock_snapshot`, `resolve_dvs.rs` tests |
- | `table-without-dv-small` | data/ | `value: long` | v1/v2 | | 10 rows, all visible. Companion to table-with-dv-small. | `dv.rs::test_table_scan(without_dv)`, `transaction/mod.rs::setup_non_dv_table/create_existing_table_txn/test_commit_io_error_returns_retryable_transaction`, `sequential_phase.rs::test_sequential_v2_with_commits_only/test_sequential_finish_before_exhaustion_error`, `parallel_phase.rs` tests, `scan/tests.rs::test_scan_metadata_paths/test_scan_metadata/test_scan_metadata_from_same_version` |
- | `with-short-dv` | data/ | `id: long, value: string, timestamp: timestamp, rand: double` | v3/v7 | r:`deletionVectors` w:`deletionVectors` | 2 files x 5 rows. First file has inline DV (`storageType="u"`) deleting 3 rows. | `read.rs::short_dv` |
- | `dv-partitioned-with-checkpoint` | golden_data/ | `value: int, part: int` partitioned by `part` | v3/v7 | r:`deletionVectors` w:`deletionVectors` | DVs on a partitioned table with a checkpoint | `golden_tables.rs::golden_test!` |
- 
- | Table | Location | Schema | Protocol (R/W) | Features | Description | Tests |
- |-------|----------|--------|----------|----------|-------------|-------|
--| `partition_cm/none` | data/ | `value: int, category: string` partitioned by `category` | v1/v1 | `columnMapping.mode=none` | Partitioned write with CM disabled | `write.rs::test_column_mapping_partitioned_write(cm_none)` |
--| `partition_cm/id` | data/ | `value: int, category: string` partitioned by `category` | v3/v7 | r:`columnMapping` w:`columnMapping`, `columnMapping.mode=id` | Partitioned write with CM id mode | `write.rs::test_column_mapping_partitioned_write(cm_id)` |
--| `partition_cm/name` | data/ | `value: int, category: string` partitioned by `category` | v3/v7 | r:`columnMapping` w:`columnMapping`, `columnMapping.mode=name` | Partitioned write with CM name mode | `write.rs::test_column_mapping_partitioned_write(cm_name)` |
-+| `partition_cm/none` | data/ | `value: int, category: string` partitioned by `category` | v1/v1 | `columnMapping.mode=none` | Partitioned write with CM disabled | `write_column_mapping.rs::test_column_mapping_partitioned_write(cm_none)` |
-+| `partition_cm/id` | data/ | `value: int, category: string` partitioned by `category` | v3/v7 | r:`columnMapping` w:`columnMapping`, `columnMapping.mode=id` | Partitioned write with CM id mode | `write_column_mapping.rs::test_column_mapping_partitioned_write(cm_id)` |
-+| `partition_cm/name` | data/ | `value: int, category: string` partitioned by `category` | v3/v7 | r:`columnMapping` w:`columnMapping`, `columnMapping.mode=name` | Partitioned write with CM name mode | `write_column_mapping.rs::test_column_mapping_partitioned_write(cm_name)` |
- | `table-with-columnmapping-mode-name` | golden_data/ | `ByteType: byte, ShortType: short, IntegerType: int, LongType: long, FloatType: float, DoubleType: double, decimal: decimal(10,2), BooleanType: boolean, StringType: string, BinaryType: binary, DateType: date, TimestampType: timestamp, nested_struct: struct{aa: string, ac: struct{aca: int}}, array_of_prims: array<int>, array_of_arrays: array<array<int>>, array_of_structs: array<struct{ab: long}>, map_of_prims: map<int,long>, map_of_rows: map<int,struct{ab: long}>, map_of_arrays: map<long,array<int>>` | v2/v5 | `columnMapping.mode=name` | Column mapping name mode | `golden_tables.rs::golden_test!` |
- | `table-with-columnmapping-mode-id` | golden_data/ | `ByteType: byte, ShortType: short, IntegerType: int, LongType: long, FloatType: float, DoubleType: double, decimal: decimal(10,2), BooleanType: boolean, StringType: string, BinaryType: binary, DateType: date, TimestampType: timestamp, nested_struct: struct{aa: string, ac: struct{aca: int}}, array_of_prims: array<int>, array_of_arrays: array<array<int>>, array_of_structs: array<struct{ab: long}>, map_of_prims: map<int,long>, map_of_rows: map<int,struct{ab: long}>, map_of_arrays: map<long,array<int>>` | v2/v5 | `columnMapping.mode=id` | Column mapping id mode | `golden_tables.rs::golden_test!` |
- 
- | Table | Location | Schema | Protocol (R/W) | Features | Description | Tests |
- |-------|----------|--------|----------|----------|-------------|-------|
- | `with_checkpoint_no_last_checkpoint` | data/ | `letter: string, int: long, date: date` | v1/v2 | `checkpointInterval=2` | Checkpoint at v2 but missing `_last_checkpoint` hint file | `snapshot.rs::test_read_table_with_checkpoint`, `scan/tests.rs::test_scan_with_checkpoint`, `sequential_phase.rs::test_sequential_checkpoint_no_commits`, `checkpoint_manifest.rs` tests, `sync/parquet.rs` test, `default/parquet.rs` test |
--| `external-table-different-nullability` | data/ | `i: int` | v1/v2 | `checkpointInterval=2` | Parquet files have different nullability than Delta schema; includes checkpoint | `write.rs::test_checkpoint_non_kernel_written_table` |
-+| `external-table-different-nullability` | data/ | `i: int` | v1/v2 | `checkpointInterval=2` | Parquet files have different nullability than Delta schema; includes checkpoint | `write_clustered.rs::test_checkpoint_non_kernel_written_table` |
- | `checkpoint` | golden_data/ | `intCol: int` | v1/v2 | | Basic checkpoint read | `golden_tables.rs::golden_test!(checkpoint_test)` |
- | `corrupted-last-checkpoint-kernel` | golden_data/ | `id: long` | v1/v2 | | Corrupted `_last_checkpoint` file | `golden_tables.rs::golden_test!` |
- | `multi-part-checkpoint` | golden_data/ | `id: long` | v1/v2 | `checkpointInterval=1` | Multi-part checkpoint files | `golden_tables.rs::golden_test!` |
\ No newline at end of file

... (truncated, output exceeded 60000 bytes)

Reproduce locally: git range-diff ac9dc19..aac9b0a 6486bd2..97281f0 | Disable: git config gitstack.push-range-diff false

Comment thread kernel/src/schema/mod.rs
"geography" => Ok(PrimitiveType::Geography(Box::default())),
geo_str if geo_str.starts_with("geography(") && geo_str.ends_with(')') => {
let inner = &geo_str[10..geo_str.len() - 1];
// Three accepted shapes:
Copy link
Copy Markdown
Collaborator Author

@lorenarosati lorenarosati Apr 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Java kernel deserializes the same cases:

  • geometry
  • geography
  • geometry with an SRID in brackets
  • geography with SRID and alg in brackets
  • geography with SRID in brackets
  • geography with alg in brackets (differentiated from just SRID case by a regex pattern that checks for a colon)

See https://github.com/delta-io/delta/blob/233842fbd93521703a66dd636dc12325fa0f5513/kernel/kernel-api/src/main/java/io/delta/kernel/internal/types/DataTypeJsonSerDe.java#L442

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder what's the case for

geography with SRID in brackets
geography with alg in brackets (differentiated from just SRID case by a regex pattern that checks for a colon)

As we only serialize to geography(srid, alg). Is java-kernel trying to be compatible with some other impl? If so, a comment would be very helpful!

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good callout! I can leave this as a TODO to look into but this should be non-blocking for this PR

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add a short comment just stating this is following the convention from kernel-java. So that if we re-visit it in the future we can know.

Copy link
Copy Markdown
Collaborator

@dengsh12 dengsh12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Several comments/questions. Also, could we add integration tests to validate the write-and-read round path?

Comment thread ffi/src/expressions/kernel_visitor.rs
Comment thread kernel/src/expressions/scalars.rs Outdated
_ => unreachable!(),
}
}
// Geometry/Geography are not valid partition column types, so there is no
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wonder if we want to detect this when we

  1. Create a transaction
  2. Create a table
    ?

Comment thread kernel/src/expressions/scalars.rs Outdated
Comment thread kernel/src/schema/mod.rs
Comment thread kernel/src/schema/mod.rs
"geography" => Ok(PrimitiveType::Geography(Box::default())),
geo_str if geo_str.starts_with("geography(") && geo_str.ends_with(')') => {
let inner = &geo_str[10..geo_str.len() - 1];
// Three accepted shapes:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder what's the case for

geography with SRID in brackets
geography with alg in brackets (differentiated from just SRID case by a regex pattern that checks for a colon)

As we only serialize to geography(srid, alg). Is java-kernel trying to be compatible with some other impl? If so, a comment would be very helpful!

Comment thread kernel/src/schema/mod.rs Outdated
Comment thread kernel/src/schema/mod.rs
Comment on lines +2452 to +2500
#[test]
fn test_roundtrip_geometry() {
let data = r#"
{
"name": "g",
"type": "geometry(EPSG:4326)",
"nullable": true,
"metadata": {}
}
"#;
let field: StructField = serde_json::from_str(data).unwrap();
assert_eq!(
field.data_type,
DataType::Primitive(PrimitiveType::Geometry(Box::new(
GeometryType::try_new("EPSG:4326").unwrap()
)))
);

let json_str = serde_json::to_string(&field).unwrap();
assert_eq!(
json_str,
r#"{"name":"g","type":"geometry(EPSG:4326)","nullable":true,"metadata":{}}"#
);
}

#[test]
fn test_roundtrip_geography() {
let data = r#"
{
"name": "g",
"type": "geography(EPSG:4326, vincenty)",
"nullable": true,
"metadata": {}
}
"#;
let field: StructField = serde_json::from_str(data).unwrap();
assert_eq!(
field.data_type,
DataType::Primitive(PrimitiveType::Geography(Box::new(
GeographyType::try_new("EPSG:4326", EdgeInterpolationAlgorithm::Vincenty).unwrap()
)))
);

let json_str = serde_json::to_string(&field).unwrap();
assert_eq!(
json_str,
r#"{"name":"g","type":"geography(EPSG:4326, vincenty)","nullable":true,"metadata":{}}"#
);
}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: Seems these two can be merged into test_geo_deserialize_defaults? We can rename test_geo_deserialize_defaults to test_geo_deserialize_succeed and assert the exact deserialized value for all cases

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. We can check round trip there

Comment thread kernel/src/transaction/stats_verifier.rs
@lorenarosati
Copy link
Copy Markdown
Collaborator Author

Range-diff: main (97281f0 -> 7546790)
kernel/src/schema/mod.rs
@@ -52,12 +52,17 @@
 +}
 +
 +impl GeometryType {
-+    pub fn try_new(srid: impl Into<String>) -> DeltaResult<Self> {
-+        let srid = srid.into();
++    /// Constructs a [`GeometryType`] from the given SRID. Use [`GeometryType::default`] to
++    /// build with [`DEFAULT_GEO_SRID`] (`OGC:CRS84`).
++    ///
++    /// Returns `Err` if `srid` is empty.
++    pub fn try_new(srid: &str) -> DeltaResult<Self> {
 +        if srid.is_empty() {
 +            return Err(Error::invalid_geometry("SRID cannot be empty"));
 +        }
-+        Ok(Self { srid })
++        Ok(Self {
++            srid: srid.to_string(),
++        })
 +    }
 +
 +    pub fn srid(&self) -> &str {
@@ -87,23 +92,26 @@
 +}
 +
 +impl GeographyType {
++    /// Constructs a GeographyType. Pass `None` for either argument to use the default:
++    /// SRID defaults to DEFAULT_GEO_SRID (`OGC:CRS84`); algorithm defaults to
++    /// EdgeInterpolationAlgorithm::Spherical.
++    ///
++    /// Returns `Err` if `srid` is `Some("")` (empty string is not a valid SRID).
 +    pub fn try_new(
-+        srid: impl Into<String>,
-+        algorithm: EdgeInterpolationAlgorithm,
++        srid: Option<&str>,
++        algorithm: Option<EdgeInterpolationAlgorithm>,
 +    ) -> DeltaResult<Self> {
-+        let srid = srid.into();
-+        if srid.is_empty() {
-+            return Err(Error::invalid_geography("SRID cannot be empty"));
-+        }
++        let srid = match srid {
++            None => DEFAULT_GEO_SRID.to_string(),
++            Some(s) => {
++                if s.is_empty() {
++                    return Err(Error::invalid_geography("SRID cannot be empty"));
++                }
++                s.to_string()
++            }
++        };
++        let algorithm = algorithm.unwrap_or(EdgeInterpolationAlgorithm::Spherical);
 +        Ok(Self { srid, algorithm })
-+    }
-+
-+    pub fn try_new_with_srid(srid: impl Into<String>) -> DeltaResult<Self> {
-+        Self::try_new(srid, EdgeInterpolationAlgorithm::Spherical)
-+    }
-+
-+    pub fn try_new_with_algorithm(algorithm: EdgeInterpolationAlgorithm) -> DeltaResult<Self> {
-+        Self::try_new(DEFAULT_GEO_SRID, algorithm)
 +    }
 +
 +    pub fn srid(&self) -> &str {
@@ -187,7 +195,7 @@
 +                        let algo_str = inner[pos + 1..].trim();
 +                        let algorithm: EdgeInterpolationAlgorithm =
 +                            algo_str.parse().map_err(serde::de::Error::custom)?;
-+                        GeographyType::try_new(srid, algorithm)
++                        GeographyType::try_new(Some(srid), Some(algorithm))
 +                            .map(Box::new)
 +                            .map(PrimitiveType::Geography)
 +                            .map_err(serde::de::Error::custom)
@@ -195,14 +203,14 @@
 +                    None => {
 +                        let trimmed = inner.trim();
 +                        if trimmed.contains(':') {
-+                            GeographyType::try_new_with_srid(trimmed)
++                            GeographyType::try_new(Some(trimmed), None)
 +                                .map(Box::new)
 +                                .map(PrimitiveType::Geography)
 +                                .map_err(serde::de::Error::custom)
 +                        } else {
 +                            let algorithm: EdgeInterpolationAlgorithm =
 +                                trimmed.parse().map_err(serde::de::Error::custom)?;
-+                            GeographyType::try_new_with_algorithm(algorithm)
++                            GeographyType::try_new(None, Some(algorithm))
 +                                .map(Box::new)
 +                                .map(PrimitiveType::Geography)
 +                                .map_err(serde::de::Error::custom)
@@ -289,7 +297,11 @@
 +        assert_eq!(
 +            field.data_type,
 +            DataType::Primitive(PrimitiveType::Geography(Box::new(
-+                GeographyType::try_new("EPSG:4326", EdgeInterpolationAlgorithm::Vincenty).unwrap()
++                GeographyType::try_new(
++                    Some("EPSG:4326"),
++                    Some(EdgeInterpolationAlgorithm::Vincenty)
++                )
++                .unwrap()
 +            )))
 +        );
 +
@@ -315,18 +327,21 @@
 +    )]
 +    #[case(
 +        "geography(EPSG:4326)",
-+        PrimitiveType::Geography(Box::new(GeographyType::try_new_with_srid("EPSG:4326").unwrap()))
++        PrimitiveType::Geography(Box::new(
++            GeographyType::try_new(Some("EPSG:4326"), None).unwrap()
++        ))
 +    )]
 +    #[case(
 +        "geography(EPSG:4326, vincenty)",
 +        PrimitiveType::Geography(Box::new(
-+            GeographyType::try_new("EPSG:4326", EdgeInterpolationAlgorithm::Vincenty).unwrap()
++            GeographyType::try_new(Some("EPSG:4326"), Some(EdgeInterpolationAlgorithm::Vincenty))
++                .unwrap()
 +        ))
 +    )]
 +    #[case(
 +        "geography(vincenty)",
 +        PrimitiveType::Geography(Box::new(
-+            GeographyType::try_new_with_algorithm(EdgeInterpolationAlgorithm::Vincenty).unwrap()
++            GeographyType::try_new(None, Some(EdgeInterpolationAlgorithm::Vincenty)).unwrap()
 +        ))
 +    )]
 +    fn test_geo_deserialize_defaults(#[case] type_str: &str, #[case] expected: PrimitiveType) {
@@ -341,6 +356,7 @@
 +        "Unknown edge interpolation algorithm"
 +    )]
 +    #[case("geography(EPSG:4326,)", "Unknown edge interpolation algorithm")]
++    #[case("geography(unknown_algo)", "Unknown edge interpolation algorithm")]
 +    #[case("geometry(EPSG:4326", "Unsupported Delta table type")]
 +    #[case("geographyz", "Unsupported Delta table type")]
 +    #[case("geometry()", "SRID cannot be empty")]
kernel/src/table_configuration.rs
@@ -9,38 +9,6 @@
      validate_timestamp_ntz_feature_support, ColumnMappingMode, EnablementCheck, FeatureRequirement,
      FeatureType, KernelSupport, Operation, TableFeature, LEGACY_READER_FEATURES,
      LEGACY_WRITER_FEATURES, MAX_VALID_READER_VERSION, MAX_VALID_WRITER_VERSION,
-         version: Version,
-     ) -> DeltaResult<Self> {
-         let logical_schema = Arc::new(metadata.parse_schema()?);
-+        Self::try_new_inner(metadata, protocol, table_root, version, logical_schema)
-+    }
-+
-+    /// Like [`try_new`](Self::try_new), but reuses `base`'s protocol, table root, and version
-+    /// and takes a pre-parsed `logical_schema`.
-+    pub(crate) fn try_new_with_schema(
-+        base: &Self,
-+        metadata: Metadata,
-+        logical_schema: SchemaRef,
-+    ) -> DeltaResult<Self> {
-+        Self::try_new_inner(
-+            metadata,
-+            base.protocol.clone(),
-+            base.table_root.clone(),
-+            base.version,
-+            logical_schema,
-+        )
-+    }
-+
-+    fn try_new_inner(
-+        metadata: Metadata,
-+        protocol: Protocol,
-+        table_root: Url,
-+        version: Version,
-+        logical_schema: SchemaRef,
-+    ) -> DeltaResult<Self> {
-         let table_properties = metadata.parse_table_properties();
-         let column_mapping_mode = column_mapping_mode(&protocol, &table_properties);
- 
  
          // Validate schema against protocol features now that we have a TC instance.
          validate_timestamp_ntz_feature_support(&table_config)?;
kernel/src/actions/mod.rs
@@ -1,32 +0,0 @@
-diff --git a/kernel/src/actions/mod.rs b/kernel/src/actions/mod.rs
---- a/kernel/src/actions/mod.rs
-+++ b/kernel/src/actions/mod.rs
- }
- 
- // Serde derives are needed for CRC file deserialization (see `crc::reader`).
-+//
-+// TODO(#2446): `Metadata` stores the schema only as a JSON string. Callers that already hold
-+// a parsed `SchemaRef` (e.g. CREATE TABLE) serialize into `schema_string` and then re-parse
-+// downstream in `TableConfiguration::try_new` via `parse_schema()`. Caching the parsed schema
-+// on `Metadata` would eliminate the round-trip.
- #[derive(Debug, Default, Clone, PartialEq, Eq, Serialize, Deserialize, ToSchema)]
- #[serde(rename_all = "camelCase")]
- #[internal_api]
-         TableProperties::from(self.configuration.iter())
-     }
- 
-+    /// Returns a new Metadata with the schema replaced, preserving all other fields.
-+    ///
-+    /// # Errors
-+    ///
-+    /// Returns an error if schema serialization fails.
-+    pub(crate) fn with_schema(self, schema: SchemaRef) -> DeltaResult<Self> {
-+        Ok(Self {
-+            schema_string: serde_json::to_string(&schema)?,
-+            ..self
-+        })
-+    }
-+
-     #[cfg(test)]
-     #[allow(clippy::too_many_arguments)]
-     pub(crate) fn new_unchecked(
\ No newline at end of file
kernel/src/engine/arrow_expression/evaluate_expression.rs
@@ -1,154 +0,0 @@
-diff --git a/kernel/src/engine/arrow_expression/evaluate_expression.rs b/kernel/src/engine/arrow_expression/evaluate_expression.rs
---- a/kernel/src/engine/arrow_expression/evaluate_expression.rs
-+++ b/kernel/src/engine/arrow_expression/evaluate_expression.rs
-         (Literal(scalar), _) => {
-             validate_array_type(scalar.to_array(batch.num_rows())?, result_type)
-         }
--        (Column(name), _) => {
--            // Column extraction uses ordinal-based struct validation because column mapping
--            // can cause physical/logical name mismatches. apply_schema handles renaming.
--            let arr = extract_column(batch, name)?;
--            if let Some(expected) = result_type {
--                ensure_data_types(expected, arr.data_type(), ValidationMode::TypesOnly)?;
--            }
--            Ok(arr)
--        }
-+        (Column(name), _) => validate_array_type(extract_column(batch, name)?, result_type),
-         (Struct(fields, nullability), Some(DataType::Struct(output_schema))) => {
-             evaluate_struct_expression(fields, batch, output_schema, nullability.as_ref())
-         }
-     }
- 
-     #[test]
--    fn column_extract_struct_with_mismatched_field_names() {
-+    fn column_extract_struct_rejects_mismatched_field_names() {
-         let batch = make_struct_batch(
-             vec![
-                 ArrowField::new("col-abc-001", ArrowDataType::Int64, true),
-             ],
-         );
- 
--        // Logical names differ from physical names due to column mapping
-         let logical_type = DataType::try_struct_type([
-             StructField::nullable("my_column", DataType::LONG),
-             StructField::nullable("other_column", DataType::LONG),
- 
-         let expr = column_expr!("stats");
-         let result = evaluate_expression(&expr, &batch, Some(&logical_type));
--
--        // Ordinal-based validation passes: same field count and types by position.
--        // The downstream apply_schema transformation handles renaming.
--        let arr = result.expect("should succeed with mismatched names but matching types");
--        let struct_arr = arr.as_any().downcast_ref::<StructArray>().unwrap();
--        assert_eq!(struct_arr.num_columns(), 2);
--        assert_eq!(struct_arr.len(), 2);
--    }
--
--    #[test]
--    fn column_extract_struct_rejects_mismatched_field_count() {
--        let batch = make_struct_batch(
--            vec![ArrowField::new("col-abc-001", ArrowDataType::Int64, true)],
--            vec![Arc::new(Int64Array::from(vec![Some(1), Some(2)]))],
--        );
--
--        let logical_type = DataType::try_struct_type([
--            StructField::nullable("a", DataType::LONG),
--            StructField::nullable("b", DataType::LONG),
--        ])
--        .unwrap();
--
--        let expr = column_expr!("stats");
--        let result = evaluate_expression(&expr, &batch, Some(&logical_type));
--        assert_result_error_with_message(result, "Struct field count mismatch");
-+        assert_result_error_with_message(result, "Missing Struct fields");
-     }
- 
-     #[test]
-     fn column_extract_struct_rejects_mismatched_child_types() {
-         let batch = make_struct_batch(
-             vec![
--                ArrowField::new("col-abc-001", ArrowDataType::Int64, true),
--                ArrowField::new("col-abc-002", ArrowDataType::Utf8, true),
-+                ArrowField::new("a", ArrowDataType::Int64, true),
-+                ArrowField::new("b", ArrowDataType::Utf8, true),
-             ],
-             vec![
-                 Arc::new(Int64Array::from(vec![Some(1)])),
-             ],
-         );
- 
--        // Expect two LONG columns, but the second arrow field is Utf8
-         let logical_type = DataType::try_struct_type([
-             StructField::nullable("a", DataType::LONG),
-             StructField::nullable("b", DataType::LONG),
-     }
- 
-     #[test]
--    fn column_extract_struct_with_matching_names_still_works() {
-+    fn column_extract_struct_with_matching_names_works() {
-         let batch = make_struct_batch(
-             vec![
-                 ArrowField::new("a", ArrowDataType::Int64, true),
-         assert!(result.is_ok());
-     }
- 
--    /// Exercises the exact code path from `get_add_transform_expr` where a `struct_from`
--    /// expression wraps `column_expr!("add.stats_parsed")`. When the checkpoint parquet has
--    /// stats_parsed with physical column names (e.g. `col-abc-001`) but the output schema
--    /// uses logical names (e.g. `id`), `evaluate_struct_expression` calls
--    /// `evaluate_expression(Column, struct_result_type)` with mismatched field names.
--    /// Without ordinal-based validation this fails with a name mismatch error.
-+    /// When a `struct_from` expression wraps a `Column` referencing stats_parsed, and the
-+    /// checkpoint parquet has physical column names (e.g. `col-abc-001`) but the output schema
-+    /// uses logical names (e.g. `id`), name-based validation correctly rejects the mismatch.
-     #[test]
--    fn struct_from_with_column_tolerates_nested_name_mismatch() {
--        // Build a batch mimicking checkpoint data: add.stats_parsed uses physical names
-+    fn struct_from_with_column_rejects_nested_name_mismatch() {
-         let stats_fields: Vec<ArrowField> = vec![
-             ArrowField::new("col-abc-001", ArrowDataType::Int64, true),
-             ArrowField::new("col-abc-002", ArrowDataType::Int64, true),
-         )]);
-         let batch = RecordBatch::try_new(Arc::new(schema), vec![Arc::new(add_struct)]).unwrap();
- 
--        // struct_from mimicking get_add_transform_expr: wraps a Column referencing stats_parsed
-         let expr = Expr::struct_from([
-             column_expr_ref!("add.path"),
-             column_expr_ref!("add.stats_parsed"),
-         .unwrap();
- 
-         let result = evaluate_expression(&expr, &batch, Some(&output_type));
--        result.expect("struct_from with Column sub-expression should tolerate field name mismatch");
--    }
--
--    #[test]
--    fn column_extract_nested_struct_with_mismatched_names() {
--        let inner_fields = vec![ArrowField::new("phys-inner", ArrowDataType::Int64, true)];
--        let inner_struct = ArrowDataType::Struct(inner_fields.clone().into());
--        let batch = make_struct_batch(
--            vec![ArrowField::new("phys-outer", inner_struct, true)],
--            vec![Arc::new(
--                StructArray::try_new(
--                    inner_fields.into(),
--                    vec![Arc::new(Int64Array::from(vec![Some(42)]))],
--                    None,
--                )
--                .unwrap(),
--            )],
--        );
--
--        let logical_type = DataType::try_struct_type([StructField::nullable(
--            "logical_outer",
--            DataType::struct_type_unchecked([StructField::nullable(
--                "logical_inner",
--                DataType::LONG,
--            )]),
--        )])
--        .unwrap();
--
--        let expr = column_expr!("stats");
--        let result = evaluate_expression(&expr, &batch, Some(&logical_type));
--        assert!(result.is_ok());
-+        assert_result_error_with_message(result, "Missing Struct fields");
-     }
- }
\ No newline at end of file
kernel/src/engine/ensure_data_types.rs
@@ -1,13 +0,0 @@
-diff --git a/kernel/src/engine/ensure_data_types.rs b/kernel/src/engine/ensure_data_types.rs
---- a/kernel/src/engine/ensure_data_types.rs
-+++ b/kernel/src/engine/ensure_data_types.rs
- #[internal_api]
- pub(crate) enum ValidationMode {
-     /// Check types only. Struct fields are matched by ordinal position, not by name.
--    /// Nullability and metadata are not checked. Used by the expression evaluator where
--    /// column mapping can cause physical/logical name mismatches.
-+    /// Nullability and metadata are not checked.
-+    #[allow(dead_code)]
-     TypesOnly,
-     /// Check types and match struct fields by name, but skip nullability and metadata.
-     /// Used by the parquet reader where fields are already resolved by name upstream.
\ No newline at end of file
kernel/src/schema/validation.rs
@@ -1,48 +0,0 @@
-diff --git a/kernel/src/schema/validation.rs b/kernel/src/schema/validation.rs
---- a/kernel/src/schema/validation.rs
-+++ b/kernel/src/schema/validation.rs
--//! Schema validation utilities for Delta table creation.
-+//! Schema validation utilities shared by table creation and schema evolution.
- //!
- //! Validates schemas per the Delta protocol specification.
- 
- /// These characters have special meaning in Parquet schema syntax.
- const INVALID_PARQUET_CHARS: &[char] = &[' ', ',', ';', '{', '}', '(', ')', '\n', '\t', '='];
- 
--/// Validates a schema for table creation.
-+/// Validates a schema for CREATE TABLE or ALTER TABLE.
- ///
- /// Performs the following checks:
- /// 1. Schema is non-empty
- /// 3. Column names contain only valid characters
- /// 4. Rejects fields with `delta.invariants` metadata (SQL expression invariants are not supported
- ///    by kernel; see `TableConfiguration::ensure_write_supported`)
--pub(crate) fn validate_schema_for_create(
-+pub(crate) fn validate_schema(
-     schema: &StructType,
-     column_mapping_mode: ColumnMappingMode,
- ) -> DeltaResult<()> {
-     #[case::dot_in_name_with_cm(schema_with_dot(), ColumnMappingMode::Name)]
-     #[case::different_struct_children(schema_different_struct_children(), ColumnMappingMode::None)]
-     fn valid_schema_accepted(#[case] schema: StructType, #[case] cm: ColumnMappingMode) {
--        assert!(validate_schema_for_create(&schema, cm).is_ok());
-+        assert!(validate_schema(&schema, cm).is_ok());
-     }
- 
-     // === Invalid schemas ===
-         #[case] cm: ColumnMappingMode,
-         #[case] expected_errs: &[&str],
-     ) {
--        let result = validate_schema_for_create(&schema, cm);
-+        let result = validate_schema(&schema, cm);
-         assert!(result.is_err());
-         let err = result.unwrap_err().to_string();
-         for expected in expected_errs {
-     #[case::array_nested(schema_array_nested_invariant(), "arr.child")]
-     #[case::map_nested(schema_map_nested_invariant(), "map.child")]
-     fn invariants_metadata_rejected(#[case] schema: StructType, #[case] expected_path: &str) {
--        let result = validate_schema_for_create(&schema, ColumnMappingMode::None);
-+        let result = validate_schema(&schema, ColumnMappingMode::None);
-         let err = result.expect_err("expected delta.invariants metadata rejection");
-         let msg = err.to_string();
-         assert!(
\ No newline at end of file
kernel/src/snapshot/mod.rs
@@ -1,27 +0,0 @@
-diff --git a/kernel/src/snapshot/mod.rs b/kernel/src/snapshot/mod.rs
---- a/kernel/src/snapshot/mod.rs
-+++ b/kernel/src/snapshot/mod.rs
- use crate::table_configuration::{InCommitTimestampEnablement, TableConfiguration};
- use crate::table_features::{physical_to_logical_column_name, ColumnMappingMode, TableFeature};
- use crate::table_properties::TableProperties;
-+use crate::transaction::builder::alter_table::AlterTableTransactionBuilder;
- use crate::transaction::Transaction;
- use crate::utils::require;
- use crate::{DeltaResult, Engine, Error, LogCompactionWriter, Version};
-         Transaction::try_new_existing_table(self, committer, engine)
-     }
- 
-+    /// Creates a builder for altering this table's metadata. Currently supports schema change
-+    /// operations.
-+    ///
-+    /// The returned builder allows chaining operations before building an
-+    /// [`AlterTableTransaction`] that can be committed.
-+    ///
-+    /// [`AlterTableTransaction`]: crate::transaction::AlterTableTransaction
-+    pub fn alter_table(self: Arc<Self>) -> AlterTableTransactionBuilder {
-+        AlterTableTransactionBuilder::new(self)
-+    }
-+
-     /// Fetch the latest version of the provided `application_id` for this snapshot. Filters the
-     /// txn based on the delta.setTransactionRetentionDuration property and lastUpdated.
-     ///
\ No newline at end of file
kernel/src/transaction/alter_table.rs
@@ -1,81 +0,0 @@
-diff --git a/kernel/src/transaction/alter_table.rs b/kernel/src/transaction/alter_table.rs
-new file mode 100644
---- /dev/null
-+++ b/kernel/src/transaction/alter_table.rs
-+//! Alter table transaction types and constructor.
-+//!
-+//! This module defines the [`AlterTableTransaction`] type alias and the
-+//! [`try_new_alter_table`](AlterTableTransaction::try_new_alter_table) constructor.
-+//! The builder logic lives in [`builder::alter_table`](super::builder::alter_table).
-+
-+#![allow(unreachable_pub)]
-+
-+use std::marker::PhantomData;
-+use std::sync::OnceLock;
-+
-+use crate::committer::Committer;
-+use crate::snapshot::SnapshotRef;
-+use crate::table_configuration::TableConfiguration;
-+use crate::transaction::{AlterTable, Transaction};
-+use crate::utils::current_time_ms;
-+use crate::DeltaResult;
-+
-+/// A type alias for alter-table transactions.
-+///
-+/// This provides a restricted API surface that only exposes operations valid during ALTER
-+/// commands. Data file operations are not available at compile time because `AlterTable`
-+/// does not implement [`SupportsDataFiles`](super::SupportsDataFiles).
-+pub type AlterTableTransaction = Transaction<AlterTable>;
-+
-+impl AlterTableTransaction {
-+    /// Create a new transaction for altering a table's schema. Produces a metadata-only commit
-+    /// that emits an updated Metadata action with the evolved schema.
-+    ///
-+    /// The `effective_table_config` is the evolved table configuration (new schema, same
-+    /// protocol). It must be fully validated before calling this constructor (e.g. schema
-+    /// operations applied, protocol feature checks passed). The `read_snapshot` provides the
-+    /// pre-commit table state (version, previous protocol/metadata, ICT timestamps) used for
-+    /// commit versioning and post-commit snapshots.
-+    ///
-+    /// This is typically called via `AlterTableTransactionBuilder::build()` rather than directly.
-+    pub(crate) fn try_new_alter_table(
-+        read_snapshot: SnapshotRef,
-+        effective_table_config: TableConfiguration,
-+        committer: Box<dyn Committer>,
-+    ) -> DeltaResult<Self> {
-+        let span = tracing::info_span!(
-+            "txn",
-+            path = %read_snapshot.table_root(),
-+            read_version = read_snapshot.version(),
-+            operation = "ALTER TABLE",
-+        );
-+
-+        Ok(Transaction {
-+            span,
-+            read_snapshot_opt: Some(read_snapshot),
-+            effective_table_config,
-+            should_emit_protocol: false,
-+            should_emit_metadata: true,
-+            committer,
-+            operation: Some("ALTER TABLE".to_string()),
-+            engine_info: None,
-+            add_files_metadata: vec![],
-+            remove_files_metadata: vec![],
-+            set_transactions: vec![],
-+            commit_timestamp: current_time_ms()?,
-+            user_domain_metadata_additions: vec![],
-+            system_domain_metadata_additions: vec![],
-+            user_domain_removals: vec![],
-+            data_change: false,
-+            shared_write_state: OnceLock::new(),
-+            engine_commit_info: None,
-+            // TODO(#2446): match delta-spark's per-op isBlindAppend policy
-+            // (ADD/DROP/DROP NOT NULL -> true, SET NOT NULL -> false). Hardcoded false for
-+            // now: safe, but misses the true-case optimization delta-spark applies.
-+            is_blind_append: false,
-+            dv_matched_files: vec![],
-+            physical_clustering_columns: None,
-+            _state: PhantomData,
-+        })
-+    }
-+}
\ No newline at end of file
kernel/src/transaction/builder/alter_table.rs
@@ -1,168 +0,0 @@
-diff --git a/kernel/src/transaction/builder/alter_table.rs b/kernel/src/transaction/builder/alter_table.rs
-new file mode 100644
---- /dev/null
-+++ b/kernel/src/transaction/builder/alter_table.rs
-+//! Builder for ALTER TABLE (schema evolution) transactions.
-+//!
-+//! This module contains [`AlterTableTransactionBuilder`], which uses a type-state pattern to
-+//! enforce valid operation chaining at compile time.
-+//!
-+//! # Type States
-+//!
-+//! - [`Ready`]: Initial state. Operations are available, but `build()` is not (at least one
-+//!   operation is required).
-+//! - [`Modifying`]: After any chainable schema operation. More ops can be chained, and `build()` is
-+//!   available. See [`AlterTableTransactionBuilder<Modifying>`] for ops.
-+//!
-+//! # Transitions
-+//!
-+//! Each `impl` block below is gated by a state bound and documents which operations that
-+//! state enables. Chainable schema operations live on `impl<S: Chainable>` and transition
-+//! the builder to a chainable state; `build()` lives on states that are buildable.
-+//!
-+//! ```ignore
-+//! // Allowed: at least one op queued before build().
-+//! snapshot.alter_table().add_column(field).build(engine, committer)?;
-+//!
-+//! // Not allowed: build() is not defined on Ready (no ops queued).
-+//! snapshot.alter_table().build(engine, committer)?;  // compile error
-+//! ```
-+
-+use std::marker::PhantomData;
-+use std::sync::Arc;
-+
-+use crate::committer::Committer;
-+use crate::schema::StructField;
-+use crate::snapshot::SnapshotRef;
-+use crate::table_configuration::TableConfiguration;
-+use crate::table_features::Operation;
-+use crate::transaction::alter_table::AlterTableTransaction;
-+use crate::transaction::schema_evolution::{
-+    apply_schema_operations, SchemaEvolutionResult, SchemaOperation,
-+};
-+use crate::{DeltaResult, Engine};
-+
-+/// Initial state: `build()` is not yet available (at least one operation is required).
-+/// See [`Chainable`] for the operations available on this state.
-+pub struct Ready;
-+
-+/// State after at least one operation has been added. `build()` is available.
-+/// See [`Chainable`] for the operations available on this state.
-+pub struct Modifying;
-+
-+/// Marker trait for builder states that accept chainable schema operations. Grouping states
-+/// under one bound lets each op (like `add_column`) live on a single `impl<S: Chainable>`
-+/// block -- chainable states share the body rather than duplicating it per state.
-+///
-+/// Sealed: external types cannot implement this, keeping the set of chainable states closed.
-+pub trait Chainable: sealed::Sealed {}
-+impl Chainable for Ready {}
-+impl Chainable for Modifying {}
-+
-+mod sealed {
-+    pub trait Sealed {}
-+    impl Sealed for super::Ready {}
-+    impl Sealed for super::Modifying {}
-+}
-+
-+/// Builder for constructing an [`AlterTableTransaction`] with schema evolution operations.
-+///
-+/// Uses a type-state pattern (`S`) to enforce at compile time:
-+/// - At least one schema operation must be queued before `build()` is callable.
-+/// - Only operations valid for the current state can be chained. This will disallow incompatibel
-+///   chaining.
-+pub struct AlterTableTransactionBuilder<S = Ready> {
-+    snapshot: SnapshotRef,
-+    operations: Vec<SchemaOperation>,
-+    // PhantomData marker for builder state (Ready or Modifying).
-+    // Zero-sized; only affects which methods are available at compile time.
-+    _state: PhantomData<S>,
-+}
-+
-+impl<S> AlterTableTransactionBuilder<S> {
-+    // Reconstructs the builder with a different PhantomData marker, changing which methods
-+    // are available at compile time (e.g. Ready -> Modifying enables `build()`). All real
-+    // fields are moved as-is; only the zero-sized type state changes.
-+    //
-+    // `T` (distinct from the struct's `S`) lets the caller pick the target state:
-+    // `self.transition::<Modifying>()` returns `AlterTableTransactionBuilder<Modifying>`.
-+    fn transition<T>(self) -> AlterTableTransactionBuilder<T> {
-+        AlterTableTransactionBuilder {
-+            snapshot: self.snapshot,
-+            operations: self.operations,
-+            _state: PhantomData,
-+        }
-+    }
-+}
-+
-+impl AlterTableTransactionBuilder<Ready> {
-+    /// Create a new builder from a snapshot.
-+    pub(crate) fn new(snapshot: SnapshotRef) -> Self {
-+        AlterTableTransactionBuilder {
-+            snapshot,
-+            operations: Vec::new(),
-+            _state: PhantomData,
-+        }
-+    }
-+}
-+
-+impl<S: Chainable> AlterTableTransactionBuilder<S> {
-+    /// Add a new top-level column to the table schema.
-+    ///
-+    /// The field must not already exist in the schema (case-insensitive). The field must be
-+    /// nullable because existing data files do not contain this column and will read NULL for it.
-+    /// These constraints are validated during [`build()`](AlterTableTransactionBuilder::build).
-+    pub fn add_column(mut self, field: StructField) -> AlterTableTransactionBuilder<Modifying> {
-+        self.operations.push(SchemaOperation::AddColumn { field });
-+        self.transition()
-+    }
-+}
-+
-+impl AlterTableTransactionBuilder<Modifying> {
-+    /// Validate and apply schema operations, then build the [`AlterTableTransaction`].
-+    ///
-+    /// This method:
-+    /// 1. Validates the table supports writes
-+    /// 2. Applies each operation sequentially against the evolving schema
-+    /// 3. Constructs new Metadata action with evolved schema
-+    /// 4. Builds the evolved table configuration
-+    /// 5. Creates the transaction
-+    ///
-+    /// # Errors
-+    ///
-+    /// - Any individual operation fails validation (see per-method errors above)
-+    /// - Table does not support writes (unsupported features)
-+    /// - The evolved schema requires protocol features not enabled on the table (e.g. adding a
-+    ///   `timestampNtz` column without the `timestampNtz` feature)
-+    pub fn build(
-+        self,
-+        _engine: &dyn Engine,
-+        committer: Box<dyn Committer>,
-+    ) -> DeltaResult<AlterTableTransaction> {
-+        let table_config = self.snapshot.table_configuration();
-+        // Rejects writes to tables kernel can't safely commit to: writer version out of
-+        // kernel's supported range, unsupported writer features, or schemas with SQL-expression
-+        // invariants. Runs on the pre-alter snapshot; future ALTER variants that change the
-+        // protocol must also re-check this on the evolved `TableConfiguration`.
-+        table_config.ensure_operation_supported(Operation::Write)?;
-+
-+        let schema = Arc::unwrap_or_clone(table_config.logical_schema());
-+        let SchemaEvolutionResult {
-+            schema: evolved_schema,
-+        } = apply_schema_operations(schema, self.operations, table_config.column_mapping_mode())?;
-+
-+        let evolved_metadata = table_config
-+            .metadata()
-+            .clone()
-+            .with_schema(evolved_schema.clone())?;
-+
-+        // Validates the evolved metadata against the protocol.
-+        let evolved_table_config = TableConfiguration::try_new_with_schema(
-+            table_config,
-+            evolved_metadata,
-+            evolved_schema,
-+        )?;
-+
-+        AlterTableTransaction::try_new_alter_table(self.snapshot, evolved_table_config, committer)
-+    }
-+}
\ No newline at end of file
kernel/src/transaction/builder/create_table.rs
@@ -1,27 +0,0 @@
-diff --git a/kernel/src/transaction/builder/create_table.rs b/kernel/src/transaction/builder/create_table.rs
---- a/kernel/src/transaction/builder/create_table.rs
-+++ b/kernel/src/transaction/builder/create_table.rs
- use crate::clustering::{create_clustering_domain_metadata, validate_clustering_columns};
- use crate::committer::Committer;
- use crate::expressions::ColumnName;
--use crate::schema::validation::validate_schema_for_create;
-+use crate::schema::validation::validate_schema;
- use crate::schema::variant_utils::schema_contains_variant_type;
- use crate::schema::{
-     normalize_column_names_to_schema_casing, schema_contains_non_null_fields, DataType, SchemaRef,
- /// compatible with Spark readers/writers.
- ///
- /// Explicit `delta.invariants` metadata annotations are rejected by
--/// `validate_schema_for_create`, so this only flips on the feature for nullability-driven
-+/// `validate_schema`, so this only flips on the feature for nullability-driven
- /// invariants. Kernel does not itself enforce the null mask at write time -- it relies on
- /// the engine's `ParquetHandler` to do so. Kernel's default `ParquetHandler` uses
- /// `arrow-rs`, whose `RecordBatch::try_new` rejects null values in fields marked
-             maybe_apply_column_mapping_for_table_create(&self.schema, &mut validated)?;
- 
-         // Validate schema (non-empty, column names, duplicates, no `delta.invariants` metadata)
--        validate_schema_for_create(&effective_schema, column_mapping_mode)?;
-+        validate_schema(&effective_schema, column_mapping_mode)?;
- 
-         // Validate data layout and resolve column names (physical for clustering, logical
-         // for partitioning). Adds required table features for clustering.
\ No newline at end of file
kernel/src/transaction/builder/mod.rs
@@ -1,8 +0,0 @@
-diff --git a/kernel/src/transaction/builder/mod.rs b/kernel/src/transaction/builder/mod.rs
---- a/kernel/src/transaction/builder/mod.rs
-+++ b/kernel/src/transaction/builder/mod.rs
- // and for tests. Also allow dead_code since these are used by integration tests.
- #![allow(unreachable_pub, dead_code)]
- 
-+pub mod alter_table;
- pub mod create_table;
\ No newline at end of file
kernel/src/transaction/mod.rs
@@ -1,35 +0,0 @@
-diff --git a/kernel/src/transaction/mod.rs b/kernel/src/transaction/mod.rs
---- a/kernel/src/transaction/mod.rs
-+++ b/kernel/src/transaction/mod.rs
- #[cfg(not(feature = "internal-api"))]
- pub(crate) mod data_layout;
- 
-+pub(crate) mod alter_table;
-+pub use alter_table::AlterTableTransaction;
- mod commit_info;
- mod domain_metadata;
-+pub(crate) mod schema_evolution;
- mod stats_verifier;
- mod update;
- mod write_context;
- #[derive(Debug)]
- pub struct CreateTable;
- 
-+/// Marker type for alter-table (schema evolution) transactions.
-+///
-+/// Transactions in this state perform metadata-only commits. Data file operations are not
-+/// available at compile time because `AlterTable` does not implement [`SupportsDataFiles`].
-+#[derive(Debug)]
-+pub struct AlterTable;
-+
- /// Marker trait for transaction states that support data file operations.
- ///
- /// Only transaction types that implement this trait can access methods for adding, removing, or
- 
-     // Note: Additional test coverage for partial file matching (where some files in a scan
-     // have DV updates but others don't) is provided by the end-to-end integration test
--    // kernel/tests/dv.rs and kernel/tests/write.rs, which exercises
-+    // kernel/tests/dv.rs and kernel/tests/write_remove_dv.rs, which exercise
-     // the full deletion vector write workflow including the DvMatchVisitor logic.
- 
-     #[test]
\ No newline at end of file
kernel/src/transaction/schema_evolution.rs
@@ -1,190 +0,0 @@
-diff --git a/kernel/src/transaction/schema_evolution.rs b/kernel/src/transaction/schema_evolution.rs
-new file mode 100644
---- /dev/null
-+++ b/kernel/src/transaction/schema_evolution.rs
-+//! Schema evolution operations for ALTER TABLE.
-+//!
-+//! This module defines the [`SchemaOperation`] enum and the [`apply_schema_operations`] function
-+//! that validates and applies schema changes to produce an evolved schema.
-+
-+use indexmap::IndexMap;
-+
-+use crate::error::Error;
-+use crate::schema::validation::validate_schema;
-+use crate::schema::{SchemaRef, StructField, StructType};
-+use crate::table_features::ColumnMappingMode;
-+use crate::DeltaResult;
-+
-+/// A schema evolution operation to be applied during ALTER TABLE.
-+///
-+/// Operations are validated and applied in order during
-+/// [`apply_schema_operations`]. Each operation sees the schema state after all prior operations
-+/// have been applied.
-+#[derive(Debug, Clone)]
-+pub(crate) enum SchemaOperation {
-+    /// Add a top-level column.
-+    AddColumn { field: StructField },
-+}
-+
-+/// The result of applying schema operations.
-+#[derive(Debug)]
-+pub(crate) struct SchemaEvolutionResult {
-+    /// The evolved schema after all operations are applied.
-+    pub schema: SchemaRef,
-+}
-+
-+/// Applies a sequence of schema operations to the given schema, returning the evolved schema.
-+///
-+/// Operations are applied sequentially: each one validates against and modifies the schema
-+/// produced by all preceding operations, not the original input schema.
-+///
-+/// # Errors
-+///
-+/// Returns an error if any operation fails validation. The error message identifies which
-+/// operation failed and why.
-+pub(crate) fn apply_schema_operations(
-+    schema: StructType,
-+    operations: Vec<SchemaOperation>,
-+    column_mapping_mode: ColumnMappingMode,
-+) -> DeltaResult<SchemaEvolutionResult> {
-+    let cm_enabled = column_mapping_mode != ColumnMappingMode::None;
-+    // IndexMap preserves field insertion order. Keys are lowercased for case-insensitive
-+    // duplicate detection; StructFields retain their original casing.
-+    let mut fields: IndexMap<String, StructField> = schema
-+        .into_fields()
-+        .map(|f| (f.name().to_lowercase(), f))
-+        .collect();
-+
-+    for op in operations {
-+        match op {
-+            // Protocol feature checks for the field's data type (e.g. `timestampNtz`) happen
-+            // later when the caller builds a new TableConfiguration from the evolved schema --
-+            // the alter is rejected if the table doesn't already have the required feature
-+            // enabled. This matches Spark, which also rejects with
-+            // `DELTA_FEATURES_REQUIRE_MANUAL_ENABLEMENT` and requires the user to enable the
-+            // feature explicitly before adding such a column.
-+            SchemaOperation::AddColumn { field } => {
-+                // TODO: support column mapping for add_column (assign ID + physical name,
-+                // update delta.columnMapping.maxColumnId).
-+                if cm_enabled {
-+                    return Err(Error::unsupported(
-+                        "ALTER TABLE add_column is not yet supported on tables with \
-+                         column mapping enabled",
-+                    ));
-+                }
-+                if field.is_metadata_column() {
-+                    return Err(Error::schema(format!(
-+                        "Cannot add column '{}': metadata columns are not allowed in \
-+                         a table schema",
-+                        field.name()
-+                    )));
-+                }
-+                let key = field.name().to_lowercase();
-+                if fields.contains_key(&key) {
-+                    return Err(Error::schema(format!(
-+                        "Cannot add column '{}': a column with that name already exists",
-+                        field.name()
-+                    )));
-+                }
-+                // Validate field is nullable (Delta protocol requires added columns to be
-+                // nullable so existing data files can return NULL for the new column)
-+                // NOTE: non-nullable columns depend on invariants feature
-+                if !field.is_nullable() {
-+                    return Err(Error::schema(format!(
-+                        "Cannot add non-nullable column '{}'. Added columns must be nullable \
-+                         because existing data files do not contain this column.",
-+                        field.name()
-+                    )));
-+                }
-+                fields.insert(key, field);
-+            }
-+        }
-+    }
-+
-+    let evolved_schema = StructType::try_new(fields.into_values())?;
-+
-+    validate_schema(&evolved_schema, column_mapping_mode)?;
-+    Ok(SchemaEvolutionResult {
-+        schema: evolved_schema.into(),
-+    })
-+}
-+
-+#[cfg(test)]
-+mod tests {
-+    use rstest::rstest;
-+
-+    use super::*;
-+    use crate::schema::{DataType, MetadataColumnSpec, StructField, StructType};
-+
-+    fn simple_schema() -> StructType {
-+        StructType::try_new(vec![
-+            StructField::not_null("id", DataType::INTEGER),
-+            StructField::nullable("name", DataType::STRING),
-+        ])
-+        .unwrap()
-+    }
-+
-+    fn add_col(name: &str, nullable: bool) -> SchemaOperation {
-+        let field = if nullable {
-+            StructField::nullable(name, DataType::STRING)
-+        } else {
-+            StructField::not_null(name, DataType::STRING)
-+        };
-+        SchemaOperation::AddColumn { field }
-+    }
-+
-+    // Builds a struct column whose nested leaf field has the given name. Used to prove that
-+    // `validate_schema` (not just the top-level dup check or `StructType::try_new`) is
-+    // reached from `apply_schema_operations`.
-+    fn add_struct_with_nested_leaf(name: &str, leaf_name: &str) -> SchemaOperation {
-+        let inner =
-+            StructType::try_new(vec![StructField::nullable(leaf_name, DataType::STRING)]).unwrap();
-+        SchemaOperation::AddColumn {
-+            field: StructField::nullable(name, inner),
-+        }
-+    }
-+
-+    #[rstest]
-+    #[case::dup_exact(vec![add_col("name", true)], "already exists")]
-+    #[case::dup_case_insensitive(vec![add_col("Name", true)], "already exists")]
-+    #[case::dup_within_batch(
-+        vec![add_col("email", true), add_col("email", true)],
-+        "already exists"
-+    )]
-+    #[case::non_nullable(vec![add_col("age", false)], "non-nullable")]
-+    #[case::invalid_parquet_char(vec![add_col("foo,bar", true)], "invalid character")]
-+    #[case::nested_invalid_parquet_char(
-+        vec![add_struct_with_nested_leaf("addr", "bad,leaf")],
-+        "invalid character"
-+    )]
-+    #[case::metadata_column(
-+        vec![SchemaOperation::AddColumn {
-+            field: StructField::create_metadata_column("row_idx", MetadataColumnSpec::RowIndex),
-+        }],
-+        "metadata columns are not allowed"
-+    )]
-+    fn apply_schema_operations_rejects(
-+        #[case] ops: Vec<SchemaOperation>,
-+        #[case] error_contains: &str,
-+    ) {
-+        let err =
-+            apply_schema_operations(simple_schema(), ops, ColumnMappingMode::None).unwrap_err();
-+        assert!(err.to_string().contains(error_contains));
-+    }
-+
-+    #[rstest]
-+    #[case::single(vec![add_col("email", true)], &["id", "name", "email"])]
-+    #[case::multiple(
-+        vec![add_col("email", true), add_col("age", true)],
-+        &["id", "name", "email", "age"]
-+    )]
-+    fn apply_schema_operations_succeeds(
-+        #[case] ops: Vec<SchemaOperation>,
-+        #[case] expected_names: &[&str],
-+    ) {
-+        let result =
-+            apply_schema_operations(simple_schema(), ops, ColumnMappingMode::None).unwrap();
-+        let actual: Vec<&str> = result.schema.fields().map(|f| f.name().as_str()).collect();
-+        assert_eq!(&actual, expected_names);
-+    }
-+}
\ No newline at end of file
kernel/tests/README.md
@@ -1,31 +0,0 @@
-diff --git a/kernel/tests/README.md b/kernel/tests/README.md
---- a/kernel/tests/README.md
-+++ b/kernel/tests/README.md
- 
- | Table | Location | Schema | Protocol (R/W) | Features | Description | Tests |
- |-------|----------|--------|----------|----------|-------------|-------|
--| `table-with-dv-small` | data/ | `value: int` | v3/v7 | r:`deletionVectors` w:`deletionVectors` | 10 rows, 2 soft-deleted by DV, 8 visible. Most heavily referenced test table. | `dv.rs::test_table_scan(with_dv)`, `write.rs::test_remove_files_adds_expected_entries`, `write.rs::test_update_deletion_vectors_adds_expected_entries`, `read.rs::with_predicate_and_removes`, `path.rs::test_to_uri/test_child/test_child_escapes`, `snapshot.rs::test_snapshot_read_metadata/test_new_snapshot/test_snapshot_new_from/test_read_table_with_missing_last_checkpoint/test_log_compaction_writer`, `deletion_vector.rs` tests, `transaction/mod.rs::setup_dv_enabled_table/test_add_files_schema/test_new_deletion_vector_path`, `default/parquet.rs` read test, `default/json.rs` read test, `log_compaction/tests.rs::create_mock_snapshot`, `resolve_dvs.rs` tests |
-+| `table-with-dv-small` | data/ | `value: int` | v3/v7 | r:`deletionVectors` w:`deletionVectors` | 10 rows, 2 soft-deleted by DV, 8 visible. Most heavily referenced test table. | `dv.rs::test_table_scan(with_dv)`, `write_remove_dv.rs::test_remove_files_adds_expected_entries`, `write_remove_dv.rs::test_update_deletion_vectors_adds_expected_entries`, `read.rs::with_predicate_and_removes`, `path.rs::test_to_uri/test_child/test_child_escapes`, `snapshot.rs::test_snapshot_read_metadata/test_new_snapshot/test_snapshot_new_from/test_read_table_with_missing_last_checkpoint/test_log_compaction_writer`, `deletion_vector.rs` tests, `transaction/mod.rs::setup_dv_enabled_table/test_add_files_schema/test_new_deletion_vector_path`, `default/parquet.rs` read test, `default/json.rs` read test, `log_compaction/tests.rs::create_mock_snapshot`, `resolve_dvs.rs` tests |
- | `table-without-dv-small` | data/ | `value: long` | v1/v2 | | 10 rows, all visible. Companion to table-with-dv-small. | `dv.rs::test_table_scan(without_dv)`, `transaction/mod.rs::setup_non_dv_table/create_existing_table_txn/test_commit_io_error_returns_retryable_transaction`, `sequential_phase.rs::test_sequential_v2_with_commits_only/test_sequential_finish_before_exhaustion_error`, `parallel_phase.rs` tests, `scan/tests.rs::test_scan_metadata_paths/test_scan_metadata/test_scan_metadata_from_same_version` |
- | `with-short-dv` | data/ | `id: long, value: string, timestamp: timestamp, rand: double` | v3/v7 | r:`deletionVectors` w:`deletionVectors` | 2 files x 5 rows. First file has inline DV (`storageType="u"`) deleting 3 rows. | `read.rs::short_dv` |
- | `dv-partitioned-with-checkpoint` | golden_data/ | `value: int, part: int` partitioned by `part` | v3/v7 | r:`deletionVectors` w:`deletionVectors` | DVs on a partitioned table with a checkpoint | `golden_tables.rs::golden_test!` |
- 
- | Table | Location | Schema | Protocol (R/W) | Features | Description | Tests |
- |-------|----------|--------|----------|----------|-------------|-------|
--| `partition_cm/none` | data/ | `value: int, category: string` partitioned by `category` | v1/v1 | `columnMapping.mode=none` | Partitioned write with CM disabled | `write.rs::test_column_mapping_partitioned_write(cm_none)` |
--| `partition_cm/id` | data/ | `value: int, category: string` partitioned by `category` | v3/v7 | r:`columnMapping` w:`columnMapping`, `columnMapping.mode=id` | Partitioned write with CM id mode | `write.rs::test_column_mapping_partitioned_write(cm_id)` |
--| `partition_cm/name` | data/ | `value: int, category: string` partitioned by `category` | v3/v7 | r:`columnMapping` w:`columnMapping`, `columnMapping.mode=name` | Partitioned write with CM name mode | `write.rs::test_column_mapping_partitioned_write(cm_name)` |
-+| `partition_cm/none` | data/ | `value: int, category: string` partitioned by `category` | v1/v1 | `columnMapping.mode=none` | Partitioned write with CM disabled | `write_column_mapping.rs::test_column_mapping_partitioned_write(cm_none)` |
-+| `partition_cm/id` | data/ | `value: int, category: string` partitioned by `category` | v3/v7 | r:`columnMapping` w:`columnMapping`, `columnMapping.mode=id` | Partitioned write with CM id mode | `write_column_mapping.rs::test_column_mapping_partitioned_write(cm_id)` |
-+| `partition_cm/name` | data/ | `value: int, category: string` partitioned by `category` | v3/v7 | r:`columnMapping` w:`columnMapping`, `columnMapping.mode=name` | Partitioned write with CM name mode | `write_column_mapping.rs::test_column_mapping_partitioned_write(cm_name)` |
- | `table-with-columnmapping-mode-name` | golden_data/ | `ByteType: byte, ShortType: short, IntegerType: int, LongType: long, FloatType: float, DoubleType: double, decimal: decimal(10,2), BooleanType: boolean, StringType: string, BinaryType: binary, DateType: date, TimestampType: timestamp, nested_struct: struct{aa: string, ac: struct{aca: int}}, array_of_prims: array<int>, array_of_arrays: array<array<int>>, array_of_structs: array<struct{ab: long}>, map_of_prims: map<int,long>, map_of_rows: map<int,struct{ab: long}>, map_of_arrays: map<long,array<int>>` | v2/v5 | `columnMapping.mode=name` | Column mapping name mode | `golden_tables.rs::golden_test!` |
- | `table-with-columnmapping-mode-id` | golden_data/ | `ByteType: byte, ShortType: short, IntegerType: int, LongType: long, FloatType: float, DoubleType: double, decimal: decimal(10,2), BooleanType: boolean, StringType: string, BinaryType: binary, DateType: date, TimestampType: timestamp, nested_struct: struct{aa: string, ac: struct{aca: int}}, array_of_prims: array<int>, array_of_arrays: array<array<int>>, array_of_structs: array<struct{ab: long}>, map_of_prims: map<int,long>, map_of_rows: map<int,struct{ab: long}>, map_of_arrays: map<long,array<int>>` | v2/v5 | `columnMapping.mode=id` | Column mapping id mode | `golden_tables.rs::golden_test!` |
- 
- | Table | Location | Schema | Protocol (R/W) | Features | Description | Tests |
- |-------|----------|--------|----------|----------|-------------|-------|
- | `with_checkpoint_no_last_checkpoint` | data/ | `letter: string, int: long, date: date` | v1/v2 | `checkpointInterval=2` | Checkpoint at v2 but missing `_last_checkpoint` hint file | `snapshot.rs::test_read_table_with_checkpoint`, `scan/tests.rs::test_scan_with_checkpoint`, `sequential_phase.rs::test_sequential_checkpoint_no_commits`, `checkpoint_manifest.rs` tests, `sync/parquet.rs` test, `default/parquet.rs` test |
--| `external-table-different-nullability` | data/ | `i: int` | v1/v2 | `checkpointInterval=2` | Parquet files have different nullability than Delta schema; includes checkpoint | `write.rs::test_checkpoint_non_kernel_written_table` |
-+| `external-table-different-nullability` | data/ | `i: int` | v1/v2 | `checkpointInterval=2` | Parquet files have different nullability than Delta schema; includes checkpoint | `write_clustered.rs::test_checkpoint_non_kernel_written_table` |
- | `checkpoint` | golden_data/ | `intCol: int` | v1/v2 | | Basic checkpoint read | `golden_tables.rs::golden_test!(checkpoint_test)` |
- | `corrupted-last-checkpoint-kernel` | golden_data/ | `id: long` | v1/v2 | | Corrupted `_last_checkpoint` file | `golden_tables.rs::golden_test!` |
- | `multi-part-checkpoint` | golden_data/ | `id: long` | v1/v2 | `checkpointInterval=1` | Multi-part checkpoint files | `golden_tables.rs::golden_test!` |
\ No newline at end of file

... (truncated, output exceeded 60000 bytes)

Reproduce locally: git range-diff ac9dc19..97281f0 6486bd2..7546790 | Disable: git config gitstack.push-range-diff false

@lorenarosati
Copy link
Copy Markdown
Collaborator Author

Range-diff: main (7546790 -> 6245673)
kernel/src/table_configuration.rs
@@ -9,42 +9,23 @@
      validate_timestamp_ntz_feature_support, ColumnMappingMode, EnablementCheck, FeatureRequirement,
      FeatureType, KernelSupport, Operation, TableFeature, LEGACY_READER_FEATURES,
      LEGACY_WRITER_FEATURES, MAX_VALID_READER_VERSION, MAX_VALID_WRITER_VERSION,
-         version: Version,
-     ) -> DeltaResult<Self> {
-         let logical_schema = Arc::new(metadata.parse_schema()?);
-+        Self::try_new_inner(metadata, protocol, table_root, version, logical_schema)
-+    }
-+
-+    /// Like [`try_new`](Self::try_new), but reuses `base`'s protocol, table root, and version
-+    /// and takes a pre-parsed `logical_schema`.
-+    pub(crate) fn try_new_with_schema(
-+        base: &Self,
-+        metadata: Metadata,
-+        logical_schema: SchemaRef,
-+    ) -> DeltaResult<Self> {
-+        Self::try_new_inner(
-+            metadata,
-+            base.protocol.clone(),
-+            base.table_root.clone(),
-+            base.version,
-+            logical_schema,
-+        )
-+    }
-+
-+    fn try_new_inner(
-+        metadata: Metadata,
-+        protocol: Protocol,
-+        table_root: Url,
-+        version: Version,
-+        logical_schema: SchemaRef,
-+    ) -> DeltaResult<Self> {
-         let table_properties = metadata.parse_table_properties();
-         let column_mapping_mode = column_mapping_mode(&protocol, &table_properties);
- 
  
          // Validate schema against protocol features now that we have a TC instance.
          validate_timestamp_ntz_feature_support(&table_config)?;
 +        validate_geospatial_feature_support(&table_config)?;
          validate_variant_type_feature_support(&table_config)?;
  
-         Ok(table_config)
\ No newline at end of file
+         Ok(table_config)
+             config.ensure_operation_supported(Operation::Write),
+             r#"Feature 'typeWidening' is not supported for writes"#,
+         );
++
++        // Geospatial is not supported for writes
++        let config = create_mock_table_config(&[], &[TableFeature::GeospatialType]);
++        assert_result_error_with_message(
++            config.ensure_operation_supported(Operation::Write),
++            r#"Feature 'geospatial' is not supported for writes"#,
++        );
+     }
+ 
+     #[test]
\ No newline at end of file
kernel/src/table_features/mod.rs
@@ -32,7 +32,12 @@
 +    feature_type: FeatureType::ReaderWriter,
 +    min_legacy_version: None,
 +    feature_requirements: &[],
-+    kernel_support: KernelSupport::Supported,
++    kernel_support: KernelSupport::Custom(|_, _, op| match op {
++        Operation::Scan | Operation::Cdf => Ok(()),
++        Operation::Write => Err(Error::unsupported(
++            "Feature 'geospatial' is not supported for writes",
++        )),
++    }),
 +    enablement_check: EnablementCheck::AlwaysIfSupported,
 +};
 +
kernel/src/actions/mod.rs
@@ -1,32 +0,0 @@
-diff --git a/kernel/src/actions/mod.rs b/kernel/src/actions/mod.rs
---- a/kernel/src/actions/mod.rs
-+++ b/kernel/src/actions/mod.rs
- }
- 
- // Serde derives are needed for CRC file deserialization (see `crc::reader`).
-+//
-+// TODO(#2446): `Metadata` stores the schema only as a JSON string. Callers that already hold
-+// a parsed `SchemaRef` (e.g. CREATE TABLE) serialize into `schema_string` and then re-parse
-+// downstream in `TableConfiguration::try_new` via `parse_schema()`. Caching the parsed schema
-+// on `Metadata` would eliminate the round-trip.
- #[derive(Debug, Default, Clone, PartialEq, Eq, Serialize, Deserialize, ToSchema)]
- #[serde(rename_all = "camelCase")]
- #[internal_api]
-         TableProperties::from(self.configuration.iter())
-     }
- 
-+    /// Returns a new Metadata with the schema replaced, preserving all other fields.
-+    ///
-+    /// # Errors
-+    ///
-+    /// Returns an error if schema serialization fails.
-+    pub(crate) fn with_schema(self, schema: SchemaRef) -> DeltaResult<Self> {
-+        Ok(Self {
-+            schema_string: serde_json::to_string(&schema)?,
-+            ..self
-+        })
-+    }
-+
-     #[cfg(test)]
-     #[allow(clippy::too_many_arguments)]
-     pub(crate) fn new_unchecked(
\ No newline at end of file
kernel/src/engine/arrow_expression/evaluate_expression.rs
@@ -1,154 +0,0 @@
-diff --git a/kernel/src/engine/arrow_expression/evaluate_expression.rs b/kernel/src/engine/arrow_expression/evaluate_expression.rs
---- a/kernel/src/engine/arrow_expression/evaluate_expression.rs
-+++ b/kernel/src/engine/arrow_expression/evaluate_expression.rs
-         (Literal(scalar), _) => {
-             validate_array_type(scalar.to_array(batch.num_rows())?, result_type)
-         }
--        (Column(name), _) => {
--            // Column extraction uses ordinal-based struct validation because column mapping
--            // can cause physical/logical name mismatches. apply_schema handles renaming.
--            let arr = extract_column(batch, name)?;
--            if let Some(expected) = result_type {
--                ensure_data_types(expected, arr.data_type(), ValidationMode::TypesOnly)?;
--            }
--            Ok(arr)
--        }
-+        (Column(name), _) => validate_array_type(extract_column(batch, name)?, result_type),
-         (Struct(fields, nullability), Some(DataType::Struct(output_schema))) => {
-             evaluate_struct_expression(fields, batch, output_schema, nullability.as_ref())
-         }
-     }
- 
-     #[test]
--    fn column_extract_struct_with_mismatched_field_names() {
-+    fn column_extract_struct_rejects_mismatched_field_names() {
-         let batch = make_struct_batch(
-             vec![
-                 ArrowField::new("col-abc-001", ArrowDataType::Int64, true),
-             ],
-         );
- 
--        // Logical names differ from physical names due to column mapping
-         let logical_type = DataType::try_struct_type([
-             StructField::nullable("my_column", DataType::LONG),
-             StructField::nullable("other_column", DataType::LONG),
- 
-         let expr = column_expr!("stats");
-         let result = evaluate_expression(&expr, &batch, Some(&logical_type));
--
--        // Ordinal-based validation passes: same field count and types by position.
--        // The downstream apply_schema transformation handles renaming.
--        let arr = result.expect("should succeed with mismatched names but matching types");
--        let struct_arr = arr.as_any().downcast_ref::<StructArray>().unwrap();
--        assert_eq!(struct_arr.num_columns(), 2);
--        assert_eq!(struct_arr.len(), 2);
--    }
--
--    #[test]
--    fn column_extract_struct_rejects_mismatched_field_count() {
--        let batch = make_struct_batch(
--            vec![ArrowField::new("col-abc-001", ArrowDataType::Int64, true)],
--            vec![Arc::new(Int64Array::from(vec![Some(1), Some(2)]))],
--        );
--
--        let logical_type = DataType::try_struct_type([
--            StructField::nullable("a", DataType::LONG),
--            StructField::nullable("b", DataType::LONG),
--        ])
--        .unwrap();
--
--        let expr = column_expr!("stats");
--        let result = evaluate_expression(&expr, &batch, Some(&logical_type));
--        assert_result_error_with_message(result, "Struct field count mismatch");
-+        assert_result_error_with_message(result, "Missing Struct fields");
-     }
- 
-     #[test]
-     fn column_extract_struct_rejects_mismatched_child_types() {
-         let batch = make_struct_batch(
-             vec![
--                ArrowField::new("col-abc-001", ArrowDataType::Int64, true),
--                ArrowField::new("col-abc-002", ArrowDataType::Utf8, true),
-+                ArrowField::new("a", ArrowDataType::Int64, true),
-+                ArrowField::new("b", ArrowDataType::Utf8, true),
-             ],
-             vec![
-                 Arc::new(Int64Array::from(vec![Some(1)])),
-             ],
-         );
- 
--        // Expect two LONG columns, but the second arrow field is Utf8
-         let logical_type = DataType::try_struct_type([
-             StructField::nullable("a", DataType::LONG),
-             StructField::nullable("b", DataType::LONG),
-     }
- 
-     #[test]
--    fn column_extract_struct_with_matching_names_still_works() {
-+    fn column_extract_struct_with_matching_names_works() {
-         let batch = make_struct_batch(
-             vec![
-                 ArrowField::new("a", ArrowDataType::Int64, true),
-         assert!(result.is_ok());
-     }
- 
--    /// Exercises the exact code path from `get_add_transform_expr` where a `struct_from`
--    /// expression wraps `column_expr!("add.stats_parsed")`. When the checkpoint parquet has
--    /// stats_parsed with physical column names (e.g. `col-abc-001`) but the output schema
--    /// uses logical names (e.g. `id`), `evaluate_struct_expression` calls
--    /// `evaluate_expression(Column, struct_result_type)` with mismatched field names.
--    /// Without ordinal-based validation this fails with a name mismatch error.
-+    /// When a `struct_from` expression wraps a `Column` referencing stats_parsed, and the
-+    /// checkpoint parquet has physical column names (e.g. `col-abc-001`) but the output schema
-+    /// uses logical names (e.g. `id`), name-based validation correctly rejects the mismatch.
-     #[test]
--    fn struct_from_with_column_tolerates_nested_name_mismatch() {
--        // Build a batch mimicking checkpoint data: add.stats_parsed uses physical names
-+    fn struct_from_with_column_rejects_nested_name_mismatch() {
-         let stats_fields: Vec<ArrowField> = vec![
-             ArrowField::new("col-abc-001", ArrowDataType::Int64, true),
-             ArrowField::new("col-abc-002", ArrowDataType::Int64, true),
-         )]);
-         let batch = RecordBatch::try_new(Arc::new(schema), vec![Arc::new(add_struct)]).unwrap();
- 
--        // struct_from mimicking get_add_transform_expr: wraps a Column referencing stats_parsed
-         let expr = Expr::struct_from([
-             column_expr_ref!("add.path"),
-             column_expr_ref!("add.stats_parsed"),
-         .unwrap();
- 
-         let result = evaluate_expression(&expr, &batch, Some(&output_type));
--        result.expect("struct_from with Column sub-expression should tolerate field name mismatch");
--    }
--
--    #[test]
--    fn column_extract_nested_struct_with_mismatched_names() {
--        let inner_fields = vec![ArrowField::new("phys-inner", ArrowDataType::Int64, true)];
--        let inner_struct = ArrowDataType::Struct(inner_fields.clone().into());
--        let batch = make_struct_batch(
--            vec![ArrowField::new("phys-outer", inner_struct, true)],
--            vec![Arc::new(
--                StructArray::try_new(
--                    inner_fields.into(),
--                    vec![Arc::new(Int64Array::from(vec![Some(42)]))],
--                    None,
--                )
--                .unwrap(),
--            )],
--        );
--
--        let logical_type = DataType::try_struct_type([StructField::nullable(
--            "logical_outer",
--            DataType::struct_type_unchecked([StructField::nullable(
--                "logical_inner",
--                DataType::LONG,
--            )]),
--        )])
--        .unwrap();
--
--        let expr = column_expr!("stats");
--        let result = evaluate_expression(&expr, &batch, Some(&logical_type));
--        assert!(result.is_ok());
-+        assert_result_error_with_message(result, "Missing Struct fields");
-     }
- }
\ No newline at end of file
kernel/src/engine/ensure_data_types.rs
@@ -1,13 +0,0 @@
-diff --git a/kernel/src/engine/ensure_data_types.rs b/kernel/src/engine/ensure_data_types.rs
---- a/kernel/src/engine/ensure_data_types.rs
-+++ b/kernel/src/engine/ensure_data_types.rs
- #[internal_api]
- pub(crate) enum ValidationMode {
-     /// Check types only. Struct fields are matched by ordinal position, not by name.
--    /// Nullability and metadata are not checked. Used by the expression evaluator where
--    /// column mapping can cause physical/logical name mismatches.
-+    /// Nullability and metadata are not checked.
-+    #[allow(dead_code)]
-     TypesOnly,
-     /// Check types and match struct fields by name, but skip nullability and metadata.
-     /// Used by the parquet reader where fields are already resolved by name upstream.
\ No newline at end of file
kernel/src/schema/validation.rs
@@ -1,48 +0,0 @@
-diff --git a/kernel/src/schema/validation.rs b/kernel/src/schema/validation.rs
---- a/kernel/src/schema/validation.rs
-+++ b/kernel/src/schema/validation.rs
--//! Schema validation utilities for Delta table creation.
-+//! Schema validation utilities shared by table creation and schema evolution.
- //!
- //! Validates schemas per the Delta protocol specification.
- 
- /// These characters have special meaning in Parquet schema syntax.
- const INVALID_PARQUET_CHARS: &[char] = &[' ', ',', ';', '{', '}', '(', ')', '\n', '\t', '='];
- 
--/// Validates a schema for table creation.
-+/// Validates a schema for CREATE TABLE or ALTER TABLE.
- ///
- /// Performs the following checks:
- /// 1. Schema is non-empty
- /// 3. Column names contain only valid characters
- /// 4. Rejects fields with `delta.invariants` metadata (SQL expression invariants are not supported
- ///    by kernel; see `TableConfiguration::ensure_write_supported`)
--pub(crate) fn validate_schema_for_create(
-+pub(crate) fn validate_schema(
-     schema: &StructType,
-     column_mapping_mode: ColumnMappingMode,
- ) -> DeltaResult<()> {
-     #[case::dot_in_name_with_cm(schema_with_dot(), ColumnMappingMode::Name)]
-     #[case::different_struct_children(schema_different_struct_children(), ColumnMappingMode::None)]
-     fn valid_schema_accepted(#[case] schema: StructType, #[case] cm: ColumnMappingMode) {
--        assert!(validate_schema_for_create(&schema, cm).is_ok());
-+        assert!(validate_schema(&schema, cm).is_ok());
-     }
- 
-     // === Invalid schemas ===
-         #[case] cm: ColumnMappingMode,
-         #[case] expected_errs: &[&str],
-     ) {
--        let result = validate_schema_for_create(&schema, cm);
-+        let result = validate_schema(&schema, cm);
-         assert!(result.is_err());
-         let err = result.unwrap_err().to_string();
-         for expected in expected_errs {
-     #[case::array_nested(schema_array_nested_invariant(), "arr.child")]
-     #[case::map_nested(schema_map_nested_invariant(), "map.child")]
-     fn invariants_metadata_rejected(#[case] schema: StructType, #[case] expected_path: &str) {
--        let result = validate_schema_for_create(&schema, ColumnMappingMode::None);
-+        let result = validate_schema(&schema, ColumnMappingMode::None);
-         let err = result.expect_err("expected delta.invariants metadata rejection");
-         let msg = err.to_string();
-         assert!(
\ No newline at end of file
kernel/src/snapshot/mod.rs
@@ -1,27 +0,0 @@
-diff --git a/kernel/src/snapshot/mod.rs b/kernel/src/snapshot/mod.rs
---- a/kernel/src/snapshot/mod.rs
-+++ b/kernel/src/snapshot/mod.rs
- use crate::table_configuration::{InCommitTimestampEnablement, TableConfiguration};
- use crate::table_features::{physical_to_logical_column_name, ColumnMappingMode, TableFeature};
- use crate::table_properties::TableProperties;
-+use crate::transaction::builder::alter_table::AlterTableTransactionBuilder;
- use crate::transaction::Transaction;
- use crate::utils::require;
- use crate::{DeltaResult, Engine, Error, LogCompactionWriter, Version};
-         Transaction::try_new_existing_table(self, committer, engine)
-     }
- 
-+    /// Creates a builder for altering this table's metadata. Currently supports schema change
-+    /// operations.
-+    ///
-+    /// The returned builder allows chaining operations before building an
-+    /// [`AlterTableTransaction`] that can be committed.
-+    ///
-+    /// [`AlterTableTransaction`]: crate::transaction::AlterTableTransaction
-+    pub fn alter_table(self: Arc<Self>) -> AlterTableTransactionBuilder {
-+        AlterTableTransactionBuilder::new(self)
-+    }
-+
-     /// Fetch the latest version of the provided `application_id` for this snapshot. Filters the
-     /// txn based on the delta.setTransactionRetentionDuration property and lastUpdated.
-     ///
\ No newline at end of file
kernel/src/transaction/alter_table.rs
@@ -1,81 +0,0 @@
-diff --git a/kernel/src/transaction/alter_table.rs b/kernel/src/transaction/alter_table.rs
-new file mode 100644
---- /dev/null
-+++ b/kernel/src/transaction/alter_table.rs
-+//! Alter table transaction types and constructor.
-+//!
-+//! This module defines the [`AlterTableTransaction`] type alias and the
-+//! [`try_new_alter_table`](AlterTableTransaction::try_new_alter_table) constructor.
-+//! The builder logic lives in [`builder::alter_table`](super::builder::alter_table).
-+
-+#![allow(unreachable_pub)]
-+
-+use std::marker::PhantomData;
-+use std::sync::OnceLock;
-+
-+use crate::committer::Committer;
-+use crate::snapshot::SnapshotRef;
-+use crate::table_configuration::TableConfiguration;
-+use crate::transaction::{AlterTable, Transaction};
-+use crate::utils::current_time_ms;
-+use crate::DeltaResult;
-+
-+/// A type alias for alter-table transactions.
-+///
-+/// This provides a restricted API surface that only exposes operations valid during ALTER
-+/// commands. Data file operations are not available at compile time because `AlterTable`
-+/// does not implement [`SupportsDataFiles`](super::SupportsDataFiles).
-+pub type AlterTableTransaction = Transaction<AlterTable>;
-+
-+impl AlterTableTransaction {
-+    /// Create a new transaction for altering a table's schema. Produces a metadata-only commit
-+    /// that emits an updated Metadata action with the evolved schema.
-+    ///
-+    /// The `effective_table_config` is the evolved table configuration (new schema, same
-+    /// protocol). It must be fully validated before calling this constructor (e.g. schema
-+    /// operations applied, protocol feature checks passed). The `read_snapshot` provides the
-+    /// pre-commit table state (version, previous protocol/metadata, ICT timestamps) used for
-+    /// commit versioning and post-commit snapshots.
-+    ///
-+    /// This is typically called via `AlterTableTransactionBuilder::build()` rather than directly.
-+    pub(crate) fn try_new_alter_table(
-+        read_snapshot: SnapshotRef,
-+        effective_table_config: TableConfiguration,
-+        committer: Box<dyn Committer>,
-+    ) -> DeltaResult<Self> {
-+        let span = tracing::info_span!(
-+            "txn",
-+            path = %read_snapshot.table_root(),
-+            read_version = read_snapshot.version(),
-+            operation = "ALTER TABLE",
-+        );
-+
-+        Ok(Transaction {
-+            span,
-+            read_snapshot_opt: Some(read_snapshot),
-+            effective_table_config,
-+            should_emit_protocol: false,
-+            should_emit_metadata: true,
-+            committer,
-+            operation: Some("ALTER TABLE".to_string()),
-+            engine_info: None,
-+            add_files_metadata: vec![],
-+            remove_files_metadata: vec![],
-+            set_transactions: vec![],
-+            commit_timestamp: current_time_ms()?,
-+            user_domain_metadata_additions: vec![],
-+            system_domain_metadata_additions: vec![],
-+            user_domain_removals: vec![],
-+            data_change: false,
-+            shared_write_state: OnceLock::new(),
-+            engine_commit_info: None,
-+            // TODO(#2446): match delta-spark's per-op isBlindAppend policy
-+            // (ADD/DROP/DROP NOT NULL -> true, SET NOT NULL -> false). Hardcoded false for
-+            // now: safe, but misses the true-case optimization delta-spark applies.
-+            is_blind_append: false,
-+            dv_matched_files: vec![],
-+            physical_clustering_columns: None,
-+            _state: PhantomData,
-+        })
-+    }
-+}
\ No newline at end of file
kernel/src/transaction/builder/alter_table.rs
@@ -1,168 +0,0 @@
-diff --git a/kernel/src/transaction/builder/alter_table.rs b/kernel/src/transaction/builder/alter_table.rs
-new file mode 100644
---- /dev/null
-+++ b/kernel/src/transaction/builder/alter_table.rs
-+//! Builder for ALTER TABLE (schema evolution) transactions.
-+//!
-+//! This module contains [`AlterTableTransactionBuilder`], which uses a type-state pattern to
-+//! enforce valid operation chaining at compile time.
-+//!
-+//! # Type States
-+//!
-+//! - [`Ready`]: Initial state. Operations are available, but `build()` is not (at least one
-+//!   operation is required).
-+//! - [`Modifying`]: After any chainable schema operation. More ops can be chained, and `build()` is
-+//!   available. See [`AlterTableTransactionBuilder<Modifying>`] for ops.
-+//!
-+//! # Transitions
-+//!
-+//! Each `impl` block below is gated by a state bound and documents which operations that
-+//! state enables. Chainable schema operations live on `impl<S: Chainable>` and transition
-+//! the builder to a chainable state; `build()` lives on states that are buildable.
-+//!
-+//! ```ignore
-+//! // Allowed: at least one op queued before build().
-+//! snapshot.alter_table().add_column(field).build(engine, committer)?;
-+//!
-+//! // Not allowed: build() is not defined on Ready (no ops queued).
-+//! snapshot.alter_table().build(engine, committer)?;  // compile error
-+//! ```
-+
-+use std::marker::PhantomData;
-+use std::sync::Arc;
-+
-+use crate::committer::Committer;
-+use crate::schema::StructField;
-+use crate::snapshot::SnapshotRef;
-+use crate::table_configuration::TableConfiguration;
-+use crate::table_features::Operation;
-+use crate::transaction::alter_table::AlterTableTransaction;
-+use crate::transaction::schema_evolution::{
-+    apply_schema_operations, SchemaEvolutionResult, SchemaOperation,
-+};
-+use crate::{DeltaResult, Engine};
-+
-+/// Initial state: `build()` is not yet available (at least one operation is required).
-+/// See [`Chainable`] for the operations available on this state.
-+pub struct Ready;
-+
-+/// State after at least one operation has been added. `build()` is available.
-+/// See [`Chainable`] for the operations available on this state.
-+pub struct Modifying;
-+
-+/// Marker trait for builder states that accept chainable schema operations. Grouping states
-+/// under one bound lets each op (like `add_column`) live on a single `impl<S: Chainable>`
-+/// block -- chainable states share the body rather than duplicating it per state.
-+///
-+/// Sealed: external types cannot implement this, keeping the set of chainable states closed.
-+pub trait Chainable: sealed::Sealed {}
-+impl Chainable for Ready {}
-+impl Chainable for Modifying {}
-+
-+mod sealed {
-+    pub trait Sealed {}
-+    impl Sealed for super::Ready {}
-+    impl Sealed for super::Modifying {}
-+}
-+
-+/// Builder for constructing an [`AlterTableTransaction`] with schema evolution operations.
-+///
-+/// Uses a type-state pattern (`S`) to enforce at compile time:
-+/// - At least one schema operation must be queued before `build()` is callable.
-+/// - Only operations valid for the current state can be chained. This will disallow incompatibel
-+///   chaining.
-+pub struct AlterTableTransactionBuilder<S = Ready> {
-+    snapshot: SnapshotRef,
-+    operations: Vec<SchemaOperation>,
-+    // PhantomData marker for builder state (Ready or Modifying).
-+    // Zero-sized; only affects which methods are available at compile time.
-+    _state: PhantomData<S>,
-+}
-+
-+impl<S> AlterTableTransactionBuilder<S> {
-+    // Reconstructs the builder with a different PhantomData marker, changing which methods
-+    // are available at compile time (e.g. Ready -> Modifying enables `build()`). All real
-+    // fields are moved as-is; only the zero-sized type state changes.
-+    //
-+    // `T` (distinct from the struct's `S`) lets the caller pick the target state:
-+    // `self.transition::<Modifying>()` returns `AlterTableTransactionBuilder<Modifying>`.
-+    fn transition<T>(self) -> AlterTableTransactionBuilder<T> {
-+        AlterTableTransactionBuilder {
-+            snapshot: self.snapshot,
-+            operations: self.operations,
-+            _state: PhantomData,
-+        }
-+    }
-+}
-+
-+impl AlterTableTransactionBuilder<Ready> {
-+    /// Create a new builder from a snapshot.
-+    pub(crate) fn new(snapshot: SnapshotRef) -> Self {
-+        AlterTableTransactionBuilder {
-+            snapshot,
-+            operations: Vec::new(),
-+            _state: PhantomData,
-+        }
-+    }
-+}
-+
-+impl<S: Chainable> AlterTableTransactionBuilder<S> {
-+    /// Add a new top-level column to the table schema.
-+    ///
-+    /// The field must not already exist in the schema (case-insensitive). The field must be
-+    /// nullable because existing data files do not contain this column and will read NULL for it.
-+    /// These constraints are validated during [`build()`](AlterTableTransactionBuilder::build).
-+    pub fn add_column(mut self, field: StructField) -> AlterTableTransactionBuilder<Modifying> {
-+        self.operations.push(SchemaOperation::AddColumn { field });
-+        self.transition()
-+    }
-+}
-+
-+impl AlterTableTransactionBuilder<Modifying> {
-+    /// Validate and apply schema operations, then build the [`AlterTableTransaction`].
-+    ///
-+    /// This method:
-+    /// 1. Validates the table supports writes
-+    /// 2. Applies each operation sequentially against the evolving schema
-+    /// 3. Constructs new Metadata action with evolved schema
-+    /// 4. Builds the evolved table configuration
-+    /// 5. Creates the transaction
-+    ///
-+    /// # Errors
-+    ///
-+    /// - Any individual operation fails validation (see per-method errors above)
-+    /// - Table does not support writes (unsupported features)
-+    /// - The evolved schema requires protocol features not enabled on the table (e.g. adding a
-+    ///   `timestampNtz` column without the `timestampNtz` feature)
-+    pub fn build(
-+        self,
-+        _engine: &dyn Engine,
-+        committer: Box<dyn Committer>,
-+    ) -> DeltaResult<AlterTableTransaction> {
-+        let table_config = self.snapshot.table_configuration();
-+        // Rejects writes to tables kernel can't safely commit to: writer version out of
-+        // kernel's supported range, unsupported writer features, or schemas with SQL-expression
-+        // invariants. Runs on the pre-alter snapshot; future ALTER variants that change the
-+        // protocol must also re-check this on the evolved `TableConfiguration`.
-+        table_config.ensure_operation_supported(Operation::Write)?;
-+
-+        let schema = Arc::unwrap_or_clone(table_config.logical_schema());
-+        let SchemaEvolutionResult {
-+            schema: evolved_schema,
-+        } = apply_schema_operations(schema, self.operations, table_config.column_mapping_mode())?;
-+
-+        let evolved_metadata = table_config
-+            .metadata()
-+            .clone()
-+            .with_schema(evolved_schema.clone())?;
-+
-+        // Validates the evolved metadata against the protocol.
-+        let evolved_table_config = TableConfiguration::try_new_with_schema(
-+            table_config,
-+            evolved_metadata,
-+            evolved_schema,
-+        )?;
-+
-+        AlterTableTransaction::try_new_alter_table(self.snapshot, evolved_table_config, committer)
-+    }
-+}
\ No newline at end of file
kernel/src/transaction/builder/create_table.rs
@@ -1,27 +0,0 @@
-diff --git a/kernel/src/transaction/builder/create_table.rs b/kernel/src/transaction/builder/create_table.rs
---- a/kernel/src/transaction/builder/create_table.rs
-+++ b/kernel/src/transaction/builder/create_table.rs
- use crate::clustering::{create_clustering_domain_metadata, validate_clustering_columns};
- use crate::committer::Committer;
- use crate::expressions::ColumnName;
--use crate::schema::validation::validate_schema_for_create;
-+use crate::schema::validation::validate_schema;
- use crate::schema::variant_utils::schema_contains_variant_type;
- use crate::schema::{
-     normalize_column_names_to_schema_casing, schema_contains_non_null_fields, DataType, SchemaRef,
- /// compatible with Spark readers/writers.
- ///
- /// Explicit `delta.invariants` metadata annotations are rejected by
--/// `validate_schema_for_create`, so this only flips on the feature for nullability-driven
-+/// `validate_schema`, so this only flips on the feature for nullability-driven
- /// invariants. Kernel does not itself enforce the null mask at write time -- it relies on
- /// the engine's `ParquetHandler` to do so. Kernel's default `ParquetHandler` uses
- /// `arrow-rs`, whose `RecordBatch::try_new` rejects null values in fields marked
-             maybe_apply_column_mapping_for_table_create(&self.schema, &mut validated)?;
- 
-         // Validate schema (non-empty, column names, duplicates, no `delta.invariants` metadata)
--        validate_schema_for_create(&effective_schema, column_mapping_mode)?;
-+        validate_schema(&effective_schema, column_mapping_mode)?;
- 
-         // Validate data layout and resolve column names (physical for clustering, logical
-         // for partitioning). Adds required table features for clustering.
\ No newline at end of file
kernel/src/transaction/builder/mod.rs
@@ -1,8 +0,0 @@
-diff --git a/kernel/src/transaction/builder/mod.rs b/kernel/src/transaction/builder/mod.rs
---- a/kernel/src/transaction/builder/mod.rs
-+++ b/kernel/src/transaction/builder/mod.rs
- // and for tests. Also allow dead_code since these are used by integration tests.
- #![allow(unreachable_pub, dead_code)]
- 
-+pub mod alter_table;
- pub mod create_table;
\ No newline at end of file
kernel/src/transaction/mod.rs
@@ -1,35 +0,0 @@
-diff --git a/kernel/src/transaction/mod.rs b/kernel/src/transaction/mod.rs
---- a/kernel/src/transaction/mod.rs
-+++ b/kernel/src/transaction/mod.rs
- #[cfg(not(feature = "internal-api"))]
- pub(crate) mod data_layout;
- 
-+pub(crate) mod alter_table;
-+pub use alter_table::AlterTableTransaction;
- mod commit_info;
- mod domain_metadata;
-+pub(crate) mod schema_evolution;
- mod stats_verifier;
- mod update;
- mod write_context;
- #[derive(Debug)]
- pub struct CreateTable;
- 
-+/// Marker type for alter-table (schema evolution) transactions.
-+///
-+/// Transactions in this state perform metadata-only commits. Data file operations are not
-+/// available at compile time because `AlterTable` does not implement [`SupportsDataFiles`].
-+#[derive(Debug)]
-+pub struct AlterTable;
-+
- /// Marker trait for transaction states that support data file operations.
- ///
- /// Only transaction types that implement this trait can access methods for adding, removing, or
- 
-     // Note: Additional test coverage for partial file matching (where some files in a scan
-     // have DV updates but others don't) is provided by the end-to-end integration test
--    // kernel/tests/dv.rs and kernel/tests/write.rs, which exercises
-+    // kernel/tests/dv.rs and kernel/tests/write_remove_dv.rs, which exercise
-     // the full deletion vector write workflow including the DvMatchVisitor logic.
- 
-     #[test]
\ No newline at end of file
kernel/src/transaction/schema_evolution.rs
@@ -1,190 +0,0 @@
-diff --git a/kernel/src/transaction/schema_evolution.rs b/kernel/src/transaction/schema_evolution.rs
-new file mode 100644
---- /dev/null
-+++ b/kernel/src/transaction/schema_evolution.rs
-+//! Schema evolution operations for ALTER TABLE.
-+//!
-+//! This module defines the [`SchemaOperation`] enum and the [`apply_schema_operations`] function
-+//! that validates and applies schema changes to produce an evolved schema.
-+
-+use indexmap::IndexMap;
-+
-+use crate::error::Error;
-+use crate::schema::validation::validate_schema;
-+use crate::schema::{SchemaRef, StructField, StructType};
-+use crate::table_features::ColumnMappingMode;
-+use crate::DeltaResult;
-+
-+/// A schema evolution operation to be applied during ALTER TABLE.
-+///
-+/// Operations are validated and applied in order during
-+/// [`apply_schema_operations`]. Each operation sees the schema state after all prior operations
-+/// have been applied.
-+#[derive(Debug, Clone)]
-+pub(crate) enum SchemaOperation {
-+    /// Add a top-level column.
-+    AddColumn { field: StructField },
-+}
-+
-+/// The result of applying schema operations.
-+#[derive(Debug)]
-+pub(crate) struct SchemaEvolutionResult {
-+    /// The evolved schema after all operations are applied.
-+    pub schema: SchemaRef,
-+}
-+
-+/// Applies a sequence of schema operations to the given schema, returning the evolved schema.
-+///
-+/// Operations are applied sequentially: each one validates against and modifies the schema
-+/// produced by all preceding operations, not the original input schema.
-+///
-+/// # Errors
-+///
-+/// Returns an error if any operation fails validation. The error message identifies which
-+/// operation failed and why.
-+pub(crate) fn apply_schema_operations(
-+    schema: StructType,
-+    operations: Vec<SchemaOperation>,
-+    column_mapping_mode: ColumnMappingMode,
-+) -> DeltaResult<SchemaEvolutionResult> {
-+    let cm_enabled = column_mapping_mode != ColumnMappingMode::None;
-+    // IndexMap preserves field insertion order. Keys are lowercased for case-insensitive
-+    // duplicate detection; StructFields retain their original casing.
-+    let mut fields: IndexMap<String, StructField> = schema
-+        .into_fields()
-+        .map(|f| (f.name().to_lowercase(), f))
-+        .collect();
-+
-+    for op in operations {
-+        match op {
-+            // Protocol feature checks for the field's data type (e.g. `timestampNtz`) happen
-+            // later when the caller builds a new TableConfiguration from the evolved schema --
-+            // the alter is rejected if the table doesn't already have the required feature
-+            // enabled. This matches Spark, which also rejects with
-+            // `DELTA_FEATURES_REQUIRE_MANUAL_ENABLEMENT` and requires the user to enable the
-+            // feature explicitly before adding such a column.
-+            SchemaOperation::AddColumn { field } => {
-+                // TODO: support column mapping for add_column (assign ID + physical name,
-+                // update delta.columnMapping.maxColumnId).
-+                if cm_enabled {
-+                    return Err(Error::unsupported(
-+                        "ALTER TABLE add_column is not yet supported on tables with \
-+                         column mapping enabled",
-+                    ));
-+                }
-+                if field.is_metadata_column() {
-+                    return Err(Error::schema(format!(
-+                        "Cannot add column '{}': metadata columns are not allowed in \
-+                         a table schema",
-+                        field.name()
-+                    )));
-+                }
-+                let key = field.name().to_lowercase();
-+                if fields.contains_key(&key) {
-+                    return Err(Error::schema(format!(
-+                        "Cannot add column '{}': a column with that name already exists",
-+                        field.name()
-+                    )));
-+                }
-+                // Validate field is nullable (Delta protocol requires added columns to be
-+                // nullable so existing data files can return NULL for the new column)
-+                // NOTE: non-nullable columns depend on invariants feature
-+                if !field.is_nullable() {
-+                    return Err(Error::schema(format!(
-+                        "Cannot add non-nullable column '{}'. Added columns must be nullable \
-+                         because existing data files do not contain this column.",
-+                        field.name()
-+                    )));
-+                }
-+                fields.insert(key, field);
-+            }
-+        }
-+    }
-+
-+    let evolved_schema = StructType::try_new(fields.into_values())?;
-+
-+    validate_schema(&evolved_schema, column_mapping_mode)?;
-+    Ok(SchemaEvolutionResult {
-+        schema: evolved_schema.into(),
-+    })
-+}
-+
-+#[cfg(test)]
-+mod tests {
-+    use rstest::rstest;
-+
-+    use super::*;
-+    use crate::schema::{DataType, MetadataColumnSpec, StructField, StructType};
-+
-+    fn simple_schema() -> StructType {
-+        StructType::try_new(vec![
-+            StructField::not_null("id", DataType::INTEGER),
-+            StructField::nullable("name", DataType::STRING),
-+        ])
-+        .unwrap()
-+    }
-+
-+    fn add_col(name: &str, nullable: bool) -> SchemaOperation {
-+        let field = if nullable {
-+            StructField::nullable(name, DataType::STRING)
-+        } else {
-+            StructField::not_null(name, DataType::STRING)
-+        };
-+        SchemaOperation::AddColumn { field }
-+    }
-+
-+    // Builds a struct column whose nested leaf field has the given name. Used to prove that
-+    // `validate_schema` (not just the top-level dup check or `StructType::try_new`) is
-+    // reached from `apply_schema_operations`.
-+    fn add_struct_with_nested_leaf(name: &str, leaf_name: &str) -> SchemaOperation {
-+        let inner =
-+            StructType::try_new(vec![StructField::nullable(leaf_name, DataType::STRING)]).unwrap();
-+        SchemaOperation::AddColumn {
-+            field: StructField::nullable(name, inner),
-+        }
-+    }
-+
-+    #[rstest]
-+    #[case::dup_exact(vec![add_col("name", true)], "already exists")]
-+    #[case::dup_case_insensitive(vec![add_col("Name", true)], "already exists")]
-+    #[case::dup_within_batch(
-+        vec![add_col("email", true), add_col("email", true)],
-+        "already exists"
-+    )]
-+    #[case::non_nullable(vec![add_col("age", false)], "non-nullable")]
-+    #[case::invalid_parquet_char(vec![add_col("foo,bar", true)], "invalid character")]
-+    #[case::nested_invalid_parquet_char(
-+        vec![add_struct_with_nested_leaf("addr", "bad,leaf")],
-+        "invalid character"
-+    )]
-+    #[case::metadata_column(
-+        vec![SchemaOperation::AddColumn {
-+            field: StructField::create_metadata_column("row_idx", MetadataColumnSpec::RowIndex),
-+        }],
-+        "metadata columns are not allowed"
-+    )]
-+    fn apply_schema_operations_rejects(
-+        #[case] ops: Vec<SchemaOperation>,
-+        #[case] error_contains: &str,
-+    ) {
-+        let err =
-+            apply_schema_operations(simple_schema(), ops, ColumnMappingMode::None).unwrap_err();
-+        assert!(err.to_string().contains(error_contains));
-+    }
-+
-+    #[rstest]
-+    #[case::single(vec![add_col("email", true)], &["id", "name", "email"])]
-+    #[case::multiple(
-+        vec![add_col("email", true), add_col("age", true)],
-+        &["id", "name", "email", "age"]
-+    )]
-+    fn apply_schema_operations_succeeds(
-+        #[case] ops: Vec<SchemaOperation>,
-+        #[case] expected_names: &[&str],
-+    ) {
-+        let result =
-+            apply_schema_operations(simple_schema(), ops, ColumnMappingMode::None).unwrap();
-+        let actual: Vec<&str> = result.schema.fields().map(|f| f.name().as_str()).collect();
-+        assert_eq!(&actual, expected_names);
-+    }
-+}
\ No newline at end of file
kernel/tests/README.md
@@ -1,31 +0,0 @@
-diff --git a/kernel/tests/README.md b/kernel/tests/README.md
---- a/kernel/tests/README.md
-+++ b/kernel/tests/README.md
- 
- | Table | Location | Schema | Protocol (R/W) | Features | Description | Tests |
- |-------|----------|--------|----------|----------|-------------|-------|
--| `table-with-dv-small` | data/ | `value: int` | v3/v7 | r:`deletionVectors` w:`deletionVectors` | 10 rows, 2 soft-deleted by DV, 8 visible. Most heavily referenced test table. | `dv.rs::test_table_scan(with_dv)`, `write.rs::test_remove_files_adds_expected_entries`, `write.rs::test_update_deletion_vectors_adds_expected_entries`, `read.rs::with_predicate_and_removes`, `path.rs::test_to_uri/test_child/test_child_escapes`, `snapshot.rs::test_snapshot_read_metadata/test_new_snapshot/test_snapshot_new_from/test_read_table_with_missing_last_checkpoint/test_log_compaction_writer`, `deletion_vector.rs` tests, `transaction/mod.rs::setup_dv_enabled_table/test_add_files_schema/test_new_deletion_vector_path`, `default/parquet.rs` read test, `default/json.rs` read test, `log_compaction/tests.rs::create_mock_snapshot`, `resolve_dvs.rs` tests |
-+| `table-with-dv-small` | data/ | `value: int` | v3/v7 | r:`deletionVectors` w:`deletionVectors` | 10 rows, 2 soft-deleted by DV, 8 visible. Most heavily referenced test table. | `dv.rs::test_table_scan(with_dv)`, `write_remove_dv.rs::test_remove_files_adds_expected_entries`, `write_remove_dv.rs::test_update_deletion_vectors_adds_expected_entries`, `read.rs::with_predicate_and_removes`, `path.rs::test_to_uri/test_child/test_child_escapes`, `snapshot.rs::test_snapshot_read_metadata/test_new_snapshot/test_snapshot_new_from/test_read_table_with_missing_last_checkpoint/test_log_compaction_writer`, `deletion_vector.rs` tests, `transaction/mod.rs::setup_dv_enabled_table/test_add_files_schema/test_new_deletion_vector_path`, `default/parquet.rs` read test, `default/json.rs` read test, `log_compaction/tests.rs::create_mock_snapshot`, `resolve_dvs.rs` tests |
- | `table-without-dv-small` | data/ | `value: long` | v1/v2 | | 10 rows, all visible. Companion to table-with-dv-small. | `dv.rs::test_table_scan(without_dv)`, `transaction/mod.rs::setup_non_dv_table/create_existing_table_txn/test_commit_io_error_returns_retryable_transaction`, `sequential_phase.rs::test_sequential_v2_with_commits_only/test_sequential_finish_before_exhaustion_error`, `parallel_phase.rs` tests, `scan/tests.rs::test_scan_metadata_paths/test_scan_metadata/test_scan_metadata_from_same_version` |
- | `with-short-dv` | data/ | `id: long, value: string, timestamp: timestamp, rand: double` | v3/v7 | r:`deletionVectors` w:`deletionVectors` | 2 files x 5 rows. First file has inline DV (`storageType="u"`) deleting 3 rows. | `read.rs::short_dv` |
- | `dv-partitioned-with-checkpoint` | golden_data/ | `value: int, part: int` partitioned by `part` | v3/v7 | r:`deletionVectors` w:`deletionVectors` | DVs on a partitioned table with a checkpoint | `golden_tables.rs::golden_test!` |
- 
- | Table | Location | Schema | Protocol (R/W) | Features | Description | Tests |
- |-------|----------|--------|----------|----------|-------------|-------|
--| `partition_cm/none` | data/ | `value: int, category: string` partitioned by `category` | v1/v1 | `columnMapping.mode=none` | Partitioned write with CM disabled | `write.rs::test_column_mapping_partitioned_write(cm_none)` |
--| `partition_cm/id` | data/ | `value: int, category: string` partitioned by `category` | v3/v7 | r:`columnMapping` w:`columnMapping`, `columnMapping.mode=id` | Partitioned write with CM id mode | `write.rs::test_column_mapping_partitioned_write(cm_id)` |
--| `partition_cm/name` | data/ | `value: int, category: string` partitioned by `category` | v3/v7 | r:`columnMapping` w:`columnMapping`, `columnMapping.mode=name` | Partitioned write with CM name mode | `write.rs::test_column_mapping_partitioned_write(cm_name)` |
-+| `partition_cm/none` | data/ | `value: int, category: string` partitioned by `category` | v1/v1 | `columnMapping.mode=none` | Partitioned write with CM disabled | `write_column_mapping.rs::test_column_mapping_partitioned_write(cm_none)` |
-+| `partition_cm/id` | data/ | `value: int, category: string` partitioned by `category` | v3/v7 | r:`columnMapping` w:`columnMapping`, `columnMapping.mode=id` | Partitioned write with CM id mode | `write_column_mapping.rs::test_column_mapping_partitioned_write(cm_id)` |
-+| `partition_cm/name` | data/ | `value: int, category: string` partitioned by `category` | v3/v7 | r:`columnMapping` w:`columnMapping`, `columnMapping.mode=name` | Partitioned write with CM name mode | `write_column_mapping.rs::test_column_mapping_partitioned_write(cm_name)` |
- | `table-with-columnmapping-mode-name` | golden_data/ | `ByteType: byte, ShortType: short, IntegerType: int, LongType: long, FloatType: float, DoubleType: double, decimal: decimal(10,2), BooleanType: boolean, StringType: string, BinaryType: binary, DateType: date, TimestampType: timestamp, nested_struct: struct{aa: string, ac: struct{aca: int}}, array_of_prims: array<int>, array_of_arrays: array<array<int>>, array_of_structs: array<struct{ab: long}>, map_of_prims: map<int,long>, map_of_rows: map<int,struct{ab: long}>, map_of_arrays: map<long,array<int>>` | v2/v5 | `columnMapping.mode=name` | Column mapping name mode | `golden_tables.rs::golden_test!` |
- | `table-with-columnmapping-mode-id` | golden_data/ | `ByteType: byte, ShortType: short, IntegerType: int, LongType: long, FloatType: float, DoubleType: double, decimal: decimal(10,2), BooleanType: boolean, StringType: string, BinaryType: binary, DateType: date, TimestampType: timestamp, nested_struct: struct{aa: string, ac: struct{aca: int}}, array_of_prims: array<int>, array_of_arrays: array<array<int>>, array_of_structs: array<struct{ab: long}>, map_of_prims: map<int,long>, map_of_rows: map<int,struct{ab: long}>, map_of_arrays: map<long,array<int>>` | v2/v5 | `columnMapping.mode=id` | Column mapping id mode | `golden_tables.rs::golden_test!` |
- 
- | Table | Location | Schema | Protocol (R/W) | Features | Description | Tests |
- |-------|----------|--------|----------|----------|-------------|-------|
- | `with_checkpoint_no_last_checkpoint` | data/ | `letter: string, int: long, date: date` | v1/v2 | `checkpointInterval=2` | Checkpoint at v2 but missing `_last_checkpoint` hint file | `snapshot.rs::test_read_table_with_checkpoint`, `scan/tests.rs::test_scan_with_checkpoint`, `sequential_phase.rs::test_sequential_checkpoint_no_commits`, `checkpoint_manifest.rs` tests, `sync/parquet.rs` test, `default/parquet.rs` test |
--| `external-table-different-nullability` | data/ | `i: int` | v1/v2 | `checkpointInterval=2` | Parquet files have different nullability than Delta schema; includes checkpoint | `write.rs::test_checkpoint_non_kernel_written_table` |
-+| `external-table-different-nullability` | data/ | `i: int` | v1/v2 | `checkpointInterval=2` | Parquet files have different nullability than Delta schema; includes checkpoint | `write_clustered.rs::test_checkpoint_non_kernel_written_table` |
- | `checkpoint` | golden_data/ | `intCol: int` | v1/v2 | | Basic checkpoint read | `golden_tables.rs::golden_test!(checkpoint_test)` |
- | `corrupted-last-checkpoint-kernel` | golden_data/ | `id: long` | v1/v2 | | Corrupted `_last_checkpoint` file | `golden_tables.rs::golden_test!` |
- | `multi-part-checkpoint` | golden_data/ | `id: long` | v1/v2 | `checkpointInterval=1` | Multi-part checkpoint files | `golden_tables.rs::golden_test!` |
\ No newline at end of file

... (truncated, output exceeded 60000 bytes)

Reproduce locally: git range-diff ac9dc19..7546790 6486bd2..6245673 | Disable: git config gitstack.push-range-diff false

@lorenarosati
Copy link
Copy Markdown
Collaborator Author

Range-diff: main (6245673 -> 12c1bf6)
kernel/src/expressions/scalars.rs
@@ -4,8 +4,7 @@
                      _ => unreachable!(),
                  }
              }
-+            // Geometry/Geography are not valid partition column types, so there is no
-+            // partition-value string format to parse here
++            // Kernel does not support parsing text into Geometry/Geography types
 +            Geometry(_) | Geography(_) => Err(Error::Unsupported(format!(
 +                "parse_scalar is not supported for {self:?}"
 +            ))),
kernel/src/table_configuration.rs
@@ -9,38 +9,6 @@
      validate_timestamp_ntz_feature_support, ColumnMappingMode, EnablementCheck, FeatureRequirement,
      FeatureType, KernelSupport, Operation, TableFeature, LEGACY_READER_FEATURES,
      LEGACY_WRITER_FEATURES, MAX_VALID_READER_VERSION, MAX_VALID_WRITER_VERSION,
-         version: Version,
-     ) -> DeltaResult<Self> {
-         let logical_schema = Arc::new(metadata.parse_schema()?);
-+        Self::try_new_inner(metadata, protocol, table_root, version, logical_schema)
-+    }
-+
-+    /// Like [`try_new`](Self::try_new), but reuses `base`'s protocol, table root, and version
-+    /// and takes a pre-parsed `logical_schema`.
-+    pub(crate) fn try_new_with_schema(
-+        base: &Self,
-+        metadata: Metadata,
-+        logical_schema: SchemaRef,
-+    ) -> DeltaResult<Self> {
-+        Self::try_new_inner(
-+            metadata,
-+            base.protocol.clone(),
-+            base.table_root.clone(),
-+            base.version,
-+            logical_schema,
-+        )
-+    }
-+
-+    fn try_new_inner(
-+        metadata: Metadata,
-+        protocol: Protocol,
-+        table_root: Url,
-+        version: Version,
-+        logical_schema: SchemaRef,
-+    ) -> DeltaResult<Self> {
-         let table_properties = metadata.parse_table_properties();
-         let column_mapping_mode = column_mapping_mode(&protocol, &table_properties);
- 
  
          // Validate schema against protocol features now that we have a TC instance.
          validate_timestamp_ntz_feature_support(&table_config)?;
kernel/src/actions/mod.rs
@@ -1,32 +0,0 @@
-diff --git a/kernel/src/actions/mod.rs b/kernel/src/actions/mod.rs
---- a/kernel/src/actions/mod.rs
-+++ b/kernel/src/actions/mod.rs
- }
- 
- // Serde derives are needed for CRC file deserialization (see `crc::reader`).
-+//
-+// TODO(#2446): `Metadata` stores the schema only as a JSON string. Callers that already hold
-+// a parsed `SchemaRef` (e.g. CREATE TABLE) serialize into `schema_string` and then re-parse
-+// downstream in `TableConfiguration::try_new` via `parse_schema()`. Caching the parsed schema
-+// on `Metadata` would eliminate the round-trip.
- #[derive(Debug, Default, Clone, PartialEq, Eq, Serialize, Deserialize, ToSchema)]
- #[serde(rename_all = "camelCase")]
- #[internal_api]
-         TableProperties::from(self.configuration.iter())
-     }
- 
-+    /// Returns a new Metadata with the schema replaced, preserving all other fields.
-+    ///
-+    /// # Errors
-+    ///
-+    /// Returns an error if schema serialization fails.
-+    pub(crate) fn with_schema(self, schema: SchemaRef) -> DeltaResult<Self> {
-+        Ok(Self {
-+            schema_string: serde_json::to_string(&schema)?,
-+            ..self
-+        })
-+    }
-+
-     #[cfg(test)]
-     #[allow(clippy::too_many_arguments)]
-     pub(crate) fn new_unchecked(
\ No newline at end of file
kernel/src/engine/arrow_expression/evaluate_expression.rs
@@ -1,154 +0,0 @@
-diff --git a/kernel/src/engine/arrow_expression/evaluate_expression.rs b/kernel/src/engine/arrow_expression/evaluate_expression.rs
---- a/kernel/src/engine/arrow_expression/evaluate_expression.rs
-+++ b/kernel/src/engine/arrow_expression/evaluate_expression.rs
-         (Literal(scalar), _) => {
-             validate_array_type(scalar.to_array(batch.num_rows())?, result_type)
-         }
--        (Column(name), _) => {
--            // Column extraction uses ordinal-based struct validation because column mapping
--            // can cause physical/logical name mismatches. apply_schema handles renaming.
--            let arr = extract_column(batch, name)?;
--            if let Some(expected) = result_type {
--                ensure_data_types(expected, arr.data_type(), ValidationMode::TypesOnly)?;
--            }
--            Ok(arr)
--        }
-+        (Column(name), _) => validate_array_type(extract_column(batch, name)?, result_type),
-         (Struct(fields, nullability), Some(DataType::Struct(output_schema))) => {
-             evaluate_struct_expression(fields, batch, output_schema, nullability.as_ref())
-         }
-     }
- 
-     #[test]
--    fn column_extract_struct_with_mismatched_field_names() {
-+    fn column_extract_struct_rejects_mismatched_field_names() {
-         let batch = make_struct_batch(
-             vec![
-                 ArrowField::new("col-abc-001", ArrowDataType::Int64, true),
-             ],
-         );
- 
--        // Logical names differ from physical names due to column mapping
-         let logical_type = DataType::try_struct_type([
-             StructField::nullable("my_column", DataType::LONG),
-             StructField::nullable("other_column", DataType::LONG),
- 
-         let expr = column_expr!("stats");
-         let result = evaluate_expression(&expr, &batch, Some(&logical_type));
--
--        // Ordinal-based validation passes: same field count and types by position.
--        // The downstream apply_schema transformation handles renaming.
--        let arr = result.expect("should succeed with mismatched names but matching types");
--        let struct_arr = arr.as_any().downcast_ref::<StructArray>().unwrap();
--        assert_eq!(struct_arr.num_columns(), 2);
--        assert_eq!(struct_arr.len(), 2);
--    }
--
--    #[test]
--    fn column_extract_struct_rejects_mismatched_field_count() {
--        let batch = make_struct_batch(
--            vec![ArrowField::new("col-abc-001", ArrowDataType::Int64, true)],
--            vec![Arc::new(Int64Array::from(vec![Some(1), Some(2)]))],
--        );
--
--        let logical_type = DataType::try_struct_type([
--            StructField::nullable("a", DataType::LONG),
--            StructField::nullable("b", DataType::LONG),
--        ])
--        .unwrap();
--
--        let expr = column_expr!("stats");
--        let result = evaluate_expression(&expr, &batch, Some(&logical_type));
--        assert_result_error_with_message(result, "Struct field count mismatch");
-+        assert_result_error_with_message(result, "Missing Struct fields");
-     }
- 
-     #[test]
-     fn column_extract_struct_rejects_mismatched_child_types() {
-         let batch = make_struct_batch(
-             vec![
--                ArrowField::new("col-abc-001", ArrowDataType::Int64, true),
--                ArrowField::new("col-abc-002", ArrowDataType::Utf8, true),
-+                ArrowField::new("a", ArrowDataType::Int64, true),
-+                ArrowField::new("b", ArrowDataType::Utf8, true),
-             ],
-             vec![
-                 Arc::new(Int64Array::from(vec![Some(1)])),
-             ],
-         );
- 
--        // Expect two LONG columns, but the second arrow field is Utf8
-         let logical_type = DataType::try_struct_type([
-             StructField::nullable("a", DataType::LONG),
-             StructField::nullable("b", DataType::LONG),
-     }
- 
-     #[test]
--    fn column_extract_struct_with_matching_names_still_works() {
-+    fn column_extract_struct_with_matching_names_works() {
-         let batch = make_struct_batch(
-             vec![
-                 ArrowField::new("a", ArrowDataType::Int64, true),
-         assert!(result.is_ok());
-     }
- 
--    /// Exercises the exact code path from `get_add_transform_expr` where a `struct_from`
--    /// expression wraps `column_expr!("add.stats_parsed")`. When the checkpoint parquet has
--    /// stats_parsed with physical column names (e.g. `col-abc-001`) but the output schema
--    /// uses logical names (e.g. `id`), `evaluate_struct_expression` calls
--    /// `evaluate_expression(Column, struct_result_type)` with mismatched field names.
--    /// Without ordinal-based validation this fails with a name mismatch error.
-+    /// When a `struct_from` expression wraps a `Column` referencing stats_parsed, and the
-+    /// checkpoint parquet has physical column names (e.g. `col-abc-001`) but the output schema
-+    /// uses logical names (e.g. `id`), name-based validation correctly rejects the mismatch.
-     #[test]
--    fn struct_from_with_column_tolerates_nested_name_mismatch() {
--        // Build a batch mimicking checkpoint data: add.stats_parsed uses physical names
-+    fn struct_from_with_column_rejects_nested_name_mismatch() {
-         let stats_fields: Vec<ArrowField> = vec![
-             ArrowField::new("col-abc-001", ArrowDataType::Int64, true),
-             ArrowField::new("col-abc-002", ArrowDataType::Int64, true),
-         )]);
-         let batch = RecordBatch::try_new(Arc::new(schema), vec![Arc::new(add_struct)]).unwrap();
- 
--        // struct_from mimicking get_add_transform_expr: wraps a Column referencing stats_parsed
-         let expr = Expr::struct_from([
-             column_expr_ref!("add.path"),
-             column_expr_ref!("add.stats_parsed"),
-         .unwrap();
- 
-         let result = evaluate_expression(&expr, &batch, Some(&output_type));
--        result.expect("struct_from with Column sub-expression should tolerate field name mismatch");
--    }
--
--    #[test]
--    fn column_extract_nested_struct_with_mismatched_names() {
--        let inner_fields = vec![ArrowField::new("phys-inner", ArrowDataType::Int64, true)];
--        let inner_struct = ArrowDataType::Struct(inner_fields.clone().into());
--        let batch = make_struct_batch(
--            vec![ArrowField::new("phys-outer", inner_struct, true)],
--            vec![Arc::new(
--                StructArray::try_new(
--                    inner_fields.into(),
--                    vec![Arc::new(Int64Array::from(vec![Some(42)]))],
--                    None,
--                )
--                .unwrap(),
--            )],
--        );
--
--        let logical_type = DataType::try_struct_type([StructField::nullable(
--            "logical_outer",
--            DataType::struct_type_unchecked([StructField::nullable(
--                "logical_inner",
--                DataType::LONG,
--            )]),
--        )])
--        .unwrap();
--
--        let expr = column_expr!("stats");
--        let result = evaluate_expression(&expr, &batch, Some(&logical_type));
--        assert!(result.is_ok());
-+        assert_result_error_with_message(result, "Missing Struct fields");
-     }
- }
\ No newline at end of file
kernel/src/engine/ensure_data_types.rs
@@ -1,13 +0,0 @@
-diff --git a/kernel/src/engine/ensure_data_types.rs b/kernel/src/engine/ensure_data_types.rs
---- a/kernel/src/engine/ensure_data_types.rs
-+++ b/kernel/src/engine/ensure_data_types.rs
- #[internal_api]
- pub(crate) enum ValidationMode {
-     /// Check types only. Struct fields are matched by ordinal position, not by name.
--    /// Nullability and metadata are not checked. Used by the expression evaluator where
--    /// column mapping can cause physical/logical name mismatches.
-+    /// Nullability and metadata are not checked.
-+    #[allow(dead_code)]
-     TypesOnly,
-     /// Check types and match struct fields by name, but skip nullability and metadata.
-     /// Used by the parquet reader where fields are already resolved by name upstream.
\ No newline at end of file
kernel/src/schema/validation.rs
@@ -1,48 +0,0 @@
-diff --git a/kernel/src/schema/validation.rs b/kernel/src/schema/validation.rs
---- a/kernel/src/schema/validation.rs
-+++ b/kernel/src/schema/validation.rs
--//! Schema validation utilities for Delta table creation.
-+//! Schema validation utilities shared by table creation and schema evolution.
- //!
- //! Validates schemas per the Delta protocol specification.
- 
- /// These characters have special meaning in Parquet schema syntax.
- const INVALID_PARQUET_CHARS: &[char] = &[' ', ',', ';', '{', '}', '(', ')', '\n', '\t', '='];
- 
--/// Validates a schema for table creation.
-+/// Validates a schema for CREATE TABLE or ALTER TABLE.
- ///
- /// Performs the following checks:
- /// 1. Schema is non-empty
- /// 3. Column names contain only valid characters
- /// 4. Rejects fields with `delta.invariants` metadata (SQL expression invariants are not supported
- ///    by kernel; see `TableConfiguration::ensure_write_supported`)
--pub(crate) fn validate_schema_for_create(
-+pub(crate) fn validate_schema(
-     schema: &StructType,
-     column_mapping_mode: ColumnMappingMode,
- ) -> DeltaResult<()> {
-     #[case::dot_in_name_with_cm(schema_with_dot(), ColumnMappingMode::Name)]
-     #[case::different_struct_children(schema_different_struct_children(), ColumnMappingMode::None)]
-     fn valid_schema_accepted(#[case] schema: StructType, #[case] cm: ColumnMappingMode) {
--        assert!(validate_schema_for_create(&schema, cm).is_ok());
-+        assert!(validate_schema(&schema, cm).is_ok());
-     }
- 
-     // === Invalid schemas ===
-         #[case] cm: ColumnMappingMode,
-         #[case] expected_errs: &[&str],
-     ) {
--        let result = validate_schema_for_create(&schema, cm);
-+        let result = validate_schema(&schema, cm);
-         assert!(result.is_err());
-         let err = result.unwrap_err().to_string();
-         for expected in expected_errs {
-     #[case::array_nested(schema_array_nested_invariant(), "arr.child")]
-     #[case::map_nested(schema_map_nested_invariant(), "map.child")]
-     fn invariants_metadata_rejected(#[case] schema: StructType, #[case] expected_path: &str) {
--        let result = validate_schema_for_create(&schema, ColumnMappingMode::None);
-+        let result = validate_schema(&schema, ColumnMappingMode::None);
-         let err = result.expect_err("expected delta.invariants metadata rejection");
-         let msg = err.to_string();
-         assert!(
\ No newline at end of file
kernel/src/snapshot/mod.rs
@@ -1,27 +0,0 @@
-diff --git a/kernel/src/snapshot/mod.rs b/kernel/src/snapshot/mod.rs
---- a/kernel/src/snapshot/mod.rs
-+++ b/kernel/src/snapshot/mod.rs
- use crate::table_configuration::{InCommitTimestampEnablement, TableConfiguration};
- use crate::table_features::{physical_to_logical_column_name, ColumnMappingMode, TableFeature};
- use crate::table_properties::TableProperties;
-+use crate::transaction::builder::alter_table::AlterTableTransactionBuilder;
- use crate::transaction::Transaction;
- use crate::utils::require;
- use crate::{DeltaResult, Engine, Error, LogCompactionWriter, Version};
-         Transaction::try_new_existing_table(self, committer, engine)
-     }
- 
-+    /// Creates a builder for altering this table's metadata. Currently supports schema change
-+    /// operations.
-+    ///
-+    /// The returned builder allows chaining operations before building an
-+    /// [`AlterTableTransaction`] that can be committed.
-+    ///
-+    /// [`AlterTableTransaction`]: crate::transaction::AlterTableTransaction
-+    pub fn alter_table(self: Arc<Self>) -> AlterTableTransactionBuilder {
-+        AlterTableTransactionBuilder::new(self)
-+    }
-+
-     /// Fetch the latest version of the provided `application_id` for this snapshot. Filters the
-     /// txn based on the delta.setTransactionRetentionDuration property and lastUpdated.
-     ///
\ No newline at end of file
kernel/src/transaction/alter_table.rs
@@ -1,81 +0,0 @@
-diff --git a/kernel/src/transaction/alter_table.rs b/kernel/src/transaction/alter_table.rs
-new file mode 100644
---- /dev/null
-+++ b/kernel/src/transaction/alter_table.rs
-+//! Alter table transaction types and constructor.
-+//!
-+//! This module defines the [`AlterTableTransaction`] type alias and the
-+//! [`try_new_alter_table`](AlterTableTransaction::try_new_alter_table) constructor.
-+//! The builder logic lives in [`builder::alter_table`](super::builder::alter_table).
-+
-+#![allow(unreachable_pub)]
-+
-+use std::marker::PhantomData;
-+use std::sync::OnceLock;
-+
-+use crate::committer::Committer;
-+use crate::snapshot::SnapshotRef;
-+use crate::table_configuration::TableConfiguration;
-+use crate::transaction::{AlterTable, Transaction};
-+use crate::utils::current_time_ms;
-+use crate::DeltaResult;
-+
-+/// A type alias for alter-table transactions.
-+///
-+/// This provides a restricted API surface that only exposes operations valid during ALTER
-+/// commands. Data file operations are not available at compile time because `AlterTable`
-+/// does not implement [`SupportsDataFiles`](super::SupportsDataFiles).
-+pub type AlterTableTransaction = Transaction<AlterTable>;
-+
-+impl AlterTableTransaction {
-+    /// Create a new transaction for altering a table's schema. Produces a metadata-only commit
-+    /// that emits an updated Metadata action with the evolved schema.
-+    ///
-+    /// The `effective_table_config` is the evolved table configuration (new schema, same
-+    /// protocol). It must be fully validated before calling this constructor (e.g. schema
-+    /// operations applied, protocol feature checks passed). The `read_snapshot` provides the
-+    /// pre-commit table state (version, previous protocol/metadata, ICT timestamps) used for
-+    /// commit versioning and post-commit snapshots.
-+    ///
-+    /// This is typically called via `AlterTableTransactionBuilder::build()` rather than directly.
-+    pub(crate) fn try_new_alter_table(
-+        read_snapshot: SnapshotRef,
-+        effective_table_config: TableConfiguration,
-+        committer: Box<dyn Committer>,
-+    ) -> DeltaResult<Self> {
-+        let span = tracing::info_span!(
-+            "txn",
-+            path = %read_snapshot.table_root(),
-+            read_version = read_snapshot.version(),
-+            operation = "ALTER TABLE",
-+        );
-+
-+        Ok(Transaction {
-+            span,
-+            read_snapshot_opt: Some(read_snapshot),
-+            effective_table_config,
-+            should_emit_protocol: false,
-+            should_emit_metadata: true,
-+            committer,
-+            operation: Some("ALTER TABLE".to_string()),
-+            engine_info: None,
-+            add_files_metadata: vec![],
-+            remove_files_metadata: vec![],
-+            set_transactions: vec![],
-+            commit_timestamp: current_time_ms()?,
-+            user_domain_metadata_additions: vec![],
-+            system_domain_metadata_additions: vec![],
-+            user_domain_removals: vec![],
-+            data_change: false,
-+            shared_write_state: OnceLock::new(),
-+            engine_commit_info: None,
-+            // TODO(#2446): match delta-spark's per-op isBlindAppend policy
-+            // (ADD/DROP/DROP NOT NULL -> true, SET NOT NULL -> false). Hardcoded false for
-+            // now: safe, but misses the true-case optimization delta-spark applies.
-+            is_blind_append: false,
-+            dv_matched_files: vec![],
-+            physical_clustering_columns: None,
-+            _state: PhantomData,
-+        })
-+    }
-+}
\ No newline at end of file
kernel/src/transaction/builder/alter_table.rs
@@ -1,168 +0,0 @@
-diff --git a/kernel/src/transaction/builder/alter_table.rs b/kernel/src/transaction/builder/alter_table.rs
-new file mode 100644
---- /dev/null
-+++ b/kernel/src/transaction/builder/alter_table.rs
-+//! Builder for ALTER TABLE (schema evolution) transactions.
-+//!
-+//! This module contains [`AlterTableTransactionBuilder`], which uses a type-state pattern to
-+//! enforce valid operation chaining at compile time.
-+//!
-+//! # Type States
-+//!
-+//! - [`Ready`]: Initial state. Operations are available, but `build()` is not (at least one
-+//!   operation is required).
-+//! - [`Modifying`]: After any chainable schema operation. More ops can be chained, and `build()` is
-+//!   available. See [`AlterTableTransactionBuilder<Modifying>`] for ops.
-+//!
-+//! # Transitions
-+//!
-+//! Each `impl` block below is gated by a state bound and documents which operations that
-+//! state enables. Chainable schema operations live on `impl<S: Chainable>` and transition
-+//! the builder to a chainable state; `build()` lives on states that are buildable.
-+//!
-+//! ```ignore
-+//! // Allowed: at least one op queued before build().
-+//! snapshot.alter_table().add_column(field).build(engine, committer)?;
-+//!
-+//! // Not allowed: build() is not defined on Ready (no ops queued).
-+//! snapshot.alter_table().build(engine, committer)?;  // compile error
-+//! ```
-+
-+use std::marker::PhantomData;
-+use std::sync::Arc;
-+
-+use crate::committer::Committer;
-+use crate::schema::StructField;
-+use crate::snapshot::SnapshotRef;
-+use crate::table_configuration::TableConfiguration;
-+use crate::table_features::Operation;
-+use crate::transaction::alter_table::AlterTableTransaction;
-+use crate::transaction::schema_evolution::{
-+    apply_schema_operations, SchemaEvolutionResult, SchemaOperation,
-+};
-+use crate::{DeltaResult, Engine};
-+
-+/// Initial state: `build()` is not yet available (at least one operation is required).
-+/// See [`Chainable`] for the operations available on this state.
-+pub struct Ready;
-+
-+/// State after at least one operation has been added. `build()` is available.
-+/// See [`Chainable`] for the operations available on this state.
-+pub struct Modifying;
-+
-+/// Marker trait for builder states that accept chainable schema operations. Grouping states
-+/// under one bound lets each op (like `add_column`) live on a single `impl<S: Chainable>`
-+/// block -- chainable states share the body rather than duplicating it per state.
-+///
-+/// Sealed: external types cannot implement this, keeping the set of chainable states closed.
-+pub trait Chainable: sealed::Sealed {}
-+impl Chainable for Ready {}
-+impl Chainable for Modifying {}
-+
-+mod sealed {
-+    pub trait Sealed {}
-+    impl Sealed for super::Ready {}
-+    impl Sealed for super::Modifying {}
-+}
-+
-+/// Builder for constructing an [`AlterTableTransaction`] with schema evolution operations.
-+///
-+/// Uses a type-state pattern (`S`) to enforce at compile time:
-+/// - At least one schema operation must be queued before `build()` is callable.
-+/// - Only operations valid for the current state can be chained. This will disallow incompatibel
-+///   chaining.
-+pub struct AlterTableTransactionBuilder<S = Ready> {
-+    snapshot: SnapshotRef,
-+    operations: Vec<SchemaOperation>,
-+    // PhantomData marker for builder state (Ready or Modifying).
-+    // Zero-sized; only affects which methods are available at compile time.
-+    _state: PhantomData<S>,
-+}
-+
-+impl<S> AlterTableTransactionBuilder<S> {
-+    // Reconstructs the builder with a different PhantomData marker, changing which methods
-+    // are available at compile time (e.g. Ready -> Modifying enables `build()`). All real
-+    // fields are moved as-is; only the zero-sized type state changes.
-+    //
-+    // `T` (distinct from the struct's `S`) lets the caller pick the target state:
-+    // `self.transition::<Modifying>()` returns `AlterTableTransactionBuilder<Modifying>`.
-+    fn transition<T>(self) -> AlterTableTransactionBuilder<T> {
-+        AlterTableTransactionBuilder {
-+            snapshot: self.snapshot,
-+            operations: self.operations,
-+            _state: PhantomData,
-+        }
-+    }
-+}
-+
-+impl AlterTableTransactionBuilder<Ready> {
-+    /// Create a new builder from a snapshot.
-+    pub(crate) fn new(snapshot: SnapshotRef) -> Self {
-+        AlterTableTransactionBuilder {
-+            snapshot,
-+            operations: Vec::new(),
-+            _state: PhantomData,
-+        }
-+    }
-+}
-+
-+impl<S: Chainable> AlterTableTransactionBuilder<S> {
-+    /// Add a new top-level column to the table schema.
-+    ///
-+    /// The field must not already exist in the schema (case-insensitive). The field must be
-+    /// nullable because existing data files do not contain this column and will read NULL for it.
-+    /// These constraints are validated during [`build()`](AlterTableTransactionBuilder::build).
-+    pub fn add_column(mut self, field: StructField) -> AlterTableTransactionBuilder<Modifying> {
-+        self.operations.push(SchemaOperation::AddColumn { field });
-+        self.transition()
-+    }
-+}
-+
-+impl AlterTableTransactionBuilder<Modifying> {
-+    /// Validate and apply schema operations, then build the [`AlterTableTransaction`].
-+    ///
-+    /// This method:
-+    /// 1. Validates the table supports writes
-+    /// 2. Applies each operation sequentially against the evolving schema
-+    /// 3. Constructs new Metadata action with evolved schema
-+    /// 4. Builds the evolved table configuration
-+    /// 5. Creates the transaction
-+    ///
-+    /// # Errors
-+    ///
-+    /// - Any individual operation fails validation (see per-method errors above)
-+    /// - Table does not support writes (unsupported features)
-+    /// - The evolved schema requires protocol features not enabled on the table (e.g. adding a
-+    ///   `timestampNtz` column without the `timestampNtz` feature)
-+    pub fn build(
-+        self,
-+        _engine: &dyn Engine,
-+        committer: Box<dyn Committer>,
-+    ) -> DeltaResult<AlterTableTransaction> {
-+        let table_config = self.snapshot.table_configuration();
-+        // Rejects writes to tables kernel can't safely commit to: writer version out of
-+        // kernel's supported range, unsupported writer features, or schemas with SQL-expression
-+        // invariants. Runs on the pre-alter snapshot; future ALTER variants that change the
-+        // protocol must also re-check this on the evolved `TableConfiguration`.
-+        table_config.ensure_operation_supported(Operation::Write)?;
-+
-+        let schema = Arc::unwrap_or_clone(table_config.logical_schema());
-+        let SchemaEvolutionResult {
-+            schema: evolved_schema,
-+        } = apply_schema_operations(schema, self.operations, table_config.column_mapping_mode())?;
-+
-+        let evolved_metadata = table_config
-+            .metadata()
-+            .clone()
-+            .with_schema(evolved_schema.clone())?;
-+
-+        // Validates the evolved metadata against the protocol.
-+        let evolved_table_config = TableConfiguration::try_new_with_schema(
-+            table_config,
-+            evolved_metadata,
-+            evolved_schema,
-+        )?;
-+
-+        AlterTableTransaction::try_new_alter_table(self.snapshot, evolved_table_config, committer)
-+    }
-+}
\ No newline at end of file
kernel/src/transaction/builder/create_table.rs
@@ -1,27 +0,0 @@
-diff --git a/kernel/src/transaction/builder/create_table.rs b/kernel/src/transaction/builder/create_table.rs
---- a/kernel/src/transaction/builder/create_table.rs
-+++ b/kernel/src/transaction/builder/create_table.rs
- use crate::clustering::{create_clustering_domain_metadata, validate_clustering_columns};
- use crate::committer::Committer;
- use crate::expressions::ColumnName;
--use crate::schema::validation::validate_schema_for_create;
-+use crate::schema::validation::validate_schema;
- use crate::schema::variant_utils::schema_contains_variant_type;
- use crate::schema::{
-     normalize_column_names_to_schema_casing, schema_contains_non_null_fields, DataType, SchemaRef,
- /// compatible with Spark readers/writers.
- ///
- /// Explicit `delta.invariants` metadata annotations are rejected by
--/// `validate_schema_for_create`, so this only flips on the feature for nullability-driven
-+/// `validate_schema`, so this only flips on the feature for nullability-driven
- /// invariants. Kernel does not itself enforce the null mask at write time -- it relies on
- /// the engine's `ParquetHandler` to do so. Kernel's default `ParquetHandler` uses
- /// `arrow-rs`, whose `RecordBatch::try_new` rejects null values in fields marked
-             maybe_apply_column_mapping_for_table_create(&self.schema, &mut validated)?;
- 
-         // Validate schema (non-empty, column names, duplicates, no `delta.invariants` metadata)
--        validate_schema_for_create(&effective_schema, column_mapping_mode)?;
-+        validate_schema(&effective_schema, column_mapping_mode)?;
- 
-         // Validate data layout and resolve column names (physical for clustering, logical
-         // for partitioning). Adds required table features for clustering.
\ No newline at end of file
kernel/src/transaction/builder/mod.rs
@@ -1,8 +0,0 @@
-diff --git a/kernel/src/transaction/builder/mod.rs b/kernel/src/transaction/builder/mod.rs
---- a/kernel/src/transaction/builder/mod.rs
-+++ b/kernel/src/transaction/builder/mod.rs
- // and for tests. Also allow dead_code since these are used by integration tests.
- #![allow(unreachable_pub, dead_code)]
- 
-+pub mod alter_table;
- pub mod create_table;
\ No newline at end of file
kernel/src/transaction/mod.rs
@@ -1,35 +0,0 @@
-diff --git a/kernel/src/transaction/mod.rs b/kernel/src/transaction/mod.rs
---- a/kernel/src/transaction/mod.rs
-+++ b/kernel/src/transaction/mod.rs
- #[cfg(not(feature = "internal-api"))]
- pub(crate) mod data_layout;
- 
-+pub(crate) mod alter_table;
-+pub use alter_table::AlterTableTransaction;
- mod commit_info;
- mod domain_metadata;
-+pub(crate) mod schema_evolution;
- mod stats_verifier;
- mod update;
- mod write_context;
- #[derive(Debug)]
- pub struct CreateTable;
- 
-+/// Marker type for alter-table (schema evolution) transactions.
-+///
-+/// Transactions in this state perform metadata-only commits. Data file operations are not
-+/// available at compile time because `AlterTable` does not implement [`SupportsDataFiles`].
-+#[derive(Debug)]
-+pub struct AlterTable;
-+
- /// Marker trait for transaction states that support data file operations.
- ///
- /// Only transaction types that implement this trait can access methods for adding, removing, or
- 
-     // Note: Additional test coverage for partial file matching (where some files in a scan
-     // have DV updates but others don't) is provided by the end-to-end integration test
--    // kernel/tests/dv.rs and kernel/tests/write.rs, which exercises
-+    // kernel/tests/dv.rs and kernel/tests/write_remove_dv.rs, which exercise
-     // the full deletion vector write workflow including the DvMatchVisitor logic.
- 
-     #[test]
\ No newline at end of file
kernel/src/transaction/schema_evolution.rs
@@ -1,190 +0,0 @@
-diff --git a/kernel/src/transaction/schema_evolution.rs b/kernel/src/transaction/schema_evolution.rs
-new file mode 100644
---- /dev/null
-+++ b/kernel/src/transaction/schema_evolution.rs
-+//! Schema evolution operations for ALTER TABLE.
-+//!
-+//! This module defines the [`SchemaOperation`] enum and the [`apply_schema_operations`] function
-+//! that validates and applies schema changes to produce an evolved schema.
-+
-+use indexmap::IndexMap;
-+
-+use crate::error::Error;
-+use crate::schema::validation::validate_schema;
-+use crate::schema::{SchemaRef, StructField, StructType};
-+use crate::table_features::ColumnMappingMode;
-+use crate::DeltaResult;
-+
-+/// A schema evolution operation to be applied during ALTER TABLE.
-+///
-+/// Operations are validated and applied in order during
-+/// [`apply_schema_operations`]. Each operation sees the schema state after all prior operations
-+/// have been applied.
-+#[derive(Debug, Clone)]
-+pub(crate) enum SchemaOperation {
-+    /// Add a top-level column.
-+    AddColumn { field: StructField },
-+}
-+
-+/// The result of applying schema operations.
-+#[derive(Debug)]
-+pub(crate) struct SchemaEvolutionResult {
-+    /// The evolved schema after all operations are applied.
-+    pub schema: SchemaRef,
-+}
-+
-+/// Applies a sequence of schema operations to the given schema, returning the evolved schema.
-+///
-+/// Operations are applied sequentially: each one validates against and modifies the schema
-+/// produced by all preceding operations, not the original input schema.
-+///
-+/// # Errors
-+///
-+/// Returns an error if any operation fails validation. The error message identifies which
-+/// operation failed and why.
-+pub(crate) fn apply_schema_operations(
-+    schema: StructType,
-+    operations: Vec<SchemaOperation>,
-+    column_mapping_mode: ColumnMappingMode,
-+) -> DeltaResult<SchemaEvolutionResult> {
-+    let cm_enabled = column_mapping_mode != ColumnMappingMode::None;
-+    // IndexMap preserves field insertion order. Keys are lowercased for case-insensitive
-+    // duplicate detection; StructFields retain their original casing.
-+    let mut fields: IndexMap<String, StructField> = schema
-+        .into_fields()
-+        .map(|f| (f.name().to_lowercase(), f))
-+        .collect();
-+
-+    for op in operations {
-+        match op {
-+            // Protocol feature checks for the field's data type (e.g. `timestampNtz`) happen
-+            // later when the caller builds a new TableConfiguration from the evolved schema --
-+            // the alter is rejected if the table doesn't already have the required feature
-+            // enabled. This matches Spark, which also rejects with
-+            // `DELTA_FEATURES_REQUIRE_MANUAL_ENABLEMENT` and requires the user to enable the
-+            // feature explicitly before adding such a column.
-+            SchemaOperation::AddColumn { field } => {
-+                // TODO: support column mapping for add_column (assign ID + physical name,
-+                // update delta.columnMapping.maxColumnId).
-+                if cm_enabled {
-+                    return Err(Error::unsupported(
-+                        "ALTER TABLE add_column is not yet supported on tables with \
-+                         column mapping enabled",
-+                    ));
-+                }
-+                if field.is_metadata_column() {
-+                    return Err(Error::schema(format!(
-+                        "Cannot add column '{}': metadata columns are not allowed in \
-+                         a table schema",
-+                        field.name()
-+                    )));
-+                }
-+                let key = field.name().to_lowercase();
-+                if fields.contains_key(&key) {
-+                    return Err(Error::schema(format!(
-+                        "Cannot add column '{}': a column with that name already exists",
-+                        field.name()
-+                    )));
-+                }
-+                // Validate field is nullable (Delta protocol requires added columns to be
-+                // nullable so existing data files can return NULL for the new column)
-+                // NOTE: non-nullable columns depend on invariants feature
-+                if !field.is_nullable() {
-+                    return Err(Error::schema(format!(
-+                        "Cannot add non-nullable column '{}'. Added columns must be nullable \
-+                         because existing data files do not contain this column.",
-+                        field.name()
-+                    )));
-+                }
-+                fields.insert(key, field);
-+            }
-+        }
-+    }
-+
-+    let evolved_schema = StructType::try_new(fields.into_values())?;
-+
-+    validate_schema(&evolved_schema, column_mapping_mode)?;
-+    Ok(SchemaEvolutionResult {
-+        schema: evolved_schema.into(),
-+    })
-+}
-+
-+#[cfg(test)]
-+mod tests {
-+    use rstest::rstest;
-+
-+    use super::*;
-+    use crate::schema::{DataType, MetadataColumnSpec, StructField, StructType};
-+
-+    fn simple_schema() -> StructType {
-+        StructType::try_new(vec![
-+            StructField::not_null("id", DataType::INTEGER),
-+            StructField::nullable("name", DataType::STRING),
-+        ])
-+        .unwrap()
-+    }
-+
-+    fn add_col(name: &str, nullable: bool) -> SchemaOperation {
-+        let field = if nullable {
-+            StructField::nullable(name, DataType::STRING)
-+        } else {
-+            StructField::not_null(name, DataType::STRING)
-+        };
-+        SchemaOperation::AddColumn { field }
-+    }
-+
-+    // Builds a struct column whose nested leaf field has the given name. Used to prove that
-+    // `validate_schema` (not just the top-level dup check or `StructType::try_new`) is
-+    // reached from `apply_schema_operations`.
-+    fn add_struct_with_nested_leaf(name: &str, leaf_name: &str) -> SchemaOperation {
-+        let inner =
-+            StructType::try_new(vec![StructField::nullable(leaf_name, DataType::STRING)]).unwrap();
-+        SchemaOperation::AddColumn {
-+            field: StructField::nullable(name, inner),
-+        }
-+    }
-+
-+    #[rstest]
-+    #[case::dup_exact(vec![add_col("name", true)], "already exists")]
-+    #[case::dup_case_insensitive(vec![add_col("Name", true)], "already exists")]
-+    #[case::dup_within_batch(
-+        vec![add_col("email", true), add_col("email", true)],
-+        "already exists"
-+    )]
-+    #[case::non_nullable(vec![add_col("age", false)], "non-nullable")]
-+    #[case::invalid_parquet_char(vec![add_col("foo,bar", true)], "invalid character")]
-+    #[case::nested_invalid_parquet_char(
-+        vec![add_struct_with_nested_leaf("addr", "bad,leaf")],
-+        "invalid character"
-+    )]
-+    #[case::metadata_column(
-+        vec![SchemaOperation::AddColumn {
-+            field: StructField::create_metadata_column("row_idx", MetadataColumnSpec::RowIndex),
-+        }],
-+        "metadata columns are not allowed"
-+    )]
-+    fn apply_schema_operations_rejects(
-+        #[case] ops: Vec<SchemaOperation>,
-+        #[case] error_contains: &str,
-+    ) {
-+        let err =
-+            apply_schema_operations(simple_schema(), ops, ColumnMappingMode::None).unwrap_err();
-+        assert!(err.to_string().contains(error_contains));
-+    }
-+
-+    #[rstest]
-+    #[case::single(vec![add_col("email", true)], &["id", "name", "email"])]
-+    #[case::multiple(
-+        vec![add_col("email", true), add_col("age", true)],
-+        &["id", "name", "email", "age"]
-+    )]
-+    fn apply_schema_operations_succeeds(
-+        #[case] ops: Vec<SchemaOperation>,
-+        #[case] expected_names: &[&str],
-+    ) {
-+        let result =
-+            apply_schema_operations(simple_schema(), ops, ColumnMappingMode::None).unwrap();
-+        let actual: Vec<&str> = result.schema.fields().map(|f| f.name().as_str()).collect();
-+        assert_eq!(&actual, expected_names);
-+    }
-+}
\ No newline at end of file
kernel/tests/README.md
@@ -1,31 +0,0 @@
-diff --git a/kernel/tests/README.md b/kernel/tests/README.md
---- a/kernel/tests/README.md
-+++ b/kernel/tests/README.md
- 
- | Table | Location | Schema | Protocol (R/W) | Features | Description | Tests |
- |-------|----------|--------|----------|----------|-------------|-------|
--| `table-with-dv-small` | data/ | `value: int` | v3/v7 | r:`deletionVectors` w:`deletionVectors` | 10 rows, 2 soft-deleted by DV, 8 visible. Most heavily referenced test table. | `dv.rs::test_table_scan(with_dv)`, `write.rs::test_remove_files_adds_expected_entries`, `write.rs::test_update_deletion_vectors_adds_expected_entries`, `read.rs::with_predicate_and_removes`, `path.rs::test_to_uri/test_child/test_child_escapes`, `snapshot.rs::test_snapshot_read_metadata/test_new_snapshot/test_snapshot_new_from/test_read_table_with_missing_last_checkpoint/test_log_compaction_writer`, `deletion_vector.rs` tests, `transaction/mod.rs::setup_dv_enabled_table/test_add_files_schema/test_new_deletion_vector_path`, `default/parquet.rs` read test, `default/json.rs` read test, `log_compaction/tests.rs::create_mock_snapshot`, `resolve_dvs.rs` tests |
-+| `table-with-dv-small` | data/ | `value: int` | v3/v7 | r:`deletionVectors` w:`deletionVectors` | 10 rows, 2 soft-deleted by DV, 8 visible. Most heavily referenced test table. | `dv.rs::test_table_scan(with_dv)`, `write_remove_dv.rs::test_remove_files_adds_expected_entries`, `write_remove_dv.rs::test_update_deletion_vectors_adds_expected_entries`, `read.rs::with_predicate_and_removes`, `path.rs::test_to_uri/test_child/test_child_escapes`, `snapshot.rs::test_snapshot_read_metadata/test_new_snapshot/test_snapshot_new_from/test_read_table_with_missing_last_checkpoint/test_log_compaction_writer`, `deletion_vector.rs` tests, `transaction/mod.rs::setup_dv_enabled_table/test_add_files_schema/test_new_deletion_vector_path`, `default/parquet.rs` read test, `default/json.rs` read test, `log_compaction/tests.rs::create_mock_snapshot`, `resolve_dvs.rs` tests |
- | `table-without-dv-small` | data/ | `value: long` | v1/v2 | | 10 rows, all visible. Companion to table-with-dv-small. | `dv.rs::test_table_scan(without_dv)`, `transaction/mod.rs::setup_non_dv_table/create_existing_table_txn/test_commit_io_error_returns_retryable_transaction`, `sequential_phase.rs::test_sequential_v2_with_commits_only/test_sequential_finish_before_exhaustion_error`, `parallel_phase.rs` tests, `scan/tests.rs::test_scan_metadata_paths/test_scan_metadata/test_scan_metadata_from_same_version` |
- | `with-short-dv` | data/ | `id: long, value: string, timestamp: timestamp, rand: double` | v3/v7 | r:`deletionVectors` w:`deletionVectors` | 2 files x 5 rows. First file has inline DV (`storageType="u"`) deleting 3 rows. | `read.rs::short_dv` |
- | `dv-partitioned-with-checkpoint` | golden_data/ | `value: int, part: int` partitioned by `part` | v3/v7 | r:`deletionVectors` w:`deletionVectors` | DVs on a partitioned table with a checkpoint | `golden_tables.rs::golden_test!` |
- 
- | Table | Location | Schema | Protocol (R/W) | Features | Description | Tests |
- |-------|----------|--------|----------|----------|-------------|-------|
--| `partition_cm/none` | data/ | `value: int, category: string` partitioned by `category` | v1/v1 | `columnMapping.mode=none` | Partitioned write with CM disabled | `write.rs::test_column_mapping_partitioned_write(cm_none)` |
--| `partition_cm/id` | data/ | `value: int, category: string` partitioned by `category` | v3/v7 | r:`columnMapping` w:`columnMapping`, `columnMapping.mode=id` | Partitioned write with CM id mode | `write.rs::test_column_mapping_partitioned_write(cm_id)` |
--| `partition_cm/name` | data/ | `value: int, category: string` partitioned by `category` | v3/v7 | r:`columnMapping` w:`columnMapping`, `columnMapping.mode=name` | Partitioned write with CM name mode | `write.rs::test_column_mapping_partitioned_write(cm_name)` |
-+| `partition_cm/none` | data/ | `value: int, category: string` partitioned by `category` | v1/v1 | `columnMapping.mode=none` | Partitioned write with CM disabled | `write_column_mapping.rs::test_column_mapping_partitioned_write(cm_none)` |
-+| `partition_cm/id` | data/ | `value: int, category: string` partitioned by `category` | v3/v7 | r:`columnMapping` w:`columnMapping`, `columnMapping.mode=id` | Partitioned write with CM id mode | `write_column_mapping.rs::test_column_mapping_partitioned_write(cm_id)` |
-+| `partition_cm/name` | data/ | `value: int, category: string` partitioned by `category` | v3/v7 | r:`columnMapping` w:`columnMapping`, `columnMapping.mode=name` | Partitioned write with CM name mode | `write_column_mapping.rs::test_column_mapping_partitioned_write(cm_name)` |
- | `table-with-columnmapping-mode-name` | golden_data/ | `ByteType: byte, ShortType: short, IntegerType: int, LongType: long, FloatType: float, DoubleType: double, decimal: decimal(10,2), BooleanType: boolean, StringType: string, BinaryType: binary, DateType: date, TimestampType: timestamp, nested_struct: struct{aa: string, ac: struct{aca: int}}, array_of_prims: array<int>, array_of_arrays: array<array<int>>, array_of_structs: array<struct{ab: long}>, map_of_prims: map<int,long>, map_of_rows: map<int,struct{ab: long}>, map_of_arrays: map<long,array<int>>` | v2/v5 | `columnMapping.mode=name` | Column mapping name mode | `golden_tables.rs::golden_test!` |
- | `table-with-columnmapping-mode-id` | golden_data/ | `ByteType: byte, ShortType: short, IntegerType: int, LongType: long, FloatType: float, DoubleType: double, decimal: decimal(10,2), BooleanType: boolean, StringType: string, BinaryType: binary, DateType: date, TimestampType: timestamp, nested_struct: struct{aa: string, ac: struct{aca: int}}, array_of_prims: array<int>, array_of_arrays: array<array<int>>, array_of_structs: array<struct{ab: long}>, map_of_prims: map<int,long>, map_of_rows: map<int,struct{ab: long}>, map_of_arrays: map<long,array<int>>` | v2/v5 | `columnMapping.mode=id` | Column mapping id mode | `golden_tables.rs::golden_test!` |
- 
- | Table | Location | Schema | Protocol (R/W) | Features | Description | Tests |
- |-------|----------|--------|----------|----------|-------------|-------|
- | `with_checkpoint_no_last_checkpoint` | data/ | `letter: string, int: long, date: date` | v1/v2 | `checkpointInterval=2` | Checkpoint at v2 but missing `_last_checkpoint` hint file | `snapshot.rs::test_read_table_with_checkpoint`, `scan/tests.rs::test_scan_with_checkpoint`, `sequential_phase.rs::test_sequential_checkpoint_no_commits`, `checkpoint_manifest.rs` tests, `sync/parquet.rs` test, `default/parquet.rs` test |
--| `external-table-different-nullability` | data/ | `i: int` | v1/v2 | `checkpointInterval=2` | Parquet files have different nullability than Delta schema; includes checkpoint | `write.rs::test_checkpoint_non_kernel_written_table` |
-+| `external-table-different-nullability` | data/ | `i: int` | v1/v2 | `checkpointInterval=2` | Parquet files have different nullability than Delta schema; includes checkpoint | `write_clustered.rs::test_checkpoint_non_kernel_written_table` |
- | `checkpoint` | golden_data/ | `intCol: int` | v1/v2 | | Basic checkpoint read | `golden_tables.rs::golden_test!(checkpoint_test)` |
- | `corrupted-last-checkpoint-kernel` | golden_data/ | `id: long` | v1/v2 | | Corrupted `_last_checkpoint` file | `golden_tables.rs::golden_test!` |
- | `multi-part-checkpoint` | golden_data/ | `id: long` | v1/v2 | `checkpointInterval=1` | Multi-part checkpoint files | `golden_tables.rs::golden_test!` |
\ No newline at end of file

... (truncated, output exceeded 60000 bytes)

Reproduce locally: git range-diff ac9dc19..6245673 6486bd2..12c1bf6 | Disable: git config gitstack.push-range-diff false

@lorenarosati lorenarosati requested a review from dengsh12 May 1, 2026 00:21
Comment thread kernel/src/schema/mod.rs
Comment on lines +1562 to +1569
pub fn try_new(srid: &str) -> DeltaResult<Self> {
if srid.is_empty() {
return Err(Error::invalid_geometry("SRID cannot be empty"));
}
Ok(Self {
srid: srid.to_string(),
})
}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This likely needs validation so that we're producing valid Geotypes.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ig this needs validation?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two options for validation:

  1. Fixed set of valid SRID strings that we check against.
  2. use parse/understand geotypes. Possibly using geo crate?

If set is small, I'd recommend 1. else, we probably want 2.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed offline - documented future work in a comment!

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Chatted offline. This can be 100s of SRIDs. Let's make this a followup

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keeping the comment as unresolved for future readers :)

Comment thread kernel/src/schema/mod.rs
@lorenarosati
Copy link
Copy Markdown
Collaborator Author

Range-diff: main (12c1bf6 -> 10b79aa)
kernel/src/schema/mod.rs
@@ -57,6 +57,8 @@
 +    ///
 +    /// Returns `Err` if `srid` is empty.
 +    pub fn try_new(srid: &str) -> DeltaResult<Self> {
++        // We only check that the SRID is non-empty; validating the value against the full set
++        // of (1000+) recognized SRIDs is future work.
 +        if srid.is_empty() {
 +            return Err(Error::invalid_geometry("SRID cannot be empty"));
 +        }
@@ -101,6 +103,8 @@
 +        srid: Option<&str>,
 +        algorithm: Option<EdgeInterpolationAlgorithm>,
 +    ) -> DeltaResult<Self> {
++        // We only check that the SRID is non-empty; validating the value against the full set
++        // of (1000+) recognized SRIDs is future work.
 +        let srid = match srid {
 +            None => DEFAULT_GEO_SRID.to_string(),
 +            Some(s) => {
kernel/src/table_configuration.rs
@@ -9,38 +9,6 @@
      validate_timestamp_ntz_feature_support, ColumnMappingMode, EnablementCheck, FeatureRequirement,
      FeatureType, KernelSupport, Operation, TableFeature, LEGACY_READER_FEATURES,
      LEGACY_WRITER_FEATURES, MAX_VALID_READER_VERSION, MAX_VALID_WRITER_VERSION,
-         version: Version,
-     ) -> DeltaResult<Self> {
-         let logical_schema = Arc::new(metadata.parse_schema()?);
-+        Self::try_new_inner(metadata, protocol, table_root, version, logical_schema)
-+    }
-+
-+    /// Like [`try_new`](Self::try_new), but reuses `base`'s protocol, table root, and version
-+    /// and takes a pre-parsed `logical_schema`.
-+    pub(crate) fn try_new_with_schema(
-+        base: &Self,
-+        metadata: Metadata,
-+        logical_schema: SchemaRef,
-+    ) -> DeltaResult<Self> {
-+        Self::try_new_inner(
-+            metadata,
-+            base.protocol.clone(),
-+            base.table_root.clone(),
-+            base.version,
-+            logical_schema,
-+        )
-+    }
-+
-+    fn try_new_inner(
-+        metadata: Metadata,
-+        protocol: Protocol,
-+        table_root: Url,
-+        version: Version,
-+        logical_schema: SchemaRef,
-+    ) -> DeltaResult<Self> {
-         let table_properties = metadata.parse_table_properties();
-         let column_mapping_mode = column_mapping_mode(&protocol, &table_properties);
- 
  
          // Validate schema against protocol features now that we have a TC instance.
          validate_timestamp_ntz_feature_support(&table_config)?;
kernel/src/actions/mod.rs
@@ -1,32 +0,0 @@
-diff --git a/kernel/src/actions/mod.rs b/kernel/src/actions/mod.rs
---- a/kernel/src/actions/mod.rs
-+++ b/kernel/src/actions/mod.rs
- }
- 
- // Serde derives are needed for CRC file deserialization (see `crc::reader`).
-+//
-+// TODO(#2446): `Metadata` stores the schema only as a JSON string. Callers that already hold
-+// a parsed `SchemaRef` (e.g. CREATE TABLE) serialize into `schema_string` and then re-parse
-+// downstream in `TableConfiguration::try_new` via `parse_schema()`. Caching the parsed schema
-+// on `Metadata` would eliminate the round-trip.
- #[derive(Debug, Default, Clone, PartialEq, Eq, Serialize, Deserialize, ToSchema)]
- #[serde(rename_all = "camelCase")]
- #[internal_api]
-         TableProperties::from(self.configuration.iter())
-     }
- 
-+    /// Returns a new Metadata with the schema replaced, preserving all other fields.
-+    ///
-+    /// # Errors
-+    ///
-+    /// Returns an error if schema serialization fails.
-+    pub(crate) fn with_schema(self, schema: SchemaRef) -> DeltaResult<Self> {
-+        Ok(Self {
-+            schema_string: serde_json::to_string(&schema)?,
-+            ..self
-+        })
-+    }
-+
-     #[cfg(test)]
-     #[allow(clippy::too_many_arguments)]
-     pub(crate) fn new_unchecked(
\ No newline at end of file
kernel/src/engine/arrow_expression/evaluate_expression.rs
@@ -1,154 +0,0 @@
-diff --git a/kernel/src/engine/arrow_expression/evaluate_expression.rs b/kernel/src/engine/arrow_expression/evaluate_expression.rs
---- a/kernel/src/engine/arrow_expression/evaluate_expression.rs
-+++ b/kernel/src/engine/arrow_expression/evaluate_expression.rs
-         (Literal(scalar), _) => {
-             validate_array_type(scalar.to_array(batch.num_rows())?, result_type)
-         }
--        (Column(name), _) => {
--            // Column extraction uses ordinal-based struct validation because column mapping
--            // can cause physical/logical name mismatches. apply_schema handles renaming.
--            let arr = extract_column(batch, name)?;
--            if let Some(expected) = result_type {
--                ensure_data_types(expected, arr.data_type(), ValidationMode::TypesOnly)?;
--            }
--            Ok(arr)
--        }
-+        (Column(name), _) => validate_array_type(extract_column(batch, name)?, result_type),
-         (Struct(fields, nullability), Some(DataType::Struct(output_schema))) => {
-             evaluate_struct_expression(fields, batch, output_schema, nullability.as_ref())
-         }
-     }
- 
-     #[test]
--    fn column_extract_struct_with_mismatched_field_names() {
-+    fn column_extract_struct_rejects_mismatched_field_names() {
-         let batch = make_struct_batch(
-             vec![
-                 ArrowField::new("col-abc-001", ArrowDataType::Int64, true),
-             ],
-         );
- 
--        // Logical names differ from physical names due to column mapping
-         let logical_type = DataType::try_struct_type([
-             StructField::nullable("my_column", DataType::LONG),
-             StructField::nullable("other_column", DataType::LONG),
- 
-         let expr = column_expr!("stats");
-         let result = evaluate_expression(&expr, &batch, Some(&logical_type));
--
--        // Ordinal-based validation passes: same field count and types by position.
--        // The downstream apply_schema transformation handles renaming.
--        let arr = result.expect("should succeed with mismatched names but matching types");
--        let struct_arr = arr.as_any().downcast_ref::<StructArray>().unwrap();
--        assert_eq!(struct_arr.num_columns(), 2);
--        assert_eq!(struct_arr.len(), 2);
--    }
--
--    #[test]
--    fn column_extract_struct_rejects_mismatched_field_count() {
--        let batch = make_struct_batch(
--            vec![ArrowField::new("col-abc-001", ArrowDataType::Int64, true)],
--            vec![Arc::new(Int64Array::from(vec![Some(1), Some(2)]))],
--        );
--
--        let logical_type = DataType::try_struct_type([
--            StructField::nullable("a", DataType::LONG),
--            StructField::nullable("b", DataType::LONG),
--        ])
--        .unwrap();
--
--        let expr = column_expr!("stats");
--        let result = evaluate_expression(&expr, &batch, Some(&logical_type));
--        assert_result_error_with_message(result, "Struct field count mismatch");
-+        assert_result_error_with_message(result, "Missing Struct fields");
-     }
- 
-     #[test]
-     fn column_extract_struct_rejects_mismatched_child_types() {
-         let batch = make_struct_batch(
-             vec![
--                ArrowField::new("col-abc-001", ArrowDataType::Int64, true),
--                ArrowField::new("col-abc-002", ArrowDataType::Utf8, true),
-+                ArrowField::new("a", ArrowDataType::Int64, true),
-+                ArrowField::new("b", ArrowDataType::Utf8, true),
-             ],
-             vec![
-                 Arc::new(Int64Array::from(vec![Some(1)])),
-             ],
-         );
- 
--        // Expect two LONG columns, but the second arrow field is Utf8
-         let logical_type = DataType::try_struct_type([
-             StructField::nullable("a", DataType::LONG),
-             StructField::nullable("b", DataType::LONG),
-     }
- 
-     #[test]
--    fn column_extract_struct_with_matching_names_still_works() {
-+    fn column_extract_struct_with_matching_names_works() {
-         let batch = make_struct_batch(
-             vec![
-                 ArrowField::new("a", ArrowDataType::Int64, true),
-         assert!(result.is_ok());
-     }
- 
--    /// Exercises the exact code path from `get_add_transform_expr` where a `struct_from`
--    /// expression wraps `column_expr!("add.stats_parsed")`. When the checkpoint parquet has
--    /// stats_parsed with physical column names (e.g. `col-abc-001`) but the output schema
--    /// uses logical names (e.g. `id`), `evaluate_struct_expression` calls
--    /// `evaluate_expression(Column, struct_result_type)` with mismatched field names.
--    /// Without ordinal-based validation this fails with a name mismatch error.
-+    /// When a `struct_from` expression wraps a `Column` referencing stats_parsed, and the
-+    /// checkpoint parquet has physical column names (e.g. `col-abc-001`) but the output schema
-+    /// uses logical names (e.g. `id`), name-based validation correctly rejects the mismatch.
-     #[test]
--    fn struct_from_with_column_tolerates_nested_name_mismatch() {
--        // Build a batch mimicking checkpoint data: add.stats_parsed uses physical names
-+    fn struct_from_with_column_rejects_nested_name_mismatch() {
-         let stats_fields: Vec<ArrowField> = vec![
-             ArrowField::new("col-abc-001", ArrowDataType::Int64, true),
-             ArrowField::new("col-abc-002", ArrowDataType::Int64, true),
-         )]);
-         let batch = RecordBatch::try_new(Arc::new(schema), vec![Arc::new(add_struct)]).unwrap();
- 
--        // struct_from mimicking get_add_transform_expr: wraps a Column referencing stats_parsed
-         let expr = Expr::struct_from([
-             column_expr_ref!("add.path"),
-             column_expr_ref!("add.stats_parsed"),
-         .unwrap();
- 
-         let result = evaluate_expression(&expr, &batch, Some(&output_type));
--        result.expect("struct_from with Column sub-expression should tolerate field name mismatch");
--    }
--
--    #[test]
--    fn column_extract_nested_struct_with_mismatched_names() {
--        let inner_fields = vec![ArrowField::new("phys-inner", ArrowDataType::Int64, true)];
--        let inner_struct = ArrowDataType::Struct(inner_fields.clone().into());
--        let batch = make_struct_batch(
--            vec![ArrowField::new("phys-outer", inner_struct, true)],
--            vec![Arc::new(
--                StructArray::try_new(
--                    inner_fields.into(),
--                    vec![Arc::new(Int64Array::from(vec![Some(42)]))],
--                    None,
--                )
--                .unwrap(),
--            )],
--        );
--
--        let logical_type = DataType::try_struct_type([StructField::nullable(
--            "logical_outer",
--            DataType::struct_type_unchecked([StructField::nullable(
--                "logical_inner",
--                DataType::LONG,
--            )]),
--        )])
--        .unwrap();
--
--        let expr = column_expr!("stats");
--        let result = evaluate_expression(&expr, &batch, Some(&logical_type));
--        assert!(result.is_ok());
-+        assert_result_error_with_message(result, "Missing Struct fields");
-     }
- }
\ No newline at end of file
kernel/src/engine/ensure_data_types.rs
@@ -1,13 +0,0 @@
-diff --git a/kernel/src/engine/ensure_data_types.rs b/kernel/src/engine/ensure_data_types.rs
---- a/kernel/src/engine/ensure_data_types.rs
-+++ b/kernel/src/engine/ensure_data_types.rs
- #[internal_api]
- pub(crate) enum ValidationMode {
-     /// Check types only. Struct fields are matched by ordinal position, not by name.
--    /// Nullability and metadata are not checked. Used by the expression evaluator where
--    /// column mapping can cause physical/logical name mismatches.
-+    /// Nullability and metadata are not checked.
-+    #[allow(dead_code)]
-     TypesOnly,
-     /// Check types and match struct fields by name, but skip nullability and metadata.
-     /// Used by the parquet reader where fields are already resolved by name upstream.
\ No newline at end of file
kernel/src/schema/validation.rs
@@ -1,48 +0,0 @@
-diff --git a/kernel/src/schema/validation.rs b/kernel/src/schema/validation.rs
---- a/kernel/src/schema/validation.rs
-+++ b/kernel/src/schema/validation.rs
--//! Schema validation utilities for Delta table creation.
-+//! Schema validation utilities shared by table creation and schema evolution.
- //!
- //! Validates schemas per the Delta protocol specification.
- 
- /// These characters have special meaning in Parquet schema syntax.
- const INVALID_PARQUET_CHARS: &[char] = &[' ', ',', ';', '{', '}', '(', ')', '\n', '\t', '='];
- 
--/// Validates a schema for table creation.
-+/// Validates a schema for CREATE TABLE or ALTER TABLE.
- ///
- /// Performs the following checks:
- /// 1. Schema is non-empty
- /// 3. Column names contain only valid characters
- /// 4. Rejects fields with `delta.invariants` metadata (SQL expression invariants are not supported
- ///    by kernel; see `TableConfiguration::ensure_write_supported`)
--pub(crate) fn validate_schema_for_create(
-+pub(crate) fn validate_schema(
-     schema: &StructType,
-     column_mapping_mode: ColumnMappingMode,
- ) -> DeltaResult<()> {
-     #[case::dot_in_name_with_cm(schema_with_dot(), ColumnMappingMode::Name)]
-     #[case::different_struct_children(schema_different_struct_children(), ColumnMappingMode::None)]
-     fn valid_schema_accepted(#[case] schema: StructType, #[case] cm: ColumnMappingMode) {
--        assert!(validate_schema_for_create(&schema, cm).is_ok());
-+        assert!(validate_schema(&schema, cm).is_ok());
-     }
- 
-     // === Invalid schemas ===
-         #[case] cm: ColumnMappingMode,
-         #[case] expected_errs: &[&str],
-     ) {
--        let result = validate_schema_for_create(&schema, cm);
-+        let result = validate_schema(&schema, cm);
-         assert!(result.is_err());
-         let err = result.unwrap_err().to_string();
-         for expected in expected_errs {
-     #[case::array_nested(schema_array_nested_invariant(), "arr.child")]
-     #[case::map_nested(schema_map_nested_invariant(), "map.child")]
-     fn invariants_metadata_rejected(#[case] schema: StructType, #[case] expected_path: &str) {
--        let result = validate_schema_for_create(&schema, ColumnMappingMode::None);
-+        let result = validate_schema(&schema, ColumnMappingMode::None);
-         let err = result.expect_err("expected delta.invariants metadata rejection");
-         let msg = err.to_string();
-         assert!(
\ No newline at end of file
kernel/src/snapshot/mod.rs
@@ -1,27 +0,0 @@
-diff --git a/kernel/src/snapshot/mod.rs b/kernel/src/snapshot/mod.rs
---- a/kernel/src/snapshot/mod.rs
-+++ b/kernel/src/snapshot/mod.rs
- use crate::table_configuration::{InCommitTimestampEnablement, TableConfiguration};
- use crate::table_features::{physical_to_logical_column_name, ColumnMappingMode, TableFeature};
- use crate::table_properties::TableProperties;
-+use crate::transaction::builder::alter_table::AlterTableTransactionBuilder;
- use crate::transaction::Transaction;
- use crate::utils::require;
- use crate::{DeltaResult, Engine, Error, LogCompactionWriter, Version};
-         Transaction::try_new_existing_table(self, committer, engine)
-     }
- 
-+    /// Creates a builder for altering this table's metadata. Currently supports schema change
-+    /// operations.
-+    ///
-+    /// The returned builder allows chaining operations before building an
-+    /// [`AlterTableTransaction`] that can be committed.
-+    ///
-+    /// [`AlterTableTransaction`]: crate::transaction::AlterTableTransaction
-+    pub fn alter_table(self: Arc<Self>) -> AlterTableTransactionBuilder {
-+        AlterTableTransactionBuilder::new(self)
-+    }
-+
-     /// Fetch the latest version of the provided `application_id` for this snapshot. Filters the
-     /// txn based on the delta.setTransactionRetentionDuration property and lastUpdated.
-     ///
\ No newline at end of file
kernel/src/transaction/alter_table.rs
@@ -1,81 +0,0 @@
-diff --git a/kernel/src/transaction/alter_table.rs b/kernel/src/transaction/alter_table.rs
-new file mode 100644
---- /dev/null
-+++ b/kernel/src/transaction/alter_table.rs
-+//! Alter table transaction types and constructor.
-+//!
-+//! This module defines the [`AlterTableTransaction`] type alias and the
-+//! [`try_new_alter_table`](AlterTableTransaction::try_new_alter_table) constructor.
-+//! The builder logic lives in [`builder::alter_table`](super::builder::alter_table).
-+
-+#![allow(unreachable_pub)]
-+
-+use std::marker::PhantomData;
-+use std::sync::OnceLock;
-+
-+use crate::committer::Committer;
-+use crate::snapshot::SnapshotRef;
-+use crate::table_configuration::TableConfiguration;
-+use crate::transaction::{AlterTable, Transaction};
-+use crate::utils::current_time_ms;
-+use crate::DeltaResult;
-+
-+/// A type alias for alter-table transactions.
-+///
-+/// This provides a restricted API surface that only exposes operations valid during ALTER
-+/// commands. Data file operations are not available at compile time because `AlterTable`
-+/// does not implement [`SupportsDataFiles`](super::SupportsDataFiles).
-+pub type AlterTableTransaction = Transaction<AlterTable>;
-+
-+impl AlterTableTransaction {
-+    /// Create a new transaction for altering a table's schema. Produces a metadata-only commit
-+    /// that emits an updated Metadata action with the evolved schema.
-+    ///
-+    /// The `effective_table_config` is the evolved table configuration (new schema, same
-+    /// protocol). It must be fully validated before calling this constructor (e.g. schema
-+    /// operations applied, protocol feature checks passed). The `read_snapshot` provides the
-+    /// pre-commit table state (version, previous protocol/metadata, ICT timestamps) used for
-+    /// commit versioning and post-commit snapshots.
-+    ///
-+    /// This is typically called via `AlterTableTransactionBuilder::build()` rather than directly.
-+    pub(crate) fn try_new_alter_table(
-+        read_snapshot: SnapshotRef,
-+        effective_table_config: TableConfiguration,
-+        committer: Box<dyn Committer>,
-+    ) -> DeltaResult<Self> {
-+        let span = tracing::info_span!(
-+            "txn",
-+            path = %read_snapshot.table_root(),
-+            read_version = read_snapshot.version(),
-+            operation = "ALTER TABLE",
-+        );
-+
-+        Ok(Transaction {
-+            span,
-+            read_snapshot_opt: Some(read_snapshot),
-+            effective_table_config,
-+            should_emit_protocol: false,
-+            should_emit_metadata: true,
-+            committer,
-+            operation: Some("ALTER TABLE".to_string()),
-+            engine_info: None,
-+            add_files_metadata: vec![],
-+            remove_files_metadata: vec![],
-+            set_transactions: vec![],
-+            commit_timestamp: current_time_ms()?,
-+            user_domain_metadata_additions: vec![],
-+            system_domain_metadata_additions: vec![],
-+            user_domain_removals: vec![],
-+            data_change: false,
-+            shared_write_state: OnceLock::new(),
-+            engine_commit_info: None,
-+            // TODO(#2446): match delta-spark's per-op isBlindAppend policy
-+            // (ADD/DROP/DROP NOT NULL -> true, SET NOT NULL -> false). Hardcoded false for
-+            // now: safe, but misses the true-case optimization delta-spark applies.
-+            is_blind_append: false,
-+            dv_matched_files: vec![],
-+            physical_clustering_columns: None,
-+            _state: PhantomData,
-+        })
-+    }
-+}
\ No newline at end of file
kernel/src/transaction/builder/alter_table.rs
@@ -1,168 +0,0 @@
-diff --git a/kernel/src/transaction/builder/alter_table.rs b/kernel/src/transaction/builder/alter_table.rs
-new file mode 100644
---- /dev/null
-+++ b/kernel/src/transaction/builder/alter_table.rs
-+//! Builder for ALTER TABLE (schema evolution) transactions.
-+//!
-+//! This module contains [`AlterTableTransactionBuilder`], which uses a type-state pattern to
-+//! enforce valid operation chaining at compile time.
-+//!
-+//! # Type States
-+//!
-+//! - [`Ready`]: Initial state. Operations are available, but `build()` is not (at least one
-+//!   operation is required).
-+//! - [`Modifying`]: After any chainable schema operation. More ops can be chained, and `build()` is
-+//!   available. See [`AlterTableTransactionBuilder<Modifying>`] for ops.
-+//!
-+//! # Transitions
-+//!
-+//! Each `impl` block below is gated by a state bound and documents which operations that
-+//! state enables. Chainable schema operations live on `impl<S: Chainable>` and transition
-+//! the builder to a chainable state; `build()` lives on states that are buildable.
-+//!
-+//! ```ignore
-+//! // Allowed: at least one op queued before build().
-+//! snapshot.alter_table().add_column(field).build(engine, committer)?;
-+//!
-+//! // Not allowed: build() is not defined on Ready (no ops queued).
-+//! snapshot.alter_table().build(engine, committer)?;  // compile error
-+//! ```
-+
-+use std::marker::PhantomData;
-+use std::sync::Arc;
-+
-+use crate::committer::Committer;
-+use crate::schema::StructField;
-+use crate::snapshot::SnapshotRef;
-+use crate::table_configuration::TableConfiguration;
-+use crate::table_features::Operation;
-+use crate::transaction::alter_table::AlterTableTransaction;
-+use crate::transaction::schema_evolution::{
-+    apply_schema_operations, SchemaEvolutionResult, SchemaOperation,
-+};
-+use crate::{DeltaResult, Engine};
-+
-+/// Initial state: `build()` is not yet available (at least one operation is required).
-+/// See [`Chainable`] for the operations available on this state.
-+pub struct Ready;
-+
-+/// State after at least one operation has been added. `build()` is available.
-+/// See [`Chainable`] for the operations available on this state.
-+pub struct Modifying;
-+
-+/// Marker trait for builder states that accept chainable schema operations. Grouping states
-+/// under one bound lets each op (like `add_column`) live on a single `impl<S: Chainable>`
-+/// block -- chainable states share the body rather than duplicating it per state.
-+///
-+/// Sealed: external types cannot implement this, keeping the set of chainable states closed.
-+pub trait Chainable: sealed::Sealed {}
-+impl Chainable for Ready {}
-+impl Chainable for Modifying {}
-+
-+mod sealed {
-+    pub trait Sealed {}
-+    impl Sealed for super::Ready {}
-+    impl Sealed for super::Modifying {}
-+}
-+
-+/// Builder for constructing an [`AlterTableTransaction`] with schema evolution operations.
-+///
-+/// Uses a type-state pattern (`S`) to enforce at compile time:
-+/// - At least one schema operation must be queued before `build()` is callable.
-+/// - Only operations valid for the current state can be chained. This will disallow incompatibel
-+///   chaining.
-+pub struct AlterTableTransactionBuilder<S = Ready> {
-+    snapshot: SnapshotRef,
-+    operations: Vec<SchemaOperation>,
-+    // PhantomData marker for builder state (Ready or Modifying).
-+    // Zero-sized; only affects which methods are available at compile time.
-+    _state: PhantomData<S>,
-+}
-+
-+impl<S> AlterTableTransactionBuilder<S> {
-+    // Reconstructs the builder with a different PhantomData marker, changing which methods
-+    // are available at compile time (e.g. Ready -> Modifying enables `build()`). All real
-+    // fields are moved as-is; only the zero-sized type state changes.
-+    //
-+    // `T` (distinct from the struct's `S`) lets the caller pick the target state:
-+    // `self.transition::<Modifying>()` returns `AlterTableTransactionBuilder<Modifying>`.
-+    fn transition<T>(self) -> AlterTableTransactionBuilder<T> {
-+        AlterTableTransactionBuilder {
-+            snapshot: self.snapshot,
-+            operations: self.operations,
-+            _state: PhantomData,
-+        }
-+    }
-+}
-+
-+impl AlterTableTransactionBuilder<Ready> {
-+    /// Create a new builder from a snapshot.
-+    pub(crate) fn new(snapshot: SnapshotRef) -> Self {
-+        AlterTableTransactionBuilder {
-+            snapshot,
-+            operations: Vec::new(),
-+            _state: PhantomData,
-+        }
-+    }
-+}
-+
-+impl<S: Chainable> AlterTableTransactionBuilder<S> {
-+    /// Add a new top-level column to the table schema.
-+    ///
-+    /// The field must not already exist in the schema (case-insensitive). The field must be
-+    /// nullable because existing data files do not contain this column and will read NULL for it.
-+    /// These constraints are validated during [`build()`](AlterTableTransactionBuilder::build).
-+    pub fn add_column(mut self, field: StructField) -> AlterTableTransactionBuilder<Modifying> {
-+        self.operations.push(SchemaOperation::AddColumn { field });
-+        self.transition()
-+    }
-+}
-+
-+impl AlterTableTransactionBuilder<Modifying> {
-+    /// Validate and apply schema operations, then build the [`AlterTableTransaction`].
-+    ///
-+    /// This method:
-+    /// 1. Validates the table supports writes
-+    /// 2. Applies each operation sequentially against the evolving schema
-+    /// 3. Constructs new Metadata action with evolved schema
-+    /// 4. Builds the evolved table configuration
-+    /// 5. Creates the transaction
-+    ///
-+    /// # Errors
-+    ///
-+    /// - Any individual operation fails validation (see per-method errors above)
-+    /// - Table does not support writes (unsupported features)
-+    /// - The evolved schema requires protocol features not enabled on the table (e.g. adding a
-+    ///   `timestampNtz` column without the `timestampNtz` feature)
-+    pub fn build(
-+        self,
-+        _engine: &dyn Engine,
-+        committer: Box<dyn Committer>,
-+    ) -> DeltaResult<AlterTableTransaction> {
-+        let table_config = self.snapshot.table_configuration();
-+        // Rejects writes to tables kernel can't safely commit to: writer version out of
-+        // kernel's supported range, unsupported writer features, or schemas with SQL-expression
-+        // invariants. Runs on the pre-alter snapshot; future ALTER variants that change the
-+        // protocol must also re-check this on the evolved `TableConfiguration`.
-+        table_config.ensure_operation_supported(Operation::Write)?;
-+
-+        let schema = Arc::unwrap_or_clone(table_config.logical_schema());
-+        let SchemaEvolutionResult {
-+            schema: evolved_schema,
-+        } = apply_schema_operations(schema, self.operations, table_config.column_mapping_mode())?;
-+
-+        let evolved_metadata = table_config
-+            .metadata()
-+            .clone()
-+            .with_schema(evolved_schema.clone())?;
-+
-+        // Validates the evolved metadata against the protocol.
-+        let evolved_table_config = TableConfiguration::try_new_with_schema(
-+            table_config,
-+            evolved_metadata,
-+            evolved_schema,
-+        )?;
-+
-+        AlterTableTransaction::try_new_alter_table(self.snapshot, evolved_table_config, committer)
-+    }
-+}
\ No newline at end of file
kernel/src/transaction/builder/create_table.rs
@@ -1,27 +0,0 @@
-diff --git a/kernel/src/transaction/builder/create_table.rs b/kernel/src/transaction/builder/create_table.rs
---- a/kernel/src/transaction/builder/create_table.rs
-+++ b/kernel/src/transaction/builder/create_table.rs
- use crate::clustering::{create_clustering_domain_metadata, validate_clustering_columns};
- use crate::committer::Committer;
- use crate::expressions::ColumnName;
--use crate::schema::validation::validate_schema_for_create;
-+use crate::schema::validation::validate_schema;
- use crate::schema::variant_utils::schema_contains_variant_type;
- use crate::schema::{
-     normalize_column_names_to_schema_casing, schema_contains_non_null_fields, DataType, SchemaRef,
- /// compatible with Spark readers/writers.
- ///
- /// Explicit `delta.invariants` metadata annotations are rejected by
--/// `validate_schema_for_create`, so this only flips on the feature for nullability-driven
-+/// `validate_schema`, so this only flips on the feature for nullability-driven
- /// invariants. Kernel does not itself enforce the null mask at write time -- it relies on
- /// the engine's `ParquetHandler` to do so. Kernel's default `ParquetHandler` uses
- /// `arrow-rs`, whose `RecordBatch::try_new` rejects null values in fields marked
-             maybe_apply_column_mapping_for_table_create(&self.schema, &mut validated)?;
- 
-         // Validate schema (non-empty, column names, duplicates, no `delta.invariants` metadata)
--        validate_schema_for_create(&effective_schema, column_mapping_mode)?;
-+        validate_schema(&effective_schema, column_mapping_mode)?;
- 
-         // Validate data layout and resolve column names (physical for clustering, logical
-         // for partitioning). Adds required table features for clustering.
\ No newline at end of file
kernel/src/transaction/builder/mod.rs
@@ -1,8 +0,0 @@
-diff --git a/kernel/src/transaction/builder/mod.rs b/kernel/src/transaction/builder/mod.rs
---- a/kernel/src/transaction/builder/mod.rs
-+++ b/kernel/src/transaction/builder/mod.rs
- // and for tests. Also allow dead_code since these are used by integration tests.
- #![allow(unreachable_pub, dead_code)]
- 
-+pub mod alter_table;
- pub mod create_table;
\ No newline at end of file
kernel/src/transaction/mod.rs
@@ -1,35 +0,0 @@
-diff --git a/kernel/src/transaction/mod.rs b/kernel/src/transaction/mod.rs
---- a/kernel/src/transaction/mod.rs
-+++ b/kernel/src/transaction/mod.rs
- #[cfg(not(feature = "internal-api"))]
- pub(crate) mod data_layout;
- 
-+pub(crate) mod alter_table;
-+pub use alter_table::AlterTableTransaction;
- mod commit_info;
- mod domain_metadata;
-+pub(crate) mod schema_evolution;
- mod stats_verifier;
- mod update;
- mod write_context;
- #[derive(Debug)]
- pub struct CreateTable;
- 
-+/// Marker type for alter-table (schema evolution) transactions.
-+///
-+/// Transactions in this state perform metadata-only commits. Data file operations are not
-+/// available at compile time because `AlterTable` does not implement [`SupportsDataFiles`].
-+#[derive(Debug)]
-+pub struct AlterTable;
-+
- /// Marker trait for transaction states that support data file operations.
- ///
- /// Only transaction types that implement this trait can access methods for adding, removing, or
- 
-     // Note: Additional test coverage for partial file matching (where some files in a scan
-     // have DV updates but others don't) is provided by the end-to-end integration test
--    // kernel/tests/dv.rs and kernel/tests/write.rs, which exercises
-+    // kernel/tests/dv.rs and kernel/tests/write_remove_dv.rs, which exercise
-     // the full deletion vector write workflow including the DvMatchVisitor logic.
- 
-     #[test]
\ No newline at end of file
kernel/src/transaction/schema_evolution.rs
@@ -1,190 +0,0 @@
-diff --git a/kernel/src/transaction/schema_evolution.rs b/kernel/src/transaction/schema_evolution.rs
-new file mode 100644
---- /dev/null
-+++ b/kernel/src/transaction/schema_evolution.rs
-+//! Schema evolution operations for ALTER TABLE.
-+//!
-+//! This module defines the [`SchemaOperation`] enum and the [`apply_schema_operations`] function
-+//! that validates and applies schema changes to produce an evolved schema.
-+
-+use indexmap::IndexMap;
-+
-+use crate::error::Error;
-+use crate::schema::validation::validate_schema;
-+use crate::schema::{SchemaRef, StructField, StructType};
-+use crate::table_features::ColumnMappingMode;
-+use crate::DeltaResult;
-+
-+/// A schema evolution operation to be applied during ALTER TABLE.
-+///
-+/// Operations are validated and applied in order during
-+/// [`apply_schema_operations`]. Each operation sees the schema state after all prior operations
-+/// have been applied.
-+#[derive(Debug, Clone)]
-+pub(crate) enum SchemaOperation {
-+    /// Add a top-level column.
-+    AddColumn { field: StructField },
-+}
-+
-+/// The result of applying schema operations.
-+#[derive(Debug)]
-+pub(crate) struct SchemaEvolutionResult {
-+    /// The evolved schema after all operations are applied.
-+    pub schema: SchemaRef,
-+}
-+
-+/// Applies a sequence of schema operations to the given schema, returning the evolved schema.
-+///
-+/// Operations are applied sequentially: each one validates against and modifies the schema
-+/// produced by all preceding operations, not the original input schema.
-+///
-+/// # Errors
-+///
-+/// Returns an error if any operation fails validation. The error message identifies which
-+/// operation failed and why.
-+pub(crate) fn apply_schema_operations(
-+    schema: StructType,
-+    operations: Vec<SchemaOperation>,
-+    column_mapping_mode: ColumnMappingMode,
-+) -> DeltaResult<SchemaEvolutionResult> {
-+    let cm_enabled = column_mapping_mode != ColumnMappingMode::None;
-+    // IndexMap preserves field insertion order. Keys are lowercased for case-insensitive
-+    // duplicate detection; StructFields retain their original casing.
-+    let mut fields: IndexMap<String, StructField> = schema
-+        .into_fields()
-+        .map(|f| (f.name().to_lowercase(), f))
-+        .collect();
-+
-+    for op in operations {
-+        match op {
-+            // Protocol feature checks for the field's data type (e.g. `timestampNtz`) happen
-+            // later when the caller builds a new TableConfiguration from the evolved schema --
-+            // the alter is rejected if the table doesn't already have the required feature
-+            // enabled. This matches Spark, which also rejects with
-+            // `DELTA_FEATURES_REQUIRE_MANUAL_ENABLEMENT` and requires the user to enable the
-+            // feature explicitly before adding such a column.
-+            SchemaOperation::AddColumn { field } => {
-+                // TODO: support column mapping for add_column (assign ID + physical name,
-+                // update delta.columnMapping.maxColumnId).
-+                if cm_enabled {
-+                    return Err(Error::unsupported(
-+                        "ALTER TABLE add_column is not yet supported on tables with \
-+                         column mapping enabled",
-+                    ));
-+                }
-+                if field.is_metadata_column() {
-+                    return Err(Error::schema(format!(
-+                        "Cannot add column '{}': metadata columns are not allowed in \
-+                         a table schema",
-+                        field.name()
-+                    )));
-+                }
-+                let key = field.name().to_lowercase();
-+                if fields.contains_key(&key) {
-+                    return Err(Error::schema(format!(
-+                        "Cannot add column '{}': a column with that name already exists",
-+                        field.name()
-+                    )));
-+                }
-+                // Validate field is nullable (Delta protocol requires added columns to be
-+                // nullable so existing data files can return NULL for the new column)
-+                // NOTE: non-nullable columns depend on invariants feature
-+                if !field.is_nullable() {
-+                    return Err(Error::schema(format!(
-+                        "Cannot add non-nullable column '{}'. Added columns must be nullable \
-+                         because existing data files do not contain this column.",
-+                        field.name()
-+                    )));
-+                }
-+                fields.insert(key, field);
-+            }
-+        }
-+    }
-+
-+    let evolved_schema = StructType::try_new(fields.into_values())?;
-+
-+    validate_schema(&evolved_schema, column_mapping_mode)?;
-+    Ok(SchemaEvolutionResult {
-+        schema: evolved_schema.into(),
-+    })
-+}
-+
-+#[cfg(test)]
-+mod tests {
-+    use rstest::rstest;
-+
-+    use super::*;
-+    use crate::schema::{DataType, MetadataColumnSpec, StructField, StructType};
-+
-+    fn simple_schema() -> StructType {
-+        StructType::try_new(vec![
-+            StructField::not_null("id", DataType::INTEGER),
-+            StructField::nullable("name", DataType::STRING),
-+        ])
-+        .unwrap()
-+    }
-+
-+    fn add_col(name: &str, nullable: bool) -> SchemaOperation {
-+        let field = if nullable {
-+            StructField::nullable(name, DataType::STRING)
-+        } else {
-+            StructField::not_null(name, DataType::STRING)
-+        };
-+        SchemaOperation::AddColumn { field }
-+    }
-+
-+    // Builds a struct column whose nested leaf field has the given name. Used to prove that
-+    // `validate_schema` (not just the top-level dup check or `StructType::try_new`) is
-+    // reached from `apply_schema_operations`.
-+    fn add_struct_with_nested_leaf(name: &str, leaf_name: &str) -> SchemaOperation {
-+        let inner =
-+            StructType::try_new(vec![StructField::nullable(leaf_name, DataType::STRING)]).unwrap();
-+        SchemaOperation::AddColumn {
-+            field: StructField::nullable(name, inner),
-+        }
-+    }
-+
-+    #[rstest]
-+    #[case::dup_exact(vec![add_col("name", true)], "already exists")]
-+    #[case::dup_case_insensitive(vec![add_col("Name", true)], "already exists")]
-+    #[case::dup_within_batch(
-+        vec![add_col("email", true), add_col("email", true)],
-+        "already exists"
-+    )]
-+    #[case::non_nullable(vec![add_col("age", false)], "non-nullable")]
-+    #[case::invalid_parquet_char(vec![add_col("foo,bar", true)], "invalid character")]
-+    #[case::nested_invalid_parquet_char(
-+        vec![add_struct_with_nested_leaf("addr", "bad,leaf")],
-+        "invalid character"
-+    )]
-+    #[case::metadata_column(
-+        vec![SchemaOperation::AddColumn {
-+            field: StructField::create_metadata_column("row_idx", MetadataColumnSpec::RowIndex),
-+        }],
-+        "metadata columns are not allowed"
-+    )]
-+    fn apply_schema_operations_rejects(
-+        #[case] ops: Vec<SchemaOperation>,
-+        #[case] error_contains: &str,
-+    ) {
-+        let err =
-+            apply_schema_operations(simple_schema(), ops, ColumnMappingMode::None).unwrap_err();
-+        assert!(err.to_string().contains(error_contains));
-+    }
-+
-+    #[rstest]
-+    #[case::single(vec![add_col("email", true)], &["id", "name", "email"])]
-+    #[case::multiple(
-+        vec![add_col("email", true), add_col("age", true)],
-+        &["id", "name", "email", "age"]
-+    )]
-+    fn apply_schema_operations_succeeds(
-+        #[case] ops: Vec<SchemaOperation>,
-+        #[case] expected_names: &[&str],
-+    ) {
-+        let result =
-+            apply_schema_operations(simple_schema(), ops, ColumnMappingMode::None).unwrap();
-+        let actual: Vec<&str> = result.schema.fields().map(|f| f.name().as_str()).collect();
-+        assert_eq!(&actual, expected_names);
-+    }
-+}
\ No newline at end of file
kernel/tests/README.md
@@ -1,31 +0,0 @@
-diff --git a/kernel/tests/README.md b/kernel/tests/README.md
---- a/kernel/tests/README.md
-+++ b/kernel/tests/README.md
- 
- | Table | Location | Schema | Protocol (R/W) | Features | Description | Tests |
- |-------|----------|--------|----------|----------|-------------|-------|
--| `table-with-dv-small` | data/ | `value: int` | v3/v7 | r:`deletionVectors` w:`deletionVectors` | 10 rows, 2 soft-deleted by DV, 8 visible. Most heavily referenced test table. | `dv.rs::test_table_scan(with_dv)`, `write.rs::test_remove_files_adds_expected_entries`, `write.rs::test_update_deletion_vectors_adds_expected_entries`, `read.rs::with_predicate_and_removes`, `path.rs::test_to_uri/test_child/test_child_escapes`, `snapshot.rs::test_snapshot_read_metadata/test_new_snapshot/test_snapshot_new_from/test_read_table_with_missing_last_checkpoint/test_log_compaction_writer`, `deletion_vector.rs` tests, `transaction/mod.rs::setup_dv_enabled_table/test_add_files_schema/test_new_deletion_vector_path`, `default/parquet.rs` read test, `default/json.rs` read test, `log_compaction/tests.rs::create_mock_snapshot`, `resolve_dvs.rs` tests |
-+| `table-with-dv-small` | data/ | `value: int` | v3/v7 | r:`deletionVectors` w:`deletionVectors` | 10 rows, 2 soft-deleted by DV, 8 visible. Most heavily referenced test table. | `dv.rs::test_table_scan(with_dv)`, `write_remove_dv.rs::test_remove_files_adds_expected_entries`, `write_remove_dv.rs::test_update_deletion_vectors_adds_expected_entries`, `read.rs::with_predicate_and_removes`, `path.rs::test_to_uri/test_child/test_child_escapes`, `snapshot.rs::test_snapshot_read_metadata/test_new_snapshot/test_snapshot_new_from/test_read_table_with_missing_last_checkpoint/test_log_compaction_writer`, `deletion_vector.rs` tests, `transaction/mod.rs::setup_dv_enabled_table/test_add_files_schema/test_new_deletion_vector_path`, `default/parquet.rs` read test, `default/json.rs` read test, `log_compaction/tests.rs::create_mock_snapshot`, `resolve_dvs.rs` tests |
- | `table-without-dv-small` | data/ | `value: long` | v1/v2 | | 10 rows, all visible. Companion to table-with-dv-small. | `dv.rs::test_table_scan(without_dv)`, `transaction/mod.rs::setup_non_dv_table/create_existing_table_txn/test_commit_io_error_returns_retryable_transaction`, `sequential_phase.rs::test_sequential_v2_with_commits_only/test_sequential_finish_before_exhaustion_error`, `parallel_phase.rs` tests, `scan/tests.rs::test_scan_metadata_paths/test_scan_metadata/test_scan_metadata_from_same_version` |
- | `with-short-dv` | data/ | `id: long, value: string, timestamp: timestamp, rand: double` | v3/v7 | r:`deletionVectors` w:`deletionVectors` | 2 files x 5 rows. First file has inline DV (`storageType="u"`) deleting 3 rows. | `read.rs::short_dv` |
- | `dv-partitioned-with-checkpoint` | golden_data/ | `value: int, part: int` partitioned by `part` | v3/v7 | r:`deletionVectors` w:`deletionVectors` | DVs on a partitioned table with a checkpoint | `golden_tables.rs::golden_test!` |
- 
- | Table | Location | Schema | Protocol (R/W) | Features | Description | Tests |
- |-------|----------|--------|----------|----------|-------------|-------|
--| `partition_cm/none` | data/ | `value: int, category: string` partitioned by `category` | v1/v1 | `columnMapping.mode=none` | Partitioned write with CM disabled | `write.rs::test_column_mapping_partitioned_write(cm_none)` |
--| `partition_cm/id` | data/ | `value: int, category: string` partitioned by `category` | v3/v7 | r:`columnMapping` w:`columnMapping`, `columnMapping.mode=id` | Partitioned write with CM id mode | `write.rs::test_column_mapping_partitioned_write(cm_id)` |
--| `partition_cm/name` | data/ | `value: int, category: string` partitioned by `category` | v3/v7 | r:`columnMapping` w:`columnMapping`, `columnMapping.mode=name` | Partitioned write with CM name mode | `write.rs::test_column_mapping_partitioned_write(cm_name)` |
-+| `partition_cm/none` | data/ | `value: int, category: string` partitioned by `category` | v1/v1 | `columnMapping.mode=none` | Partitioned write with CM disabled | `write_column_mapping.rs::test_column_mapping_partitioned_write(cm_none)` |
-+| `partition_cm/id` | data/ | `value: int, category: string` partitioned by `category` | v3/v7 | r:`columnMapping` w:`columnMapping`, `columnMapping.mode=id` | Partitioned write with CM id mode | `write_column_mapping.rs::test_column_mapping_partitioned_write(cm_id)` |
-+| `partition_cm/name` | data/ | `value: int, category: string` partitioned by `category` | v3/v7 | r:`columnMapping` w:`columnMapping`, `columnMapping.mode=name` | Partitioned write with CM name mode | `write_column_mapping.rs::test_column_mapping_partitioned_write(cm_name)` |
- | `table-with-columnmapping-mode-name` | golden_data/ | `ByteType: byte, ShortType: short, IntegerType: int, LongType: long, FloatType: float, DoubleType: double, decimal: decimal(10,2), BooleanType: boolean, StringType: string, BinaryType: binary, DateType: date, TimestampType: timestamp, nested_struct: struct{aa: string, ac: struct{aca: int}}, array_of_prims: array<int>, array_of_arrays: array<array<int>>, array_of_structs: array<struct{ab: long}>, map_of_prims: map<int,long>, map_of_rows: map<int,struct{ab: long}>, map_of_arrays: map<long,array<int>>` | v2/v5 | `columnMapping.mode=name` | Column mapping name mode | `golden_tables.rs::golden_test!` |
- | `table-with-columnmapping-mode-id` | golden_data/ | `ByteType: byte, ShortType: short, IntegerType: int, LongType: long, FloatType: float, DoubleType: double, decimal: decimal(10,2), BooleanType: boolean, StringType: string, BinaryType: binary, DateType: date, TimestampType: timestamp, nested_struct: struct{aa: string, ac: struct{aca: int}}, array_of_prims: array<int>, array_of_arrays: array<array<int>>, array_of_structs: array<struct{ab: long}>, map_of_prims: map<int,long>, map_of_rows: map<int,struct{ab: long}>, map_of_arrays: map<long,array<int>>` | v2/v5 | `columnMapping.mode=id` | Column mapping id mode | `golden_tables.rs::golden_test!` |
- 
- | Table | Location | Schema | Protocol (R/W) | Features | Description | Tests |
- |-------|----------|--------|----------|----------|-------------|-------|
- | `with_checkpoint_no_last_checkpoint` | data/ | `letter: string, int: long, date: date` | v1/v2 | `checkpointInterval=2` | Checkpoint at v2 but missing `_last_checkpoint` hint file | `snapshot.rs::test_read_table_with_checkpoint`, `scan/tests.rs::test_scan_with_checkpoint`, `sequential_phase.rs::test_sequential_checkpoint_no_commits`, `checkpoint_manifest.rs` tests, `sync/parquet.rs` test, `default/parquet.rs` test |
--| `external-table-different-nullability` | data/ | `i: int` | v1/v2 | `checkpointInterval=2` | Parquet files have different nullability than Delta schema; includes checkpoint | `write.rs::test_checkpoint_non_kernel_written_table` |
-+| `external-table-different-nullability` | data/ | `i: int` | v1/v2 | `checkpointInterval=2` | Parquet files have different nullability than Delta schema; includes checkpoint | `write_clustered.rs::test_checkpoint_non_kernel_written_table` |
- | `checkpoint` | golden_data/ | `intCol: int` | v1/v2 | | Basic checkpoint read | `golden_tables.rs::golden_test!(checkpoint_test)` |
- | `corrupted-last-checkpoint-kernel` | golden_data/ | `id: long` | v1/v2 | | Corrupted `_last_checkpoint` file | `golden_tables.rs::golden_test!` |
- | `multi-part-checkpoint` | golden_data/ | `id: long` | v1/v2 | `checkpointInterval=1` | Multi-part checkpoint files | `golden_tables.rs::golden_test!` |
\ No newline at end of file

... (truncated, output exceeded 60000 bytes)

Reproduce locally: git range-diff ac9dc19..12c1bf6 6486bd2..10b79aa | Disable: git config gitstack.push-range-diff false

@lorenarosati lorenarosati requested a review from OussamaSaoudi May 1, 2026 19:04
Comment thread kernel/src/schema/mod.rs
))
)]
#[case(
"geography(EPSG:4326, vincenty)",
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add test cases for all the other interpolation algorithms?

);

// Geospatial is not supported for writes
let config = create_mock_table_config(&[], &[TableFeature::GeospatialType]);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There should be an equivalent test for ensure REad and CDF support. Could you add that?

Comment thread kernel/src/schema/mod.rs
.map_err(serde::de::Error::custom)
}
None => {
let trimmed = inner.trim();
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm this would technically allow geography( OGC:CRS84 )

Maybe add a todo comment to reevaluate if that's an issue.

Comment thread kernel/src/schema/mod.rs Outdated
///
/// Returns `Err` if `srid` is empty.
pub fn try_new(srid: &str) -> DeltaResult<Self> {
// We only check that the SRID is non-empty; validating the value against the full set
Copy link
Copy Markdown
Collaborator

@OussamaSaoudi OussamaSaoudi May 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CRS value can be specified in one of the following formats:
- A standard authority and identifier (`<authority>:<identifier>`), e.g.:
    - `OGC:CRS84`
    - `EPSG:3857`
- A custom definition, which can be provided in one of two ways:
    - Using a Spatial Reference System Identifier (SRID), e.g. `srid:<number>`.
    - Using a projjson reference to a table property where the projjson string is stored, e.g. `projjson:<tableProperty>`.

add validation that it's of form authority:identifier pls.

maybe tests for srid:, foo, :, and empty string, :CRS84

Comment thread kernel/src/schema/mod.rs
))
)]
#[case(
"geography(vincenty)",
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this a valid test? Protocol just says:

In the schema the geospatial types are serialized as:

  • geometry(<crs>)
  • geography(<crs>, <algorithm>)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should just reject if single-element is not crs ==> geography(<algorithm>) not allowed

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This also removes default CRS above.

Copy link
Copy Markdown
Collaborator

@dengsh12 dengsh12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for iterating! Agree with @OussamaSaoudi's comments, and one more concern on create table side, and as geo is supported by scan, could we include an integration test reading geo table using different combinations of CRS + algorithms?

}
// Geometry/Geography are not valid partition column types, so there is no
// partition-value string format to parse here
// Kernel does not support parsing text into Geometry/Geography types
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: // Kernel does not support parsing text into Geometry/Geography types yet

Comment thread kernel/src/schema/mod.rs
"geography" => Ok(PrimitiveType::Geography(Box::default())),
geo_str if geo_str.starts_with("geography(") && geo_str.ends_with(')') => {
let inner = &geo_str[10..geo_str.len() - 1];
// Three accepted shapes:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add a short comment just stating this is following the convention from kernel-java. So that if we re-visit it in the future we can know.

r#"Feature 'typeWidening' is not supported for writes"#,
);

// Geospatial is not supported for writes
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT:
// Geospatial is not supported for writes yet

Comment thread kernel/src/expressions/scalars.rs Outdated
_ => unreachable!(),
}
}
// Geometry/Geography are not valid partition column types, so there is no
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

geo follows the same level of checks that other types that can't be partition values do -

Seems not totally same level of checks? Map, Array, Variants as partition columns are rejected in create table(validate_partition_columns) but geo types are not

@lorenarosati
Copy link
Copy Markdown
Collaborator Author

Range-diff: main (10b79aa -> 52c80b6)
kernel/src/engine/ensure_data_types.rs
@@ -1,13 +1,23 @@
 diff --git a/kernel/src/engine/ensure_data_types.rs b/kernel/src/engine/ensure_data_types.rs
 --- a/kernel/src/engine/ensure_data_types.rs
 +++ b/kernel/src/engine/ensure_data_types.rs
- #[internal_api]
- pub(crate) enum ValidationMode {
-     /// Check types only. Struct fields are matched by ordinal position, not by name.
--    /// Nullability and metadata are not checked. Used by the expression evaluator where
--    /// column mapping can cause physical/logical name mismatches.
-+    /// Nullability and metadata are not checked.
-+    #[allow(dead_code)]
-     TypesOnly,
-     /// Check types and match struct fields by name, but skip nullability and metadata.
-     /// Used by the parquet reader where fields are already resolved by name upstream.
\ No newline at end of file
+ use super::arrow_conversion::TryIntoArrow as _;
+ use crate::arrow::datatypes::{DataType as ArrowDataType, Field as ArrowField, TimeUnit};
+ use crate::engine::arrow_utils::make_arrow_error;
+-use crate::schema::{DataType, MetadataValue, StructField};
++use crate::schema::{DataType, MetadataValue, PrimitiveType, StructField};
+ use crate::utils::require;
+ use crate::{DeltaResult, Error};
+ 
+             | (&DataType::BINARY, ArrowDataType::LargeBinary)
+             | (&DataType::BINARY, ArrowDataType::BinaryView)
+             | (&DataType::BINARY, ArrowDataType::Binary) => Ok(DataTypeCompat::Identical),
++            // Geometry and Geography values are stored as WKB bytes in a Binary array; the
++            // kernel schema carries the geo annotation while the physical Arrow type is Binary.
++            (
++                DataType::Primitive(PrimitiveType::Geometry(_) | PrimitiveType::Geography(_)),
++                ArrowDataType::Binary | ArrowDataType::LargeBinary | ArrowDataType::BinaryView,
++            ) => Ok(DataTypeCompat::Identical),
+             (DataType::Array(inner_type), ArrowDataType::List(arrow_list_field))
+             | (DataType::Array(inner_type), ArrowDataType::LargeList(arrow_list_field))
+             | (DataType::Array(inner_type), ArrowDataType::ListView(arrow_list_field))
\ No newline at end of file
kernel/src/error.rs
@@ -4,13 +4,9 @@
      #[error("Invalid decimal: {0}")]
      InvalidDecimal(String),
  
-+    /// Invalid srid for a GeometryType
-+    #[error("Invalid geometry: {0}")]
-+    InvalidGeometry(String),
-+
-+    /// Invalid srid for a GeographyType
-+    #[error("Invalid geography: {0}")]
-+    InvalidGeography(String),
++    /// Invalid SRID or other parameter for a Geometry / Geography type
++    #[error("Invalid geo parameters: {0}")]
++    InvalidGeoParams(String),
 +
      /// Inconsistent data passed to struct scalar
      #[error("Invalid struct data: {0}")]
@@ -18,11 +14,8 @@
      pub fn invalid_decimal(msg: impl ToString) -> Self {
          Self::InvalidDecimal(msg.to_string())
      }
-+    pub fn invalid_geometry(msg: impl ToString) -> Self {
-+        Self::InvalidGeometry(msg.to_string())
-+    }
-+    pub fn invalid_geography(msg: impl ToString) -> Self {
-+        Self::InvalidGeography(msg.to_string())
++    pub fn invalid_geo_params(msg: impl ToString) -> Self {
++        Self::InvalidGeoParams(msg.to_string())
 +    }
      pub fn invalid_struct_data(msg: impl ToString) -> Self {
          Self::InvalidStructData(msg.to_string())
kernel/src/schema/mod.rs
@@ -7,6 +7,32 @@
 +/// Default spatial reference identifier for geometry and geography types.
 +pub const DEFAULT_GEO_SRID: &str = "OGC:CRS84";
 +
++/// Validates that an SRID is in AUTHORITY:CODE form: contains a colon, has non-empty text
++/// before and after it, and has no leading or trailing whitespace. Validating the value
++/// against the full set of recognized SRIDs is future work.
++fn validate_srid(srid: &str) -> DeltaResult<()> {
++    require!(
++        srid == srid.trim(),
++        Error::invalid_geo_params(format!(
++            "SRID '{srid}' must not have leading or trailing whitespace"
++        ))
++    );
++    let (authority, code) = srid.split_once(':').ok_or_else(|| {
++        Error::invalid_geo_params(format!("SRID '{srid}' must be in 'AUTHORITY:CODE' format"))
++    })?;
++    require!(
++        !authority.is_empty(),
++        Error::invalid_geo_params(format!(
++            "SRID '{srid}' must have an authority before the colon"
++        ))
++    );
++    require!(
++        !code.is_empty(),
++        Error::invalid_geo_params(format!("SRID '{srid}' must have a code after the colon"))
++    );
++    Ok(())
++}
++
 +/// Algorithm used to interpolate edges between two vertices of a geography path.
 +#[derive(Debug, Clone, PartialEq, Eq, Hash, Serialize, Deserialize)]
 +pub enum EdgeInterpolationAlgorithm {
@@ -52,16 +78,11 @@
 +}
 +
 +impl GeometryType {
-+    /// Constructs a [`GeometryType`] from the given SRID. Use [`GeometryType::default`] to
-+    /// build with [`DEFAULT_GEO_SRID`] (`OGC:CRS84`).
-+    ///
-+    /// Returns `Err` if `srid` is empty.
++    /// Constructs a GeometryType from the given SRID, or returns an error if the SRID is
++    /// not in AUTHORITY:CODE form. Use GeometryType::default to build with the default
++    /// SRID (OGC:CRS84).
 +    pub fn try_new(srid: &str) -> DeltaResult<Self> {
-+        // We only check that the SRID is non-empty; validating the value against the full set
-+        // of (1000+) recognized SRIDs is future work.
-+        if srid.is_empty() {
-+            return Err(Error::invalid_geometry("SRID cannot be empty"));
-+        }
++        validate_srid(srid)?;
 +        Ok(Self {
 +            srid: srid.to_string(),
 +        })
@@ -94,23 +115,17 @@
 +}
 +
 +impl GeographyType {
-+    /// Constructs a GeographyType. Pass `None` for either argument to use the default:
-+    /// SRID defaults to DEFAULT_GEO_SRID (`OGC:CRS84`); algorithm defaults to
-+    /// EdgeInterpolationAlgorithm::Spherical.
-+    ///
-+    /// Returns `Err` if `srid` is `Some("")` (empty string is not a valid SRID).
++    /// Constructs a GeographyType. Pass None for either argument to use the default: SRID
++    /// defaults to OGC:CRS84; algorithm defaults to Spherical. Returns an error if srid is
++    /// Some(...) but not in AUTHORITY:CODE form.
 +    pub fn try_new(
 +        srid: Option<&str>,
 +        algorithm: Option<EdgeInterpolationAlgorithm>,
 +    ) -> DeltaResult<Self> {
-+        // We only check that the SRID is non-empty; validating the value against the full set
-+        // of (1000+) recognized SRIDs is future work.
 +        let srid = match srid {
 +            None => DEFAULT_GEO_SRID.to_string(),
 +            Some(s) => {
-+                if s.is_empty() {
-+                    return Err(Error::invalid_geography("SRID cannot be empty"));
-+                }
++                validate_srid(s)?;
 +                s.to_string()
 +            }
 +        };
@@ -363,8 +378,8 @@
 +    #[case("geography(unknown_algo)", "Unknown edge interpolation algorithm")]
 +    #[case("geometry(EPSG:4326", "Unsupported Delta table type")]
 +    #[case("geographyz", "Unsupported Delta table type")]
-+    #[case("geometry()", "SRID cannot be empty")]
-+    #[case("geography(, vincenty)", "SRID cannot be empty")]
++    #[case("geometry()", "must be in 'AUTHORITY:CODE' format")]
++    #[case("geography(, vincenty)", "must be in 'AUTHORITY:CODE' format")]
 +    fn test_invalid_geo_format(#[case] invalid_type: &str, #[case] expected_error: &str) {
 +        let data = format!(
 +            r#"{{
@@ -383,6 +398,42 @@
 +        );
 +    }
 +
++    #[rstest]
++    #[case::no_colon("foo")]
++    #[case::empty_after_colon("srid:")]
++    #[case::colon_only(":")]
++    #[case::empty("")]
++    #[case::empty_before_colon(":CRS84")]
++    #[case::leading_whitespace(" EPSG:4326")]
++    #[case::trailing_whitespace("EPSG:4326 ")]
++    #[case::surrounding_whitespace(" EPSG:4326 ")]
++    fn test_geometry_try_new_rejects_invalid_srid(#[case] srid: &str) {
++        let err =
++            GeometryType::try_new(srid).expect_err(&format!("expected '{srid}' to be rejected"));
++        assert!(
++            err.to_string().contains("SRID"),
++            "expected SRID error for '{srid}', got: {err}"
++        );
++    }
++
++    #[rstest]
++    #[case::no_colon("foo")]
++    #[case::empty_after_colon("srid:")]
++    #[case::colon_only(":")]
++    #[case::empty("")]
++    #[case::empty_before_colon(":CRS84")]
++    #[case::leading_whitespace(" EPSG:4326")]
++    #[case::trailing_whitespace("EPSG:4326 ")]
++    #[case::surrounding_whitespace(" EPSG:4326 ")]
++    fn test_geography_try_new_rejects_invalid_srid(#[case] srid: &str) {
++        let err = GeographyType::try_new(Some(srid), None)
++            .expect_err(&format!("expected '{srid}' to be rejected"));
++        assert!(
++            err.to_string().contains("SRID"),
++            "expected SRID error for '{srid}', got: {err}"
++        );
++    }
++
      #[rstest]
      #[case(
          r#"{"type": "array", "elementType": "integer", "containsNull": false}"#,
\ No newline at end of file
kernel/src/table_configuration.rs
@@ -9,38 +9,6 @@
      validate_timestamp_ntz_feature_support, ColumnMappingMode, EnablementCheck, FeatureRequirement,
      FeatureType, KernelSupport, Operation, TableFeature, LEGACY_READER_FEATURES,
      LEGACY_WRITER_FEATURES, MAX_VALID_READER_VERSION, MAX_VALID_WRITER_VERSION,
-         version: Version,
-     ) -> DeltaResult<Self> {
-         let logical_schema = Arc::new(metadata.parse_schema()?);
-+        Self::try_new_inner(metadata, protocol, table_root, version, logical_schema)
-+    }
-+
-+    /// Like [`try_new`](Self::try_new), but reuses `base`'s protocol, table root, and version
-+    /// and takes a pre-parsed `logical_schema`.
-+    pub(crate) fn try_new_with_schema(
-+        base: &Self,
-+        metadata: Metadata,
-+        logical_schema: SchemaRef,
-+    ) -> DeltaResult<Self> {
-+        Self::try_new_inner(
-+            metadata,
-+            base.protocol.clone(),
-+            base.table_root.clone(),
-+            base.version,
-+            logical_schema,
-+        )
-+    }
-+
-+    fn try_new_inner(
-+        metadata: Metadata,
-+        protocol: Protocol,
-+        table_root: Url,
-+        version: Version,
-+        logical_schema: SchemaRef,
-+    ) -> DeltaResult<Self> {
-         let table_properties = metadata.parse_table_properties();
-         let column_mapping_mode = column_mapping_mode(&protocol, &table_properties);
- 
  
          // Validate schema against protocol features now that we have a TC instance.
          validate_timestamp_ntz_feature_support(&table_config)?;
kernel/src/actions/mod.rs
@@ -1,32 +0,0 @@
-diff --git a/kernel/src/actions/mod.rs b/kernel/src/actions/mod.rs
---- a/kernel/src/actions/mod.rs
-+++ b/kernel/src/actions/mod.rs
- }
- 
- // Serde derives are needed for CRC file deserialization (see `crc::reader`).
-+//
-+// TODO(#2446): `Metadata` stores the schema only as a JSON string. Callers that already hold
-+// a parsed `SchemaRef` (e.g. CREATE TABLE) serialize into `schema_string` and then re-parse
-+// downstream in `TableConfiguration::try_new` via `parse_schema()`. Caching the parsed schema
-+// on `Metadata` would eliminate the round-trip.
- #[derive(Debug, Default, Clone, PartialEq, Eq, Serialize, Deserialize, ToSchema)]
- #[serde(rename_all = "camelCase")]
- #[internal_api]
-         TableProperties::from(self.configuration.iter())
-     }
- 
-+    /// Returns a new Metadata with the schema replaced, preserving all other fields.
-+    ///
-+    /// # Errors
-+    ///
-+    /// Returns an error if schema serialization fails.
-+    pub(crate) fn with_schema(self, schema: SchemaRef) -> DeltaResult<Self> {
-+        Ok(Self {
-+            schema_string: serde_json::to_string(&schema)?,
-+            ..self
-+        })
-+    }
-+
-     #[cfg(test)]
-     #[allow(clippy::too_many_arguments)]
-     pub(crate) fn new_unchecked(
\ No newline at end of file
kernel/src/engine/arrow_expression/evaluate_expression.rs
@@ -1,154 +0,0 @@
-diff --git a/kernel/src/engine/arrow_expression/evaluate_expression.rs b/kernel/src/engine/arrow_expression/evaluate_expression.rs
---- a/kernel/src/engine/arrow_expression/evaluate_expression.rs
-+++ b/kernel/src/engine/arrow_expression/evaluate_expression.rs
-         (Literal(scalar), _) => {
-             validate_array_type(scalar.to_array(batch.num_rows())?, result_type)
-         }
--        (Column(name), _) => {
--            // Column extraction uses ordinal-based struct validation because column mapping
--            // can cause physical/logical name mismatches. apply_schema handles renaming.
--            let arr = extract_column(batch, name)?;
--            if let Some(expected) = result_type {
--                ensure_data_types(expected, arr.data_type(), ValidationMode::TypesOnly)?;
--            }
--            Ok(arr)
--        }
-+        (Column(name), _) => validate_array_type(extract_column(batch, name)?, result_type),
-         (Struct(fields, nullability), Some(DataType::Struct(output_schema))) => {
-             evaluate_struct_expression(fields, batch, output_schema, nullability.as_ref())
-         }
-     }
- 
-     #[test]
--    fn column_extract_struct_with_mismatched_field_names() {
-+    fn column_extract_struct_rejects_mismatched_field_names() {
-         let batch = make_struct_batch(
-             vec![
-                 ArrowField::new("col-abc-001", ArrowDataType::Int64, true),
-             ],
-         );
- 
--        // Logical names differ from physical names due to column mapping
-         let logical_type = DataType::try_struct_type([
-             StructField::nullable("my_column", DataType::LONG),
-             StructField::nullable("other_column", DataType::LONG),
- 
-         let expr = column_expr!("stats");
-         let result = evaluate_expression(&expr, &batch, Some(&logical_type));
--
--        // Ordinal-based validation passes: same field count and types by position.
--        // The downstream apply_schema transformation handles renaming.
--        let arr = result.expect("should succeed with mismatched names but matching types");
--        let struct_arr = arr.as_any().downcast_ref::<StructArray>().unwrap();
--        assert_eq!(struct_arr.num_columns(), 2);
--        assert_eq!(struct_arr.len(), 2);
--    }
--
--    #[test]
--    fn column_extract_struct_rejects_mismatched_field_count() {
--        let batch = make_struct_batch(
--            vec![ArrowField::new("col-abc-001", ArrowDataType::Int64, true)],
--            vec![Arc::new(Int64Array::from(vec![Some(1), Some(2)]))],
--        );
--
--        let logical_type = DataType::try_struct_type([
--            StructField::nullable("a", DataType::LONG),
--            StructField::nullable("b", DataType::LONG),
--        ])
--        .unwrap();
--
--        let expr = column_expr!("stats");
--        let result = evaluate_expression(&expr, &batch, Some(&logical_type));
--        assert_result_error_with_message(result, "Struct field count mismatch");
-+        assert_result_error_with_message(result, "Missing Struct fields");
-     }
- 
-     #[test]
-     fn column_extract_struct_rejects_mismatched_child_types() {
-         let batch = make_struct_batch(
-             vec![
--                ArrowField::new("col-abc-001", ArrowDataType::Int64, true),
--                ArrowField::new("col-abc-002", ArrowDataType::Utf8, true),
-+                ArrowField::new("a", ArrowDataType::Int64, true),
-+                ArrowField::new("b", ArrowDataType::Utf8, true),
-             ],
-             vec![
-                 Arc::new(Int64Array::from(vec![Some(1)])),
-             ],
-         );
- 
--        // Expect two LONG columns, but the second arrow field is Utf8
-         let logical_type = DataType::try_struct_type([
-             StructField::nullable("a", DataType::LONG),
-             StructField::nullable("b", DataType::LONG),
-     }
- 
-     #[test]
--    fn column_extract_struct_with_matching_names_still_works() {
-+    fn column_extract_struct_with_matching_names_works() {
-         let batch = make_struct_batch(
-             vec![
-                 ArrowField::new("a", ArrowDataType::Int64, true),
-         assert!(result.is_ok());
-     }
- 
--    /// Exercises the exact code path from `get_add_transform_expr` where a `struct_from`
--    /// expression wraps `column_expr!("add.stats_parsed")`. When the checkpoint parquet has
--    /// stats_parsed with physical column names (e.g. `col-abc-001`) but the output schema
--    /// uses logical names (e.g. `id`), `evaluate_struct_expression` calls
--    /// `evaluate_expression(Column, struct_result_type)` with mismatched field names.
--    /// Without ordinal-based validation this fails with a name mismatch error.
-+    /// When a `struct_from` expression wraps a `Column` referencing stats_parsed, and the
-+    /// checkpoint parquet has physical column names (e.g. `col-abc-001`) but the output schema
-+    /// uses logical names (e.g. `id`), name-based validation correctly rejects the mismatch.
-     #[test]
--    fn struct_from_with_column_tolerates_nested_name_mismatch() {
--        // Build a batch mimicking checkpoint data: add.stats_parsed uses physical names
-+    fn struct_from_with_column_rejects_nested_name_mismatch() {
-         let stats_fields: Vec<ArrowField> = vec![
-             ArrowField::new("col-abc-001", ArrowDataType::Int64, true),
-             ArrowField::new("col-abc-002", ArrowDataType::Int64, true),
-         )]);
-         let batch = RecordBatch::try_new(Arc::new(schema), vec![Arc::new(add_struct)]).unwrap();
- 
--        // struct_from mimicking get_add_transform_expr: wraps a Column referencing stats_parsed
-         let expr = Expr::struct_from([
-             column_expr_ref!("add.path"),
-             column_expr_ref!("add.stats_parsed"),
-         .unwrap();
- 
-         let result = evaluate_expression(&expr, &batch, Some(&output_type));
--        result.expect("struct_from with Column sub-expression should tolerate field name mismatch");
--    }
--
--    #[test]
--    fn column_extract_nested_struct_with_mismatched_names() {
--        let inner_fields = vec![ArrowField::new("phys-inner", ArrowDataType::Int64, true)];
--        let inner_struct = ArrowDataType::Struct(inner_fields.clone().into());
--        let batch = make_struct_batch(
--            vec![ArrowField::new("phys-outer", inner_struct, true)],
--            vec![Arc::new(
--                StructArray::try_new(
--                    inner_fields.into(),
--                    vec![Arc::new(Int64Array::from(vec![Some(42)]))],
--                    None,
--                )
--                .unwrap(),
--            )],
--        );
--
--        let logical_type = DataType::try_struct_type([StructField::nullable(
--            "logical_outer",
--            DataType::struct_type_unchecked([StructField::nullable(
--                "logical_inner",
--                DataType::LONG,
--            )]),
--        )])
--        .unwrap();
--
--        let expr = column_expr!("stats");
--        let result = evaluate_expression(&expr, &batch, Some(&logical_type));
--        assert!(result.is_ok());
-+        assert_result_error_with_message(result, "Missing Struct fields");
-     }
- }
\ No newline at end of file
kernel/src/schema/validation.rs
@@ -1,48 +0,0 @@
-diff --git a/kernel/src/schema/validation.rs b/kernel/src/schema/validation.rs
---- a/kernel/src/schema/validation.rs
-+++ b/kernel/src/schema/validation.rs
--//! Schema validation utilities for Delta table creation.
-+//! Schema validation utilities shared by table creation and schema evolution.
- //!
- //! Validates schemas per the Delta protocol specification.
- 
- /// These characters have special meaning in Parquet schema syntax.
- const INVALID_PARQUET_CHARS: &[char] = &[' ', ',', ';', '{', '}', '(', ')', '\n', '\t', '='];
- 
--/// Validates a schema for table creation.
-+/// Validates a schema for CREATE TABLE or ALTER TABLE.
- ///
- /// Performs the following checks:
- /// 1. Schema is non-empty
- /// 3. Column names contain only valid characters
- /// 4. Rejects fields with `delta.invariants` metadata (SQL expression invariants are not supported
- ///    by kernel; see `TableConfiguration::ensure_write_supported`)
--pub(crate) fn validate_schema_for_create(
-+pub(crate) fn validate_schema(
-     schema: &StructType,
-     column_mapping_mode: ColumnMappingMode,
- ) -> DeltaResult<()> {
-     #[case::dot_in_name_with_cm(schema_with_dot(), ColumnMappingMode::Name)]
-     #[case::different_struct_children(schema_different_struct_children(), ColumnMappingMode::None)]
-     fn valid_schema_accepted(#[case] schema: StructType, #[case] cm: ColumnMappingMode) {
--        assert!(validate_schema_for_create(&schema, cm).is_ok());
-+        assert!(validate_schema(&schema, cm).is_ok());
-     }
- 
-     // === Invalid schemas ===
-         #[case] cm: ColumnMappingMode,
-         #[case] expected_errs: &[&str],
-     ) {
--        let result = validate_schema_for_create(&schema, cm);
-+        let result = validate_schema(&schema, cm);
-         assert!(result.is_err());
-         let err = result.unwrap_err().to_string();
-         for expected in expected_errs {
-     #[case::array_nested(schema_array_nested_invariant(), "arr.child")]
-     #[case::map_nested(schema_map_nested_invariant(), "map.child")]
-     fn invariants_metadata_rejected(#[case] schema: StructType, #[case] expected_path: &str) {
--        let result = validate_schema_for_create(&schema, ColumnMappingMode::None);
-+        let result = validate_schema(&schema, ColumnMappingMode::None);
-         let err = result.expect_err("expected delta.invariants metadata rejection");
-         let msg = err.to_string();
-         assert!(
\ No newline at end of file
kernel/src/snapshot/mod.rs
@@ -1,27 +0,0 @@
-diff --git a/kernel/src/snapshot/mod.rs b/kernel/src/snapshot/mod.rs
---- a/kernel/src/snapshot/mod.rs
-+++ b/kernel/src/snapshot/mod.rs
- use crate::table_configuration::{InCommitTimestampEnablement, TableConfiguration};
- use crate::table_features::{physical_to_logical_column_name, ColumnMappingMode, TableFeature};
- use crate::table_properties::TableProperties;
-+use crate::transaction::builder::alter_table::AlterTableTransactionBuilder;
- use crate::transaction::Transaction;
- use crate::utils::require;
- use crate::{DeltaResult, Engine, Error, LogCompactionWriter, Version};
-         Transaction::try_new_existing_table(self, committer, engine)
-     }
- 
-+    /// Creates a builder for altering this table's metadata. Currently supports schema change
-+    /// operations.
-+    ///
-+    /// The returned builder allows chaining operations before building an
-+    /// [`AlterTableTransaction`] that can be committed.
-+    ///
-+    /// [`AlterTableTransaction`]: crate::transaction::AlterTableTransaction
-+    pub fn alter_table(self: Arc<Self>) -> AlterTableTransactionBuilder {
-+        AlterTableTransactionBuilder::new(self)
-+    }
-+
-     /// Fetch the latest version of the provided `application_id` for this snapshot. Filters the
-     /// txn based on the delta.setTransactionRetentionDuration property and lastUpdated.
-     ///
\ No newline at end of file
kernel/src/transaction/alter_table.rs
@@ -1,81 +0,0 @@
-diff --git a/kernel/src/transaction/alter_table.rs b/kernel/src/transaction/alter_table.rs
-new file mode 100644
---- /dev/null
-+++ b/kernel/src/transaction/alter_table.rs
-+//! Alter table transaction types and constructor.
-+//!
-+//! This module defines the [`AlterTableTransaction`] type alias and the
-+//! [`try_new_alter_table`](AlterTableTransaction::try_new_alter_table) constructor.
-+//! The builder logic lives in [`builder::alter_table`](super::builder::alter_table).
-+
-+#![allow(unreachable_pub)]
-+
-+use std::marker::PhantomData;
-+use std::sync::OnceLock;
-+
-+use crate::committer::Committer;
-+use crate::snapshot::SnapshotRef;
-+use crate::table_configuration::TableConfiguration;
-+use crate::transaction::{AlterTable, Transaction};
-+use crate::utils::current_time_ms;
-+use crate::DeltaResult;
-+
-+/// A type alias for alter-table transactions.
-+///
-+/// This provides a restricted API surface that only exposes operations valid during ALTER
-+/// commands. Data file operations are not available at compile time because `AlterTable`
-+/// does not implement [`SupportsDataFiles`](super::SupportsDataFiles).
-+pub type AlterTableTransaction = Transaction<AlterTable>;
-+
-+impl AlterTableTransaction {
-+    /// Create a new transaction for altering a table's schema. Produces a metadata-only commit
-+    /// that emits an updated Metadata action with the evolved schema.
-+    ///
-+    /// The `effective_table_config` is the evolved table configuration (new schema, same
-+    /// protocol). It must be fully validated before calling this constructor (e.g. schema
-+    /// operations applied, protocol feature checks passed). The `read_snapshot` provides the
-+    /// pre-commit table state (version, previous protocol/metadata, ICT timestamps) used for
-+    /// commit versioning and post-commit snapshots.
-+    ///
-+    /// This is typically called via `AlterTableTransactionBuilder::build()` rather than directly.
-+    pub(crate) fn try_new_alter_table(
-+        read_snapshot: SnapshotRef,
-+        effective_table_config: TableConfiguration,
-+        committer: Box<dyn Committer>,
-+    ) -> DeltaResult<Self> {
-+        let span = tracing::info_span!(
-+            "txn",
-+            path = %read_snapshot.table_root(),
-+            read_version = read_snapshot.version(),
-+            operation = "ALTER TABLE",
-+        );
-+
-+        Ok(Transaction {
-+            span,
-+            read_snapshot_opt: Some(read_snapshot),
-+            effective_table_config,
-+            should_emit_protocol: false,
-+            should_emit_metadata: true,
-+            committer,
-+            operation: Some("ALTER TABLE".to_string()),
-+            engine_info: None,
-+            add_files_metadata: vec![],
-+            remove_files_metadata: vec![],
-+            set_transactions: vec![],
-+            commit_timestamp: current_time_ms()?,
-+            user_domain_metadata_additions: vec![],
-+            system_domain_metadata_additions: vec![],
-+            user_domain_removals: vec![],
-+            data_change: false,
-+            shared_write_state: OnceLock::new(),
-+            engine_commit_info: None,
-+            // TODO(#2446): match delta-spark's per-op isBlindAppend policy
-+            // (ADD/DROP/DROP NOT NULL -> true, SET NOT NULL -> false). Hardcoded false for
-+            // now: safe, but misses the true-case optimization delta-spark applies.
-+            is_blind_append: false,
-+            dv_matched_files: vec![],
-+            physical_clustering_columns: None,
-+            _state: PhantomData,
-+        })
-+    }
-+}
\ No newline at end of file
kernel/src/transaction/builder/alter_table.rs
@@ -1,168 +0,0 @@
-diff --git a/kernel/src/transaction/builder/alter_table.rs b/kernel/src/transaction/builder/alter_table.rs
-new file mode 100644
---- /dev/null
-+++ b/kernel/src/transaction/builder/alter_table.rs
-+//! Builder for ALTER TABLE (schema evolution) transactions.
-+//!
-+//! This module contains [`AlterTableTransactionBuilder`], which uses a type-state pattern to
-+//! enforce valid operation chaining at compile time.
-+//!
-+//! # Type States
-+//!
-+//! - [`Ready`]: Initial state. Operations are available, but `build()` is not (at least one
-+//!   operation is required).
-+//! - [`Modifying`]: After any chainable schema operation. More ops can be chained, and `build()` is
-+//!   available. See [`AlterTableTransactionBuilder<Modifying>`] for ops.
-+//!
-+//! # Transitions
-+//!
-+//! Each `impl` block below is gated by a state bound and documents which operations that
-+//! state enables. Chainable schema operations live on `impl<S: Chainable>` and transition
-+//! the builder to a chainable state; `build()` lives on states that are buildable.
-+//!
-+//! ```ignore
-+//! // Allowed: at least one op queued before build().
-+//! snapshot.alter_table().add_column(field).build(engine, committer)?;
-+//!
-+//! // Not allowed: build() is not defined on Ready (no ops queued).
-+//! snapshot.alter_table().build(engine, committer)?;  // compile error
-+//! ```
-+
-+use std::marker::PhantomData;
-+use std::sync::Arc;
-+
-+use crate::committer::Committer;
-+use crate::schema::StructField;
-+use crate::snapshot::SnapshotRef;
-+use crate::table_configuration::TableConfiguration;
-+use crate::table_features::Operation;
-+use crate::transaction::alter_table::AlterTableTransaction;
-+use crate::transaction::schema_evolution::{
-+    apply_schema_operations, SchemaEvolutionResult, SchemaOperation,
-+};
-+use crate::{DeltaResult, Engine};
-+
-+/// Initial state: `build()` is not yet available (at least one operation is required).
-+/// See [`Chainable`] for the operations available on this state.
-+pub struct Ready;
-+
-+/// State after at least one operation has been added. `build()` is available.
-+/// See [`Chainable`] for the operations available on this state.
-+pub struct Modifying;
-+
-+/// Marker trait for builder states that accept chainable schema operations. Grouping states
-+/// under one bound lets each op (like `add_column`) live on a single `impl<S: Chainable>`
-+/// block -- chainable states share the body rather than duplicating it per state.
-+///
-+/// Sealed: external types cannot implement this, keeping the set of chainable states closed.
-+pub trait Chainable: sealed::Sealed {}
-+impl Chainable for Ready {}
-+impl Chainable for Modifying {}
-+
-+mod sealed {
-+    pub trait Sealed {}
-+    impl Sealed for super::Ready {}
-+    impl Sealed for super::Modifying {}
-+}
-+
-+/// Builder for constructing an [`AlterTableTransaction`] with schema evolution operations.
-+///
-+/// Uses a type-state pattern (`S`) to enforce at compile time:
-+/// - At least one schema operation must be queued before `build()` is callable.
-+/// - Only operations valid for the current state can be chained. This will disallow incompatibel
-+///   chaining.
-+pub struct AlterTableTransactionBuilder<S = Ready> {
-+    snapshot: SnapshotRef,
-+    operations: Vec<SchemaOperation>,
-+    // PhantomData marker for builder state (Ready or Modifying).
-+    // Zero-sized; only affects which methods are available at compile time.
-+    _state: PhantomData<S>,
-+}
-+
-+impl<S> AlterTableTransactionBuilder<S> {
-+    // Reconstructs the builder with a different PhantomData marker, changing which methods
-+    // are available at compile time (e.g. Ready -> Modifying enables `build()`). All real
-+    // fields are moved as-is; only the zero-sized type state changes.
-+    //
-+    // `T` (distinct from the struct's `S`) lets the caller pick the target state:
-+    // `self.transition::<Modifying>()` returns `AlterTableTransactionBuilder<Modifying>`.
-+    fn transition<T>(self) -> AlterTableTransactionBuilder<T> {
-+        AlterTableTransactionBuilder {
-+            snapshot: self.snapshot,
-+            operations: self.operations,
-+            _state: PhantomData,
-+        }
-+    }
-+}
-+
-+impl AlterTableTransactionBuilder<Ready> {
-+    /// Create a new builder from a snapshot.
-+    pub(crate) fn new(snapshot: SnapshotRef) -> Self {
-+        AlterTableTransactionBuilder {
-+            snapshot,
-+            operations: Vec::new(),
-+            _state: PhantomData,
-+        }
-+    }
-+}
-+
-+impl<S: Chainable> AlterTableTransactionBuilder<S> {
-+    /// Add a new top-level column to the table schema.
-+    ///
-+    /// The field must not already exist in the schema (case-insensitive). The field must be
-+    /// nullable because existing data files do not contain this column and will read NULL for it.
-+    /// These constraints are validated during [`build()`](AlterTableTransactionBuilder::build).
-+    pub fn add_column(mut self, field: StructField) -> AlterTableTransactionBuilder<Modifying> {
-+        self.operations.push(SchemaOperation::AddColumn { field });
-+        self.transition()
-+    }
-+}
-+
-+impl AlterTableTransactionBuilder<Modifying> {
-+    /// Validate and apply schema operations, then build the [`AlterTableTransaction`].
-+    ///
-+    /// This method:
-+    /// 1. Validates the table supports writes
-+    /// 2. Applies each operation sequentially against the evolving schema
-+    /// 3. Constructs new Metadata action with evolved schema
-+    /// 4. Builds the evolved table configuration
-+    /// 5. Creates the transaction
-+    ///
-+    /// # Errors
-+    ///
-+    /// - Any individual operation fails validation (see per-method errors above)
-+    /// - Table does not support writes (unsupported features)
-+    /// - The evolved schema requires protocol features not enabled on the table (e.g. adding a
-+    ///   `timestampNtz` column without the `timestampNtz` feature)
-+    pub fn build(
-+        self,
-+        _engine: &dyn Engine,
-+        committer: Box<dyn Committer>,
-+    ) -> DeltaResult<AlterTableTransaction> {
-+        let table_config = self.snapshot.table_configuration();
-+        // Rejects writes to tables kernel can't safely commit to: writer version out of
-+        // kernel's supported range, unsupported writer features, or schemas with SQL-expression
-+        // invariants. Runs on the pre-alter snapshot; future ALTER variants that change the
-+        // protocol must also re-check this on the evolved `TableConfiguration`.
-+        table_config.ensure_operation_supported(Operation::Write)?;
-+
-+        let schema = Arc::unwrap_or_clone(table_config.logical_schema());
-+        let SchemaEvolutionResult {
-+            schema: evolved_schema,
-+        } = apply_schema_operations(schema, self.operations, table_config.column_mapping_mode())?;
-+
-+        let evolved_metadata = table_config
-+            .metadata()
-+            .clone()
-+            .with_schema(evolved_schema.clone())?;
-+
-+        // Validates the evolved metadata against the protocol.
-+        let evolved_table_config = TableConfiguration::try_new_with_schema(
-+            table_config,
-+            evolved_metadata,
-+            evolved_schema,
-+        )?;
-+
-+        AlterTableTransaction::try_new_alter_table(self.snapshot, evolved_table_config, committer)
-+    }
-+}
\ No newline at end of file
kernel/src/transaction/builder/create_table.rs
@@ -1,27 +0,0 @@
-diff --git a/kernel/src/transaction/builder/create_table.rs b/kernel/src/transaction/builder/create_table.rs
---- a/kernel/src/transaction/builder/create_table.rs
-+++ b/kernel/src/transaction/builder/create_table.rs
- use crate::clustering::{create_clustering_domain_metadata, validate_clustering_columns};
- use crate::committer::Committer;
- use crate::expressions::ColumnName;
--use crate::schema::validation::validate_schema_for_create;
-+use crate::schema::validation::validate_schema;
- use crate::schema::variant_utils::schema_contains_variant_type;
- use crate::schema::{
-     normalize_column_names_to_schema_casing, schema_contains_non_null_fields, DataType, SchemaRef,
- /// compatible with Spark readers/writers.
- ///
- /// Explicit `delta.invariants` metadata annotations are rejected by
--/// `validate_schema_for_create`, so this only flips on the feature for nullability-driven
-+/// `validate_schema`, so this only flips on the feature for nullability-driven
- /// invariants. Kernel does not itself enforce the null mask at write time -- it relies on
- /// the engine's `ParquetHandler` to do so. Kernel's default `ParquetHandler` uses
- /// `arrow-rs`, whose `RecordBatch::try_new` rejects null values in fields marked
-             maybe_apply_column_mapping_for_table_create(&self.schema, &mut validated)?;
- 
-         // Validate schema (non-empty, column names, duplicates, no `delta.invariants` metadata)
--        validate_schema_for_create(&effective_schema, column_mapping_mode)?;
-+        validate_schema(&effective_schema, column_mapping_mode)?;
- 
-         // Validate data layout and resolve column names (physical for clustering, logical
-         // for partitioning). Adds required table features for clustering.
\ No newline at end of file
kernel/src/transaction/builder/mod.rs
@@ -1,8 +0,0 @@
-diff --git a/kernel/src/transaction/builder/mod.rs b/kernel/src/transaction/builder/mod.rs
---- a/kernel/src/transaction/builder/mod.rs
-+++ b/kernel/src/transaction/builder/mod.rs
- // and for tests. Also allow dead_code since these are used by integration tests.
- #![allow(unreachable_pub, dead_code)]
- 
-+pub mod alter_table;
- pub mod create_table;
\ No newline at end of file
kernel/src/transaction/mod.rs
@@ -1,35 +0,0 @@
-diff --git a/kernel/src/transaction/mod.rs b/kernel/src/transaction/mod.rs
---- a/kernel/src/transaction/mod.rs
-+++ b/kernel/src/transaction/mod.rs
- #[cfg(not(feature = "internal-api"))]
- pub(crate) mod data_layout;
- 
-+pub(crate) mod alter_table;
-+pub use alter_table::AlterTableTransaction;
- mod commit_info;
- mod domain_metadata;
-+pub(crate) mod schema_evolution;
- mod stats_verifier;
- mod update;
- mod write_context;
- #[derive(Debug)]
- pub struct CreateTable;
- 
-+/// Marker type for alter-table (schema evolution) transactions.
-+///
-+/// Transactions in this state perform metadata-only commits. Data file operations are not
-+/// available at compile time because `AlterTable` does not implement [`SupportsDataFiles`].
-+#[derive(Debug)]
-+pub struct AlterTable;
-+
- /// Marker trait for transaction states that support data file operations.
- ///
- /// Only transaction types that implement this trait can access methods for adding, removing, or
- 
-     // Note: Additional test coverage for partial file matching (where some files in a scan
-     // have DV updates but others don't) is provided by the end-to-end integration test
--    // kernel/tests/dv.rs and kernel/tests/write.rs, which exercises
-+    // kernel/tests/dv.rs and kernel/tests/write_remove_dv.rs, which exercise
-     // the full deletion vector write workflow including the DvMatchVisitor logic.
- 
-     #[test]
\ No newline at end of file
kernel/src/transaction/schema_evolution.rs
@@ -1,190 +0,0 @@
-diff --git a/kernel/src/transaction/schema_evolution.rs b/kernel/src/transaction/schema_evolution.rs
-new file mode 100644
---- /dev/null
-+++ b/kernel/src/transaction/schema_evolution.rs
-+//! Schema evolution operations for ALTER TABLE.
-+//!
-+//! This module defines the [`SchemaOperation`] enum and the [`apply_schema_operations`] function
-+//! that validates and applies schema changes to produce an evolved schema.
-+
-+use indexmap::IndexMap;
-+
-+use crate::error::Error;
-+use crate::schema::validation::validate_schema;
-+use crate::schema::{SchemaRef, StructField, StructType};
-+use crate::table_features::ColumnMappingMode;
-+use crate::DeltaResult;
-+
-+/// A schema evolution operation to be applied during ALTER TABLE.
-+///
-+/// Operations are validated and applied in order during
-+/// [`apply_schema_operations`]. Each operation sees the schema state after all prior operations
-+/// have been applied.
-+#[derive(Debug, Clone)]
-+pub(crate) enum SchemaOperation {
-+    /// Add a top-level column.
-+    AddColumn { field: StructField },
-+}
-+
-+/// The result of applying schema operations.
-+#[derive(Debug)]
-+pub(crate) struct SchemaEvolutionResult {
-+    /// The evolved schema after all operations are applied.
-+    pub schema: SchemaRef,
-+}
-+
-+/// Applies a sequence of schema operations to the given schema, returning the evolved schema.
-+///
-+/// Operations are applied sequentially: each one validates against and modifies the schema
-+/// produced by all preceding operations, not the original input schema.
-+///
-+/// # Errors
-+///
-+/// Returns an error if any operation fails validation. The error message identifies which
-+/// operation failed and why.
-+pub(crate) fn apply_schema_operations(
-+    schema: StructType,
-+    operations: Vec<SchemaOperation>,
-+    column_mapping_mode: ColumnMappingMode,
-+) -> DeltaResult<SchemaEvolutionResult> {
-+    let cm_enabled = column_mapping_mode != ColumnMappingMode::None;
-+    // IndexMap preserves field insertion order. Keys are lowercased for case-insensitive
-+    // duplicate detection; StructFields retain their original casing.
-+    let mut fields: IndexMap<String, StructField> = schema
-+        .into_fields()
-+        .map(|f| (f.name().to_lowercase(), f))
-+        .collect();
-+
-+    for op in operations {
-+        match op {
-+            // Protocol feature checks for the field's data type (e.g. `timestampNtz`) happen
-+            // later when the caller builds a new TableConfiguration from the evolved schema --
-+            // the alter is rejected if the table doesn't already have the required feature
-+            // enabled. This matches Spark, which also rejects with
-+            // `DELTA_FEATURES_REQUIRE_MANUAL_ENABLEMENT` and requires the user to enable the
-+            // feature explicitly before adding such a column.
-+            SchemaOperation::AddColumn { field } => {
-+                // TODO: support column mapping for add_column (assign ID + physical name,
-+                // update delta.columnMapping.maxColumnId).
-+                if cm_enabled {
-+                    return Err(Error::unsupported(
-+                        "ALTER TABLE add_column is not yet supported on tables with \
-+                         column mapping enabled",
-+                    ));
-+                }
-+                if field.is_metadata_column() {
-+                    return Err(Error::schema(format!(
-+                        "Cannot add column '{}': metadata columns are not allowed in \
-+                         a table schema",
-+                        field.name()
-+                    )));
-+                }
-+                let key = field.name().to_lowercase();
-+                if fields.contains_key(&key) {
-+                    return Err(Error::schema(format!(
-+                        "Cannot add column '{}': a column with that name already exists",
-+                        field.name()
-+                    )));
-+                }
-+                // Validate field is nullable (Delta protocol requires added columns to be
-+                // nullable so existing data files can return NULL for the new column)
-+                // NOTE: non-nullable columns depend on invariants feature
-+                if !field.is_nullable() {
-+                    return Err(Error::schema(format!(
-+                        "Cannot add non-nullable column '{}'. Added columns must be nullable \
-+                         because existing data files do not contain this column.",
-+                        field.name()
-+                    )));
-+                }
-+                fields.insert(key, field);
-+            }
-+        }
-+    }
-+
-+    let evolved_schema = StructType::try_new(fields.into_values())?;
-+
-+    validate_schema(&evolved_schema, column_mapping_mode)?;
-+    Ok(SchemaEvolutionResult {
-+        schema: evolved_schema.into(),
-+    })
-+}
-+
-+#[cfg(test)]
-+mod tests {
-+    use rstest::rstest;
-+
-+    use super::*;
-+    use crate::schema::{DataType, MetadataColumnSpec, StructField, StructType};
-+
-+    fn simple_schema() -> StructType {
-+        StructType::try_new(vec![
-+            StructField::not_null("id", DataType::INTEGER),
-+            StructField::nullable("name", DataType::STRING),
-+        ])
-+        .unwrap()
-+    }
-+
-+    fn add_col(name: &str, nullable: bool) -> SchemaOperation {
-+        let field = if nullable {
-+            StructField::nullable(name, DataType::STRING)
-+        } else {
-+            StructField::not_null(name, DataType::STRING)
-+        };
-+        SchemaOperation::AddColumn { field }
-+    }
-+
-+    // Builds a struct column whose nested leaf field has the given name. Used to prove that
-+    // `validate_schema` (not just the top-level dup check or `StructType::try_new`) is
-+    // reached from `apply_schema_operations`.
-+    fn add_struct_with_nested_leaf(name: &str, leaf_name: &str) -> SchemaOperation {
-+        let inner =
-+            StructType::try_new(vec![StructField::nullable(leaf_name, DataType::STRING)]).unwrap();
-+        SchemaOperation::AddColumn {
-+            field: StructField::nullable(name, inner),
-+        }
-+    }
-+
-+    #[rstest]
-+    #[case::dup_exact(vec![add_col("name", true)], "already exists")]
-+    #[case::dup_case_insensitive(vec![add_col("Name", true)], "already exists")]
-+    #[case::dup_within_batch(
-+        vec![add_col("email", true), add_col("email", true)],
-+        "already exists"
-+    )]
-+    #[case::non_nullable(vec![add_col("age", false)], "non-nullable")]
-+    #[case::invalid_parquet_char(vec![add_col("foo,bar", true)], "invalid character")]
-+    #[case::nested_invalid_parquet_char(
-+        vec![add_struct_with_nested_leaf("addr", "bad,leaf")],
-+        "invalid character"
-+    )]
-+    #[case::metadata_column(
-+        vec![SchemaOperation::AddColumn {
-+            field: StructField::create_metadata_column("row_idx", MetadataColumnSpec::RowIndex),
-+        }],
-+        "metadata columns are not allowed"
-+    )]
-+    fn apply_schema_operations_rejects(
-+        #[case] ops: Vec<SchemaOperation>,
-+        #[case] error_contains: &str,
-+    ) {
-+        let err =
-+            apply_schema_operations(simple_schema(), ops, ColumnMappingMode::None).unwrap_err();
-+        assert!(err.to_string().contains(error_contains));
-+    }
-+
-+    #[rstest]
-+    #[case::single(vec![add_col("email", true)], &["id", "name", "email"])]
-+    #[case::multiple(
-+        vec![add_col("email", true), add_col("age", true)],
-+        &["id", "name", "email", "age"]
-+    )]
-+    fn apply_schema_operations_succeeds(
-+        #[case] ops: Vec<SchemaOperation>,
-+        #[case] expected_names: &[&str],
-+    ) {
-+        let result =
-+            apply_schema_operations(simple_schema(), ops, ColumnMappingMode::None).unwrap();
-+        let actual: Vec<&str> = result.schema.fields().map(|f| f.name().as_str()).collect();
-+        assert_eq!(&actual, expected_names);
-+    }
-+}
\ No newline at end of file
kernel/tests/README.md
@@ -1,31 +0,0 @@
-diff --git a/kernel/tests/README.md b/kernel/tests/README.md
---- a/kernel/tests/README.md
-+++ b/kernel/tests/README.md
- 
- | Table | Location | Schema | Protocol (R/W) | Features | Description | Tests |
- |-------|----------|--------|----------|----------|-------------|-------|
--| `table-with-dv-small` | data/ | `value: int` | v3/v7 | r:`deletionVectors` w:`deletionVectors` | 10 rows, 2 soft-deleted by DV, 8 visible. Most heavily referenced test table. | `dv.rs::test_table_scan(with_dv)`, `write.rs::test_remove_files_adds_expected_entries`, `write.rs::test_update_deletion_vectors_adds_expected_entries`, `read.rs::with_predicate_and_removes`, `path.rs::test_to_uri/test_child/test_child_escapes`, `snapshot.rs::test_snapshot_read_metadata/test_new_snapshot/test_snapshot_new_from/test_read_table_with_missing_last_checkpoint/test_log_compaction_writer`, `deletion_vector.rs` tests, `transaction/mod.rs::setup_dv_enabled_table/test_add_files_schema/test_new_deletion_vector_path`, `default/parquet.rs` read test, `default/json.rs` read test, `log_compaction/tests.rs::create_mock_snapshot`, `resolve_dvs.rs` tests |
-+| `table-with-dv-small` | data/ | `value: int` | v3/v7 | r:`deletionVectors` w:`deletionVectors` | 10 rows, 2 soft-deleted by DV, 8 visible. Most heavily referenced test table. | `dv.rs::test_table_scan(with_dv)`, `write_remove_dv.rs::test_remove_files_adds_expected_entries`, `write_remove_dv.rs::test_update_deletion_vectors_adds_expected_entries`, `read.rs::with_predicate_and_removes`, `path.rs::test_to_uri/test_child/test_child_escapes`, `snapshot.rs::test_snapshot_read_metadata/test_new_snapshot/test_snapshot_new_from/test_read_table_with_missing_last_checkpoint/test_log_compaction_writer`, `deletion_vector.rs` tests, `transaction/mod.rs::setup_dv_enabled_table/test_add_files_schema/test_new_deletion_vector_path`, `default/parquet.rs` read test, `default/json.rs` read test, `log_compaction/tests.rs::create_mock_snapshot`, `resolve_dvs.rs` tests |
- | `table-without-dv-small` | data/ | `value: long` | v1/v2 | | 10 rows, all visible. Companion to table-with-dv-small. | `dv.rs::test_table_scan(without_dv)`, `transaction/mod.rs::setup_non_dv_table/create_existing_table_txn/test_commit_io_error_returns_retryable_transaction`, `sequential_phase.rs::test_sequential_v2_with_commits_only/test_sequential_finish_before_exhaustion_error`, `parallel_phase.rs` tests, `scan/tests.rs::test_scan_metadata_paths/test_scan_metadata/test_scan_metadata_from_same_version` |
- | `with-short-dv` | data/ | `id: long, value: string, timestamp: timestamp, rand: double` | v3/v7 | r:`deletionVectors` w:`deletionVectors` | 2 files x 5 rows. First file has inline DV (`storageType="u"`) deleting 3 rows. | `read.rs::short_dv` |
- | `dv-partitioned-with-checkpoint` | golden_data/ | `value: int, part: int` partitioned by `part` | v3/v7 | r:`deletionVectors` w:`deletionVectors` | DVs on a partitioned table with a checkpoint | `golden_tables.rs::golden_test!` |
- 
- | Table | Location | Schema | Protocol (R/W) | Features | Description | Tests |
- |-------|----------|--------|----------|----------|-------------|-------|
--| `partition_cm/none` | data/ | `value: int, category: string` partitioned by `category` | v1/v1 | `columnMapping.mode=none` | Partitioned write with CM disabled | `write.rs::test_column_mapping_partitioned_write(cm_none)` |
--| `partition_cm/id` | data/ | `value: int, category: string` partitioned by `category` | v3/v7 | r:`columnMapping` w:`columnMapping`, `columnMapping.mode=id` | Partitioned write with CM id mode | `write.rs::test_column_mapping_partitioned_write(cm_id)` |
--| `partition_cm/name` | data/ | `value: int, category: string` partitioned by `category` | v3/v7 | r:`columnMapping` w:`columnMapping`, `columnMapping.mode=name` | Partitioned write with CM name mode | `write.rs::test_column_mapping_partitioned_write(cm_name)` |
-+| `partition_cm/none` | data/ | `value: int, category: string` partitioned by `category` | v1/v1 | `columnMapping.mode=none` | Partitioned write with CM disabled | `write_column_mapping.rs::test_column_mapping_partitioned_write(cm_none)` |
-+| `partition_cm/id` | data/ | `value: int, category: string` partitioned by `category` | v3/v7 | r:`columnMapping` w:`columnMapping`, `columnMapping.mode=id` | Partitioned write with CM id mode | `write_column_mapping.rs::test_column_mapping_partitioned_write(cm_id)` |
-+| `partition_cm/name` | data/ | `value: int, category: string` partitioned by `category` | v3/v7 | r:`columnMapping` w:`columnMapping`, `columnMapping.mode=name` | Partitioned write with CM name mode | `write_column_mapping.rs::test_column_mapping_partitioned_write(cm_name)` |
- | `table-with-columnmapping-mode-name` | golden_data/ | `ByteType: byte, ShortType: short, IntegerType: int, LongType: long, FloatType: float, DoubleType: double, decimal: decimal(10,2), BooleanType: boolean, StringType: string, BinaryType: binary, DateType: date, TimestampType: timestamp, nested_struct: struct{aa: string, ac: struct{aca: int}}, array_of_prims: array<int>, array_of_arrays: array<array<int>>, array_of_structs: array<struct{ab: long}>, map_of_prims: map<int,long>, map_of_rows: map<int,struct{ab: long}>, map_of_arrays: map<long,array<int>>` | v2/v5 | `columnMapping.mode=name` | Column mapping name mode | `golden_tables.rs::golden_test!` |
- | `table-with-columnmapping-mode-id` | golden_data/ | `ByteType: byte, ShortType: short, IntegerType: int, LongType: long, FloatType: float, DoubleType: double, decimal: decimal(10,2), BooleanType: boolean, StringType: string, BinaryType: binary, DateType: date, TimestampType: timestamp, nested_struct: struct{aa: string, ac: struct{aca: int}}, array_of_prims: array<int>, array_of_arrays: array<array<int>>, array_of_structs: array<struct{ab: long}>, map_of_prims: map<int,long>, map_of_rows: map<int,struct{ab: long}>, map_of_arrays: map<long,array<int>>` | v2/v5 | `columnMapping.mode=id` | Column mapping id mode | `golden_tables.rs::golden_test!` |
- 
- | Table | Location | Schema | Protocol (R/W) | Features | Description | Tests |
- |-------|----------|--------|----------|----------|-------------|-------|
- | `with_checkpoint_no_last_checkpoint` | data/ | `letter: string, int: long, date: date` | v1/v2 | `checkpointInterval=2` | Checkpoint at v2 but missing `_last_checkpoint` hint file | `snapshot.rs::test_read_table_with_checkpoint`, `scan/tests.rs::test_scan_with_checkpoint`, `sequential_phase.rs::test_sequential_checkpoint_no_commits`, `checkpoint_manifest.rs` tests, `sync/parquet.rs` test, `default/parquet.rs` test |
--| `external-table-different-nullability` | data/ | `i: int` | v1/v2 | `checkpointInterval=2` | Parquet files have different nullability than Delta schema; includes checkpoint | `write.rs::test_checkpoint_non_kernel_written_table` |
-+| `external-table-different-nullability` | data/ | `i: int` | v1/v2 | `checkpointInterval=2` | Parquet files have different nullability than Delta schema; includes checkpoint | `write_clustered.rs::test_checkpoint_non_kernel_written_table` |
- | `checkpoint` | golden_data/ | `intCol: int` | v1/v2 | | Basic checkpoint read | `golden_tables.rs::golden_test!(checkpoint_test)` |
- | `corrupted-last-checkpoint-kernel` | golden_data/ | `id: long` | v1/v2 | | Corrupted `_last_checkpoint` file | `golden_tables.rs::golden_test!` |
- | `multi-part-checkpoint` | golden_data/ | `id: long` | v1/v2 | `checkpointInterval=1` | Multi-part checkpoint files | `golden_tables.rs::golden_test!` |
\ No newline at end of file

... (truncated, output exceeded 60000 bytes)

Reproduce locally: git range-diff ac9dc19..10b79aa 6486bd2..52c80b6 | Disable: git config gitstack.push-range-diff false

Comment on lines 401 to +416
pub(crate) fn is_skipping_eligible_datatype(data_type: &PrimitiveType) -> bool {
matches!(
data_type,
&PrimitiveType::Byte
| &PrimitiveType::Short
| &PrimitiveType::Integer
| &PrimitiveType::Long
| &PrimitiveType::Float
| &PrimitiveType::Double
| &PrimitiveType::Date
| &PrimitiveType::Timestamp
| &PrimitiveType::TimestampNtz
| &PrimitiveType::String
| PrimitiveType::Decimal(_)
| PrimitiveType::Geometry(_)
| PrimitiveType::Geography(_)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should geo be eligible for data skipping?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

breaking-change Public API change that could cause downstream compilation failures. Requires a major version bump.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants