Skip to content

refactor: move data file methods behind SupportsDataFiles trait#2386

Merged
scottsand-db merged 1 commit intodelta-io:mainfrom
william-ch-databricks:stack/alter-table-2-supports-data-files
Apr 23, 2026
Merged

refactor: move data file methods behind SupportsDataFiles trait#2386
scottsand-db merged 1 commit intodelta-io:mainfrom
william-ch-databricks:stack/alter-table-2-supports-data-files

Conversation

@william-ch-databricks
Copy link
Copy Markdown
Collaborator

@william-ch-databricks william-ch-databricks commented Apr 14, 2026

Stacked PR

Use this link to review incremental changes.


What changes are proposed in this pull request?

Introduce a SupportsDataFiles marker trait to gate data-file methods at compile time. This prevents future metadata-only transaction types (like AlterTable) from accessing methods that add, remove, or configure data files.

Gated behind SupportsDataFiles (only ExistingTable and CreateTable):

Method Reason
stats_schema() Returns table-specific stats schema for writing files
stats_columns() Returns column names needing stats collection
generate_logical_to_physical() Builds partition column transform for file writing
get_write_context() Returns full write context (target dir, schemas, column mapping)
add_files() Queues file additions

Kept on impl<S> Transaction<S> (all transaction types):

Method Reason
commit() Shared commit path; handles empty data gracefully
with_data_change() / set_data_change() ALTER TABLE sets data_change: false
with_engine_info() Applicable to all commits
with_commit_info() Applicable to all commits
with_transaction_id() Applicable to all commits
with_domain_metadata() Schema changes may need domain metadata
add_files_schema() Read-only accessor; used by generate_adds() which handles empty state internally. Kept ungated so it remains available if it becomes dynamic in the future.
validate_add_files_stats() Internal; handles empty data with early return
generate_adds() Internal; returns early when no files queued
generate_remove_actions() Internal; returns early when no files queued
generate_dv_update_actions() Internal; returns early when no DV updates
generate_adds_for_dv_update() Internal; only called from generate_dv_update_actions
build_crc_delta() Internal; shared commit path
into_committed() / into_conflicted() / into_retryable() Internal; shared commit path

How was this change tested?

All existing tests pass unchanged. The trait adds compile-time safety with no runtime behavior change. A compile_fail doctest will be added in the follow-up PR that introduces AlterTable to verify that data file methods are rejected at compile time.

@william-ch-databricks william-ch-databricks force-pushed the stack/alter-table-2-supports-data-files branch 4 times, most recently from 1c48472 to 50130c3 Compare April 17, 2026 21:15
github-merge-queue Bot pushed a commit that referenced this pull request Apr 17, 2026
…2385)

## Stacked PR
Use this
[link](https://github.com/delta-io/delta-kernel-rs/pull/2385/files) to
review incremental changes.
-
[**stack/alter-table-1-refactor-state**](#2385)
[[Files
changed](https://github.com/delta-io/delta-kernel-rs/pull/2385/files)]
-
[stack/alter-table-2-supports-data-files](#2386)
[[Files
changed](https://github.com/delta-io/delta-kernel-rs/pull/2386/files/029d6672081d32caea314dca61f3688e841f3036..50130c30c4378997fbf3d0ef7426ce23992e5c85)]
-
[stack/alter-table-3-framework-add-column](#2387)
[[Files
changed](https://github.com/delta-io/delta-kernel-rs/pull/2387/files/50130c30c4378997fbf3d0ef7426ce23992e5c85..eaa5277d025434aa78b79beed1a7cedfe82aa621)]
-
[stack/alter-table-4-set-nullable](#2388)
[[Files
changed](https://github.com/delta-io/delta-kernel-rs/pull/2388/files/eaa5277d025434aa78b79beed1a7cedfe82aa621..7506e1273ef349fbf628e4a6db5d0e7c1b4cbd8b)]
-
[stack/alter-table-5-column-mapping-add](#2389)
[[Files
changed](https://github.com/delta-io/delta-kernel-rs/pull/2389/files/7506e1273ef349fbf628e4a6db5d0e7c1b4cbd8b..d943ca2945da1ec9c354db57c889d93456381dee)]
-
[stack/alter-table-6-drop-column](#2390)
[[Files
changed](https://github.com/delta-io/delta-kernel-rs/pull/2390/files/d943ca2945da1ec9c354db57c889d93456381dee..40e3a1ff9e2915b456d0f0a76a50eb67f27a1ab1)]
-
[stack/alter-table-7-rename-column](#2391)
[[Files
changed](https://github.com/delta-io/delta-kernel-rs/pull/2391/files/40e3a1ff9e2915b456d0f0a76a50eb67f27a1ab1..628af3c02767f008a4e3c80a146aec7e5a3ac0b3)]

---------
## What changes are proposed in this pull request?

Splits Transaction's snapshot into two concerns:
- `read_snapshot_opt: Option<SnapshotRef>` -- the pre-commit table state
(None for CREATE TABLE)
- `effective_table_config: TableConfiguration` -- the config this commit
will produce

This separates "what did I read?" (conflict detection, post-commit
snapshots) from "what will
this commit produce?" (schema, protocol, stats, write context).
Write-path call sites read from
`effective_table_config`; read-path call sites use `read_snapshot()`.

Also adds `should_emit_protocol` / `should_emit_metadata` flags to
replace the old
`is_create_table()` checks for Protocol/Metadata action emission, and
replaces the synthetic
pre-commit snapshot in CREATE TABLE with direct `TableConfiguration`
construction.

This is a pure refactor with no behaviour change.

## How was this change tested?

All existing tests pass. Added unit tests for
`LogSegment::new_for_version_zero` (valid input,
non-zero version rejection, non-commit file rejection).
@william-ch-databricks william-ch-databricks force-pushed the stack/alter-table-2-supports-data-files branch from 50130c3 to 25e133d Compare April 17, 2026 21:59
@william-ch-databricks william-ch-databricks force-pushed the stack/alter-table-2-supports-data-files branch from 25e133d to 3c9a347 Compare April 22, 2026 18:38
Introduce a SupportsDataFiles marker trait implemented by ExistingTable
and CreateTable (but not the future AlterTable). Move public data file
methods (add_files_schema, stats_schema, stats_columns,
get_write_context, add_files) behind this trait bound.

Internal generate_* methods remain on impl<S> since they handle empty
data gracefully and are called from the shared commit path.

This enables compile-time prevention of data file operations on
metadata-only transaction types like AlterTable.
@william-ch-databricks william-ch-databricks force-pushed the stack/alter-table-2-supports-data-files branch from 3c9a347 to a71c8b8 Compare April 23, 2026 19:11
@scottsand-db scottsand-db enabled auto-merge April 23, 2026 19:42
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 23, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 88.40%. Comparing base (453d8c7) to head (a71c8b8).
⚠️ Report is 5 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #2386   +/-   ##
=======================================
  Coverage   88.40%   88.40%           
=======================================
  Files         175      175           
  Lines       57753    57753           
  Branches    57753    57753           
=======================================
  Hits        51059    51059           
  Misses       4750     4750           
  Partials     1944     1944           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@scottsand-db scottsand-db added this pull request to the merge queue Apr 23, 2026
Merged via the queue into delta-io:main with commit 4d52261 Apr 23, 2026
24 of 25 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants