test: migrate a test to use the create table builder by sanujbasu · Pull Request #2482 · delta-io/delta-kernel-rs

sanujbasu · 2026-04-28T05:06:29Z

What changes are proposed in this pull request?

Moves test_materialized_partition_columns_excluded_from_stats from kernel/tests/write/append.rs to its proper topic module kernel/tests/write/partitioned.rs, and switches its setup from the test_utils::create_table JSON helper (which hard-codes writerFeatures = ["materializePartitionColumns"]) to the kernel create_table builder using delta.feature.materializePartitionColumns=supported.

The migration removes the hand-rolled protocol from the test setup so the same path that connectors will use to enable the feature is exercised end-to-end. Test assertions are unchanged.

How was this change tested?

1/ Migrated test partitioned::test_materialized_partition_columns_excluded_from_stats passes on the new builder-based setup; assertions are unchanged.

2/ Local verification:

cargo +nightly fmt --check
cargo clippy --workspace --benches --tests --all-features -- -D warnings
cargo doc --workspace --all-features --no-deps
cargo test -p delta_kernel --all-features for affected paths

codecov · 2026-04-28T05:11:26Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 88.43%. Comparing base (a13ef67) to head (a97b11b).
⚠️ Report is 2 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #2482      +/-   ##
==========================================
- Coverage   88.43%   88.43%   -0.01%     
==========================================
  Files         178      178              
  Lines       58417    58566     +149     
  Branches    58417    58566     +149     
==========================================
+ Hits        51660    51791     +131     
- Misses       4787     4788       +1     
- Partials     1970     1987      +17

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

scottsand-db

LGTM but left minor feedback on two code comments

… to kernel builder ## What changes are proposed in this pull request? Moves test_materialized_partition_columns_excluded_from_stats from kernel/tests/write/append.rs to its proper topic module kernel/tests/write/partitioned.rs, and switches its setup from the test_utils::create_table JSON helper (which hard-codes writerFeatures = ["materializePartitionColumns"]) to the kernel create_table builder using delta.feature.materializePartitionColumns=supported. The migration removes the hand-rolled protocol from the test setup so the same path that connectors will use to enable the feature is exercised end-to-end. Test assertions are unchanged. ## How was this change tested? 1/ Migrated test partitioned::test_materialized_partition_columns_excluded_from_stats passes on the new builder-based setup; assertions are unchanged. 2/ Local verification: - cargo +nightly fmt --check - cargo clippy --workspace --benches --tests --all-features -- -D warnings - cargo doc --workspace --all-features --no-deps - cargo test -p delta_kernel --all-features for affected paths

sanujbasu · 2026-04-29T00:54:05Z

Range-diff: stack/materialize-partition-columns-create-table (32eead5 -> a97b11b)

kernel/tests/integration/write/append.rs

@@ -0,0 +1,111 @@
+diff --git a/kernel/tests/integration/write/append.rs b/kernel/tests/integration/write/append.rs
+--- a/kernel/tests/integration/write/append.rs
++++ b/kernel/tests/integration/write/append.rs
+ use delta_kernel::{DeltaResult, Error as KernelError, Snapshot};
+ use itertools::Itertools;
+ use serde_json::{json, Deserializer};
+-use test_utils::{create_table, engine_store_setup, set_json_value, setup_test_tables, test_read};
++use test_utils::{set_json_value, setup_test_tables, test_read};
+ 
+ use crate::common::write_utils::{
+     check_action_timestamps, get_and_check_all_parquet_sizes, get_simple_int_schema,
+     Ok(())
+ }
+ 
+-// Verify that materialized partition columns do not get stats collected. The partition column
+-// is physically present in the parquet file, but its stats should be omitted because the
+-// value is already in the Add action's `partitionValues`.
+-#[tokio::test]
+-async fn test_materialized_partition_columns_excluded_from_stats(
+-) -> Result<(), Box<dyn std::error::Error>> {
+-    let _ = tracing_subscriber::fmt::try_init();
+-
+-    let partition_col = "partition";
+-    let table_schema = Arc::new(StructType::try_new(vec![
+-        StructField::nullable("number", DataType::INTEGER),
+-        StructField::nullable("partition", DataType::STRING),
+-    ])?);
+-
+-    // Create a table with materializePartitionColumns writer feature
+-    let (store, engine, table_location) = engine_store_setup("test_mat_part", None);
+-    let table_url = create_table(
+-        store.clone(),
+-        table_location,
+-        table_schema.clone(),
+-        &[partition_col],
+-        true, // use_37_protocol for writer features
+-        vec![],
+-        vec!["materializePartitionColumns"],
+-    )
+-    .await?;
+-
+-    let engine = Arc::new(engine);
+-    let snapshot = Snapshot::builder_for(table_url.clone()).build(engine.as_ref())?;
+-    let mut txn = snapshot
+-        .transaction(Box::new(FileSystemCommitter::new()), engine.as_ref())?
+-        .with_engine_info("default engine");
+-
+-    // With materializePartitionColumns, the data batch includes the partition column
+-    let arrow_schema = Arc::new(table_schema.as_ref().try_into_arrow()?);
+-    let batch = RecordBatch::try_new(
+-        arrow_schema,
+-        vec![
+-            Arc::new(Int32Array::from(vec![1, 2, 3])),
+-            Arc::new(StringArray::from(vec!["a", "a", "a"])),
+-        ],
+-    )?;
+-    let data = Box::new(ArrowEngineData::new(batch));
+-
+-    let write_context = txn.partitioned_write_context(HashMap::from([(
+-        partition_col.to_string(),
+-        Scalar::String("a".into()),
+-    )]))?;
+-    let result = engine.write_parquet(&data, &write_context).await?;
+-    txn.add_files(result);
+-    assert!(txn.commit(engine.as_ref())?.is_committed());
+-
+-    // Read the commit log and verify stats
+-    let commit = store
+-        .get(&Path::from(
+-            "/test_mat_part/_delta_log/00000000000000000001.json",
+-        ))
+-        .await?;
+-    let parsed: Vec<serde_json::Value> = Deserializer::from_slice(&commit.bytes().await?)
+-        .into_iter::<serde_json::Value>()
+-        .try_collect()?;
+-
+-    let add = parsed
+-        .iter()
+-        .find(|v| v.get("add").is_some())
+-        .expect("should have an add action");
+-    let stats: serde_json::Value =
+-        serde_json::from_str(add["add"]["stats"].as_str().unwrap()).unwrap();
+-
+-    // Stats should contain the data column but NOT the partition column
+-    assert!(
+-        stats["minValues"].get("number").is_some(),
+-        "Data column 'number' should have minValues"
+-    );
+-    assert!(
+-        stats["maxValues"].get("number").is_some(),
+-        "Data column 'number' should have maxValues"
+-    );
+-    assert!(
+-        stats["minValues"].get("partition").is_none(),
+-        "Partition column should not have minValues even when materialized"
+-    );
+-    assert!(
+-        stats["maxValues"].get("partition").is_none(),
+-        "Partition column should not have maxValues even when materialized"
+-    );
+-    assert!(
+-        stats["nullCount"].get("partition").is_none(),
+-        "Partition column should not have nullCount even when materialized"
+-    );
+-
+-    Ok(())
+-}
+-
+ #[tokio::test]
+ async fn test_append_invalid_schema() -> Result<(), Box<dyn std::error::Error>> {
+     // setup tracing
\ No newline at end of file

kernel/tests/integration/write/partitioned.rs

@@ -0,0 +1,94 @@
+diff --git a/kernel/tests/integration/write/partitioned.rs b/kernel/tests/integration/write/partitioned.rs
+--- a/kernel/tests/integration/write/partitioned.rs
++++ b/kernel/tests/integration/write/partitioned.rs
+ use delta_kernel::arrow::datatypes::Schema as ArrowSchema;
+ use delta_kernel::committer::FileSystemCommitter;
+ use delta_kernel::engine::arrow_conversion::TryIntoArrow as _;
++use delta_kernel::engine::arrow_data::ArrowEngineData;
+ use delta_kernel::expressions::Scalar;
+ use delta_kernel::schema::{DataType, StructField, StructType};
+ use delta_kernel::table_features::ColumnMappingMode;
+     );
+     Ok((add, rel_path))
+ }
++
++// ==============================================================================
++// materializePartitionColumns tests
++// ==============================================================================
++
++/// Materialized partition columns are physically present in the parquet file, but their
++/// stats must be omitted from the Add action because the value is already in
++/// `partitionValues`. Verifies that contract end-to-end through the kernel `create_table`
++/// builder.
++#[tokio::test(flavor = "multi_thread")]
++async fn test_materialized_partition_columns_excluded_from_stats(
++) -> Result<(), Box<dyn std::error::Error>> {
++    let _ = tracing_subscriber::fmt::try_init();
++
++    let partition_col = "partition";
++    let table_schema = Arc::new(StructType::try_new(vec![
++        StructField::nullable("number", DataType::INTEGER),
++        StructField::nullable(partition_col, DataType::STRING),
++    ])?);
++
++    let (_tmp_dir, table_path, engine) = test_table_setup_mt()?;
++    let _ = create_table(&table_path, table_schema.clone(), "test/1.0")
++        .with_data_layout(DataLayout::partitioned([partition_col]))
++        .with_table_properties([("delta.feature.materializePartitionColumns", "supported")])
++        .build(engine.as_ref(), Box::new(FileSystemCommitter::new()))?
++        .commit(engine.as_ref())?;
++
++    let snapshot = Snapshot::builder_for(&table_path).build(engine.as_ref())?;
++    let mut txn = snapshot
++        .transaction(Box::new(FileSystemCommitter::new()), engine.as_ref())?
++        .with_engine_info("default engine");
++
++    // Build the input logical batch with all schema columns, including the partition column.
++    // What ends up in the parquet file is determined by `materializePartitionColumns` later
++    // in the pipeline (`Transaction::generate_logical_to_physical` skips the partition-column
++    // drop when the feature is on); the input batch shape itself is unaffected.
++    let arrow_schema = Arc::new(table_schema.as_ref().try_into_arrow()?);
++    let batch = RecordBatch::try_new(
++        arrow_schema,
++        vec![
++            Arc::new(Int32Array::from(vec![1, 2, 3])),
++            Arc::new(StringArray::from(vec!["a", "a", "a"])),
++        ],
++    )?;
++    let data = Box::new(ArrowEngineData::new(batch));
++
++    let write_context = txn.partitioned_write_context(HashMap::from([(
++        partition_col.to_string(),
++        Scalar::String("a".into()),
++    )]))?;
++    let result = engine.write_parquet(&data, &write_context).await?;
++    txn.add_files(result);
++    assert!(txn.commit(engine.as_ref())?.is_committed());
++
++    let (add, _) = read_single_add(&table_path, 1)?;
++    let stats: serde_json::Value = serde_json::from_str(add["stats"].as_str().unwrap()).unwrap();
++
++    // Stats should contain the data column but NOT the partition column.
++    assert!(
++        stats["minValues"].get("number").is_some(),
++        "data column 'number' should have minValues"
++    );
++    assert!(
++        stats["maxValues"].get("number").is_some(),
++        "data column 'number' should have maxValues"
++    );
++    assert!(
++        stats["minValues"].get(partition_col).is_none(),
++        "partition column should not have minValues even when materialized"
++    );
++    assert!(
++        stats["maxValues"].get(partition_col).is_none(),
++        "partition column should not have maxValues even when materialized"
++    );
++    assert!(
++        stats["nullCount"].get(partition_col).is_none(),
++        "partition column should not have nullCount even when materialized"
++    );
++
++    Ok(())
++}
\ No newline at end of file

kernel/tests/write/append.rs

@@ -1,111 +0,0 @@
-diff --git a/kernel/tests/write/append.rs b/kernel/tests/write/append.rs
---- a/kernel/tests/write/append.rs
-+++ b/kernel/tests/write/append.rs
- use delta_kernel::{DeltaResult, Error as KernelError, Snapshot};
- use itertools::Itertools;
- use serde_json::{json, Deserializer};
--use test_utils::{create_table, engine_store_setup, set_json_value, setup_test_tables, test_read};
-+use test_utils::{set_json_value, setup_test_tables, test_read};
- 
- use crate::common::write_utils::{
-     check_action_timestamps, get_and_check_all_parquet_sizes, get_simple_int_schema,
-     Ok(())
- }
- 
--// Verify that materialized partition columns do not get stats collected. The partition column
--// is physically present in the parquet file, but its stats should be omitted because the
--// value is already in the Add action's `partitionValues`.
--#[tokio::test]
--async fn test_materialized_partition_columns_excluded_from_stats(
--) -> Result<(), Box<dyn std::error::Error>> {
--    let _ = tracing_subscriber::fmt::try_init();
--
--    let partition_col = "partition";
--    let table_schema = Arc::new(StructType::try_new(vec![
--        StructField::nullable("number", DataType::INTEGER),
--        StructField::nullable("partition", DataType::STRING),
--    ])?);
--
--    // Create a table with materializePartitionColumns writer feature
--    let (store, engine, table_location) = engine_store_setup("test_mat_part", None);
--    let table_url = create_table(
--        store.clone(),
--        table_location,
--        table_schema.clone(),
--        &[partition_col],
--        true, // use_37_protocol for writer features
--        vec![],
--        vec!["materializePartitionColumns"],
--    )
--    .await?;
--
--    let engine = Arc::new(engine);
--    let snapshot = Snapshot::builder_for(table_url.clone()).build(engine.as_ref())?;
--    let mut txn = snapshot
--        .transaction(Box::new(FileSystemCommitter::new()), engine.as_ref())?
--        .with_engine_info("default engine");
--
--    // With materializePartitionColumns, the data batch includes the partition column
--    let arrow_schema = Arc::new(table_schema.as_ref().try_into_arrow()?);
--    let batch = RecordBatch::try_new(
--        arrow_schema,
--        vec![
--            Arc::new(Int32Array::from(vec![1, 2, 3])),
--            Arc::new(StringArray::from(vec!["a", "a", "a"])),
--        ],
--    )?;
--    let data = Box::new(ArrowEngineData::new(batch));
--
--    let write_context = txn.partitioned_write_context(HashMap::from([(
--        partition_col.to_string(),
--        Scalar::String("a".into()),
--    )]))?;
--    let result = engine.write_parquet(&data, &write_context).await?;
--    txn.add_files(result);
--    assert!(txn.commit(engine.as_ref())?.is_committed());
--
--    // Read the commit log and verify stats
--    let commit = store
--        .get(&Path::from(
--            "/test_mat_part/_delta_log/00000000000000000001.json",
--        ))
--        .await?;
--    let parsed: Vec<serde_json::Value> = Deserializer::from_slice(&commit.bytes().await?)
--        .into_iter::<serde_json::Value>()
--        .try_collect()?;
--
--    let add = parsed
--        .iter()
--        .find(|v| v.get("add").is_some())
--        .expect("should have an add action");
--    let stats: serde_json::Value =
--        serde_json::from_str(add["add"]["stats"].as_str().unwrap()).unwrap();
--
--    // Stats should contain the data column but NOT the partition column
--    assert!(
--        stats["minValues"].get("number").is_some(),
--        "Data column 'number' should have minValues"
--    );
--    assert!(
--        stats["maxValues"].get("number").is_some(),
--        "Data column 'number' should have maxValues"
--    );
--    assert!(
--        stats["minValues"].get("partition").is_none(),
--        "Partition column should not have minValues even when materialized"
--    );
--    assert!(
--        stats["maxValues"].get("partition").is_none(),
--        "Partition column should not have maxValues even when materialized"
--    );
--    assert!(
--        stats["nullCount"].get("partition").is_none(),
--        "Partition column should not have nullCount even when materialized"
--    );
--
--    Ok(())
--}
--
- #[tokio::test]
- async fn test_append_invalid_schema() -> Result<(), Box<dyn std::error::Error>> {
-     // setup tracing
\ No newline at end of file

kernel/tests/write/partitioned.rs

@@ -1,91 +0,0 @@
-diff --git a/kernel/tests/write/partitioned.rs b/kernel/tests/write/partitioned.rs
---- a/kernel/tests/write/partitioned.rs
-+++ b/kernel/tests/write/partitioned.rs
- use delta_kernel::arrow::datatypes::Schema as ArrowSchema;
- use delta_kernel::committer::FileSystemCommitter;
- use delta_kernel::engine::arrow_conversion::TryIntoArrow as _;
-+use delta_kernel::engine::arrow_data::ArrowEngineData;
- use delta_kernel::expressions::Scalar;
- use delta_kernel::schema::{DataType, StructField, StructType};
- use delta_kernel::table_features::ColumnMappingMode;
-     );
-     Ok((add, rel_path))
- }
-+
-+// ==============================================================================
-+// materializePartitionColumns tests
-+// ==============================================================================
-+
-+/// Materialized partition columns are physically present in the parquet file, but their
-+/// stats must be omitted from the Add action because the value is already in
-+/// `partitionValues`. Pins the stat-collection contract for materialized partition tables
-+/// end-to-end through the kernel `create_table` builder.
-+#[tokio::test(flavor = "multi_thread")]
-+async fn test_materialized_partition_columns_excluded_from_stats(
-+) -> Result<(), Box<dyn std::error::Error>> {
-+    let _ = tracing_subscriber::fmt::try_init();
-+
-+    let partition_col = "partition";
-+    let table_schema = Arc::new(StructType::try_new(vec![
-+        StructField::nullable("number", DataType::INTEGER),
-+        StructField::nullable(partition_col, DataType::STRING),
-+    ])?);
-+
-+    let (_tmp_dir, table_path, engine) = test_table_setup_mt()?;
-+    let _ = create_table(&table_path, table_schema.clone(), "test/1.0")
-+        .with_data_layout(DataLayout::partitioned([partition_col]))
-+        .with_table_properties([("delta.feature.materializePartitionColumns", "supported")])
-+        .build(engine.as_ref(), Box::new(FileSystemCommitter::new()))?
-+        .commit(engine.as_ref())?;
-+
-+    let snapshot = Snapshot::builder_for(&table_path).build(engine.as_ref())?;
-+    let mut txn = snapshot
-+        .transaction(Box::new(FileSystemCommitter::new()), engine.as_ref())?
-+        .with_engine_info("default engine");
-+
-+    // With materializePartitionColumns, the data batch retains the partition column.
-+    let arrow_schema = Arc::new(table_schema.as_ref().try_into_arrow()?);
-+    let batch = RecordBatch::try_new(
-+        arrow_schema,
-+        vec![
-+            Arc::new(Int32Array::from(vec![1, 2, 3])),
-+            Arc::new(StringArray::from(vec!["a", "a", "a"])),
-+        ],
-+    )?;
-+    let data = Box::new(ArrowEngineData::new(batch));
-+
-+    let write_context = txn.partitioned_write_context(HashMap::from([(
-+        partition_col.to_string(),
-+        Scalar::String("a".into()),
-+    )]))?;
-+    let result = engine.write_parquet(&data, &write_context).await?;
-+    txn.add_files(result);
-+    assert!(txn.commit(engine.as_ref())?.is_committed());
-+
-+    let (add, _) = read_single_add(&table_path, 1)?;
-+    let stats: serde_json::Value = serde_json::from_str(add["stats"].as_str().unwrap()).unwrap();
-+
-+    // Stats should contain the data column but NOT the partition column.
-+    assert!(
-+        stats["minValues"].get("number").is_some(),
-+        "data column 'number' should have minValues"
-+    );
-+    assert!(
-+        stats["maxValues"].get("number").is_some(),
-+        "data column 'number' should have maxValues"
-+    );
-+    assert!(
-+        stats["minValues"].get(partition_col).is_none(),
-+        "partition column should not have minValues even when materialized"
-+    );
-+    assert!(
-+        stats["maxValues"].get(partition_col).is_none(),
-+        "partition column should not have maxValues even when materialized"
-+    );
-+    assert!(
-+        stats["nullCount"].get(partition_col).is_none(),
-+        "partition column should not have nullCount even when materialized"
-+    );
-+
-+    Ok(())
-+}
\ No newline at end of file

_{Reproduce locally: git range-diff 66b5a42..32eead5 a13ef67..a97b11b | Disable: git config gitstack.push-range-diff false}

github-actions Bot assigned sanujbasu Apr 28, 2026

This was referenced Apr 28, 2026

feat: enforce and test non-null column invariant on writes #2476

Closed

feat: allow materializePartitionColumns feature signal in CREATE TABLE #2481

Merged

sanujbasu mentioned this pull request Apr 28, 2026

feat: enforce and test non-null column invariant on writes #2483

Merged

sanujbasu force-pushed the stack/materialize-partition-columns-cleanup branch from 8136d72 to eef3a35 Compare April 28, 2026 05:56

sanujbasu requested review from rliao147 and scottsand-db April 28, 2026 05:58

sanujbasu changed the title ~~test: migrate test_materialized_partition_columns_excluded_from_stats to kernel builder~~ test: migrate a test to use the create table builder Apr 28, 2026

sanujbasu force-pushed the stack/materialize-partition-columns-cleanup branch from eef3a35 to 348e8bb Compare April 28, 2026 06:20

scottsand-db approved these changes Apr 28, 2026

View reviewed changes

Comment thread kernel/tests/write/partitioned.rs Outdated

Comment thread kernel/tests/write/partitioned.rs Outdated

sanujbasu force-pushed the stack/materialize-partition-columns-cleanup branch 2 times, most recently from 1d0621a to 32eead5 Compare April 28, 2026 18:25

rliao147 approved these changes Apr 28, 2026

View reviewed changes

sanujbasu force-pushed the stack/materialize-partition-columns-cleanup branch from 32eead5 to a97b11b Compare April 29, 2026 00:54

sanujbasu enabled auto-merge April 29, 2026 01:02

sanujbasu added this pull request to the merge queue Apr 29, 2026

Merged via the queue into delta-io:main with commit bc78961 Apr 29, 2026
24 of 25 checks passed

sanujbasu deleted the stack/materialize-partition-columns-cleanup branch April 29, 2026 01:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test: migrate a test to use the create table builder#2482

test: migrate a test to use the create table builder#2482
sanujbasu merged 1 commit intodelta-io:mainfrom
sanujbasu:stack/materialize-partition-columns-cleanup

sanujbasu commented Apr 28, 2026 •

edited

Loading

Uh oh!

codecov Bot commented Apr 28, 2026 •

edited

Loading

Uh oh!

scottsand-db left a comment

Uh oh!

Uh oh!

Uh oh!

sanujbasu commented Apr 29, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

sanujbasu commented Apr 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes are proposed in this pull request?

How was this change tested?

Uh oh!

codecov Bot commented Apr 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

scottsand-db left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

sanujbasu commented Apr 29, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

sanujbasu commented Apr 28, 2026 •

edited

Loading

codecov Bot commented Apr 28, 2026 •

edited

Loading