Skip to content

test: migrate a test to use the create table builder#2482

Merged
sanujbasu merged 1 commit intodelta-io:mainfrom
sanujbasu:stack/materialize-partition-columns-cleanup
Apr 29, 2026
Merged

test: migrate a test to use the create table builder#2482
sanujbasu merged 1 commit intodelta-io:mainfrom
sanujbasu:stack/materialize-partition-columns-cleanup

Conversation

@sanujbasu
Copy link
Copy Markdown
Collaborator

@sanujbasu sanujbasu commented Apr 28, 2026

What changes are proposed in this pull request?

Moves test_materialized_partition_columns_excluded_from_stats from kernel/tests/write/append.rs to its proper topic module kernel/tests/write/partitioned.rs, and switches its setup from the test_utils::create_table JSON helper (which hard-codes writerFeatures = ["materializePartitionColumns"]) to the kernel create_table builder using delta.feature.materializePartitionColumns=supported.

The migration removes the hand-rolled protocol from the test setup so the same path that connectors will use to enable the feature is exercised end-to-end. Test assertions are unchanged.

How was this change tested?

1/ Migrated test partitioned::test_materialized_partition_columns_excluded_from_stats passes on the new builder-based setup; assertions are unchanged.

2/ Local verification:

  • cargo +nightly fmt --check
  • cargo clippy --workspace --benches --tests --all-features -- -D warnings
  • cargo doc --workspace --all-features --no-deps
  • cargo test -p delta_kernel --all-features for affected paths

@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 28, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 88.43%. Comparing base (a13ef67) to head (a97b11b).
⚠️ Report is 2 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2482      +/-   ##
==========================================
- Coverage   88.43%   88.43%   -0.01%     
==========================================
  Files         178      178              
  Lines       58417    58566     +149     
  Branches    58417    58566     +149     
==========================================
+ Hits        51660    51791     +131     
- Misses       4787     4788       +1     
- Partials     1970     1987      +17     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@sanujbasu sanujbasu force-pushed the stack/materialize-partition-columns-cleanup branch from 8136d72 to eef3a35 Compare April 28, 2026 05:56
@sanujbasu sanujbasu changed the title test: migrate test_materialized_partition_columns_excluded_from_stats to kernel builder test: migrate a test to use the create table builder Apr 28, 2026
@sanujbasu sanujbasu force-pushed the stack/materialize-partition-columns-cleanup branch from eef3a35 to 348e8bb Compare April 28, 2026 06:20
Copy link
Copy Markdown
Collaborator

@scottsand-db scottsand-db left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM but left minor feedback on two code comments

Comment thread kernel/tests/write/partitioned.rs Outdated
Comment thread kernel/tests/write/partitioned.rs Outdated
@sanujbasu sanujbasu force-pushed the stack/materialize-partition-columns-cleanup branch 2 times, most recently from 1d0621a to 32eead5 Compare April 28, 2026 18:25
… to kernel

builder

## What changes are proposed in this pull request?

Moves test_materialized_partition_columns_excluded_from_stats from
kernel/tests/write/append.rs to its proper topic module
kernel/tests/write/partitioned.rs, and switches its setup from the
test_utils::create_table JSON helper (which hard-codes writerFeatures =
["materializePartitionColumns"]) to the kernel create_table builder using
delta.feature.materializePartitionColumns=supported.

The migration removes the hand-rolled protocol from the test setup so the same
path that connectors will use to enable the feature is exercised end-to-end.
Test assertions are unchanged.

## How was this change tested?

1/ Migrated test
   partitioned::test_materialized_partition_columns_excluded_from_stats passes
   on the new builder-based setup; assertions are unchanged.

2/ Local verification:

  - cargo +nightly fmt --check
  - cargo clippy --workspace --benches --tests --all-features -- -D warnings
  - cargo doc --workspace --all-features --no-deps
  - cargo test -p delta_kernel --all-features for affected paths
@sanujbasu
Copy link
Copy Markdown
Collaborator Author

Range-diff: stack/materialize-partition-columns-create-table (32eead5 -> a97b11b)
kernel/tests/integration/write/append.rs
@@ -0,0 +1,111 @@
+diff --git a/kernel/tests/integration/write/append.rs b/kernel/tests/integration/write/append.rs
+--- a/kernel/tests/integration/write/append.rs
++++ b/kernel/tests/integration/write/append.rs
+ use delta_kernel::{DeltaResult, Error as KernelError, Snapshot};
+ use itertools::Itertools;
+ use serde_json::{json, Deserializer};
+-use test_utils::{create_table, engine_store_setup, set_json_value, setup_test_tables, test_read};
++use test_utils::{set_json_value, setup_test_tables, test_read};
+ 
+ use crate::common::write_utils::{
+     check_action_timestamps, get_and_check_all_parquet_sizes, get_simple_int_schema,
+     Ok(())
+ }
+ 
+-// Verify that materialized partition columns do not get stats collected. The partition column
+-// is physically present in the parquet file, but its stats should be omitted because the
+-// value is already in the Add action's `partitionValues`.
+-#[tokio::test]
+-async fn test_materialized_partition_columns_excluded_from_stats(
+-) -> Result<(), Box<dyn std::error::Error>> {
+-    let _ = tracing_subscriber::fmt::try_init();
+-
+-    let partition_col = "partition";
+-    let table_schema = Arc::new(StructType::try_new(vec![
+-        StructField::nullable("number", DataType::INTEGER),
+-        StructField::nullable("partition", DataType::STRING),
+-    ])?);
+-
+-    // Create a table with materializePartitionColumns writer feature
+-    let (store, engine, table_location) = engine_store_setup("test_mat_part", None);
+-    let table_url = create_table(
+-        store.clone(),
+-        table_location,
+-        table_schema.clone(),
+-        &[partition_col],
+-        true, // use_37_protocol for writer features
+-        vec![],
+-        vec!["materializePartitionColumns"],
+-    )
+-    .await?;
+-
+-    let engine = Arc::new(engine);
+-    let snapshot = Snapshot::builder_for(table_url.clone()).build(engine.as_ref())?;
+-    let mut txn = snapshot
+-        .transaction(Box::new(FileSystemCommitter::new()), engine.as_ref())?
+-        .with_engine_info("default engine");
+-
+-    // With materializePartitionColumns, the data batch includes the partition column
+-    let arrow_schema = Arc::new(table_schema.as_ref().try_into_arrow()?);
+-    let batch = RecordBatch::try_new(
+-        arrow_schema,
+-        vec![
+-            Arc::new(Int32Array::from(vec![1, 2, 3])),
+-            Arc::new(StringArray::from(vec!["a", "a", "a"])),
+-        ],
+-    )?;
+-    let data = Box::new(ArrowEngineData::new(batch));
+-
+-    let write_context = txn.partitioned_write_context(HashMap::from([(
+-        partition_col.to_string(),
+-        Scalar::String("a".into()),
+-    )]))?;
+-    let result = engine.write_parquet(&data, &write_context).await?;
+-    txn.add_files(result);
+-    assert!(txn.commit(engine.as_ref())?.is_committed());
+-
+-    // Read the commit log and verify stats
+-    let commit = store
+-        .get(&Path::from(
+-            "/test_mat_part/_delta_log/00000000000000000001.json",
+-        ))
+-        .await?;
+-    let parsed: Vec<serde_json::Value> = Deserializer::from_slice(&commit.bytes().await?)
+-        .into_iter::<serde_json::Value>()
+-        .try_collect()?;
+-
+-    let add = parsed
+-        .iter()
+-        .find(|v| v.get("add").is_some())
+-        .expect("should have an add action");
+-    let stats: serde_json::Value =
+-        serde_json::from_str(add["add"]["stats"].as_str().unwrap()).unwrap();
+-
+-    // Stats should contain the data column but NOT the partition column
+-    assert!(
+-        stats["minValues"].get("number").is_some(),
+-        "Data column 'number' should have minValues"
+-    );
+-    assert!(
+-        stats["maxValues"].get("number").is_some(),
+-        "Data column 'number' should have maxValues"
+-    );
+-    assert!(
+-        stats["minValues"].get("partition").is_none(),
+-        "Partition column should not have minValues even when materialized"
+-    );
+-    assert!(
+-        stats["maxValues"].get("partition").is_none(),
+-        "Partition column should not have maxValues even when materialized"
+-    );
+-    assert!(
+-        stats["nullCount"].get("partition").is_none(),
+-        "Partition column should not have nullCount even when materialized"
+-    );
+-
+-    Ok(())
+-}
+-
+ #[tokio::test]
+ async fn test_append_invalid_schema() -> Result<(), Box<dyn std::error::Error>> {
+     // setup tracing
\ No newline at end of file
kernel/tests/integration/write/partitioned.rs
@@ -0,0 +1,94 @@
+diff --git a/kernel/tests/integration/write/partitioned.rs b/kernel/tests/integration/write/partitioned.rs
+--- a/kernel/tests/integration/write/partitioned.rs
++++ b/kernel/tests/integration/write/partitioned.rs
+ use delta_kernel::arrow::datatypes::Schema as ArrowSchema;
+ use delta_kernel::committer::FileSystemCommitter;
+ use delta_kernel::engine::arrow_conversion::TryIntoArrow as _;
++use delta_kernel::engine::arrow_data::ArrowEngineData;
+ use delta_kernel::expressions::Scalar;
+ use delta_kernel::schema::{DataType, StructField, StructType};
+ use delta_kernel::table_features::ColumnMappingMode;
+     );
+     Ok((add, rel_path))
+ }
++
++// ==============================================================================
++// materializePartitionColumns tests
++// ==============================================================================
++
++/// Materialized partition columns are physically present in the parquet file, but their
++/// stats must be omitted from the Add action because the value is already in
++/// `partitionValues`. Verifies that contract end-to-end through the kernel `create_table`
++/// builder.
++#[tokio::test(flavor = "multi_thread")]
++async fn test_materialized_partition_columns_excluded_from_stats(
++) -> Result<(), Box<dyn std::error::Error>> {
++    let _ = tracing_subscriber::fmt::try_init();
++
++    let partition_col = "partition";
++    let table_schema = Arc::new(StructType::try_new(vec![
++        StructField::nullable("number", DataType::INTEGER),
++        StructField::nullable(partition_col, DataType::STRING),
++    ])?);
++
++    let (_tmp_dir, table_path, engine) = test_table_setup_mt()?;
++    let _ = create_table(&table_path, table_schema.clone(), "test/1.0")
++        .with_data_layout(DataLayout::partitioned([partition_col]))
++        .with_table_properties([("delta.feature.materializePartitionColumns", "supported")])
++        .build(engine.as_ref(), Box::new(FileSystemCommitter::new()))?
++        .commit(engine.as_ref())?;
++
++    let snapshot = Snapshot::builder_for(&table_path).build(engine.as_ref())?;
++    let mut txn = snapshot
++        .transaction(Box::new(FileSystemCommitter::new()), engine.as_ref())?
++        .with_engine_info("default engine");
++
++    // Build the input logical batch with all schema columns, including the partition column.
++    // What ends up in the parquet file is determined by `materializePartitionColumns` later
++    // in the pipeline (`Transaction::generate_logical_to_physical` skips the partition-column
++    // drop when the feature is on); the input batch shape itself is unaffected.
++    let arrow_schema = Arc::new(table_schema.as_ref().try_into_arrow()?);
++    let batch = RecordBatch::try_new(
++        arrow_schema,
++        vec![
++            Arc::new(Int32Array::from(vec![1, 2, 3])),
++            Arc::new(StringArray::from(vec!["a", "a", "a"])),
++        ],
++    )?;
++    let data = Box::new(ArrowEngineData::new(batch));
++
++    let write_context = txn.partitioned_write_context(HashMap::from([(
++        partition_col.to_string(),
++        Scalar::String("a".into()),
++    )]))?;
++    let result = engine.write_parquet(&data, &write_context).await?;
++    txn.add_files(result);
++    assert!(txn.commit(engine.as_ref())?.is_committed());
++
++    let (add, _) = read_single_add(&table_path, 1)?;
++    let stats: serde_json::Value = serde_json::from_str(add["stats"].as_str().unwrap()).unwrap();
++
++    // Stats should contain the data column but NOT the partition column.
++    assert!(
++        stats["minValues"].get("number").is_some(),
++        "data column 'number' should have minValues"
++    );
++    assert!(
++        stats["maxValues"].get("number").is_some(),
++        "data column 'number' should have maxValues"
++    );
++    assert!(
++        stats["minValues"].get(partition_col).is_none(),
++        "partition column should not have minValues even when materialized"
++    );
++    assert!(
++        stats["maxValues"].get(partition_col).is_none(),
++        "partition column should not have maxValues even when materialized"
++    );
++    assert!(
++        stats["nullCount"].get(partition_col).is_none(),
++        "partition column should not have nullCount even when materialized"
++    );
++
++    Ok(())
++}
\ No newline at end of file
kernel/tests/write/append.rs
@@ -1,111 +0,0 @@
-diff --git a/kernel/tests/write/append.rs b/kernel/tests/write/append.rs
---- a/kernel/tests/write/append.rs
-+++ b/kernel/tests/write/append.rs
- use delta_kernel::{DeltaResult, Error as KernelError, Snapshot};
- use itertools::Itertools;
- use serde_json::{json, Deserializer};
--use test_utils::{create_table, engine_store_setup, set_json_value, setup_test_tables, test_read};
-+use test_utils::{set_json_value, setup_test_tables, test_read};
- 
- use crate::common::write_utils::{
-     check_action_timestamps, get_and_check_all_parquet_sizes, get_simple_int_schema,
-     Ok(())
- }
- 
--// Verify that materialized partition columns do not get stats collected. The partition column
--// is physically present in the parquet file, but its stats should be omitted because the
--// value is already in the Add action's `partitionValues`.
--#[tokio::test]
--async fn test_materialized_partition_columns_excluded_from_stats(
--) -> Result<(), Box<dyn std::error::Error>> {
--    let _ = tracing_subscriber::fmt::try_init();
--
--    let partition_col = "partition";
--    let table_schema = Arc::new(StructType::try_new(vec![
--        StructField::nullable("number", DataType::INTEGER),
--        StructField::nullable("partition", DataType::STRING),
--    ])?);
--
--    // Create a table with materializePartitionColumns writer feature
--    let (store, engine, table_location) = engine_store_setup("test_mat_part", None);
--    let table_url = create_table(
--        store.clone(),
--        table_location,
--        table_schema.clone(),
--        &[partition_col],
--        true, // use_37_protocol for writer features
--        vec![],
--        vec!["materializePartitionColumns"],
--    )
--    .await?;
--
--    let engine = Arc::new(engine);
--    let snapshot = Snapshot::builder_for(table_url.clone()).build(engine.as_ref())?;
--    let mut txn = snapshot
--        .transaction(Box::new(FileSystemCommitter::new()), engine.as_ref())?
--        .with_engine_info("default engine");
--
--    // With materializePartitionColumns, the data batch includes the partition column
--    let arrow_schema = Arc::new(table_schema.as_ref().try_into_arrow()?);
--    let batch = RecordBatch::try_new(
--        arrow_schema,
--        vec![
--            Arc::new(Int32Array::from(vec![1, 2, 3])),
--            Arc::new(StringArray::from(vec!["a", "a", "a"])),
--        ],
--    )?;
--    let data = Box::new(ArrowEngineData::new(batch));
--
--    let write_context = txn.partitioned_write_context(HashMap::from([(
--        partition_col.to_string(),
--        Scalar::String("a".into()),
--    )]))?;
--    let result = engine.write_parquet(&data, &write_context).await?;
--    txn.add_files(result);
--    assert!(txn.commit(engine.as_ref())?.is_committed());
--
--    // Read the commit log and verify stats
--    let commit = store
--        .get(&Path::from(
--            "/test_mat_part/_delta_log/00000000000000000001.json",
--        ))
--        .await?;
--    let parsed: Vec<serde_json::Value> = Deserializer::from_slice(&commit.bytes().await?)
--        .into_iter::<serde_json::Value>()
--        .try_collect()?;
--
--    let add = parsed
--        .iter()
--        .find(|v| v.get("add").is_some())
--        .expect("should have an add action");
--    let stats: serde_json::Value =
--        serde_json::from_str(add["add"]["stats"].as_str().unwrap()).unwrap();
--
--    // Stats should contain the data column but NOT the partition column
--    assert!(
--        stats["minValues"].get("number").is_some(),
--        "Data column 'number' should have minValues"
--    );
--    assert!(
--        stats["maxValues"].get("number").is_some(),
--        "Data column 'number' should have maxValues"
--    );
--    assert!(
--        stats["minValues"].get("partition").is_none(),
--        "Partition column should not have minValues even when materialized"
--    );
--    assert!(
--        stats["maxValues"].get("partition").is_none(),
--        "Partition column should not have maxValues even when materialized"
--    );
--    assert!(
--        stats["nullCount"].get("partition").is_none(),
--        "Partition column should not have nullCount even when materialized"
--    );
--
--    Ok(())
--}
--
- #[tokio::test]
- async fn test_append_invalid_schema() -> Result<(), Box<dyn std::error::Error>> {
-     // setup tracing
\ No newline at end of file
kernel/tests/write/partitioned.rs
@@ -1,91 +0,0 @@
-diff --git a/kernel/tests/write/partitioned.rs b/kernel/tests/write/partitioned.rs
---- a/kernel/tests/write/partitioned.rs
-+++ b/kernel/tests/write/partitioned.rs
- use delta_kernel::arrow::datatypes::Schema as ArrowSchema;
- use delta_kernel::committer::FileSystemCommitter;
- use delta_kernel::engine::arrow_conversion::TryIntoArrow as _;
-+use delta_kernel::engine::arrow_data::ArrowEngineData;
- use delta_kernel::expressions::Scalar;
- use delta_kernel::schema::{DataType, StructField, StructType};
- use delta_kernel::table_features::ColumnMappingMode;
-     );
-     Ok((add, rel_path))
- }
-+
-+// ==============================================================================
-+// materializePartitionColumns tests
-+// ==============================================================================
-+
-+/// Materialized partition columns are physically present in the parquet file, but their
-+/// stats must be omitted from the Add action because the value is already in
-+/// `partitionValues`. Pins the stat-collection contract for materialized partition tables
-+/// end-to-end through the kernel `create_table` builder.
-+#[tokio::test(flavor = "multi_thread")]
-+async fn test_materialized_partition_columns_excluded_from_stats(
-+) -> Result<(), Box<dyn std::error::Error>> {
-+    let _ = tracing_subscriber::fmt::try_init();
-+
-+    let partition_col = "partition";
-+    let table_schema = Arc::new(StructType::try_new(vec![
-+        StructField::nullable("number", DataType::INTEGER),
-+        StructField::nullable(partition_col, DataType::STRING),
-+    ])?);
-+
-+    let (_tmp_dir, table_path, engine) = test_table_setup_mt()?;
-+    let _ = create_table(&table_path, table_schema.clone(), "test/1.0")
-+        .with_data_layout(DataLayout::partitioned([partition_col]))
-+        .with_table_properties([("delta.feature.materializePartitionColumns", "supported")])
-+        .build(engine.as_ref(), Box::new(FileSystemCommitter::new()))?
-+        .commit(engine.as_ref())?;
-+
-+    let snapshot = Snapshot::builder_for(&table_path).build(engine.as_ref())?;
-+    let mut txn = snapshot
-+        .transaction(Box::new(FileSystemCommitter::new()), engine.as_ref())?
-+        .with_engine_info("default engine");
-+
-+    // With materializePartitionColumns, the data batch retains the partition column.
-+    let arrow_schema = Arc::new(table_schema.as_ref().try_into_arrow()?);
-+    let batch = RecordBatch::try_new(
-+        arrow_schema,
-+        vec![
-+            Arc::new(Int32Array::from(vec![1, 2, 3])),
-+            Arc::new(StringArray::from(vec!["a", "a", "a"])),
-+        ],
-+    )?;
-+    let data = Box::new(ArrowEngineData::new(batch));
-+
-+    let write_context = txn.partitioned_write_context(HashMap::from([(
-+        partition_col.to_string(),
-+        Scalar::String("a".into()),
-+    )]))?;
-+    let result = engine.write_parquet(&data, &write_context).await?;
-+    txn.add_files(result);
-+    assert!(txn.commit(engine.as_ref())?.is_committed());
-+
-+    let (add, _) = read_single_add(&table_path, 1)?;
-+    let stats: serde_json::Value = serde_json::from_str(add["stats"].as_str().unwrap()).unwrap();
-+
-+    // Stats should contain the data column but NOT the partition column.
-+    assert!(
-+        stats["minValues"].get("number").is_some(),
-+        "data column 'number' should have minValues"
-+    );
-+    assert!(
-+        stats["maxValues"].get("number").is_some(),
-+        "data column 'number' should have maxValues"
-+    );
-+    assert!(
-+        stats["minValues"].get(partition_col).is_none(),
-+        "partition column should not have minValues even when materialized"
-+    );
-+    assert!(
-+        stats["maxValues"].get(partition_col).is_none(),
-+        "partition column should not have maxValues even when materialized"
-+    );
-+    assert!(
-+        stats["nullCount"].get(partition_col).is_none(),
-+        "partition column should not have nullCount even when materialized"
-+    );
-+
-+    Ok(())
-+}
\ No newline at end of file

Reproduce locally: git range-diff 66b5a42..32eead5 a13ef67..a97b11b | Disable: git config gitstack.push-range-diff false

@sanujbasu sanujbasu force-pushed the stack/materialize-partition-columns-cleanup branch from 32eead5 to a97b11b Compare April 29, 2026 00:54
@sanujbasu sanujbasu enabled auto-merge April 29, 2026 01:02
@sanujbasu sanujbasu added this pull request to the merge queue Apr 29, 2026
Merged via the queue into delta-io:main with commit bc78961 Apr 29, 2026
24 of 25 checks passed
@sanujbasu sanujbasu deleted the stack/materialize-partition-columns-cleanup branch April 29, 2026 01:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants