feat: Write parquet id for nested items in map/array according to metadata#2508

Open

dengsh12 wants to merge 13 commits intodelta-io:mainfrom

dengsh12:stack/write-nested-id

Collaborator

dengsh12 commented May 1, 2026 •

edited

Loading

🥞 Stacked PR

Use this link to review incremental changes.

stack/add-v3-feature [Files changed] [MERGED]
- stack/write-part-when-ice3 [Files changed] [MERGED]
  - stack/block-alter-table [Files changed]
    - stack/write-nested-id [Files changed]
      - stack/detect-no-num [Files changed]
        
        stack/create-ice-v3-table

What changes are proposed in this pull request?

Added the functionality to attach parquet field id for key/value/element in kernel map/array, when we transform a kernel schema to arrow schema in DefaultEngine. Added comments on engine trait stating engine need to be aware of the nested parquet id metadata as well.

Recommended review start: ArrowField::try_from_kernel, apply_schema, and StructField::try_from_arrow. These three func are entrance of transforming kernel schema to arrow schema, or applying kernel schema onto arrow data

How was this change tested?

Integration test:

Create a RecordBatch and kernel schema with nested parquet id. Write to a parquet file and read the raw just-written parquet to validate the field ids are correctly written.

Unit tests:
For both ArrowField::try_from_kernel and apply_schema

Deep nested schema struct<map<array, struct<map<array,int>>>> to validate parquet id attaching properly.
Test both metadata keys parquet.field.nested.ids and delta.columnMapping.nested.ids
Test error cases(parquet.field.nested.ids is not a json, parquet.field.nested.ids contains non-integer parquet id)

For apply_schema

Arrow data with existing metadata and existing parquet id, validates the metadata is reserved and the existing parquet id is override by kernel schema

For StructField::try_from_arrow:

Deep nested schema struct<map<array, struct<map<array,int>>>> to validate parquet nested id metadata filled correctly.
When arrow schema doesn't contain nested parquet id, the parquet id metadata in kernel is not filled
Error case(parquet id not an integer)
Test against various arrow list type(e.g. largelist)

dengsh12 mentioned this pull request

feat: Materialize partition values when icebergCompatV3 enabled #2504

Merged

github-actions Bot assigned dengsh12

dengsh12 force-pushed the stack/write-nested-id branch from d9cde6e to f01f689 Compare

May 1, 2026 00:06

dengsh12 mentioned this pull request

feat: Block alter table when icebergCompatV3 enabled #2505

Open

dengsh12 changed the title ~~draft impl~~ feat: Write parquet id for nested items in map/array according to metadata

github-actions Bot added the breaking-change label

dengsh12 force-pushed the stack/write-nested-id branch from f01f689 to 3078f79 Compare

May 1, 2026 02:59

github-actions Bot commented May 1, 2026 •

edited

Loading

PR title does not match the required pattern. Please ensure you follow the conventional commits spec.

Your title should start with feat:, fix:, chore:, docs:, perf:, refactor:, test:, or ci:, and if it's a breaking change that should be suffixed with a ! (like feat!:), and then a 1-72 character brief description of your change.

Title: feat: Write parquet id for nested items in map/array according to metadata
PR title does not match the required pattern. Please ensure you follow the conventional commits spec.

Your title should start with feat:, fix:, chore:, docs:, perf:, refactor:, test:, or ci:, and if it's a breaking change that should be suffixed with a ! (like feat!:), and then a 1-72 character brief description of your change.

Title: feat: Write parquet id for nested items in map/array according to metadata
PR title does not match the required pattern. Please ensure you follow the conventional commits spec.

Your title should start with feat:, fix:, chore:, docs:, perf:, refactor:, test:, or ci:, and if it's a breaking change that should be suffixed with a ! (like feat!:), and then a 1-72 character brief description of your change.

Title: feat: Write parquet id for nested items in map/array according to metadata
PR title does not match the required pattern. Please ensure you follow the conventional commits spec.

Your title should start with feat:, fix:, chore:, docs:, perf:, refactor:, test:, or ci:, and if it's a breaking change that should be suffixed with a ! (like feat!:), and then a 1-72 character brief description of your change.

Title: feat: Write parquet id for nested items in map/array according to metadata
PR title does not match the required pattern. Please ensure you follow the conventional commits spec.

Your title should start with feat:, fix:, chore:, docs:, perf:, refactor:, test:, or ci:, and if it's a breaking change that should be suffixed with a ! (like feat!:), and then a 1-72 character brief description of your change.

Title: feat: Write parquet id for nested items in map/array according to metadata
PR title does not match the required pattern. Please ensure you follow the conventional commits spec.

Your title should start with feat:, fix:, chore:, docs:, perf:, refactor:, test:, or ci:, and if it's a breaking change that should be suffixed with a ! (like feat!:), and then a 1-72 character brief description of your change.

Title: feat: Write parquet id for nested items in map/array according to metadata
PR title does not match the required pattern. Please ensure you follow the conventional commits spec.

Your title should start with feat:, fix:, chore:, docs:, perf:, refactor:, test:, or ci:, and if it's a breaking change that should be suffixed with a ! (like feat!:), and then a 1-72 character brief description of your change.

Title: feat: Write parquet id for nested items in map/array according to metadata
PR title does not match the required pattern. Please ensure you follow the conventional commits spec.

Your title should start with feat:, fix:, chore:, docs:, perf:, refactor:, test:, or ci:, and if it's a breaking change that should be suffixed with a ! (like feat!:), and then a 1-72 character brief description of your change.

Title: feat: Write parquet id for nested items in map/array according to metadata
PR title does not match the required pattern. Please ensure you follow the conventional commits spec.

Your title should start with feat:, fix:, chore:, docs:, perf:, refactor:, test:, or ci:, and if it's a breaking change that should be suffixed with a ! (like feat!:), and then a 1-72 character brief description of your change.

Title: feat: Write parquet id for nested items in map/array according to metadata
PR title does not match the required pattern. Please ensure you follow the conventional commits spec.

Your title should start with feat:, fix:, chore:, docs:, perf:, refactor:, test:, or ci:, and if it's a breaking change that should be suffixed with a ! (like feat!:), and then a 1-72 character brief description of your change.

Title: feat: Write parquet id for nested items in map/array according to metadata
PR title does not match the required pattern. Please ensure you follow the conventional commits spec.

Your title should start with feat:, fix:, chore:, docs:, perf:, refactor:, test:, or ci:, and if it's a breaking change that should be suffixed with a ! (like feat!:), and then a 1-72 character brief description of your change.

Title: feat: Write parquet id for nested items in map/array according to metadata
PR title does not match the required pattern. Please ensure you follow the conventional commits spec.

Your title should start with feat:, fix:, chore:, docs:, perf:, refactor:, test:, or ci:, and if it's a breaking change that should be suffixed with a ! (like feat!:), and then a 1-72 character brief description of your change.

Title: feat: Write parquet id for nested items in map/array according to metadata
PR title does not match the required pattern. Please ensure you follow the conventional commits spec.

Your title should start with feat:, fix:, chore:, docs:, perf:, refactor:, test:, or ci:, and if it's a breaking change that should be suffixed with a ! (like feat!:), and then a 1-72 character brief description of your change.

Title: feat: Write parquet id for nested items in map/array according to metadata
PR title does not match the required pattern. Please ensure you follow the conventional commits spec.

Your title should start with feat:, fix:, chore:, docs:, perf:, refactor:, test:, or ci:, and if it's a breaking change that should be suffixed with a ! (like feat!:), and then a 1-72 character brief description of your change.

Title: feat: Write parquet id for nested items in map/array according to metadata
PR title does not match the required pattern. Please ensure you follow the conventional commits spec.

Your title should start with feat:, fix:, chore:, docs:, perf:, refactor:, test:, or ci:, and if it's a breaking change that should be suffixed with a ! (like feat!:), and then a 1-72 character brief description of your change.

Title: feat: Write parquet id for nested items in map/array according to metadata
PR title does not match the required pattern. Please ensure you follow the conventional commits spec.

Your title should start with feat:, fix:, chore:, docs:, perf:, refactor:, test:, or ci:, and if it's a breaking change that should be suffixed with a ! (like feat!:), and then a 1-72 character brief description of your change.

Title: feat: Write parquet id for nested items in map/array according to metadata
PR title does not match the required pattern. Please ensure you follow the conventional commits spec.

Your title should start with feat:, fix:, chore:, docs:, perf:, refactor:, test:, or ci:, and if it's a breaking change that should be suffixed with a ! (like feat!:), and then a 1-72 character brief description of your change.

Title: feat: Write parquet id for nested items in map/array according to metadata
PR title does not match the required pattern. Please ensure you follow the conventional commits spec.

Your title should start with feat:, fix:, chore:, docs:, perf:, refactor:, test:, or ci:, and if it's a breaking change that should be suffixed with a ! (like feat!:), and then a 1-72 character brief description of your change.

Title: feat: Write parquet id for nested items in map/array according to metadata
PR title does not match the required pattern. Please ensure you follow the conventional commits spec.

Your title should start with feat:, fix:, chore:, docs:, perf:, refactor:, test:, or ci:, and if it's a breaking change that should be suffixed with a ! (like feat!:), and then a 1-72 character brief description of your change.

Title: feat: Write parquet id for nested items in map/array according to metadata
PR title does not match the required pattern. Please ensure you follow the conventional commits spec.

Your title should start with feat:, fix:, chore:, docs:, perf:, refactor:, test:, or ci:, and if it's a breaking change that should be suffixed with a ! (like feat!:), and then a 1-72 character brief description of your change.

Title: feat: Write parquet id for nested items in map/array according to metadata
PR title does not match the required pattern. Please ensure you follow the conventional commits spec.

Your title should start with feat:, fix:, chore:, docs:, perf:, refactor:, test:, or ci:, and if it's a breaking change that should be suffixed with a ! (like feat!:), and then a 1-72 character brief description of your change.

Title: feat: Write parquet id for nested items in map/array according to metadata
PR title does not match the required pattern. Please ensure you follow the conventional commits spec.

Your title should start with feat:, fix:, chore:, docs:, perf:, refactor:, test:, or ci:, and if it's a breaking change that should be suffixed with a ! (like feat!:), and then a 1-72 character brief description of your change.

Title: feat: Write parquet id for nested items in map/array according to metadata
PR title does not match the required pattern. Please ensure you follow the conventional commits spec.

Your title should start with feat:, fix:, chore:, docs:, perf:, refactor:, test:, or ci:, and if it's a breaking change that should be suffixed with a ! (like feat!:), and then a 1-72 character brief description of your change.

Title: feat: Write parquet id for nested items in map/array according to metadata
PR title does not match the required pattern. Please ensure you follow the conventional commits spec.

Your title should start with feat:, fix:, chore:, docs:, perf:, refactor:, test:, or ci:, and if it's a breaking change that should be suffixed with a ! (like feat!:), and then a 1-72 character brief description of your change.

Title: feat: Write parquet id for nested items in map/array according to metadata
PR title does not match the required pattern. Please ensure you follow the conventional commits spec.

Your title should start with feat:, fix:, chore:, docs:, perf:, refactor:, test:, or ci:, and if it's a breaking change that should be suffixed with a ! (like feat!:), and then a 1-72 character brief description of your change.

Title: feat: Write parquet id for nested items in map/array according to metadata
PR title does not match the required pattern. Please ensure you follow the conventional commits spec.

Your title should start with feat:, fix:, chore:, docs:, perf:, refactor:, test:, or ci:, and if it's a breaking change that should be suffixed with a ! (like feat!:), and then a 1-72 character brief description of your change.

Title: feat: Write parquet id for nested items in map/array according to metadata
PR title does not match the required pattern. Please ensure you follow the conventional commits spec.

Your title should start with feat:, fix:, chore:, docs:, perf:, refactor:, test:, or ci:, and if it's a breaking change that should be suffixed with a ! (like feat!:), and then a 1-72 character brief description of your change.

Title: feat: Write parquet id for nested items in map/array according to metadata
PR title does not match the required pattern. Please ensure you follow the conventional commits spec.

Your title should start with feat:, fix:, chore:, docs:, perf:, refactor:, test:, or ci:, and if it's a breaking change that should be suffixed with a ! (like feat!:), and then a 1-72 character brief description of your change.

Title: feat: Write parquet id for nested items in map/array according to metadata
PR title does not match the required pattern. Please ensure you follow the conventional commits spec.

Your title should start with feat:, fix:, chore:, docs:, perf:, refactor:, test:, or ci:, and if it's a breaking change that should be suffixed with a ! (like feat!:), and then a 1-72 character brief description of your change.

Title: feat: Write parquet id for nested items in map/array according to metadata
PR title does not match the required pattern. Please ensure you follow the conventional commits spec.

Your title should start with feat:, fix:, chore:, docs:, perf:, refactor:, test:, or ci:, and if it's a breaking change that should be suffixed with a ! (like feat!:), and then a 1-72 character brief description of your change.

Title: feat: Write parquet id for nested items in map/array according to metadata
PR title does not match the required pattern. Please ensure you follow the conventional commits spec.

Your title should start with feat:, fix:, chore:, docs:, perf:, refactor:, test:, or ci:, and if it's a breaking change that should be suffixed with a ! (like feat!:), and then a 1-72 character brief description of your change.

Title: feat: Write parquet id for nested items in map/array according to metadata
PR title does not match the required pattern. Please ensure you follow the conventional commits spec.

Your title should start with feat:, fix:, chore:, docs:, perf:, refactor:, test:, or ci:, and if it's a breaking change that should be suffixed with a ! (like feat!:), and then a 1-72 character brief description of your change.

Title: feat: Write parquet id for nested items in map/array according to metadata
PR title does not match the required pattern. Please ensure you follow the conventional commits spec.

Your title should start with feat:, fix:, chore:, docs:, perf:, refactor:, test:, or ci:, and if it's a breaking change that should be suffixed with a ! (like feat!:), and then a 1-72 character brief description of your change.

Title: feat: Write parquet id for nested items in map/array according to metadata

dengsh12 force-pushed the stack/write-nested-id branch 2 times, most recently from 64b3fcf to d058829 Compare

May 1, 2026 21:17

dengsh12 commented

View reviewed changes

kernel/src/engine/arrow_expression/apply_schema.rs Outdated

Comment on lines +163 to +169

+                  // 3. Rebuild the element field. `apply_schema` applies kernel's schema, so the input arrow
+                  //    field's metadata comes entirely from the kernel side. Parquet field ID is the only
+                  //    metadata that should be set on the element field here. Since kernel's
+                  //    `ArrayType::element_type` is a bare `DataType` (not a `StructField`), there is no
+                  //    per-element metadata on the kernel side. The element's Parquet field ID comes from the
+                  //    ancestor's metadata.
+                  let transformed_arrow_element_field = ArrowField::new(

Collaborator Author

dengsh12 May 1, 2026

https://github.com/delta-io/delta-kernel-rs/pull/331/changes#r1807071289 context here(we stripped the metadata in the past as well)

codecov Bot commented May 1, 2026 •

edited

Loading

Codecov Report

❌ Patch coverage is 93.52751% with 40 lines in your changes missing coverage. Please review.
✅ Project coverage is 88.57%. Comparing base (78670eb) to head (1ba9eff).

Files with missing lines	Patch %	Lines
kernel/src/engine/arrow_conversion/mod.rs	92.19%	9 Missing and 12 partials ⚠️
kernel/src/engine/arrow_expression/apply_schema.rs	89.47%	13 Missing and 3 partials ⚠️
kernel/src/utils.rs	98.36%	3 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #2508      +/-   ##
==========================================
+ Coverage   88.50%   88.57%   +0.07%     
==========================================
  Files         179      179              
  Lines       59031    59572     +541     
  Branches    59031    59572     +541     
==========================================
+ Hits        52245    52766     +521     
- Misses       4785     4791       +6     
- Partials     2001     2015      +14

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

dengsh12 marked this pull request as ready for review

May 1, 2026 21:26

dengsh12 requested review from Jameson-Crate and scottsand-db

May 1, 2026 21:26

scottsand-db reviewed

View reviewed changes

kernel/src/engine/arrow_conversion/mod.rs

                   }
               }
+              /// Recursive Arrow-type builder that propagates nested field IDs from `ancestor`'s

Collaborator

scottsand-db May 1, 2026

Give an example?

Collaborator Author

dengsh12 May 1, 2026

Added

scottsand-db reviewed

View reviewed changes

kernel/src/engine/arrow_expression/apply_schema.rs Outdated

               }
-              // deconstruct the array, then rebuild the mapped version
+              // Apply a kernel [`ArrayType`] to an Arrow [`ListArray`].

Collaborator

scottsand-db May 1, 2026

Can you please give an example? and clarify what "apply" means.

It's not intuitive to me what "apply a kernel ArrayType to an Arrow ListArray" means.

i.e. to me that is equivalently as confusing as "apply an Kernel integer to an arrow integer". I don't know what "apply" means.

Collaborator Author

dengsh12 May 1, 2026 •

edited

Loading

Added detail example and explanation(centralized to apply_schema_to_inner to avoid repeated comments)

scottsand-db reviewed

View reviewed changes

kernel/src/engine/arrow_expression/apply_schema.rs Outdated

-              // deconstruct a map, and rebuild it with the specified target kernel type
-              fn apply_schema_to_map(array: &dyn Array, kernel_map_type: &MapType) -> DeltaResult<MapArray> {
+              // Apply a kernel [`MapType`] to an Arrow [`MapArray`].

Collaborator

scottsand-db May 1, 2026

Ditto

Collaborator Author

dengsh12 May 1, 2026 •

edited

Loading

Added detail example and explaination(centralized to apply_schema_to_inner to avoid repeated comments)

dengsh12 force-pushed the stack/write-nested-id branch 3 times, most recently from 34ce46d to 766ace4 Compare

May 1, 2026 22:36

This was referenced May 1, 2026

Tracking: IcebergCompatV3 support #2492

Open

feat: Validate add.stats. numRecords must exist on writes when icebergCompatV3 enabled #2512

Open

dengsh12 added 7 commits

May 2, 2026 01:00


          block alter table

db32373


          self-review

2dce0ff


          self-review

494ec00


          self-review-2

e5fdeb3


          rewording

8704a0d


          error msg

cd4bf3f


          ai+self reviewed

f9bbb09

dengsh12 added 3 commits

May 2, 2026 01:00


          wording

a1475cd


          clarification examples

e70033a


          delete useless

c648031

dengsh12 force-pushed the stack/write-nested-id branch from 766ace4 to c648031 Compare

May 2, 2026 01:00

CynicDog pushed a commit to CynicDog/delta-kernel-rs that referenced this pull request


          feat: Materialize partition values when icebergCompatV3 enabled (de…

8b5cad7

…lta-io#2504)

## Stacked PR
Use this
[link](https://github.com/delta-io/delta-kernel-rs/pull/2504/files) to
review incremental changes.
-
[stack/add-v3-feature](delta-io#2475)
[[Files
changed](https://github.com/delta-io/delta-kernel-rs/pull/2475/files)]
[MERGED]
-
[**stack/write-part-when-ice3**](delta-io#2504)
[[Files
changed](https://github.com/delta-io/delta-kernel-rs/pull/2504/files)]
-
[stack/block-alter-table](delta-io#2505)
[[Files
changed](https://github.com/delta-io/delta-kernel-rs/pull/2505/files/15d151f8b271a72cb421ccfa81cda7d5887d1b37..562605f3911acd99854efae7940deb9344e7ee55)]
-
[stack/write-nested-id](delta-io#2508)
[[Files
changed](https://github.com/delta-io/delta-kernel-rs/pull/2508/files/562605f3911acd99854efae7940deb9344e7ee55..766ace41c2176c546f533be4ff0d7a9aa24ae001)]

---------
## What changes are proposed in this pull request?
Materialize partition values when icebergCompatV3 enabled.

## How was this change tested?
1. When `icebergCompatV3` enabled, partition columns are contained into
`physical_write_schema`
2. When `icebergCompatV3` supported but not enabled, partition columns
are not contained into `physical_write_schema`

Note: other related cases(e.g. `MaterializePartitionColumns`) are
already covered by existing tests so we are not adding tests here

dengsh12 added 3 commits

May 4, 2026 03:29


          roundtrip + doc + self-review

230da04


          fix ut

5ece508


          small changes from ai+self review

1ba9eff

dengsh12 commented

View reviewed changes

kernel/src/engine/arrow_conversion/mod.rs

Comment on lines +466 to +477

+                                  let id: i64 = v.parse().map_err(|_| {
+                                      ArrowError::SchemaError(format!(
+                                          "'{k}' on field '{}' must be an integer, got '{v}'",
+                                          arrow_field.name()
+                                      ))
+                                  })?;
+                                  return Ok((
+                                      ColumnMetadataKey::ParquetFieldId.as_ref().to_string(),
+                                      MetadataValue::Number(id),
+                                  ));
+                              }
+                              Ok((k.clone(), MetadataValue::from(v)))

Collaborator Author

dengsh12 May 4, 2026

Note: This is a fix, previously we filled ColumnMetadataKey::ParquetFieldId with string, but from here

delta-kernel-rs/kernel/src/engine/arrow_utils.rs

Lines 955 to 958 in 78670eb

    
           |field| match field.get_config_value(&ColumnMetadataKey::ParquetFieldId) { 
        
               Some(MetadataValue::Number(fid)) => Some((*fid, field.name())), 
        
               _ => None, 
        
           },

we can see it should be MetadataValue::Number

kernel/src/engine/arrow_conversion/mod.rs

Comment on lines +486 to +493

+                      if !nested_ids.is_empty() {
+                          kernel_metadata.insert(
+                              ColumnMetadataKey::ParquetFieldNestedIds
+                                  .as_ref()
+                                  .to_string(),
+                              MetadataValue::Other(serde_json::Value::Object(nested_ids)),
+                          );
+                      }

Collaborator Author

dengsh12 May 4, 2026

Note: I'm not validating if arrow_metadata contains different ColumnMetadataKey::ParquetFieldNestedIds for simplicity(feel like defense in depth). If arrow_metadata contains ColumnMetadataKey::ParquetFieldNestedIds which is different from collect_nested_field_ids_from_arrow, it would be invalid data. But lmk if you think it's worth defending here!

Collaborator Author

dengsh12 May 4, 2026

Think again we probably should not error, in the past we may write to column mapping but not icebergCompat table with delta.columnmapping.nested.ids, but didn't write parquet id. If we error here, those tables are not readable

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

breaking-change