Skip to content

feat: Write parquet id for nested items in map/array according to metadata#2508

Open
dengsh12 wants to merge 13 commits intodelta-io:mainfrom
dengsh12:stack/write-nested-id
Open

feat: Write parquet id for nested items in map/array according to metadata#2508
dengsh12 wants to merge 13 commits intodelta-io:mainfrom
dengsh12:stack/write-nested-id

Conversation

@dengsh12
Copy link
Copy Markdown
Collaborator

@dengsh12 dengsh12 commented May 1, 2026

🥞 Stacked PR

Use this link to review incremental changes.


What changes are proposed in this pull request?

Added the functionality to attach parquet field id for key/value/element in kernel map/array, when we transform a kernel schema to arrow schema in DefaultEngine. Added comments on engine trait stating engine need to be aware of the nested parquet id metadata as well.

Recommended review start: ArrowField::try_from_kernel, apply_schema, and StructField::try_from_arrow. These three func are entrance of transforming kernel schema to arrow schema, or applying kernel schema onto arrow data

How was this change tested?

Integration test:

  1. Create a RecordBatch and kernel schema with nested parquet id. Write to a parquet file and read the raw just-written parquet to validate the field ids are correctly written.

Unit tests:
For both ArrowField::try_from_kernel and apply_schema

  1. Deep nested schema struct<map<array, struct<map<array,int>>>> to validate parquet id attaching properly.
  2. Test both metadata keys parquet.field.nested.ids and delta.columnMapping.nested.ids
  3. Test error cases(parquet.field.nested.ids is not a json, parquet.field.nested.ids contains non-integer parquet id)

For apply_schema

  1. Arrow data with existing metadata and existing parquet id, validates the metadata is reserved and the existing parquet id is override by kernel schema

For StructField::try_from_arrow:

  1. Deep nested schema struct<map<array, struct<map<array,int>>>> to validate parquet nested id metadata filled correctly.
  2. When arrow schema doesn't contain nested parquet id, the parquet id metadata in kernel is not filled
  3. Error case(parquet id not an integer)
  4. Test against various arrow list type(e.g. largelist)

@dengsh12 dengsh12 force-pushed the stack/write-nested-id branch from d9cde6e to f01f689 Compare May 1, 2026 00:06
@dengsh12 dengsh12 changed the title draft impl feat: Write parquet id for nested items in map/array according to metadata May 1, 2026
@github-actions github-actions Bot added the breaking-change Public API change that could cause downstream compilation failures. Requires a major version bump. label May 1, 2026
@dengsh12 dengsh12 force-pushed the stack/write-nested-id branch from f01f689 to 3078f79 Compare May 1, 2026 02:59
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 1, 2026

PR title does not match the required pattern. Please ensure you follow the conventional commits spec.

Your title should start with feat:, fix:, chore:, docs:, perf:, refactor:, test:, or ci:, and if it's a breaking change that should be suffixed with a ! (like feat!:), and then a 1-72 character brief description of your change.

Title: feat: Write parquet id for nested items in map/array according to metadata
PR title does not match the required pattern. Please ensure you follow the conventional commits spec.

Your title should start with feat:, fix:, chore:, docs:, perf:, refactor:, test:, or ci:, and if it's a breaking change that should be suffixed with a ! (like feat!:), and then a 1-72 character brief description of your change.

Title: feat: Write parquet id for nested items in map/array according to metadata
PR title does not match the required pattern. Please ensure you follow the conventional commits spec.

Your title should start with feat:, fix:, chore:, docs:, perf:, refactor:, test:, or ci:, and if it's a breaking change that should be suffixed with a ! (like feat!:), and then a 1-72 character brief description of your change.

Title: feat: Write parquet id for nested items in map/array according to metadata
PR title does not match the required pattern. Please ensure you follow the conventional commits spec.

Your title should start with feat:, fix:, chore:, docs:, perf:, refactor:, test:, or ci:, and if it's a breaking change that should be suffixed with a ! (like feat!:), and then a 1-72 character brief description of your change.

Title: feat: Write parquet id for nested items in map/array according to metadata
PR title does not match the required pattern. Please ensure you follow the conventional commits spec.

Your title should start with feat:, fix:, chore:, docs:, perf:, refactor:, test:, or ci:, and if it's a breaking change that should be suffixed with a ! (like feat!:), and then a 1-72 character brief description of your change.

Title: feat: Write parquet id for nested items in map/array according to metadata
PR title does not match the required pattern. Please ensure you follow the conventional commits spec.

Your title should start with feat:, fix:, chore:, docs:, perf:, refactor:, test:, or ci:, and if it's a breaking change that should be suffixed with a ! (like feat!:), and then a 1-72 character brief description of your change.

Title: feat: Write parquet id for nested items in map/array according to metadata
PR title does not match the required pattern. Please ensure you follow the conventional commits spec.

Your title should start with feat:, fix:, chore:, docs:, perf:, refactor:, test:, or ci:, and if it's a breaking change that should be suffixed with a ! (like feat!:), and then a 1-72 character brief description of your change.

Title: feat: Write parquet id for nested items in map/array according to metadata
PR title does not match the required pattern. Please ensure you follow the conventional commits spec.

Your title should start with feat:, fix:, chore:, docs:, perf:, refactor:, test:, or ci:, and if it's a breaking change that should be suffixed with a ! (like feat!:), and then a 1-72 character brief description of your change.

Title: feat: Write parquet id for nested items in map/array according to metadata
PR title does not match the required pattern. Please ensure you follow the conventional commits spec.

Your title should start with feat:, fix:, chore:, docs:, perf:, refactor:, test:, or ci:, and if it's a breaking change that should be suffixed with a ! (like feat!:), and then a 1-72 character brief description of your change.

Title: feat: Write parquet id for nested items in map/array according to metadata
PR title does not match the required pattern. Please ensure you follow the conventional commits spec.

Your title should start with feat:, fix:, chore:, docs:, perf:, refactor:, test:, or ci:, and if it's a breaking change that should be suffixed with a ! (like feat!:), and then a 1-72 character brief description of your change.

Title: feat: Write parquet id for nested items in map/array according to metadata
PR title does not match the required pattern. Please ensure you follow the conventional commits spec.

Your title should start with feat:, fix:, chore:, docs:, perf:, refactor:, test:, or ci:, and if it's a breaking change that should be suffixed with a ! (like feat!:), and then a 1-72 character brief description of your change.

Title: feat: Write parquet id for nested items in map/array according to metadata
PR title does not match the required pattern. Please ensure you follow the conventional commits spec.

Your title should start with feat:, fix:, chore:, docs:, perf:, refactor:, test:, or ci:, and if it's a breaking change that should be suffixed with a ! (like feat!:), and then a 1-72 character brief description of your change.

Title: feat: Write parquet id for nested items in map/array according to metadata
PR title does not match the required pattern. Please ensure you follow the conventional commits spec.

Your title should start with feat:, fix:, chore:, docs:, perf:, refactor:, test:, or ci:, and if it's a breaking change that should be suffixed with a ! (like feat!:), and then a 1-72 character brief description of your change.

Title: feat: Write parquet id for nested items in map/array according to metadata
PR title does not match the required pattern. Please ensure you follow the conventional commits spec.

Your title should start with feat:, fix:, chore:, docs:, perf:, refactor:, test:, or ci:, and if it's a breaking change that should be suffixed with a ! (like feat!:), and then a 1-72 character brief description of your change.

Title: feat: Write parquet id for nested items in map/array according to metadata
PR title does not match the required pattern. Please ensure you follow the conventional commits spec.

Your title should start with feat:, fix:, chore:, docs:, perf:, refactor:, test:, or ci:, and if it's a breaking change that should be suffixed with a ! (like feat!:), and then a 1-72 character brief description of your change.

Title: feat: Write parquet id for nested items in map/array according to metadata
PR title does not match the required pattern. Please ensure you follow the conventional commits spec.

Your title should start with feat:, fix:, chore:, docs:, perf:, refactor:, test:, or ci:, and if it's a breaking change that should be suffixed with a ! (like feat!:), and then a 1-72 character brief description of your change.

Title: feat: Write parquet id for nested items in map/array according to metadata
PR title does not match the required pattern. Please ensure you follow the conventional commits spec.

Your title should start with feat:, fix:, chore:, docs:, perf:, refactor:, test:, or ci:, and if it's a breaking change that should be suffixed with a ! (like feat!:), and then a 1-72 character brief description of your change.

Title: feat: Write parquet id for nested items in map/array according to metadata
PR title does not match the required pattern. Please ensure you follow the conventional commits spec.

Your title should start with feat:, fix:, chore:, docs:, perf:, refactor:, test:, or ci:, and if it's a breaking change that should be suffixed with a ! (like feat!:), and then a 1-72 character brief description of your change.

Title: feat: Write parquet id for nested items in map/array according to metadata
PR title does not match the required pattern. Please ensure you follow the conventional commits spec.

Your title should start with feat:, fix:, chore:, docs:, perf:, refactor:, test:, or ci:, and if it's a breaking change that should be suffixed with a ! (like feat!:), and then a 1-72 character brief description of your change.

Title: feat: Write parquet id for nested items in map/array according to metadata
PR title does not match the required pattern. Please ensure you follow the conventional commits spec.

Your title should start with feat:, fix:, chore:, docs:, perf:, refactor:, test:, or ci:, and if it's a breaking change that should be suffixed with a ! (like feat!:), and then a 1-72 character brief description of your change.

Title: feat: Write parquet id for nested items in map/array according to metadata
PR title does not match the required pattern. Please ensure you follow the conventional commits spec.

Your title should start with feat:, fix:, chore:, docs:, perf:, refactor:, test:, or ci:, and if it's a breaking change that should be suffixed with a ! (like feat!:), and then a 1-72 character brief description of your change.

Title: feat: Write parquet id for nested items in map/array according to metadata
PR title does not match the required pattern. Please ensure you follow the conventional commits spec.

Your title should start with feat:, fix:, chore:, docs:, perf:, refactor:, test:, or ci:, and if it's a breaking change that should be suffixed with a ! (like feat!:), and then a 1-72 character brief description of your change.

Title: feat: Write parquet id for nested items in map/array according to metadata
PR title does not match the required pattern. Please ensure you follow the conventional commits spec.

Your title should start with feat:, fix:, chore:, docs:, perf:, refactor:, test:, or ci:, and if it's a breaking change that should be suffixed with a ! (like feat!:), and then a 1-72 character brief description of your change.

Title: feat: Write parquet id for nested items in map/array according to metadata
PR title does not match the required pattern. Please ensure you follow the conventional commits spec.

Your title should start with feat:, fix:, chore:, docs:, perf:, refactor:, test:, or ci:, and if it's a breaking change that should be suffixed with a ! (like feat!:), and then a 1-72 character brief description of your change.

Title: feat: Write parquet id for nested items in map/array according to metadata
PR title does not match the required pattern. Please ensure you follow the conventional commits spec.

Your title should start with feat:, fix:, chore:, docs:, perf:, refactor:, test:, or ci:, and if it's a breaking change that should be suffixed with a ! (like feat!:), and then a 1-72 character brief description of your change.

Title: feat: Write parquet id for nested items in map/array according to metadata
PR title does not match the required pattern. Please ensure you follow the conventional commits spec.

Your title should start with feat:, fix:, chore:, docs:, perf:, refactor:, test:, or ci:, and if it's a breaking change that should be suffixed with a ! (like feat!:), and then a 1-72 character brief description of your change.

Title: feat: Write parquet id for nested items in map/array according to metadata
PR title does not match the required pattern. Please ensure you follow the conventional commits spec.

Your title should start with feat:, fix:, chore:, docs:, perf:, refactor:, test:, or ci:, and if it's a breaking change that should be suffixed with a ! (like feat!:), and then a 1-72 character brief description of your change.

Title: feat: Write parquet id for nested items in map/array according to metadata
PR title does not match the required pattern. Please ensure you follow the conventional commits spec.

Your title should start with feat:, fix:, chore:, docs:, perf:, refactor:, test:, or ci:, and if it's a breaking change that should be suffixed with a ! (like feat!:), and then a 1-72 character brief description of your change.

Title: feat: Write parquet id for nested items in map/array according to metadata
PR title does not match the required pattern. Please ensure you follow the conventional commits spec.

Your title should start with feat:, fix:, chore:, docs:, perf:, refactor:, test:, or ci:, and if it's a breaking change that should be suffixed with a ! (like feat!:), and then a 1-72 character brief description of your change.

Title: feat: Write parquet id for nested items in map/array according to metadata
PR title does not match the required pattern. Please ensure you follow the conventional commits spec.

Your title should start with feat:, fix:, chore:, docs:, perf:, refactor:, test:, or ci:, and if it's a breaking change that should be suffixed with a ! (like feat!:), and then a 1-72 character brief description of your change.

Title: feat: Write parquet id for nested items in map/array according to metadata
PR title does not match the required pattern. Please ensure you follow the conventional commits spec.

Your title should start with feat:, fix:, chore:, docs:, perf:, refactor:, test:, or ci:, and if it's a breaking change that should be suffixed with a ! (like feat!:), and then a 1-72 character brief description of your change.

Title: feat: Write parquet id for nested items in map/array according to metadata
PR title does not match the required pattern. Please ensure you follow the conventional commits spec.

Your title should start with feat:, fix:, chore:, docs:, perf:, refactor:, test:, or ci:, and if it's a breaking change that should be suffixed with a ! (like feat!:), and then a 1-72 character brief description of your change.

Title: feat: Write parquet id for nested items in map/array according to metadata
PR title does not match the required pattern. Please ensure you follow the conventional commits spec.

Your title should start with feat:, fix:, chore:, docs:, perf:, refactor:, test:, or ci:, and if it's a breaking change that should be suffixed with a ! (like feat!:), and then a 1-72 character brief description of your change.

Title: feat: Write parquet id for nested items in map/array according to metadata

@dengsh12 dengsh12 force-pushed the stack/write-nested-id branch 2 times, most recently from 64b3fcf to d058829 Compare May 1, 2026 21:17
Comment on lines +163 to +169
// 3. Rebuild the element field. `apply_schema` applies kernel's schema, so the input arrow
// field's metadata comes entirely from the kernel side. Parquet field ID is the only
// metadata that should be set on the element field here. Since kernel's
// `ArrayType::element_type` is a bare `DataType` (not a `StructField`), there is no
// per-element metadata on the kernel side. The element's Parquet field ID comes from the
// ancestor's metadata.
let transformed_arrow_element_field = ArrowField::new(
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://github.com/delta-io/delta-kernel-rs/pull/331/changes#r1807071289 context here(we stripped the metadata in the past as well)

@codecov
Copy link
Copy Markdown

codecov Bot commented May 1, 2026

Codecov Report

❌ Patch coverage is 93.52751% with 40 lines in your changes missing coverage. Please review.
✅ Project coverage is 88.57%. Comparing base (78670eb) to head (1ba9eff).

Files with missing lines Patch % Lines
kernel/src/engine/arrow_conversion/mod.rs 92.19% 9 Missing and 12 partials ⚠️
kernel/src/engine/arrow_expression/apply_schema.rs 89.47% 13 Missing and 3 partials ⚠️
kernel/src/utils.rs 98.36% 3 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2508      +/-   ##
==========================================
+ Coverage   88.50%   88.57%   +0.07%     
==========================================
  Files         179      179              
  Lines       59031    59572     +541     
  Branches    59031    59572     +541     
==========================================
+ Hits        52245    52766     +521     
- Misses       4785     4791       +6     
- Partials     2001     2015      +14     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@dengsh12 dengsh12 marked this pull request as ready for review May 1, 2026 21:26
}
}

/// Recursive Arrow-type builder that propagates nested field IDs from `ancestor`'s
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Give an example?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added

}

// deconstruct the array, then rebuild the mapped version
// Apply a kernel [`ArrayType`] to an Arrow [`ListArray`].
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please give an example? and clarify what "apply" means.

It's not intuitive to me what "apply a kernel ArrayType to an Arrow ListArray" means.

i.e. to me that is equivalently as confusing as "apply an Kernel integer to an arrow integer". I don't know what "apply" means.

Copy link
Copy Markdown
Collaborator Author

@dengsh12 dengsh12 May 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added detail example and explanation(centralized to apply_schema_to_inner to avoid repeated comments)


// deconstruct a map, and rebuild it with the specified target kernel type
fn apply_schema_to_map(array: &dyn Array, kernel_map_type: &MapType) -> DeltaResult<MapArray> {
// Apply a kernel [`MapType`] to an Arrow [`MapArray`].
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto

Copy link
Copy Markdown
Collaborator Author

@dengsh12 dengsh12 May 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added detail example and explaination(centralized to apply_schema_to_inner to avoid repeated comments)

@dengsh12 dengsh12 force-pushed the stack/write-nested-id branch from 766ace4 to c648031 Compare May 2, 2026 01:00
CynicDog pushed a commit to CynicDog/delta-kernel-rs that referenced this pull request May 2, 2026
…lta-io#2504)

## Stacked PR
Use this
[link](https://github.com/delta-io/delta-kernel-rs/pull/2504/files) to
review incremental changes.
-
[stack/add-v3-feature](delta-io#2475)
[[Files
changed](https://github.com/delta-io/delta-kernel-rs/pull/2475/files)]
[MERGED]
-
[**stack/write-part-when-ice3**](delta-io#2504)
[[Files
changed](https://github.com/delta-io/delta-kernel-rs/pull/2504/files)]
-
[stack/block-alter-table](delta-io#2505)
[[Files
changed](https://github.com/delta-io/delta-kernel-rs/pull/2505/files/15d151f8b271a72cb421ccfa81cda7d5887d1b37..562605f3911acd99854efae7940deb9344e7ee55)]
-
[stack/write-nested-id](delta-io#2508)
[[Files
changed](https://github.com/delta-io/delta-kernel-rs/pull/2508/files/562605f3911acd99854efae7940deb9344e7ee55..766ace41c2176c546f533be4ff0d7a9aa24ae001)]

---------
## What changes are proposed in this pull request?
Materialize partition values when icebergCompatV3 enabled.

## How was this change tested?
1. When `icebergCompatV3` enabled, partition columns are contained into
`physical_write_schema`
2. When `icebergCompatV3` supported but not enabled, partition columns
are not contained into `physical_write_schema`

Note: other related cases(e.g. `MaterializePartitionColumns`) are
already covered by existing tests so we are not adding tests here
Comment on lines +466 to +477
let id: i64 = v.parse().map_err(|_| {
ArrowError::SchemaError(format!(
"'{k}' on field '{}' must be an integer, got '{v}'",
arrow_field.name()
))
})?;
return Ok((
ColumnMetadataKey::ParquetFieldId.as_ref().to_string(),
MetadataValue::Number(id),
));
}
Ok((k.clone(), MetadataValue::from(v)))
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: This is a fix, previously we filled ColumnMetadataKey::ParquetFieldId with string, but from here

|field| match field.get_config_value(&ColumnMetadataKey::ParquetFieldId) {
Some(MetadataValue::Number(fid)) => Some((*fid, field.name())),
_ => None,
},
we can see it should be MetadataValue::Number

Comment on lines +486 to +493
if !nested_ids.is_empty() {
kernel_metadata.insert(
ColumnMetadataKey::ParquetFieldNestedIds
.as_ref()
.to_string(),
MetadataValue::Other(serde_json::Value::Object(nested_ids)),
);
}
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: I'm not validating if arrow_metadata contains different ColumnMetadataKey::ParquetFieldNestedIds for simplicity(feel like defense in depth). If arrow_metadata contains ColumnMetadataKey::ParquetFieldNestedIds which is different from collect_nested_field_ids_from_arrow, it would be invalid data. But lmk if you think it's worth defending here!

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Think again we probably should not error, in the past we may write to column mapping but not icebergCompat table with delta.columnmapping.nested.ids, but didn't write parquet id. If we error here, those tables are not readable

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

breaking-change Public API change that could cause downstream compilation failures. Requires a major version bump.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants