chore: support RNTuple v1.0.1.0 format (attribute set record frame)#1580
chore: support RNTuple v1.0.1.0 format (attribute set record frame)#1580wdconinc wants to merge 3 commits intoscikit-hep:mainfrom
Conversation
|
FYI @ariostas |
| # https://github.com/root-project/root/blob/8cd9eed6f3a32e55ef1f0f1df8e5462e753c735d/tree/ntuple/v7/doc/BinaryFormatSpecification.md#footer-envelope | ||
| class FooterReader: | ||
| def __init__(self): | ||
| def __init__(self, version=(1, 0, 0, 0)): |
There was a problem hiding this comment.
The only reason for a default version here is to provide backwards compatibility: I don't know if there are users of this class who may assume it has no arguments.
There was a problem hiding this comment.
Pull request overview
This PR adds support for reading RNTuple v1.0.1.0 files, which introduced a new "linked attribute sets" list frame in the footer structure. The implementation uses version-dependent parsing to handle both the new v1.0.1.0 format (used by ROOT v6.38) and maintain backward compatibility with v1.0.0.0 files.
Changes:
- Added
LinkedAttributeSetRecordReaderclass to parse linked attribute set records - Modified
FooterReaderto conditionally read linked attribute sets based on RNTuple version - Updated footer parsing to pass version information for version-dependent parsing
- Added comprehensive tests for both v1.0.1.0 and v1.0.0.0 file formats
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| tests/test_1510_rntuple_v1010.py | Adds tests to verify v1.0.1.0 footer parsing and backward compatibility with v1.0.0.0 files |
| src/uproot/models/RNTuple.py | Implements LinkedAttributeSetRecordReader and version-dependent footer parsing logic |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| out.schema_version_major = cursor.field(chunk, struct.Struct("<H"), context) | ||
| out.schema_version_minor = cursor.field(chunk, struct.Struct("<H"), context) | ||
| out.anchor_uncompressed_size = cursor.field(chunk, struct.Struct("<I"), context) | ||
| out.locator = LocatorReader().read(chunk, cursor, context) | ||
| out.name = cursor.rntuple_string(chunk, context) | ||
| return out | ||
|
|
||
|
|
||
| # https://github.com/root-project/root/blob/8cd9eed6f3a32e55ef1f0f1df8e5462e753c735d/tree/ntuple/v7/doc/BinaryFormatSpecification.md#schema-extension-record-frame |
There was a problem hiding this comment.
The struct format definitions should be defined as module-level constants at the top of the file (around lines 25-60), following the established convention in this codebase. Instead of creating struct.Struct instances inline, define a constant like:
_rntuple_linked_attribute_set_format = struct.Struct("<HHI")
at the top of the file with the other format definitions, then use cursor.fields() to read all three values at once:
out.schema_version_major, out.schema_version_minor, out.anchor_uncompressed_size = cursor.fields(chunk, _rntuple_linked_attribute_set_format, context)
This follows the pattern used by other readers like ClusterGroupRecordReader and FieldRecordReader.
| # Verify checksum validation passes | ||
| import xxhash | ||
|
|
||
| chunk = obj._footer_chunk | ||
| computed = xxhash.xxh3_64_intdigest(chunk.raw_data[:-8]) | ||
| stored = footer.checksum | ||
| assert ( | ||
| computed == stored | ||
| ), f"Footer checksum mismatch: {computed:016x} != {stored:016x}" |
There was a problem hiding this comment.
The checksum validation in these test assertions is redundant. The footer property already validates the checksum automatically (see src/uproot/models/RNTuple.py lines 318-320). If the checksum were invalid, accessing obj.footer would raise an AssertionError before reaching this point. Consider removing this redundant validation or adding a comment explaining why it's being tested again.
| # Verify checksum validation still passes | ||
| import xxhash | ||
|
|
||
| chunk = obj._footer_chunk | ||
| computed = xxhash.xxh3_64_intdigest(chunk.raw_data[:-8]) | ||
| stored = footer.checksum | ||
| assert ( | ||
| computed == stored | ||
| ), f"Footer checksum mismatch: {computed:016x} != {stored:016x}" |
There was a problem hiding this comment.
The checksum validation in these test assertions is redundant. The footer property already validates the checksum automatically (see src/uproot/models/RNTuple.py lines 318-320). If the checksum were invalid, accessing obj.footer would raise an AssertionError before reaching this point. Consider removing this redundant validation or adding a comment explaining why it's being tested again.
This PR adds support for RNTuple v1.0.1.0 files.
In the new version a new attribute set record frame was added; root-project/root@b6e7318). This is the format in used by ROOT v6.38. The new format requires version-dependent parsing since it is not just an empty frame, but there is no frame at all.
The tests (written with help of AI, so somewhat wordy and with more asserts than maybe necessary) will require the test data added in scikit-hep/scikit-hep-testdata#208. Both tests succeed in local tests:
This only partly addresses #1510. No change to uproot support for the RNTuple attributes.