- Type: Bug / Data corruption
- Severity: High
- Component: HDF5 particle checkpoint post-processing
- Location:
Src/Extern/HDF5/AMReX_ParticleHDF5.H:512-513
Problem
CheckpointPostHDF5 opens HdrFileNamePrePost with std::ofstream in append mode and writes per-grid metadata as text:
std::ofstream HdrFile;
HdrFile.open(HdrFileNamePrePost.c_str(), std::ios::out | std::ios::app);
// ...
HdrFile << whichPrePost[lev][j] << ' ' << countPrePost[lev][j] << ' '
<< wherePrePost[lev][j] << '\n';
However, HdrFileNamePrePost is set in WriteHDF5ParticleDataSync (line 228 of AMReX_WriteBinaryParticleDataHDF5.H) to an .h5 file:
HDF5FileName += ".h5";
pc.HdrFileNamePrePost = HDF5FileName;
Appending raw text to an HDF5 binary file corrupts the HDF5 structure. HDF5 files have a specific binary format with a superblock, B-tree indices, and heap structures. Appending arbitrary bytes beyond the HDF5 end-of-file marker makes the file unreadable by most HDF5 tools and prevents subsequent H5Fopen from working correctly.
Impact
- Any use of the pre/post checkpoint path with HDF5 (
usePrePost = true) produces corrupt HDF5 files.
- The non-HDF5
CheckpointPost correctly writes to a separate text header file, but CheckpointPostHDF5 reuses the .h5 filename without adaptation.
Suggested patch
The per-grid metadata (which, count, where) should be written as HDF5 attributes or datasets, not appended as text. One approach:
--- a/Src/Extern/HDF5/AMReX_ParticleHDF5.H
+++ b/Src/Extern/HDF5/AMReX_ParticleHDF5.H
@@ -509,8 +509,14 @@
const int IOProcNumber = ParallelDescriptor::IOProcessorNumber();
- std::ofstream HdrFile;
- HdrFile.open(HdrFileNamePrePost.c_str(), std::ios::out | std::ios::app);
+ hid_t fid = -1;
+ if (ParallelDescriptor::IOProcessor()) {
+ fid = H5Fopen(HdrFileNamePrePost.c_str(), H5F_ACC_RDWR, H5P_DEFAULT);
+ if (fid < 0) {
+ amrex::Abort("ParticleContainer::CheckpointPostHDF5(): "
+ "unable to open HDF5 file for post metadata");
+ }
+ }
for(int lev(0); lev <= finestLevel(); ++lev) {
ParallelDescriptor::ReduceIntSum (...);
@@ // write which/count/where as HDF5 datasets per level instead of text
The exact HDF5 schema (attributes vs. datasets, naming) should match whatever the corresponding RestartHDF5 expects.
Prepared by Claude
Src/Extern/HDF5/AMReX_ParticleHDF5.H:512-513Problem
CheckpointPostHDF5opensHdrFileNamePrePostwithstd::ofstreamin append mode and writes per-grid metadata as text:However,
HdrFileNamePrePostis set inWriteHDF5ParticleDataSync(line 228 ofAMReX_WriteBinaryParticleDataHDF5.H) to an.h5file:HDF5FileName += ".h5"; pc.HdrFileNamePrePost = HDF5FileName;Appending raw text to an HDF5 binary file corrupts the HDF5 structure. HDF5 files have a specific binary format with a superblock, B-tree indices, and heap structures. Appending arbitrary bytes beyond the HDF5 end-of-file marker makes the file unreadable by most HDF5 tools and prevents subsequent
H5Fopenfrom working correctly.Impact
usePrePost = true) produces corrupt HDF5 files.CheckpointPostcorrectly writes to a separate text header file, butCheckpointPostHDF5reuses the.h5filename without adaptation.Suggested patch
The per-grid metadata (which, count, where) should be written as HDF5 attributes or datasets, not appended as text. One approach:
The exact HDF5 schema (attributes vs. datasets, naming) should match whatever the corresponding
RestartHDF5expects.Prepared by Claude