Conversation
|
@Gfrisella Please check carefully, I had to resolve over a year of merge conflicts. This assumes that the format form cuda muons is the same as the muons and matter one. |
c30d0d4 to
abd0c32
Compare
|
I was not able to review the conversion code for the ROOT file carefully, as there is a lot of technical python code :), but the rest looks good to me. |
|
@Gfrisella Could you please review and test with a current cudo muons file? |
|
I’ll check it later today 👍 |
📝 WalkthroughWalkthroughThe PR adds support for reading particle event data from ROOT TTree files through a new Changes
Sequence DiagramsequenceDiagram
participant User
participant SimScript as run_simScript.py
participant TreeGen as TTreeGenerator
participant RootFile as ROOT TFile/TTree
participant FairGen as FairPrimaryGenerator
User->>SimScript: --ttree input.root
SimScript->>TreeGen: Create & SetTreeName("converted_ntuple")
SimScript->>TreeGen: Init(inputFile)
TreeGen->>RootFile: Open & Validate TTree
RootFile-->>TreeGen: Bind branches (px, py, pz, x, y, z, id, w)
SimScript->>FairGen: Add TTreeGenerator
SimScript->>FairGen: Generate Events
loop Per Event
FairGen->>TreeGen: ReadEvent(primGen)
TreeGen->>RootFile: Load entry at current index
RootFile-->>TreeGen: Return branch values
TreeGen->>FairGen: AddTrack(px, py, pz, x, y, z, pdgId, weight)
TreeGen->>TreeGen: Increment event counter
end
TreeGen->>RootFile: Close file
Estimated Code Review Effort🎯 3 (Moderate) | ⏱️ ~25 minutes 🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (3)
shipgen/TTreeGenerator.cxx (1)
76-87: Consider validating branch existence.
SetBranchAddresssilently fails if branches don't exist or have incompatible types, leaving member variables uninitialized with their default values. Consider checking the return value or usingGetBranchfirst to validate the tree structure matches expectations.This would help catch configuration errors (e.g., wrong tree name or different column schema) early.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@shipgen/TTreeGenerator.cxx` around lines 76 - 87, InitBranches currently calls SetBranchAddress for several branches (px, py, pz, x, y, z, id, w) without verifying they exist, which can leave members like fPx/fPdgId uninitialized; update TTreeGenerator::InitBranches to call fInputTree->GetBranch("px") etc. (or a helper that iterates the expected names) before SetBranchAddress, and if any branch is missing or incompatible log an error via your logger (or return/throw) so initialization fails fast instead of silently continuing with default values. Ensure you reference the branch names and member pointers (fPx, fPy, fPz, fX, fY, fZ, fPdgId, fWeight) when validating to make the check robust.python/experimental/eminem_importer.py (2)
105-114: Acknowledge pickle security risk in documentation.Pickle deserialization can execute arbitrary code if given malicious input. While this script is intended for internal use with known muon data files, consider adding a warning in the docstring or using a safer format (e.g., NumPy's
.npy/.npz) for future data.For the current use case with trusted internal data files, this is acceptable but worth documenting.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@python/experimental/eminem_importer.py` around lines 105 - 114, Add a clear security note to the module docstring in python/experimental/eminem_importer.py stating that pickle deserialization (used in the block handling gzip.BadGzipFile and calling pickle.loads/pickle.load on input_file) can execute arbitrary code and should only be used with trusted internal files; also mention recommended safer alternatives (e.g., NumPy .npy/.npz or JSON) for future use so maintainers know to avoid silent risks when changing this loader.
137-139: Specify UTF-8 encoding for JSON file reading.For portability across different systems, explicitly specify the encoding when reading JSON files.
🔧 Suggested fix
- with open(config_path) as f: + with open(config_path, encoding="utf-8") as f: custom_config = json.load(f)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@python/experimental/eminem_importer.py` around lines 137 - 139, The JSON file is opened without an explicit encoding in the try block around the open(config_path) call inside python/experimental/eminem_importer.py; update the open call used to read the custom_config (the with open(config_path) context where json.load(f) is called) to explicitly specify encoding='utf-8' (and optionally mode='r') so the JSON is read portably across platforms.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@macro/run_simScript.py`:
- Around line 595-603: The TTree handling assumes inputFile is a string but it
can be a list; update the TTree branch that uses options.ttree to accept
multiple input files by checking if inputFile is a list and, if so, iterating
over each file: for each file create a new ROOT.SHiP.TTreeGenerator(), call
generator.Init(file), primGen.AddGenerator(generator), accumulate total events
using generator.GetNEvents(); after adding all files set options.nEvents =
min(options.nEvents, total_events). If inputFile is a single string keep the
existing single-generator flow; reference symbols: options.ttree, inputFile,
ROOT.SHiP.TTreeGenerator, generator.Init, primGen.AddGenerator,
generator.GetNEvents(), options.nEvents.
In `@shipgen/TTreeGenerator.cxx`:
- Around line 36-58: When TTreeGenerator::Init opens fInputFile but fails to
find the tree (the !fInputTree branch), close and free the opened file before
returning to avoid leaking the TFile; e.g. call fInputFile->Close(), delete
fInputFile, and set fInputFile = nullptr in the error path where you log "Cannot
find tree ..." so subsequent Init calls won't leak the previous file pointer.
---
Nitpick comments:
In `@python/experimental/eminem_importer.py`:
- Around line 105-114: Add a clear security note to the module docstring in
python/experimental/eminem_importer.py stating that pickle deserialization (used
in the block handling gzip.BadGzipFile and calling pickle.loads/pickle.load on
input_file) can execute arbitrary code and should only be used with trusted
internal files; also mention recommended safer alternatives (e.g., NumPy
.npy/.npz or JSON) for future use so maintainers know to avoid silent risks when
changing this loader.
- Around line 137-139: The JSON file is opened without an explicit encoding in
the try block around the open(config_path) call inside
python/experimental/eminem_importer.py; update the open call used to read the
custom_config (the with open(config_path) context where json.load(f) is called)
to explicitly specify encoding='utf-8' (and optionally mode='r') so the JSON is
read portably across platforms.
In `@shipgen/TTreeGenerator.cxx`:
- Around line 76-87: InitBranches currently calls SetBranchAddress for several
branches (px, py, pz, x, y, z, id, w) without verifying they exist, which can
leave members like fPx/fPdgId uninitialized; update TTreeGenerator::InitBranches
to call fInputTree->GetBranch("px") etc. (or a helper that iterates the expected
names) before SetBranchAddress, and if any branch is missing or incompatible log
an error via your logger (or return/throw) so initialization fails fast instead
of silently continuing with default values. Ensure you reference the branch
names and member pointers (fPx, fPy, fPz, fX, fY, fZ, fPdgId, fWeight) when
validating to make the check robust.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 17837bab-00cf-4141-a322-b5ae142deb85
📒 Files selected for processing (7)
CHANGELOG.mdmacro/run_simScript.pypython/experimental/eminem_importer.pyshipgen/CMakeLists.txtshipgen/LinkDef.hshipgen/TTreeGenerator.cxxshipgen/TTreeGenerator.h
| if options.ttree: | ||
| ut.checkFileExists(inputFile) | ||
| primGen.SetTarget(0.0, 0.0) | ||
| generator = ROOT.SHiP.TTreeGenerator() | ||
| generator.SetTreeName("converted_ntuple") | ||
| generator.Init(inputFile) | ||
| primGen.AddGenerator(generator) | ||
| options.nEvents = min(options.nEvents, generator.GetNEvents()) | ||
| print("Process ", options.nEvents, " from input file") |
There was a problem hiding this comment.
Potential issue: inputFile may be a list.
When multiple input files are specified with -f, inputFile becomes a list (see lines 294-297). The Init(inputFile) call on line 600 will fail if inputFile is a list since TTreeGenerator::Init expects const char*.
Consider handling this similarly to other generators or documenting that --ttree only supports single input files.
🔧 Suggested fix to handle list input
if options.ttree:
ut.checkFileExists(inputFile)
primGen.SetTarget(0.0, 0.0)
generator = ROOT.SHiP.TTreeGenerator()
generator.SetTreeName("converted_ntuple")
- generator.Init(inputFile)
+ # TTreeGenerator only supports single file input
+ input_path = inputFile[0] if isinstance(inputFile, list) else inputFile
+ generator.Init(input_path)
primGen.AddGenerator(generator)
options.nEvents = min(options.nEvents, generator.GetNEvents())
print("Process ", options.nEvents, " from input file")📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if options.ttree: | |
| ut.checkFileExists(inputFile) | |
| primGen.SetTarget(0.0, 0.0) | |
| generator = ROOT.SHiP.TTreeGenerator() | |
| generator.SetTreeName("converted_ntuple") | |
| generator.Init(inputFile) | |
| primGen.AddGenerator(generator) | |
| options.nEvents = min(options.nEvents, generator.GetNEvents()) | |
| print("Process ", options.nEvents, " from input file") | |
| if options.ttree: | |
| ut.checkFileExists(inputFile) | |
| primGen.SetTarget(0.0, 0.0) | |
| generator = ROOT.SHiP.TTreeGenerator() | |
| generator.SetTreeName("converted_ntuple") | |
| # TTreeGenerator only supports single file input | |
| input_path = inputFile[0] if isinstance(inputFile, list) else inputFile | |
| generator.Init(input_path) | |
| primGen.AddGenerator(generator) | |
| options.nEvents = min(options.nEvents, generator.GetNEvents()) | |
| print("Process ", options.nEvents, " from input file") |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@macro/run_simScript.py` around lines 595 - 603, The TTree handling assumes
inputFile is a string but it can be a list; update the TTree branch that uses
options.ttree to accept multiple input files by checking if inputFile is a list
and, if so, iterating over each file: for each file create a new
ROOT.SHiP.TTreeGenerator(), call generator.Init(file),
primGen.AddGenerator(generator), accumulate total events using
generator.GetNEvents(); after adding all files set options.nEvents =
min(options.nEvents, total_events). If inputFile is a single string keep the
existing single-generator flow; reference symbols: options.ttree, inputFile,
ROOT.SHiP.TTreeGenerator, generator.Init, primGen.AddGenerator,
generator.GetNEvents(), options.nEvents.
| Bool_t TTreeGenerator::Init(const char* fileName, int startEvent) { | ||
| fInputFile = TFile::Open(fileName, "READ"); | ||
| if (!fInputFile || fInputFile->IsZombie()) { | ||
| LOG(ERROR) << "TTreeGenerator: Cannot open file " << fileName; | ||
| return kFALSE; | ||
| } | ||
|
|
||
| fInputTree = dynamic_cast<TTree*>(fInputFile->Get(fTreeName.Data())); | ||
| if (!fInputTree) { | ||
| LOG(ERROR) << "TTreeGenerator: Cannot find tree " << fTreeName.Data() | ||
| << " in file " << fileName; | ||
| return kFALSE; | ||
| } | ||
|
|
||
| fNEvents = fInputTree->GetEntries(); | ||
| fCurrentEvent = startEvent; | ||
| LOG(INFO) << "TTreeGenerator: Found " << fNEvents | ||
| << " events in tree, starting at " << startEvent; | ||
|
|
||
| InitBranches(); | ||
|
|
||
| return kTRUE; | ||
| } |
There was a problem hiding this comment.
Resource leak when tree is not found.
If the file opens successfully but the tree is not found (lines 44-48), the function returns kFALSE without closing fInputFile. The destructor will eventually clean up, but if Init is called again, the old file pointer will be leaked.
🔧 Suggested fix
fInputTree = dynamic_cast<TTree*>(fInputFile->Get(fTreeName.Data()));
if (!fInputTree) {
LOG(ERROR) << "TTreeGenerator: Cannot find tree " << fTreeName.Data()
<< " in file " << fileName;
+ fInputFile->Close();
+ delete fInputFile;
+ fInputFile = nullptr;
return kFALSE;
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| Bool_t TTreeGenerator::Init(const char* fileName, int startEvent) { | |
| fInputFile = TFile::Open(fileName, "READ"); | |
| if (!fInputFile || fInputFile->IsZombie()) { | |
| LOG(ERROR) << "TTreeGenerator: Cannot open file " << fileName; | |
| return kFALSE; | |
| } | |
| fInputTree = dynamic_cast<TTree*>(fInputFile->Get(fTreeName.Data())); | |
| if (!fInputTree) { | |
| LOG(ERROR) << "TTreeGenerator: Cannot find tree " << fTreeName.Data() | |
| << " in file " << fileName; | |
| return kFALSE; | |
| } | |
| fNEvents = fInputTree->GetEntries(); | |
| fCurrentEvent = startEvent; | |
| LOG(INFO) << "TTreeGenerator: Found " << fNEvents | |
| << " events in tree, starting at " << startEvent; | |
| InitBranches(); | |
| return kTRUE; | |
| } | |
| Bool_t TTreeGenerator::Init(const char* fileName, int startEvent) { | |
| fInputFile = TFile::Open(fileName, "READ"); | |
| if (!fInputFile || fInputFile->IsZombie()) { | |
| LOG(ERROR) << "TTreeGenerator: Cannot open file " << fileName; | |
| return kFALSE; | |
| } | |
| fInputTree = dynamic_cast<TTree*>(fInputFile->Get(fTreeName.Data())); | |
| if (!fInputTree) { | |
| LOG(ERROR) << "TTreeGenerator: Cannot find tree " << fTreeName.Data() | |
| << " in file " << fileName; | |
| fInputFile->Close(); | |
| delete fInputFile; | |
| fInputFile = nullptr; | |
| return kFALSE; | |
| } | |
| fNEvents = fInputTree->GetEntries(); | |
| fCurrentEvent = startEvent; | |
| LOG(INFO) << "TTreeGenerator: Found " << fNEvents | |
| << " events in tree, starting at " << startEvent; | |
| InitBranches(); | |
| return kTRUE; | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@shipgen/TTreeGenerator.cxx` around lines 36 - 58, When TTreeGenerator::Init
opens fInputFile but fails to find the tree (the !fInputTree branch), close and
free the opened file before returning to avoid leaking the TFile; e.g. call
fInputFile->Close(), delete fInputFile, and set fInputFile = nullptr in the
error path where you log "Cannot find tree ..." so subsequent Init calls won't
leak the previous file pointer.
Checklist
Summary by CodeRabbit
Release Notes
--ttreecommand-line option to enable TTree-based input mode for reading particle event data from ROOT files