Skip to content

Support for muons and matter/cuda muons files#1138

Open
olantwin wants to merge 3 commits intomasterfrom
eminem
Open

Support for muons and matter/cuda muons files#1138
olantwin wants to merge 3 commits intomasterfrom
eminem

Conversation

@olantwin
Copy link
Copy Markdown
Contributor

@olantwin olantwin commented Apr 2, 2026

Checklist

Summary by CodeRabbit

Release Notes

  • New Features
    • Added experimental support for importing and converting Muons and Matter pickle files to ROOT format
    • Introduced new --ttree command-line option to enable TTree-based input mode for reading particle event data from ROOT files
    • Added new fieldmap file (2026_04_01_SHiP_MainSpectrometerField_V13.root)

@olantwin olantwin requested a review from Gfrisella April 2, 2026 10:03
@olantwin olantwin requested a review from a team as a code owner April 2, 2026 10:03
@olantwin
Copy link
Copy Markdown
Contributor Author

olantwin commented Apr 2, 2026

@Gfrisella Please check carefully, I had to resolve over a year of merge conflicts.

This assumes that the format form cuda muons is the same as the muons and matter one.

@olantwin olantwin force-pushed the eminem branch 2 times, most recently from c30d0d4 to abd0c32 Compare April 2, 2026 11:52
@olantwin olantwin enabled auto-merge April 2, 2026 12:24
@kholoimov
Copy link
Copy Markdown

I was not able to review the conversion code for the ROOT file carefully, as there is a lot of technical python code :), but the rest looks good to me.

@olantwin
Copy link
Copy Markdown
Contributor Author

olantwin commented Apr 8, 2026

@Gfrisella Could you please review and test with a current cudo muons file?

@Gfrisella
Copy link
Copy Markdown
Contributor

I’ll check it later today 👍

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 9, 2026

📝 Walkthrough

Walkthrough

The PR adds support for reading particle event data from ROOT TTree files through a new TTreeGenerator class. A Python script (eminem_importer.py) converts pickled muon/matter datasets into ROOT files. The simulation script is extended with a --ttree flag to use this new generator, and build configuration is updated accordingly.

Changes

Cohort / File(s) Summary
Documentation & Changelog
CHANGELOG.md
Added three new entries: experimental support for Muons/Matter .pkl file import, TTreeGenerator capability for reading M&M ntuples, and a new fieldmap variant (2026_04_01).
Build Configuration
shipgen/CMakeLists.txt, shipgen/LinkDef.h
Added TTreeGenerator.cxx to build sources and linked SHiP::TTreeGenerator class for ROOT dictionary generation.
TTreeGenerator Implementation
shipgen/TTreeGenerator.h, shipgen/TTreeGenerator.cxx
New SHiP::TTreeGenerator class reads particle events from ROOT TTrees. Manages file/tree lifecycle, branch binding, and forwards events as primary particles via ReadEvent(). Supports configurable tree name, event range, and tracks momentum, position, PDG id, and weight fields.
Simulation Script Integration
macro/run_simScript.py
Added --ttree CLI flag to select TTree input mode. New conditional branch initializes TTreeGenerator with input file, tree name "converted_ntuple", and caps event count to available entries.
Data Import Tool
python/experimental/eminem_importer.py
New 455-line Python script converts pickled muon/matter datasets (8-column NumPy arrays: px, py, pz, x, y, z, id, w) to ROOT TTrees or RNTuples. Includes column configuration with unit/offset/scaling, JSON-based custom config override, ROOT.RDataFrame integration, and validation.

Sequence Diagram

sequenceDiagram
    participant User
    participant SimScript as run_simScript.py
    participant TreeGen as TTreeGenerator
    participant RootFile as ROOT TFile/TTree
    participant FairGen as FairPrimaryGenerator
    
    User->>SimScript: --ttree input.root
    SimScript->>TreeGen: Create & SetTreeName("converted_ntuple")
    SimScript->>TreeGen: Init(inputFile)
    TreeGen->>RootFile: Open & Validate TTree
    RootFile-->>TreeGen: Bind branches (px, py, pz, x, y, z, id, w)
    SimScript->>FairGen: Add TTreeGenerator
    SimScript->>FairGen: Generate Events
    
    loop Per Event
        FairGen->>TreeGen: ReadEvent(primGen)
        TreeGen->>RootFile: Load entry at current index
        RootFile-->>TreeGen: Return branch values
        TreeGen->>FairGen: AddTrack(px, py, pz, x, y, z, pdgId, weight)
        TreeGen->>TreeGen: Increment event counter
    end
    
    TreeGen->>RootFile: Close file
Loading

Estimated Code Review Effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 57.89% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The PR title clearly describes the main addition: support for muons and matter/CUDA muons files, which matches the three commit features (M&M importer, TTree generator, and --ttree option).
Description check ✅ Passed The PR description includes all required checklist items completed: changelog updated, commits follow conventional commits, and CI checks pass.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch eminem

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (3)
shipgen/TTreeGenerator.cxx (1)

76-87: Consider validating branch existence.

SetBranchAddress silently fails if branches don't exist or have incompatible types, leaving member variables uninitialized with their default values. Consider checking the return value or using GetBranch first to validate the tree structure matches expectations.

This would help catch configuration errors (e.g., wrong tree name or different column schema) early.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@shipgen/TTreeGenerator.cxx` around lines 76 - 87, InitBranches currently
calls SetBranchAddress for several branches (px, py, pz, x, y, z, id, w) without
verifying they exist, which can leave members like fPx/fPdgId uninitialized;
update TTreeGenerator::InitBranches to call fInputTree->GetBranch("px") etc. (or
a helper that iterates the expected names) before SetBranchAddress, and if any
branch is missing or incompatible log an error via your logger (or return/throw)
so initialization fails fast instead of silently continuing with default values.
Ensure you reference the branch names and member pointers (fPx, fPy, fPz, fX,
fY, fZ, fPdgId, fWeight) when validating to make the check robust.
python/experimental/eminem_importer.py (2)

105-114: Acknowledge pickle security risk in documentation.

Pickle deserialization can execute arbitrary code if given malicious input. While this script is intended for internal use with known muon data files, consider adding a warning in the docstring or using a safer format (e.g., NumPy's .npy/.npz) for future data.

For the current use case with trusted internal data files, this is acceptable but worth documenting.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@python/experimental/eminem_importer.py` around lines 105 - 114, Add a clear
security note to the module docstring in python/experimental/eminem_importer.py
stating that pickle deserialization (used in the block handling gzip.BadGzipFile
and calling pickle.loads/pickle.load on input_file) can execute arbitrary code
and should only be used with trusted internal files; also mention recommended
safer alternatives (e.g., NumPy .npy/.npz or JSON) for future use so maintainers
know to avoid silent risks when changing this loader.

137-139: Specify UTF-8 encoding for JSON file reading.

For portability across different systems, explicitly specify the encoding when reading JSON files.

🔧 Suggested fix
-        with open(config_path) as f:
+        with open(config_path, encoding="utf-8") as f:
             custom_config = json.load(f)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@python/experimental/eminem_importer.py` around lines 137 - 139, The JSON file
is opened without an explicit encoding in the try block around the
open(config_path) call inside python/experimental/eminem_importer.py; update the
open call used to read the custom_config (the with open(config_path) context
where json.load(f) is called) to explicitly specify encoding='utf-8' (and
optionally mode='r') so the JSON is read portably across platforms.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@macro/run_simScript.py`:
- Around line 595-603: The TTree handling assumes inputFile is a string but it
can be a list; update the TTree branch that uses options.ttree to accept
multiple input files by checking if inputFile is a list and, if so, iterating
over each file: for each file create a new ROOT.SHiP.TTreeGenerator(), call
generator.Init(file), primGen.AddGenerator(generator), accumulate total events
using generator.GetNEvents(); after adding all files set options.nEvents =
min(options.nEvents, total_events). If inputFile is a single string keep the
existing single-generator flow; reference symbols: options.ttree, inputFile,
ROOT.SHiP.TTreeGenerator, generator.Init, primGen.AddGenerator,
generator.GetNEvents(), options.nEvents.

In `@shipgen/TTreeGenerator.cxx`:
- Around line 36-58: When TTreeGenerator::Init opens fInputFile but fails to
find the tree (the !fInputTree branch), close and free the opened file before
returning to avoid leaking the TFile; e.g. call fInputFile->Close(), delete
fInputFile, and set fInputFile = nullptr in the error path where you log "Cannot
find tree ..." so subsequent Init calls won't leak the previous file pointer.

---

Nitpick comments:
In `@python/experimental/eminem_importer.py`:
- Around line 105-114: Add a clear security note to the module docstring in
python/experimental/eminem_importer.py stating that pickle deserialization (used
in the block handling gzip.BadGzipFile and calling pickle.loads/pickle.load on
input_file) can execute arbitrary code and should only be used with trusted
internal files; also mention recommended safer alternatives (e.g., NumPy
.npy/.npz or JSON) for future use so maintainers know to avoid silent risks when
changing this loader.
- Around line 137-139: The JSON file is opened without an explicit encoding in
the try block around the open(config_path) call inside
python/experimental/eminem_importer.py; update the open call used to read the
custom_config (the with open(config_path) context where json.load(f) is called)
to explicitly specify encoding='utf-8' (and optionally mode='r') so the JSON is
read portably across platforms.

In `@shipgen/TTreeGenerator.cxx`:
- Around line 76-87: InitBranches currently calls SetBranchAddress for several
branches (px, py, pz, x, y, z, id, w) without verifying they exist, which can
leave members like fPx/fPdgId uninitialized; update TTreeGenerator::InitBranches
to call fInputTree->GetBranch("px") etc. (or a helper that iterates the expected
names) before SetBranchAddress, and if any branch is missing or incompatible log
an error via your logger (or return/throw) so initialization fails fast instead
of silently continuing with default values. Ensure you reference the branch
names and member pointers (fPx, fPy, fPz, fX, fY, fZ, fPdgId, fWeight) when
validating to make the check robust.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 17837bab-00cf-4141-a322-b5ae142deb85

📥 Commits

Reviewing files that changed from the base of the PR and between 50cf541 and 9ecb269.

📒 Files selected for processing (7)
  • CHANGELOG.md
  • macro/run_simScript.py
  • python/experimental/eminem_importer.py
  • shipgen/CMakeLists.txt
  • shipgen/LinkDef.h
  • shipgen/TTreeGenerator.cxx
  • shipgen/TTreeGenerator.h

Comment on lines +595 to +603
if options.ttree:
ut.checkFileExists(inputFile)
primGen.SetTarget(0.0, 0.0)
generator = ROOT.SHiP.TTreeGenerator()
generator.SetTreeName("converted_ntuple")
generator.Init(inputFile)
primGen.AddGenerator(generator)
options.nEvents = min(options.nEvents, generator.GetNEvents())
print("Process ", options.nEvents, " from input file")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Potential issue: inputFile may be a list.

When multiple input files are specified with -f, inputFile becomes a list (see lines 294-297). The Init(inputFile) call on line 600 will fail if inputFile is a list since TTreeGenerator::Init expects const char*.

Consider handling this similarly to other generators or documenting that --ttree only supports single input files.

🔧 Suggested fix to handle list input
 if options.ttree:
     ut.checkFileExists(inputFile)
     primGen.SetTarget(0.0, 0.0)
     generator = ROOT.SHiP.TTreeGenerator()
     generator.SetTreeName("converted_ntuple")
-    generator.Init(inputFile)
+    # TTreeGenerator only supports single file input
+    input_path = inputFile[0] if isinstance(inputFile, list) else inputFile
+    generator.Init(input_path)
     primGen.AddGenerator(generator)
     options.nEvents = min(options.nEvents, generator.GetNEvents())
     print("Process ", options.nEvents, " from input file")
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if options.ttree:
ut.checkFileExists(inputFile)
primGen.SetTarget(0.0, 0.0)
generator = ROOT.SHiP.TTreeGenerator()
generator.SetTreeName("converted_ntuple")
generator.Init(inputFile)
primGen.AddGenerator(generator)
options.nEvents = min(options.nEvents, generator.GetNEvents())
print("Process ", options.nEvents, " from input file")
if options.ttree:
ut.checkFileExists(inputFile)
primGen.SetTarget(0.0, 0.0)
generator = ROOT.SHiP.TTreeGenerator()
generator.SetTreeName("converted_ntuple")
# TTreeGenerator only supports single file input
input_path = inputFile[0] if isinstance(inputFile, list) else inputFile
generator.Init(input_path)
primGen.AddGenerator(generator)
options.nEvents = min(options.nEvents, generator.GetNEvents())
print("Process ", options.nEvents, " from input file")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@macro/run_simScript.py` around lines 595 - 603, The TTree handling assumes
inputFile is a string but it can be a list; update the TTree branch that uses
options.ttree to accept multiple input files by checking if inputFile is a list
and, if so, iterating over each file: for each file create a new
ROOT.SHiP.TTreeGenerator(), call generator.Init(file),
primGen.AddGenerator(generator), accumulate total events using
generator.GetNEvents(); after adding all files set options.nEvents =
min(options.nEvents, total_events). If inputFile is a single string keep the
existing single-generator flow; reference symbols: options.ttree, inputFile,
ROOT.SHiP.TTreeGenerator, generator.Init, primGen.AddGenerator,
generator.GetNEvents(), options.nEvents.

Comment on lines +36 to +58
Bool_t TTreeGenerator::Init(const char* fileName, int startEvent) {
fInputFile = TFile::Open(fileName, "READ");
if (!fInputFile || fInputFile->IsZombie()) {
LOG(ERROR) << "TTreeGenerator: Cannot open file " << fileName;
return kFALSE;
}

fInputTree = dynamic_cast<TTree*>(fInputFile->Get(fTreeName.Data()));
if (!fInputTree) {
LOG(ERROR) << "TTreeGenerator: Cannot find tree " << fTreeName.Data()
<< " in file " << fileName;
return kFALSE;
}

fNEvents = fInputTree->GetEntries();
fCurrentEvent = startEvent;
LOG(INFO) << "TTreeGenerator: Found " << fNEvents
<< " events in tree, starting at " << startEvent;

InitBranches();

return kTRUE;
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Resource leak when tree is not found.

If the file opens successfully but the tree is not found (lines 44-48), the function returns kFALSE without closing fInputFile. The destructor will eventually clean up, but if Init is called again, the old file pointer will be leaked.

🔧 Suggested fix
   fInputTree = dynamic_cast<TTree*>(fInputFile->Get(fTreeName.Data()));
   if (!fInputTree) {
     LOG(ERROR) << "TTreeGenerator: Cannot find tree " << fTreeName.Data()
                << " in file " << fileName;
+    fInputFile->Close();
+    delete fInputFile;
+    fInputFile = nullptr;
     return kFALSE;
   }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
Bool_t TTreeGenerator::Init(const char* fileName, int startEvent) {
fInputFile = TFile::Open(fileName, "READ");
if (!fInputFile || fInputFile->IsZombie()) {
LOG(ERROR) << "TTreeGenerator: Cannot open file " << fileName;
return kFALSE;
}
fInputTree = dynamic_cast<TTree*>(fInputFile->Get(fTreeName.Data()));
if (!fInputTree) {
LOG(ERROR) << "TTreeGenerator: Cannot find tree " << fTreeName.Data()
<< " in file " << fileName;
return kFALSE;
}
fNEvents = fInputTree->GetEntries();
fCurrentEvent = startEvent;
LOG(INFO) << "TTreeGenerator: Found " << fNEvents
<< " events in tree, starting at " << startEvent;
InitBranches();
return kTRUE;
}
Bool_t TTreeGenerator::Init(const char* fileName, int startEvent) {
fInputFile = TFile::Open(fileName, "READ");
if (!fInputFile || fInputFile->IsZombie()) {
LOG(ERROR) << "TTreeGenerator: Cannot open file " << fileName;
return kFALSE;
}
fInputTree = dynamic_cast<TTree*>(fInputFile->Get(fTreeName.Data()));
if (!fInputTree) {
LOG(ERROR) << "TTreeGenerator: Cannot find tree " << fTreeName.Data()
<< " in file " << fileName;
fInputFile->Close();
delete fInputFile;
fInputFile = nullptr;
return kFALSE;
}
fNEvents = fInputTree->GetEntries();
fCurrentEvent = startEvent;
LOG(INFO) << "TTreeGenerator: Found " << fNEvents
<< " events in tree, starting at " << startEvent;
InitBranches();
return kTRUE;
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@shipgen/TTreeGenerator.cxx` around lines 36 - 58, When TTreeGenerator::Init
opens fInputFile but fails to find the tree (the !fInputTree branch), close and
free the opened file before returning to avoid leaking the TFile; e.g. call
fInputFile->Close(), delete fInputFile, and set fInputFile = nullptr in the
error path where you log "Cannot find tree ..." so subsequent Init calls won't
leak the previous file pointer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants