Skip to content

RK4 tracking with analytical magnetic fields#231

Open
ndwang wants to merge 88 commits intobmad-sim:mainfrom
ndwang:FieldTrack
Open

RK4 tracking with analytical magnetic fields#231
ndwang wants to merge 88 commits intobmad-sim:mainfrom
ndwang:FieldTrack

Conversation

@ndwang
Copy link
Copy Markdown
Contributor

@ndwang ndwang commented Jan 23, 2026

  • Adds a new RungeKutta tracking method for fixed step s-based 4th-order Runge-Kutta integration.
  • Currently only supports analytical magnetic fields using BMultipoleParams.
  • Test cases for rk4_kernel!
  • Benchmarking script for rk4_kernel!

@codecov
Copy link
Copy Markdown

codecov Bot commented Jan 23, 2026

Codecov Report

❌ Patch coverage is 82.17822% with 36 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
ext/BeamTrackingBeamlinesExt/rungekutta.jl 55.69% 35 Missing ⚠️
src/modules/RungeKuttaTracking.jl 99.09% 1 Missing ⚠️

📢 Thoughts on this report? Let us know!

is_solenoid = (mm[1] == 0)
bz = vifelse(is_solenoid, kn[1], zero(x))

# Convert from normalized (field/Bρ) to physical units (Tesla)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the best way to do it? Aren't the normalized fields what appear in the equations of motion?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The equation of motion uses physical units to calculate Lorentz force

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In accelerator coordinates, you only need the normalized fields. On lines 95-97, you multiply by the charge and on lines 122-123, you divide by p0.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

E needs to be in physical units. If they are non-zero, the return value will be in mixed units

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why? E is also multiplied by the charge and divided by p0

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JosephPeterDevlin @mattsignorelli
Normalized field strength is for magnetic fields. It's not a thing for electric fields. E/Brho is not a physical quantity and will only confuse people in the future. If we decide to use E/Brho for this kernel, it means every other field solver (e.g. space charge, wakefield, CSR) needs to return fields in V/(T*m^2) or 1/s.

The performance improvement is an illusion. In most cases, it's just hiding the multiplication/division somewhere else. The only case where it's better is analytical magnets and it only saves 6 multiplications.

To put this into numbers, I benchmarked normalized units and physical units on CPU and GPU for three scenarios:

  • Analytical magnetic field
  • Field map
  • Superposition of both (to represent superposition, or analytical magnet + any other field sources that returns physical units, such as space charge or wakefield)

tl;dr

Normalized units are marginally better for analytical magnets, but drastically worse on CPU when you have anything that lives in physical units land.

Here are the results:

On my PC (CPU i7 13700K, GPU 4070)

CPU — Analytical Multipole

N Physical min (μs) Normalized min (μs) Physical med ± std (μs) Normalized med ± std (μs) Δ med
1,000 3,012 3,032 3,039 ± 369 3,055 ± 98 -0.5%
10,000 30,291 30,509 30,853 ± 4,412 30,953 ± 3,050 -0.3%
100,000 307,193 308,359 309,472 ± 8,690 310,878 ± 11,590 -0.5%

CPU — Field Map (3D Rectangular Grid)

N Physical min (μs) Normalized min (μs) Physical med ± std (μs) Normalized med ± std (μs) Δ med
1,000 15,317 15,908 15,630 ± 563 16,220 ± 666 -3.8%
10,000 155,262 161,201 156,225 ± 4,181 162,160 ± 1,176 -3.8%
100,000 1,570,321 1,621,898 1,576,434 ± 17,162 1,634,477 ± 17,525 -3.7%

CPU — Superposition (Multipole + Field Map)

N Physical min (μs) Normalized min (μs) Physical med ± std (μs) Normalized med ± std (μs) Δ med
1,000 14,880 16,495 15,200 ± 676 16,851 ± 697 -10.9%
10,000 150,374 166,601 151,653 ± 3,100 168,686 ± 3,492 -11.2%
100,000 1,521,921 1,689,968 1,530,000 ± 6,404 1,696,674 ± 6,507 -10.9%

GPU (CUDA) — Analytical Multipole

N Physical min (μs) Normalized min (μs) Physical med ± std (μs) Normalized med ± std (μs) Δ med
1,000 4,135 4,026 4,207 ± 178 4,090 ± 179 +2.8%
10,000 4,147 4,038 4,213 ± 190 4,094 ± 154 +2.8%
100,000 37,009 36,043 37,785 ± 466 36,778 ± 434 +2.7%

GPU (CUDA) — Field Map (3D Rectangular Grid)

N Physical min (μs) Normalized min (μs) Physical med ± std (μs) Normalized med ± std (μs) Δ med
1,000 6,243 6,259 6,356 ± 217 6,357 ± 223 -0.0%
10,000 6,258 6,269 6,341 ± 193 6,359 ± 188 -0.3%
100,000 56,123 56,303 57,429 ± 543 57,621 ± 629 -0.3%

GPU (CUDA) — Superposition (Multipole + Field Map)

N Physical min (μs) Normalized min (μs) Physical med ± std (μs) Normalized med ± std (μs) Δ med
1,000 6,761 6,709 6,854 ± 176 6,804 ± 197 +0.7%
10,000 6,781 6,750 6,959 ± 284 6,972 ± 243 -0.2%
100,000 61,253 60,773 62,293 ± 544 61,829 ± 633 +0.7%

On NERSC

CPU — Analytical Multipole

N Physical min (μs) Normalized min (μs) Physical med ± std (μs) Normalized med ± std (μs) Δ med
1,000 4,592 4,412 4,688 ± 147 4,469 ± 113 +4.7%
10,000 46,488 44,556 46,895 ± 233 45,868 ± 8,410 +2.2%
100,000 468,548 447,922 471,795 ± 1,951 458,366 ± 325,462 +2.9%

CPU — Field Map (3D Rectangular Grid)

N Physical min (μs) Normalized min (μs) Physical med ± std (μs) Normalized med ± std (μs) Δ med
1,000 25,752 28,822 26,100 ± 220 29,211 ± 256 -11.9%
10,000 258,509 289,861 259,329 ± 1,016 291,629 ± 1,288 -12.5%
100,000 2,599,207 2,909,775 2,603,184 ± 5,625 2,912,697 ± 4,132 -11.9%

CPU — Superposition (Multipole + Field Map)

N Physical min (μs) Normalized min (μs) Physical med ± std (μs) Normalized med ± std (μs) Δ med
1,000 28,777 31,652 29,264 ± 351 32,007 ± 202 -9.4%
10,000 288,959 320,165 291,804 ± 2,853 321,247 ± 836 -10.1%
100,000 2,914,413 3,200,890 2,918,006 ± 5,082 3,216,311 ± 21,808 -10.2%

GPU (CUDA) — Analytical Multipole

N Physical min (μs) Normalized min (μs) Physical med ± std (μs) Normalized med ± std (μs) Δ med
1,000 964 949 977 ± 13 957 ± 11 +2.1%
10,000 972 952 980 ± 18 959 ± 13 +2.1%
100,000 3,760 3,697 3,788 ± 17 3,709 ± 15 +2.1%

GPU (CUDA) — Field Map (3D Rectangular Grid)

N Physical min (μs) Normalized min (μs) Physical med ± std (μs) Normalized med ± std (μs) Δ med
1,000 1,278 1,275 1,305 ± 4,477 1,297 ± 14 +0.6%
10,000 1,295 1,295 1,313 ± 14 1,309 ± 15 +0.3%
100,000 5,077 5,094 5,117 ± 4,403 5,113 ± 20 +0.1%

GPU (CUDA) — Superposition (Multipole + Field Map)

N Physical min (μs) Normalized min (μs) Physical med ± std (μs) Normalized med ± std (μs) Δ med
1,000 1,565 1,586 1,599 ± 42 1,606 ± 16 -0.4%
10,000 1,614 1,629 1,644 ± 18 1,643 ± 11 +0.0%
100,000 6,258 6,241 6,314 ± 31 6,282 ± 26 +0.5%

Summary

My PC

Scenario CPU (avg) GPU (avg)
Analytical multipole -0.4% +2.8%
Field map -3.8% -0.2%
Superposition -11.0% +0.4%

NERSC

Scenario CPU GPU
Analytical multipole +3.2% +2.1%
Field map -12.1% +0.3%
Superposition -9.9% +0.0%

Copy link
Copy Markdown
Collaborator

@JosephPeterDevlin JosephPeterDevlin Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not a matter of physicality. The Hamiltonian only contains the electric field (actually the scalar potential) divided by p_ref/q. No matter what, you have to divide by p_ref/q. I therefore am very suspicious of this benchmark.

Copy link
Copy Markdown
Member

@mattsignorelli mattsignorelli Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we decide to use E/Brho for this kernel, it means every other field solver (e.g. space charge, wakefield, CSR) needs to return fields in V/(T*m^2) or 1/s.

Sounds fine to me... the phase space coordinates are normalized and not kinetic, so that actually makes sense

The performance improvement is an illusion. In most cases, it's just hiding the multiplication/division somewhere else. The only case where it's better is analytical magnets and it only saves 6 multiplications.

This doesn't make sense to me, because in one case the multiplication/division is done outside the kernel, so that it is only once and not done by every particle inside a kernel, and in the other case every particle has to do it.

I also am extremely suspicious of the benchmark, because logically with normalized fields, the multiplication/division is done outside of the kernel, not in. So any performance "losses" would have to be explained thoroughly.

Copy link
Copy Markdown
Contributor Author

@ndwang ndwang Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Field maps come from simulations (CST) or measurements. They are properties of the hardware, not the beam. The RF engineer will give you a file in V/m and have no idea what p_over_q_ref is.

OK what if we preprocess the field map. Well you can't store them normalized. Every time p_over_q_ref changes, you invalidate the field map. A cavity field map used for 12 cavities in a linac would need 12 separate copies because they have a different p_over_q_ref after acceleration. You want to optimize a cavity phase upstream? You have to re-normalize all field maps downstream. How about ramping? Magnets ramp B with p, so the normalized field strength stay constant. But you don't ramp RF cavity voltages with p. E/Brho is not a constant and you can't make a new field map for every ramping step. If EIC wants to reuse a magnet from RHIC, you cannot use a normalized field map unless you also have a lattice model and calcualte its p_over_q_ref.

The benchmark script is https://github.com/ndwang/BeamTracking.jl/blob/fieldmap/benchmark/rk4_units_benchmark.jl
Compare it with https://github.com/ndwang/BeamTracking.jl/blob/fieldmap/src/modules/RungeKuttaTracking.jl
The only difference is where p_over_q_ref is applied.
You should point to specific issues or benchmark it yourselves instead of handwaving about performance. "I'm suspicious" is not a counterargument

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You gave us a (probably AI generated) benchmark without the code, and made a claim that doing more FLOPS inside a kernel/per-particle loop is somehow faster than doing it outside of a kernel. This, which is a specific issue that we are pointing to, doesn't make sense to either me or @JosephPeterDevlin, hence our "I'm suspicious" response to the benchmark. We kindly requested you to clarify exactly what is causing the slowdown, and that question, at least to me, remains unanswered.

All I see are things like this:
https://github.com/ndwang/BeamTracking.jl/blob/cb875ef6018602f8478cf898a3d6932ca5273f47/src/modules/RungeKuttaTracking.jl#L92

which are extra operations inside a kernel. Until you point out exactly the cause of the slowdown, then a bunch of numbers on a table from a benchmark that neither of us understand don't mean anything to me or @JosephPeterDevlin .

This PR is for analytical magnetic fields. The entire codebase uses normalized coordinates because the canonical coordinates are normalized. In my view, this is such a minor request from @JosephPeterDevlin that I don't understand the strong pushback.

Furthermore, ramping is actually exactly the case where normalized magnetic fields make sense, because then I don't have to update every single B field at every time. When you ramp a ring, the normalized magnetic field strengths remain constant, even though the B fields change.

Comment thread src/modules/RungeKuttaTracking.jl Outdated
Comment thread src/modules/RungeKuttaTracking.jl Outdated
@JosephPeterDevlin
Copy link
Copy Markdown
Collaborator

@mattsignorelli Can you do a quick review?

@ndwang
Copy link
Copy Markdown
Contributor Author

ndwang commented Feb 11, 2026

@JosephPeterDevlin Fixed the 1s

E_force_z = charge * Ez
B_force_x = charge * (vy*Bz - vz*By)
B_force_y = charge * (vz*Bx - vx*Bz)
B_force_z = charge * (vx*By - vy*Bx)
Copy link
Copy Markdown
Collaborator

@JosephPeterDevlin JosephPeterDevlin Feb 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is B_force_z used?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants