RK4 tracking with analytical magnetic fields by ndwang · Pull Request #231 · bmad-sim/BeamTracking.jl

ndwang · 2026-01-23T06:23:32Z

Adds a new RungeKutta tracking method for fixed step s-based 4th-order Runge-Kutta integration.
Currently only supports analytical magnetic fields using BMultipoleParams.
Test cases for rk4_kernel!
Benchmarking script for rk4_kernel!

…eld calculations.

…d add alive state check; changed while loop to for loop in rk4_kernel

codecov · 2026-01-23T20:09:20Z

Codecov Report

❌ Patch coverage is 82.17822% with 36 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
ext/BeamTrackingBeamlinesExt/rungekutta.jl	55.69%	35 Missing ⚠️
src/modules/RungeKuttaTracking.jl	99.09%	1 Missing ⚠️

📢 Thoughts on this report? Let us know!

JosephPeterDevlin · 2026-02-11T01:36:40Z

+  is_solenoid = (mm[1] == 0)
+  bz = vifelse(is_solenoid, kn[1], zero(x))
+
+  # Convert from normalized (field/Bρ) to physical units (Tesla)


Is this the best way to do it? Aren't the normalized fields what appear in the equations of motion?

The equation of motion uses physical units to calculate Lorentz force

In accelerator coordinates, you only need the normalized fields. On lines 95-97, you multiply by the charge and on lines 122-123, you divide by p0.

E needs to be in physical units. If they are non-zero, the return value will be in mixed units

Why? E is also multiplied by the charge and divided by p0

@JosephPeterDevlin @mattsignorelli
Normalized field strength is for magnetic fields. It's not a thing for electric fields. E/Brho is not a physical quantity and will only confuse people in the future. If we decide to use E/Brho for this kernel, it means every other field solver (e.g. space charge, wakefield, CSR) needs to return fields in V/(T*m^2) or 1/s.

The performance improvement is an illusion. In most cases, it's just hiding the multiplication/division somewhere else. The only case where it's better is analytical magnets and it only saves 6 multiplications.

To put this into numbers, I benchmarked normalized units and physical units on CPU and GPU for three scenarios:

Analytical magnetic field

Field map

Superposition of both (to represent superposition, or analytical magnet + any other field sources that returns physical units, such as space charge or wakefield)

tl;dr

Normalized units are marginally better for analytical magnets, but drastically worse on CPU when you have anything that lives in physical units land.

Here are the results:

On my PC (CPU i7 13700K, GPU 4070)

CPU — Analytical Multipole

N Physical min (μs) Normalized min (μs) Physical med ± std (μs) Normalized med ± std (μs) Δ med

1,000 3,012 3,032 3,039 ± 369 3,055 ± 98 -0.5%

10,000 30,291 30,509 30,853 ± 4,412 30,953 ± 3,050 -0.3%

100,000 307,193 308,359 309,472 ± 8,690 310,878 ± 11,590 -0.5%

CPU — Field Map (3D Rectangular Grid)

N Physical min (μs) Normalized min (μs) Physical med ± std (μs) Normalized med ± std (μs) Δ med

1,000 15,317 15,908 15,630 ± 563 16,220 ± 666 -3.8%

10,000 155,262 161,201 156,225 ± 4,181 162,160 ± 1,176 -3.8%

100,000 1,570,321 1,621,898 1,576,434 ± 17,162 1,634,477 ± 17,525 -3.7%

CPU — Superposition (Multipole + Field Map)

N Physical min (μs) Normalized min (μs) Physical med ± std (μs) Normalized med ± std (μs) Δ med

1,000 14,880 16,495 15,200 ± 676 16,851 ± 697 -10.9%

10,000 150,374 166,601 151,653 ± 3,100 168,686 ± 3,492 -11.2%

100,000 1,521,921 1,689,968 1,530,000 ± 6,404 1,696,674 ± 6,507 -10.9%

GPU (CUDA) — Analytical Multipole

N Physical min (μs) Normalized min (μs) Physical med ± std (μs) Normalized med ± std (μs) Δ med

1,000 4,135 4,026 4,207 ± 178 4,090 ± 179 +2.8%

10,000 4,147 4,038 4,213 ± 190 4,094 ± 154 +2.8%

100,000 37,009 36,043 37,785 ± 466 36,778 ± 434 +2.7%

GPU (CUDA) — Field Map (3D Rectangular Grid)

N Physical min (μs) Normalized min (μs) Physical med ± std (μs) Normalized med ± std (μs) Δ med

1,000 6,243 6,259 6,356 ± 217 6,357 ± 223 -0.0%

10,000 6,258 6,269 6,341 ± 193 6,359 ± 188 -0.3%

100,000 56,123 56,303 57,429 ± 543 57,621 ± 629 -0.3%

GPU (CUDA) — Superposition (Multipole + Field Map)

N Physical min (μs) Normalized min (μs) Physical med ± std (μs) Normalized med ± std (μs) Δ med

1,000 6,761 6,709 6,854 ± 176 6,804 ± 197 +0.7%

10,000 6,781 6,750 6,959 ± 284 6,972 ± 243 -0.2%

100,000 61,253 60,773 62,293 ± 544 61,829 ± 633 +0.7%

On NERSC

CPU — Analytical Multipole

N Physical min (μs) Normalized min (μs) Physical med ± std (μs) Normalized med ± std (μs) Δ med

1,000 4,592 4,412 4,688 ± 147 4,469 ± 113 +4.7%

10,000 46,488 44,556 46,895 ± 233 45,868 ± 8,410 +2.2%

100,000 468,548 447,922 471,795 ± 1,951 458,366 ± 325,462 +2.9%

CPU — Field Map (3D Rectangular Grid)

N Physical min (μs) Normalized min (μs) Physical med ± std (μs) Normalized med ± std (μs) Δ med

1,000 25,752 28,822 26,100 ± 220 29,211 ± 256 -11.9%

10,000 258,509 289,861 259,329 ± 1,016 291,629 ± 1,288 -12.5%

100,000 2,599,207 2,909,775 2,603,184 ± 5,625 2,912,697 ± 4,132 -11.9%

CPU — Superposition (Multipole + Field Map)

N Physical min (μs) Normalized min (μs) Physical med ± std (μs) Normalized med ± std (μs) Δ med

1,000 28,777 31,652 29,264 ± 351 32,007 ± 202 -9.4%

10,000 288,959 320,165 291,804 ± 2,853 321,247 ± 836 -10.1%

100,000 2,914,413 3,200,890 2,918,006 ± 5,082 3,216,311 ± 21,808 -10.2%

GPU (CUDA) — Analytical Multipole

N Physical min (μs) Normalized min (μs) Physical med ± std (μs) Normalized med ± std (μs) Δ med

1,000 964 949 977 ± 13 957 ± 11 +2.1%

10,000 972 952 980 ± 18 959 ± 13 +2.1%

100,000 3,760 3,697 3,788 ± 17 3,709 ± 15 +2.1%

GPU (CUDA) — Field Map (3D Rectangular Grid)

N Physical min (μs) Normalized min (μs) Physical med ± std (μs) Normalized med ± std (μs) Δ med

1,000 1,278 1,275 1,305 ± 4,477 1,297 ± 14 +0.6%

10,000 1,295 1,295 1,313 ± 14 1,309 ± 15 +0.3%

100,000 5,077 5,094 5,117 ± 4,403 5,113 ± 20 +0.1%

GPU (CUDA) — Superposition (Multipole + Field Map)

N Physical min (μs) Normalized min (μs) Physical med ± std (μs) Normalized med ± std (μs) Δ med

1,000 1,565 1,586 1,599 ± 42 1,606 ± 16 -0.4%

10,000 1,614 1,629 1,644 ± 18 1,643 ± 11 +0.0%

100,000 6,258 6,241 6,314 ± 31 6,282 ± 26 +0.5%

Summary

My PC

Scenario CPU (avg) GPU (avg)

Analytical multipole -0.4% +2.8%

Field map -3.8% -0.2%

Superposition -11.0% +0.4%

NERSC

Scenario CPU GPU

Analytical multipole +3.2% +2.1%

Field map -12.1% +0.3%

Superposition -9.9% +0.0%

It's not a matter of physicality. The Hamiltonian only contains the electric field (actually the scalar potential) divided by p_ref/q. No matter what, you have to divide by p_ref/q. I therefore am very suspicious of this benchmark.

If we decide to use E/Brho for this kernel, it means every other field solver (e.g. space charge, wakefield, CSR) needs to return fields in V/(T*m^2) or 1/s.

Sounds fine to me... the phase space coordinates are normalized and not kinetic, so that actually makes sense

The performance improvement is an illusion. In most cases, it's just hiding the multiplication/division somewhere else. The only case where it's better is analytical magnets and it only saves 6 multiplications.

This doesn't make sense to me, because in one case the multiplication/division is done outside the kernel, so that it is only once and not done by every particle inside a kernel, and in the other case every particle has to do it.

I also am extremely suspicious of the benchmark, because logically with normalized fields, the multiplication/division is done outside of the kernel, not in. So any performance "losses" would have to be explained thoroughly.

Field maps come from simulations (CST) or measurements. They are properties of the hardware, not the beam. The RF engineer will give you a file in V/m and have no idea what p_over_q_ref is.

OK what if we preprocess the field map. Well you can't store them normalized. Every time p_over_q_ref changes, you invalidate the field map. A cavity field map used for 12 cavities in a linac would need 12 separate copies because they have a different p_over_q_ref after acceleration. You want to optimize a cavity phase upstream? You have to re-normalize all field maps downstream. How about ramping? Magnets ramp B with p, so the normalized field strength stay constant. But you don't ramp RF cavity voltages with p. E/Brho is not a constant and you can't make a new field map for every ramping step. If EIC wants to reuse a magnet from RHIC, you cannot use a normalized field map unless you also have a lattice model and calcualte its p_over_q_ref.

The benchmark script is https://github.com/ndwang/BeamTracking.jl/blob/fieldmap/benchmark/rk4_units_benchmark.jl
Compare it with https://github.com/ndwang/BeamTracking.jl/blob/fieldmap/src/modules/RungeKuttaTracking.jl
The only difference is where p_over_q_ref is applied.
You should point to specific issues or benchmark it yourselves instead of handwaving about performance. "I'm suspicious" is not a counterargument

You gave us a (probably AI generated) benchmark without the code, and made a claim that doing more FLOPS inside a kernel/per-particle loop is somehow faster than doing it outside of a kernel. This, which is a specific issue that we are pointing to, doesn't make sense to either me or @JosephPeterDevlin, hence our "I'm suspicious" response to the benchmark. We kindly requested you to clarify exactly what is causing the slowdown, and that question, at least to me, remains unanswered.

All I see are things like this:
https://github.com/ndwang/BeamTracking.jl/blob/cb875ef6018602f8478cf898a3d6932ca5273f47/src/modules/RungeKuttaTracking.jl#L92

which are extra operations inside a kernel. Until you point out exactly the cause of the slowdown, then a bunch of numbers on a table from a benchmark that neither of us understand don't mean anything to me or @JosephPeterDevlin .

This PR is for analytical magnetic fields. The entire codebase uses normalized coordinates because the canonical coordinates are normalized. In my view, this is such a minor request from @JosephPeterDevlin that I don't understand the strong pushback.

Furthermore, ramping is actually exactly the case where normalized magnetic fields make sense, because then I don't have to update every single B field at every time. When you ramp a ring, the normalized magnetic field strengths remain constant, even though the B fields change.

JosephPeterDevlin · 2026-02-11T01:39:35Z

@mattsignorelli Can you do a quick review?

ndwang · 2026-02-11T02:03:21Z

@JosephPeterDevlin Fixed the 1s

JosephPeterDevlin · 2026-02-11T02:23:35Z

+  E_force_z = charge * Ez
+  B_force_x = charge * (vy*Bz - vz*By)
+  B_force_y = charge * (vz*Bx - vx*Bz)
+  B_force_z = charge * (vx*By - vy*Bx)


Is B_force_z used?

ndwang and others added 30 commits May 16, 2025 02:02

Add documentations for the core module

4e9ce21

Merge branch 'main' of https://github.com/ndwang/BeamTracking.jl

942bb0d

refactor launch! to avoid deep nested if-else

d6e233f

update documentation on GPU support

3da768d

Merge remote-tracking branch 'upstream/main'

82083eb

fieldtracking with arbitrary E field function

26c533d

remove default solver to work with @makekernel

8091b1e

Adding Kernel evaluator

3850735

remove kwargs since @makekernel doesn't support kwargs

909df7b

fix scoping issue with @benchmark

3e103db

linear track and field track evaluation shortcuts

431ea40

Explicit RK4 integration kernel

3202fa2

update FieldTracking test

865ac06

Clean up dependencies in Project.toml

20254d2

remove figures

4c2e626

fieldtracking with arbitrary E field function

fbb3101

remove default solver to work with @makekernel

2e311ca

Adding Kernel evaluator

d346bfb

remove kwargs since @makekernel doesn't support kwargs

315f5ef

fix scoping issue with @benchmark

ade1644

linear track and field track evaluation shortcuts

d7cea07

Explicit RK4 integration kernel

793a6a6

update FieldTracking test

ddc42d8

Clean up dependencies in Project.toml

7943996

remove figures

fa2f15e

Merge branch 'FieldTrack'

619856f

Reverting documentation for PR

e2778af

optimize solver_params to minimize memory allocation

210fef8

Merge branch 'evolve' into FieldTrack

85d120b

new benchmark environment

e99164a

ndwang added 16 commits January 9, 2026 01:37

Update field_func signature in documentation

cae1336

Improved test particle setup, added uniform Ez test

68db318

Refactor Runge-Kutta tracking to avoid closure for electromagnetic fi…

8804e03

…eld calculations.

Remove branching in multipole_em_field; avoid division in rk4_step an…

ed5e202

…d add alive state check; changed while loop to for loop in rk4_kernel

Added default empty multipoles when absent

ff97d58

bug fixes

d591bca

RK4 now supports drift

83ceae5

Update RK4 tests to use the new interface

da97590

Update RK4 documentation to new interface

6c09ba5

Update documentation: track! now requires a beamline

3815486

Fixing tests

7ad4f2c

minor doc rearrangement

d85b82f

fix indentation

eca6e98

DifferentialEquations are no longer needed

23988a2

benchmarks are outdated

edd80c1

remove ODE from test imports

ffa3779

ndwang requested a review from mattsignorelli as a code owner January 23, 2026 06:23

ndwang added 3 commits January 23, 2026 01:24

Add bencharmking script for RK4

ab49d6c

Merge branch 'main' into FieldTrack

b35e7b4

adjust to new signatures

8b6f5df

ndwang added 4 commits January 24, 2026 18:23

Add RungeKutta constructor test and zero-length element test

d70bd1f

Update R_ref in benchmark to p_over_q_ref

866420f

Update multipole_em_field to return fields in physical units

37f22f2

Clean up dead variables

d55580b

JosephPeterDevlin reviewed Feb 11, 2026

View reviewed changes

Changing 1.0 to 1 to avoid promotion

4ba9485

JosephPeterDevlin reviewed Feb 11, 2026

View reviewed changes

N	Physical min (μs)	Normalized min (μs)	Physical med ± std (μs)	Normalized med ± std (μs)	Δ med
1,000	3,012	3,032	3,039 ± 369	3,055 ± 98	-0.5%
10,000	30,291	30,509	30,853 ± 4,412	30,953 ± 3,050	-0.3%
100,000	307,193	308,359	309,472 ± 8,690	310,878 ± 11,590	-0.5%

N	Physical min (μs)	Normalized min (μs)	Physical med ± std (μs)	Normalized med ± std (μs)	Δ med
1,000	15,317	15,908	15,630 ± 563	16,220 ± 666	-3.8%
10,000	155,262	161,201	156,225 ± 4,181	162,160 ± 1,176	-3.8%
100,000	1,570,321	1,621,898	1,576,434 ± 17,162	1,634,477 ± 17,525	-3.7%

N	Physical min (μs)	Normalized min (μs)	Physical med ± std (μs)	Normalized med ± std (μs)	Δ med
1,000	14,880	16,495	15,200 ± 676	16,851 ± 697	-10.9%
10,000	150,374	166,601	151,653 ± 3,100	168,686 ± 3,492	-11.2%
100,000	1,521,921	1,689,968	1,530,000 ± 6,404	1,696,674 ± 6,507	-10.9%

N	Physical min (μs)	Normalized min (μs)	Physical med ± std (μs)	Normalized med ± std (μs)	Δ med
1,000	4,135	4,026	4,207 ± 178	4,090 ± 179	+2.8%
10,000	4,147	4,038	4,213 ± 190	4,094 ± 154	+2.8%
100,000	37,009	36,043	37,785 ± 466	36,778 ± 434	+2.7%

N	Physical min (μs)	Normalized min (μs)	Physical med ± std (μs)	Normalized med ± std (μs)	Δ med
1,000	6,243	6,259	6,356 ± 217	6,357 ± 223	-0.0%
10,000	6,258	6,269	6,341 ± 193	6,359 ± 188	-0.3%
100,000	56,123	56,303	57,429 ± 543	57,621 ± 629	-0.3%

N	Physical min (μs)	Normalized min (μs)	Physical med ± std (μs)	Normalized med ± std (μs)	Δ med
1,000	6,761	6,709	6,854 ± 176	6,804 ± 197	+0.7%
10,000	6,781	6,750	6,959 ± 284	6,972 ± 243	-0.2%
100,000	61,253	60,773	62,293 ± 544	61,829 ± 633	+0.7%

N	Physical min (μs)	Normalized min (μs)	Physical med ± std (μs)	Normalized med ± std (μs)	Δ med
1,000	4,592	4,412	4,688 ± 147	4,469 ± 113	+4.7%
10,000	46,488	44,556	46,895 ± 233	45,868 ± 8,410	+2.2%
100,000	468,548	447,922	471,795 ± 1,951	458,366 ± 325,462	+2.9%

N	Physical min (μs)	Normalized min (μs)	Physical med ± std (μs)	Normalized med ± std (μs)	Δ med
1,000	25,752	28,822	26,100 ± 220	29,211 ± 256	-11.9%
10,000	258,509	289,861	259,329 ± 1,016	291,629 ± 1,288	-12.5%
100,000	2,599,207	2,909,775	2,603,184 ± 5,625	2,912,697 ± 4,132	-11.9%

N	Physical min (μs)	Normalized min (μs)	Physical med ± std (μs)	Normalized med ± std (μs)	Δ med
1,000	28,777	31,652	29,264 ± 351	32,007 ± 202	-9.4%
10,000	288,959	320,165	291,804 ± 2,853	321,247 ± 836	-10.1%
100,000	2,914,413	3,200,890	2,918,006 ± 5,082	3,216,311 ± 21,808	-10.2%

N	Physical min (μs)	Normalized min (μs)	Physical med ± std (μs)	Normalized med ± std (μs)	Δ med
1,000	964	949	977 ± 13	957 ± 11	+2.1%
10,000	972	952	980 ± 18	959 ± 13	+2.1%
100,000	3,760	3,697	3,788 ± 17	3,709 ± 15	+2.1%

N	Physical min (μs)	Normalized min (μs)	Physical med ± std (μs)	Normalized med ± std (μs)	Δ med
1,000	1,278	1,275	1,305 ± 4,477	1,297 ± 14	+0.6%
10,000	1,295	1,295	1,313 ± 14	1,309 ± 15	+0.3%
100,000	5,077	5,094	5,117 ± 4,403	5,113 ± 20	+0.1%

N	Physical min (μs)	Normalized min (μs)	Physical med ± std (μs)	Normalized med ± std (μs)	Δ med
1,000	1,565	1,586	1,599 ± 42	1,606 ± 16	-0.4%
10,000	1,614	1,629	1,644 ± 18	1,643 ± 11	+0.0%
100,000	6,258	6,241	6,314 ± 31	6,282 ± 26	+0.5%

Scenario	CPU (avg)	GPU (avg)
Analytical multipole	-0.4%	+2.8%
Field map	-3.8%	-0.2%
Superposition	-11.0%	+0.4%

Scenario	CPU	GPU
Analytical multipole	+3.2%	+2.1%
Field map	-12.1%	+0.3%
Superposition	-9.9%	+0.0%

Conversation

ndwang commented Jan 23, 2026

Uh oh!

codecov Bot commented Jan 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

JosephPeterDevlin Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

ndwang Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

JosephPeterDevlin Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

ndwang Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

JosephPeterDevlin Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

ndwang Mar 30, 2026

Choose a reason for hiding this comment

tl;dr

On my PC (CPU i7 13700K, GPU 4070)

CPU — Analytical Multipole

CPU — Field Map (3D Rectangular Grid)

CPU — Superposition (Multipole + Field Map)

GPU (CUDA) — Analytical Multipole

GPU (CUDA) — Field Map (3D Rectangular Grid)

GPU (CUDA) — Superposition (Multipole + Field Map)

On NERSC

CPU — Analytical Multipole

CPU — Field Map (3D Rectangular Grid)

CPU — Superposition (Multipole + Field Map)

GPU (CUDA) — Analytical Multipole

GPU (CUDA) — Field Map (3D Rectangular Grid)

GPU (CUDA) — Superposition (Multipole + Field Map)

Summary

My PC

NERSC

Uh oh!

JosephPeterDevlin Mar 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mattsignorelli Mar 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ndwang Mar 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mattsignorelli Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

JosephPeterDevlin commented Feb 11, 2026

Uh oh!

ndwang commented Feb 11, 2026

Uh oh!

JosephPeterDevlin Feb 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

codecov Bot commented Jan 23, 2026 •

edited

Loading

JosephPeterDevlin Mar 30, 2026 •

edited

Loading

mattsignorelli Mar 30, 2026 •

edited

Loading

ndwang Mar 30, 2026 •

edited

Loading

JosephPeterDevlin Feb 11, 2026 •

edited

Loading