-
Notifications
You must be signed in to change notification settings - Fork 18
Open
Description
Global reductions such as nodal_min must compute and evaluate the local reduction before passing the result to MPI. The code for these local reductions is not compiled, so It seems like they would incur the cost of generating the kernel each time the reduction is called? Should they perhaps have a memoized compile on the inside similar to compiled_lsrk45_step instead?
(Not sure if this has any real performance impact, I just noticed it while discussing with @MTCam and thought it was worth mentioning.)
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels