Should global reductions `compile` internally?

Global reductions such as `nodal_min` must compute and evaluate the local reduction before passing the result to MPI. The code for these local reductions is not compiled, so It seems like they would incur the cost of generating the kernel each time the reduction is called? Should they perhaps have a memoized `compile` on the inside similar to `compiled_lsrk45_step` instead?

(Not sure if this has any real performance impact, I just noticed it while discussing with @MTCam and thought it was worth mentioning.)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Should global reductions `compile` internally? #274

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Should global reductions compile internally? #274

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

Should global reductions `compile` internally? #274