[CUDA] Fallback Event impl when there is no hardware cpu/gpu coherency #3070

zcbenz · 2026-01-27T03:39:52Z

For 4090 there is no hardware cpu/gpu coherency, so we can not synchronize between CPU and GPU by sharing atomics. This PR provides a fallback implementation of AtomicEvent for this case: we store the atomic in device-only memory, and do CPU waiting with a busy-wait loop that reads the value with cudaMemcpy.

This is a very inefficient fallback but it mostly serves to make tests pass, AtomicEvent has always been a slow fallback from CudaEvent for infrequent use cases.

Also includes some small changes:

Use uint32 instead of uint64 as the latter has more hardware requirements for atomic ops.
Use atomic_ref instead of atomic to make initialization easier for device-only memory.

awni · 2026-01-27T15:06:24Z

mlx/backend/cuda/event.cu

+  // 2. hostNativeAtomicSupported == true
+  //    => use cuda::atom_ref on pinned host memory
+  // 2. no hardware cpu/gpu coherency
+  //    => use cuda::atom_ref on device memory


Which category (2 or 3) do typical consumer gpus fall into (4090, 5090 etc)?

4090 is cat 3, I don't know about 5090. Cat 2 is probably only theoretically possible, but it is very simple to support.

awni

Looks great, thanks!

zcbenz force-pushed the fallback-event branch from f713650 to 57f4623 Compare January 27, 2026 05:11

Fallback Event impl when there is no hardware cpu/gpu coherency

38ed731

zcbenz force-pushed the fallback-event branch from 57f4623 to 38ed731 Compare January 27, 2026 11:38

awni reviewed Jan 27, 2026

View reviewed changes

awni approved these changes Jan 27, 2026

View reviewed changes

zcbenz merged commit 2ac18ed into ml-explore:main Jan 28, 2026
16 checks passed

zcbenz deleted the fallback-event branch January 28, 2026 01:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CUDA] Fallback Event impl when there is no hardware cpu/gpu coherency #3070

[CUDA] Fallback Event impl when there is no hardware cpu/gpu coherency #3070

zcbenz commented Jan 27, 2026

Uh oh!

awni Jan 27, 2026

Uh oh!

zcbenz Jan 28, 2026

Uh oh!

awni left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[CUDA] Fallback Event impl when there is no hardware cpu/gpu coherency #3070

[CUDA] Fallback Event impl when there is no hardware cpu/gpu coherency #3070

Conversation

zcbenz commented Jan 27, 2026

Uh oh!

awni Jan 27, 2026

Choose a reason for hiding this comment

Uh oh!

zcbenz Jan 28, 2026

Choose a reason for hiding this comment

Uh oh!

awni left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants