JuliaGPU switching to task local state, synchronize only syncs default stream on current task. This ma lead to conflicts between array programming operations executed on defaults and kernel programming executed on custom stream. Adding support for (heavy) device sync may be needed in some cases:
CUDA: CUDA.device_synchronize()
AMDGPU: AMDGPU.HIP.devide_synchronize()
JuliaGPU switching to task local state,
synchronizeonly syncs default stream on current task. This ma lead to conflicts between array programming operations executed on defaults and kernel programming executed on custom stream. Adding support for (heavy) device sync may be needed in some cases:CUDA:
CUDA.device_synchronize()AMDGPU:
AMDGPU.HIP.devide_synchronize()