Add device_sync

JuliaGPU switching to task local state, `synchronize` only syncs default stream on current task. This ma lead to conflicts between array programming operations executed on defaults and kernel programming executed on custom stream. Adding support for (heavy) device sync may be needed in some cases:
CUDA: `CUDA.device_synchronize()`
AMDGPU: `AMDGPU.HIP.devide_synchronize()`