SYCL: Make sure to use a large enough workgroup size for tree traversal#931
SYCL: Make sure to use a large enough workgroup size for tree traversal#931masterleinad wants to merge 1 commit intoarborx:masterfrom
Conversation
aprokop
left a comment
There was a problem hiding this comment.
Should nearest be changed too? Should the kernel in MST be changed as well?
Does it affect all kernels?
| // FIXME_SYCL | ||
| #ifdef KOKKOS_ENABLE_SYCL | ||
| if constexpr (std::is_same_v<ExecutionSpace, Kokkos::Experimental::SYCL>) | ||
| policy.set_chunk_size(1024); |
There was a problem hiding this comment.
Can you please add a comment on the considerations that went into choosing 1024?
There was a problem hiding this comment.
Not quite sure if there is much behind it rather than that choosing the maximum gave much better performance (than 32 which is what the compiler chose) and I couldn't find a different workgroup size that gave better results (but results don't differ much choosing between 256 and 1024 threads per workgroup).
There was a problem hiding this comment.
Can you please post the results in this issue for different workgroup sizes?
There was a problem hiding this comment.
I'll share it privately.
|
Looks like the MST algorithm ( |
Forcing the workgroup size for the tree traversal to 1024 for
SYCL, improves the tree traversal by 25% in the DBSCAN benchmark.