-
Notifications
You must be signed in to change notification settings - Fork 73
Open
Labels
dep: cuda-tileirDepend on a feature or bug fix in cuda-tileir and tileiras compilerDepend on a feature or bug fix in cuda-tileir and tileiras compilerfeature request
Description
Is this a new feature, an improvement, or a change to existing functionality?
New Feature
How would you describe the priority of this feature request?
Critical (currently preventing usage)
Please provide a clear description of problem this feature solves
Is NVSHMEM integration planned for cuTile? Lack of NVSHMEM support prevents kernels from performing fine-grained, in-kernel communication, limiting compute–communication overlap.
Feature Description
Support to NVSHMEM device APIs (e.g. nvshmemx_putmem_block)
Describe your ideal solution
@ct.kernel
def vector_add(a, b, remote_c, tile_size: ct.Constant[int], pe: ct.Constant[int]):
# Get the 1D pid
pid = ct.bid(0)
# Load input tiles
a_tile = ct.load(a, index=(pid,), shape=(tile_size,))
b_tile = ct.load(b, index=(pid,), shape=(tile_size,))
# Perform elementwise addition
result = a_tile + b_tile
# Store result using NVSHMEM
ct. nvshmemx_putmem_block(remote_c, result, index=(pid,), pe=pe)
Describe any alternatives you have considered
No response
Additional context
No response
Contributing Guidelines
- I agree to follow cuTile Python's contributing guidelines
- I have searched the open feature requests and have found no duplicates for this feature request
kwen2501
Metadata
Metadata
Assignees
Labels
dep: cuda-tileirDepend on a feature or bug fix in cuda-tileir and tileiras compilerDepend on a feature or bug fix in cuda-tileir and tileiras compilerfeature request