Skip to content

[FEA]: NVSHMEM Support #10

@mksit

Description

@mksit

Is this a new feature, an improvement, or a change to existing functionality?

New Feature

How would you describe the priority of this feature request?

Critical (currently preventing usage)

Please provide a clear description of problem this feature solves

Is NVSHMEM integration planned for cuTile? Lack of NVSHMEM support prevents kernels from performing fine-grained, in-kernel communication, limiting compute–communication overlap.

Feature Description

Support to NVSHMEM device APIs (e.g. nvshmemx_putmem_block)

Describe your ideal solution

@ct.kernel
def vector_add(a, b, remote_c, tile_size: ct.Constant[int], pe: ct.Constant[int]):
# Get the 1D pid
pid = ct.bid(0)

# Load input tiles
a_tile = ct.load(a, index=(pid,), shape=(tile_size,))
b_tile = ct.load(b, index=(pid,), shape=(tile_size,))

# Perform elementwise addition
result = a_tile + b_tile

# Store result using NVSHMEM
ct. nvshmemx_putmem_block(remote_c, result, index=(pid,), pe=pe)

Describe any alternatives you have considered

No response

Additional context

No response

Contributing Guidelines

  • I agree to follow cuTile Python's contributing guidelines
  • I have searched the open feature requests and have found no duplicates for this feature request

Metadata

Metadata

Assignees

Labels

dep: cuda-tileirDepend on a feature or bug fix in cuda-tileir and tileiras compilerfeature request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions