Align the data element counts for Allreduce nvls kernel #697

seagater · 2025-12-01T20:08:01Z

Add padding to make the total size divisible by (nRanksPerNode * alignment).
Fix the issue "ALLREDUCE assert failed" #682

In test/torch/correctness_test.py, updated the device initialization to use device_id=torch.device(f"cuda:{local_rank}") in dist.init_process_group for compatibility with older versions of PyTorch, such as v2.7.0.

Copilot

Pull request overview

This PR addresses alignment issues in the NVLS (NVLink Switch) allreduce kernel by ensuring data element counts meet the 16-byte alignment requirement per rank. The PR also updates PyTorch distributed initialization in the test file to use a torch.device object instead of an integer for the device_id parameter, claiming compatibility with older PyTorch versions.

Key changes:

Added alignment logic to AllreduceNvlsWithCopy::allreduceKernelFunc to ensure (size / nRanksPerNode) is 16-byte aligned
Changed PyTorch test initialization from device_id=local_rank to device_id=torch.device(f"cuda:{local_rank}")

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File	Description
apps/nccl/src/allreduce.cu	Adds 16-byte alignment calculation for NVLS kernels by computing `alignedCount` and passing it to the kernel instead of the original `count`
test/torch/correctness_test.py	Changes device_id parameter in dist.init_process_group from integer to torch.device object

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

test/torch/correctness_test.py

apps/nccl/src/allreduce.cu

mahdiehghazim

I recommend adding a few benchmark tests that use non-aligned input buffer sizes. This will help ensure we’re covering that case as we add more code in the future.

apps/nccl/src/allreduce.cu

…nment) in allreduce11

seagater · 2025-12-05T02:06:17Z

I recommend adding a few benchmark tests that use non-aligned input buffer sizes. This will help ensure we’re covering that case as we add more code in the future.

Pass the test with following nelem：
10556587,
10556576, 10556592,10556608, 1048576, 9999999, 12345678

Qinghua Zhou and others added 4 commits November 26, 2025 23:02

Align the counts for allreduce nvls kernel

6cef4a6

Merge branch 'main' into qinghuazhou/allreduce_nvls_size_alignment

c1d762e

Use getDataTypeSize instead of ncclTypeSize

eb162ca

Remove empty line

677f014

seagater requested review from Binyang2014, chhwang and Copilot December 1, 2025 20:08

Copilot started reviewing on behalf of seagater December 1, 2025 20:08 View session

Update pylint

094ca80

Copilot finished reviewing on behalf of seagater December 1, 2025 20:11

Copilot AI reviewed Dec 1, 2025

View reviewed changes

test/torch/correctness_test.py Show resolved Hide resolved

apps/nccl/src/allreduce.cu Outdated Show resolved Hide resolved

mahdiehghazim reviewed Dec 2, 2025

View reviewed changes

apps/nccl/src/allreduce.cu Outdated Show resolved Hide resolved

seagater added 2 commits December 4, 2025 22:12

Add padding to make the total size divisible by (nRanksPerNode * alig…

e866834

…nment) in allreduce11

Update clang-format

1de1e89

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Align the data element counts for Allreduce nvls kernel #697

Align the data element counts for Allreduce nvls kernel #697

Uh oh!

seagater commented Dec 1, 2025 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

mahdiehghazim left a comment

Uh oh!

Uh oh!

seagater commented Dec 5, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Align the data element counts for Allreduce nvls kernel #697

Are you sure you want to change the base?

Align the data element counts for Allreduce nvls kernel #697

Uh oh!

Conversation

seagater commented Dec 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

mahdiehghazim left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

seagater commented Dec 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

seagater commented Dec 1, 2025 •

edited

Loading

seagater commented Dec 5, 2025 •

edited

Loading