Phase 2: MultiPeerTransportStates implementation and MPI tests#639
Closed
dmwu wants to merge 3 commits intometa-pytorch:mainfrom
Closed
Phase 2: MultiPeerTransportStates implementation and MPI tests#639dmwu wants to merge 3 commits intometa-pytorch:mainfrom
dmwu wants to merge 3 commits intometa-pytorch:mainfrom
Conversation
dmwu
added a commit
to dmwu/torchcomms
that referenced
this pull request
Feb 11, 2026
…pytorch#639) Summary: Implement the host-side MultiPeerTransportStates class that unifies NVLink, IBGDA, and Self transports with automatic topology discovery. - MultiPeerTransportStates.h/.cc: Core implementation with topology discovery via cudaDeviceCanAccessPeer, sub-transport creation (NVL with IntraNodeBootstrapAdapter, IBGDA with global ranks), exchange(), and getDeviceHandle() - MultiPeerTransportStatesTest.cc: MPI-based host-side tests (nnodes=1, ppn=2) covering TopologyDiscovery, SelfTransportType, ExchangeSucceeds, HostNvlAccessor, SelfAccessor, DeviceHandleMetadata, DeviceHandleBeforeExchange - Updated BUCK targets for library and test NVL local rank assignment is sorted by global rank to match MPI local rank ordering, which is required by MpiBootstrap::allGatherIntraNode() validation. Differential Revision: D92882528
9557f52 to
08a53c7
Compare
dmwu
added a commit
to dmwu/torchcomms
that referenced
this pull request
Feb 11, 2026
…pytorch#639) Summary: Implement the host-side MultiPeerTransportStates class that unifies NVLink, IBGDA, and Self transports with automatic topology discovery. - MultiPeerTransportStates.h/.cc: Core implementation with topology discovery via cudaDeviceCanAccessPeer, sub-transport creation (NVL with IntraNodeBootstrapAdapter, IBGDA with global ranks), exchange(), and getDeviceHandle() - MultiPeerTransportStatesTest.cc: MPI-based host-side tests (nnodes=1, ppn=2) covering TopologyDiscovery, SelfTransportType, ExchangeSucceeds, HostNvlAccessor, SelfAccessor, DeviceHandleMetadata, DeviceHandleBeforeExchange - Updated BUCK targets for library and test NVL local rank assignment is sorted by global rank to match MPI local rank ordering, which is required by MpiBootstrap::allGatherIntraNode() validation. Differential Revision: D92882528
46892cc to
13e918a
Compare
dmwu
added a commit
to dmwu/torchcomms
that referenced
this pull request
Feb 13, 2026
…pytorch#639) Summary: Pull Request resolved: meta-pytorch#639 Implement the host-side MultiPeerTransportStates class that unifies NVLink, IBGDA, and Self transports with automatic topology discovery. - MultiPeerTransportStates.h/.cc: Core implementation with topology discovery via cudaDeviceCanAccessPeer, sub-transport creation (NVL with IntraNodeBootstrapAdapter, IBGDA with global ranks), exchange(), and getDeviceHandle() - MultiPeerTransportStatesTest.cc: MPI-based host-side tests (nnodes=1, ppn=2) covering TopologyDiscovery, SelfTransportType, ExchangeSucceeds, HostNvlAccessor, SelfAccessor, DeviceHandleMetadata, DeviceHandleBeforeExchange - Updated BUCK targets for library and test NVL local rank assignment is sorted by global rank to match MPI local rank ordering, which is required by MpiBootstrap::allGatherIntraNode() validation. Differential Revision: D92882528
dmwu
added a commit
to dmwu/torchcomms
that referenced
this pull request
Feb 13, 2026
…pytorch#639) Summary: Pull Request resolved: meta-pytorch#639 Implement the host-side MultiPeerTransportStates class that unifies NVLink, IBGDA, and Self transports with automatic topology discovery. - MultiPeerTransportStates.h/.cc: Core implementation with topology discovery via cudaDeviceCanAccessPeer, sub-transport creation (NVL with IntraNodeBootstrapAdapter, IBGDA with global ranks), exchange(), and getDeviceHandle() - MultiPeerTransportStatesTest.cc: MPI-based host-side tests (nnodes=1, ppn=2) covering TopologyDiscovery, SelfTransportType, ExchangeSucceeds, HostNvlAccessor, SelfAccessor, DeviceHandleMetadata, DeviceHandleBeforeExchange - Updated BUCK targets for library and test NVL local rank assignment is sorted by global rank to match MPI local rank ordering, which is required by MpiBootstrap::allGatherIntraNode() validation. Differential Revision: D92882528
dmwu
added a commit
to dmwu/torchcomms
that referenced
this pull request
Feb 20, 2026
…h#639) Summary: Implement the host-side MultiPeerTransport class that unifies NVLink, IBGDA, and Self transports with automatic topology discovery. - MultiPeerTransport.h/.cc: Core implementation with topology discovery via cudaDeviceCanAccessPeer, sub-transport creation (NVL with IntraNodeBootstrapAdapter, IBGDA with global ranks), exchange(), and getDeviceHandle() - MultiPeerTransportTest.cc: MPI-based host-side tests (nnodes=1, ppn=2) covering TopologyDiscovery, SelfTransportType, ExchangeSucceeds, HostNvlAccessor, SelfAccessor, DeviceHandleMetadata, DeviceHandleBeforeExchange - Updated BUCK targets for library and test NVL local rank assignment is sorted by global rank to match MPI local rank ordering, which is required by MpiBootstrap::allGatherIntraNode() validation. Reviewed By: siyengar Differential Revision: D92882528
13e918a to
3cce8ca
Compare
…ed-transport] Phase 1: Add foundation headers for MultiPeerTransport" (meta-pytorch#747) Summary: Separate bootstrap-related code into a dedicated bootstrap/ subfolder under pipes. This is a prerequisite for the unified-transport diff stack, moving NvlBootstrapAdapter and its tests to a cleaner location. - Create pipes/bootstrap/ with NvlBootstrapAdapter.h and BUCK - Add unit test NvlBootstrapAdapterTest.cc in bootstrap/tests/ Reviewed By: siyengar Differential Revision: D93367250
) Summary: Add the building blocks for MultiPeerTransport - a unified wrapper that combines NVLink and IBGDA transports with automatic topology discovery. - IntraNodeBootstrapAdapter.h: Bootstrap wrapper that redirects allGather() to allGatherIntraNode() with rank mapping, allowing MultiPeerNvlTransport to operate on intra-node rank subsets transparently. - MultiPeerDeviceHandle.cuh: Unified device-side handle struct passed to CUDA kernels, containing DeviceSpans for NVL/IBGDA transport arrays and rank mapping arrays with getType/getNvl/getIbgda accessors. - Transport.cuh: Extended TransportType enum with P2P_IBGDA variant and updated the tagged union with non-owning pointer, constructor, move semantics, and destructor support. - MnnvlFabric: detect nvlink connectivity across nodes for mnnvl cases (e.g., gb200/gb300) UT covers only topology discovery in this diff. Transport layer UTs are in next diff Reviewed By: siyengar Differential Revision: D92882527
…h#639) Summary: Implement the host-side MultiPeerTransport class that unifies NVLink, IBGDA, and Self transports with automatic topology discovery. - MultiPeerTransport.h/.cc: Core implementation with topology discovery via cudaDeviceCanAccessPeer, sub-transport creation (NVL with IntraNodeBootstrapAdapter, IBGDA with global ranks), exchange(), and getDeviceHandle() - MultiPeerTransportTest.cc: MPI-based host-side tests (nnodes=1, ppn=2) covering TopologyDiscovery, SelfTransportType, ExchangeSucceeds, HostNvlAccessor, SelfAccessor, DeviceHandleMetadata, DeviceHandleBeforeExchange - Updated BUCK targets for library and test NVL local rank assignment is sorted by global rank to match MPI local rank ordering, which is required by MpiBootstrap::allGatherIntraNode() validation. Reviewed By: siyengar Differential Revision: D92882528
3cce8ca to
bf10bd5
Compare
|
This pull request has been merged in ce7829c. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary:
Implement the host-side MultiPeerTransportStates class that unifies NVLink,
IBGDA, and Self transports with automatic topology discovery.
via cudaDeviceCanAccessPeer, sub-transport creation (NVL with
IntraNodeBootstrapAdapter, IBGDA with global ranks), exchange(), and
getDeviceHandle()
covering TopologyDiscovery, SelfTransportType, ExchangeSucceeds,
HostNvlAccessor, SelfAccessor, DeviceHandleMetadata, DeviceHandleBeforeExchange
NVL local rank assignment is sorted by global rank to match MPI local rank
ordering, which is required by MpiBootstrap::allGatherIntraNode() validation.
Differential Revision: D92882528