Skip to content

Comments

Phase 2: MultiPeerTransportStates implementation and MPI tests#639

Closed
dmwu wants to merge 3 commits intometa-pytorch:mainfrom
dmwu:export-D92882528
Closed

Phase 2: MultiPeerTransportStates implementation and MPI tests#639
dmwu wants to merge 3 commits intometa-pytorch:mainfrom
dmwu:export-D92882528

Conversation

@dmwu
Copy link
Contributor

@dmwu dmwu commented Feb 11, 2026

Summary:
Implement the host-side MultiPeerTransportStates class that unifies NVLink,
IBGDA, and Self transports with automatic topology discovery.

  • MultiPeerTransportStates.h/.cc: Core implementation with topology discovery
    via cudaDeviceCanAccessPeer, sub-transport creation (NVL with
    IntraNodeBootstrapAdapter, IBGDA with global ranks), exchange(), and
    getDeviceHandle()
  • MultiPeerTransportStatesTest.cc: MPI-based host-side tests (nnodes=1, ppn=2)
    covering TopologyDiscovery, SelfTransportType, ExchangeSucceeds,
    HostNvlAccessor, SelfAccessor, DeviceHandleMetadata, DeviceHandleBeforeExchange
  • Updated BUCK targets for library and test

NVL local rank assignment is sorted by global rank to match MPI local rank
ordering, which is required by MpiBootstrap::allGatherIntraNode() validation.

Differential Revision: D92882528

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Feb 11, 2026
@meta-codesync
Copy link

meta-codesync bot commented Feb 11, 2026

@dmwu has exported this pull request. If you are a Meta employee, you can view the originating Diff in D92882528.

dmwu added a commit to dmwu/torchcomms that referenced this pull request Feb 11, 2026
…pytorch#639)

Summary:

Implement the host-side MultiPeerTransportStates class that unifies NVLink,
IBGDA, and Self transports with automatic topology discovery.

- MultiPeerTransportStates.h/.cc: Core implementation with topology discovery
  via cudaDeviceCanAccessPeer, sub-transport creation (NVL with
  IntraNodeBootstrapAdapter, IBGDA with global ranks), exchange(), and
  getDeviceHandle()
- MultiPeerTransportStatesTest.cc: MPI-based host-side tests (nnodes=1, ppn=2)
  covering TopologyDiscovery, SelfTransportType, ExchangeSucceeds,
  HostNvlAccessor, SelfAccessor, DeviceHandleMetadata, DeviceHandleBeforeExchange
- Updated BUCK targets for library and test

NVL local rank assignment is sorted by global rank to match MPI local rank
ordering, which is required by MpiBootstrap::allGatherIntraNode() validation.

Differential Revision: D92882528
dmwu added a commit to dmwu/torchcomms that referenced this pull request Feb 11, 2026
…pytorch#639)

Summary:

Implement the host-side MultiPeerTransportStates class that unifies NVLink,
IBGDA, and Self transports with automatic topology discovery.

- MultiPeerTransportStates.h/.cc: Core implementation with topology discovery
  via cudaDeviceCanAccessPeer, sub-transport creation (NVL with
  IntraNodeBootstrapAdapter, IBGDA with global ranks), exchange(), and
  getDeviceHandle()
- MultiPeerTransportStatesTest.cc: MPI-based host-side tests (nnodes=1, ppn=2)
  covering TopologyDiscovery, SelfTransportType, ExchangeSucceeds,
  HostNvlAccessor, SelfAccessor, DeviceHandleMetadata, DeviceHandleBeforeExchange
- Updated BUCK targets for library and test

NVL local rank assignment is sorted by global rank to match MPI local rank
ordering, which is required by MpiBootstrap::allGatherIntraNode() validation.

Differential Revision: D92882528
@dmwu dmwu force-pushed the export-D92882528 branch 2 times, most recently from 46892cc to 13e918a Compare February 13, 2026 23:11
dmwu added a commit to dmwu/torchcomms that referenced this pull request Feb 13, 2026
…pytorch#639)

Summary:
Pull Request resolved: meta-pytorch#639

Implement the host-side MultiPeerTransportStates class that unifies NVLink,
IBGDA, and Self transports with automatic topology discovery.

- MultiPeerTransportStates.h/.cc: Core implementation with topology discovery
  via cudaDeviceCanAccessPeer, sub-transport creation (NVL with
  IntraNodeBootstrapAdapter, IBGDA with global ranks), exchange(), and
  getDeviceHandle()
- MultiPeerTransportStatesTest.cc: MPI-based host-side tests (nnodes=1, ppn=2)
  covering TopologyDiscovery, SelfTransportType, ExchangeSucceeds,
  HostNvlAccessor, SelfAccessor, DeviceHandleMetadata, DeviceHandleBeforeExchange
- Updated BUCK targets for library and test

NVL local rank assignment is sorted by global rank to match MPI local rank
ordering, which is required by MpiBootstrap::allGatherIntraNode() validation.

Differential Revision: D92882528
dmwu added a commit to dmwu/torchcomms that referenced this pull request Feb 13, 2026
…pytorch#639)

Summary:
Pull Request resolved: meta-pytorch#639

Implement the host-side MultiPeerTransportStates class that unifies NVLink,
IBGDA, and Self transports with automatic topology discovery.

- MultiPeerTransportStates.h/.cc: Core implementation with topology discovery
  via cudaDeviceCanAccessPeer, sub-transport creation (NVL with
  IntraNodeBootstrapAdapter, IBGDA with global ranks), exchange(), and
  getDeviceHandle()
- MultiPeerTransportStatesTest.cc: MPI-based host-side tests (nnodes=1, ppn=2)
  covering TopologyDiscovery, SelfTransportType, ExchangeSucceeds,
  HostNvlAccessor, SelfAccessor, DeviceHandleMetadata, DeviceHandleBeforeExchange
- Updated BUCK targets for library and test

NVL local rank assignment is sorted by global rank to match MPI local rank
ordering, which is required by MpiBootstrap::allGatherIntraNode() validation.

Differential Revision: D92882528
dmwu added a commit to dmwu/torchcomms that referenced this pull request Feb 20, 2026
…h#639)

Summary:

Implement the host-side MultiPeerTransport class that unifies NVLink,
IBGDA, and Self transports with automatic topology discovery.

- MultiPeerTransport.h/.cc: Core implementation with topology discovery
  via cudaDeviceCanAccessPeer, sub-transport creation (NVL with
  IntraNodeBootstrapAdapter, IBGDA with global ranks), exchange(), and
  getDeviceHandle()
- MultiPeerTransportTest.cc: MPI-based host-side tests (nnodes=1, ppn=2)
  covering TopologyDiscovery, SelfTransportType, ExchangeSucceeds,
  HostNvlAccessor, SelfAccessor, DeviceHandleMetadata, DeviceHandleBeforeExchange
- Updated BUCK targets for library and test

NVL local rank assignment is sorted by global rank to match MPI local rank
ordering, which is required by MpiBootstrap::allGatherIntraNode() validation.

Reviewed By: siyengar

Differential Revision: D92882528
dmwu added 3 commits February 19, 2026 21:46
…ed-transport] Phase 1: Add foundation headers for MultiPeerTransport" (meta-pytorch#747)

Summary:

Separate bootstrap-related code into a dedicated bootstrap/ subfolder under pipes.
This is a prerequisite for the unified-transport diff stack, moving NvlBootstrapAdapter
and its tests to a cleaner location.

- Create pipes/bootstrap/ with NvlBootstrapAdapter.h and BUCK
- Add unit test NvlBootstrapAdapterTest.cc in bootstrap/tests/

Reviewed By: siyengar

Differential Revision: D93367250
)

Summary:

Add the building blocks for MultiPeerTransport - a unified wrapper
that combines NVLink and IBGDA transports with automatic topology discovery.

- IntraNodeBootstrapAdapter.h: Bootstrap wrapper that redirects allGather()
  to allGatherIntraNode() with rank mapping, allowing MultiPeerNvlTransport
  to operate on intra-node rank subsets transparently.

- MultiPeerDeviceHandle.cuh: Unified device-side handle struct passed to
  CUDA kernels, containing DeviceSpans for NVL/IBGDA transport arrays and
  rank mapping arrays with getType/getNvl/getIbgda accessors.

- Transport.cuh: Extended TransportType enum with P2P_IBGDA variant and
  updated the tagged union with non-owning pointer, constructor, move
  semantics, and destructor support.

- MnnvlFabric: detect nvlink connectivity across nodes for mnnvl cases (e.g., gb200/gb300)

UT covers only topology discovery in this diff. Transport layer UTs are in next diff

Reviewed By: siyengar

Differential Revision: D92882527
…h#639)

Summary:

Implement the host-side MultiPeerTransport class that unifies NVLink,
IBGDA, and Self transports with automatic topology discovery.

- MultiPeerTransport.h/.cc: Core implementation with topology discovery
  via cudaDeviceCanAccessPeer, sub-transport creation (NVL with
  IntraNodeBootstrapAdapter, IBGDA with global ranks), exchange(), and
  getDeviceHandle()
- MultiPeerTransportTest.cc: MPI-based host-side tests (nnodes=1, ppn=2)
  covering TopologyDiscovery, SelfTransportType, ExchangeSucceeds,
  HostNvlAccessor, SelfAccessor, DeviceHandleMetadata, DeviceHandleBeforeExchange
- Updated BUCK targets for library and test

NVL local rank assignment is sorted by global rank to match MPI local rank
ordering, which is required by MpiBootstrap::allGatherIntraNode() validation.

Reviewed By: siyengar

Differential Revision: D92882528
@meta-codesync meta-codesync bot closed this in ce7829c Feb 20, 2026
@meta-codesync
Copy link

meta-codesync bot commented Feb 20, 2026

This pull request has been merged in ce7829c.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot. fb-exported Merged meta-exported

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants