Skip to content

[BugFix] Fix MoRIIOConnector for disaggregated P/D inference#37716

Draft
raviguptaamd wants to merge 1 commit intovllm-project:mainfrom
raviguptaamd:fix/moriio-connector-pd-disagg
Draft

[BugFix] Fix MoRIIOConnector for disaggregated P/D inference#37716
raviguptaamd wants to merge 1 commit intovllm-project:mainfrom
raviguptaamd:fix/moriio-connector-pd-disagg

Conversation

@raviguptaamd
Copy link

@raviguptaamd raviguptaamd commented Mar 20, 2026

Summary

Three targeted fixes to MoRIIOConnector that resolve crashes and improve compatibility in Prefill/Decode disaggregated inference. All changes are scoped to the MORI-IO code path only — zero impact on NIXL, shared-storage, or SGLang connectors.

Fix 1: Cache kv_transfer_params across scheduler steps

The Request object's kv_transfer_params can be mutated or cleared between the update_state_after_alloc and build_connector_meta scheduler steps. By the time build_connector_meta runs, fields like remote_block_ids and remote_engine_id may be gone, causing KeyError on every decode request.

Fix: Snapshot kv_transfer_params into a _req_kv_params dict at allocation time and use the cached copy when building connector metadata. Cleanup is performed after both recv and save loops to avoid memory leaks.

Fix 2: Handle attention backend block_size overrides

For MLA models (e.g., DeepSeek-V3), the FlashMLA attention backend silently overrides block_size to 64 regardless of the --block-size CLI flag. The existing assert block_size == self.block_size compared the actual KV cache tensor shape against the CLI config value, causing an AssertionError during register_kv_caches.

Fix: Trust the actual tensor shape as ground truth. If it differs from the config, log an informational message and update self.block_size to match.

Fix 3: Default values for MORI-IO-specific kv_transfer_params fields

remote_handshake_port and remote_notify_port used direct dict access (kv_transfer_params["remote_handshake_port"]) with no fallback. External coordinators (e.g., the NIXL sidecar in LLM-D) that don't inject these MORI-IO-specific fields would cause KeyError.

Fix: Use .get() with MoRIIOConstants.DEFAULT_HANDSHAKE_PORT / DEFAULT_NOTIFY_PORT defaults, matching the pattern already used for tp_size and remote_dp_size.

Files Changed

File Lines Change
moriio_connector.py +32/-3 Fix 1 (kv_params cache) + Fix 2 (block_size)
moriio_common.py +10/-4 Fix 3 (default ports)

Hardware / Configuration

  • AMD MI300X, 8 GPUs per node, TP=8
  • 1 Prefill / 1 Decode disaggregation
  • RDMA KV transfer via MORI-IO in READ mode
  • Kubernetes LLM-D with pd-sidecar

Signed-off-by: Rav Gupta [email protected]

@mergify mergify bot added bug Something isn't working kv-connector labels Mar 20, 2026
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces three important fixes for the MoRIIOConnector to improve its stability and compatibility in disaggregated inference scenarios. The changes are well-targeted and address clear issues: caching kv_transfer_params to prevent race conditions, handling block_size overrides from attention backends gracefully, and providing default values for port configurations to avoid KeyError exceptions.

The implementation of these fixes is mostly solid. However, I've identified a potential issue with the use of pop() when retrieving cached kv_transfer_params. If a request happens to be in both the receive and save queues, this could lead to using stale data, reintroducing the bug this PR aims to fix. I've left specific comments with a suggestion to use get() instead for a more robust solution.

Overall, these are valuable fixes, and with the suggested adjustment, the connector will be much more resilient.

Three fixes to make MoRIIOConnector work reliably in Prefill/Decode
disaggregation (both sidecar-based K8s/LLM-D and Slurm deployments):

1. Cache kv_transfer_params at allocation time: The Request object's
   kv_transfer_params can be mutated between scheduler steps, causing
   KeyError on remote_block_ids/remote_engine_id when
   build_connector_meta runs. Snapshot params in _req_kv_params dict
   during update_state_after_alloc and use the cached copy in
   build_connector_meta.

2. Handle attention backend block_size overrides: FlashMLA (used by
   DeepSeek-V3 and other MLA models) silently overrides block_size
   to 64 regardless of the --block-size CLI flag. The previous
   assertion compared the actual KV cache tensor shape against the
   CLI config value, causing a crash. Now trusts the tensor shape
   as ground truth and logs a warning.

3. Use defaults for MORI-IO-specific kv_transfer_params fields:
   remote_handshake_port and remote_notify_port now fall back to
   MoRIIOConstants defaults when not present in kv_transfer_params.
   This allows the existing NIXL sidecar to work with MoRIIOConnector
   without requiring a dedicated MORI-IO sidecar connector.

Signed-off-by: Rav Gupta <[email protected]>
Made-with: Cursor
@raviguptaamd raviguptaamd force-pushed the fix/moriio-connector-pd-disagg branch from 8dd11a3 to af364cd Compare March 20, 2026 22:02
@raviguptaamd raviguptaamd marked this pull request as draft March 21, 2026 00:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working kv-connector

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant