[BugFix] Fix MoRIIOConnector for disaggregated P/D inference by raviguptaamd · Pull Request #37716 · vllm-project/vllm

raviguptaamd · 2026-03-20T21:41:38Z

Summary

Three targeted fixes to MoRIIOConnector that resolve crashes and improve compatibility in Prefill/Decode disaggregated inference. All changes are scoped to the MORI-IO code path only — zero impact on NIXL, shared-storage, or SGLang connectors.

Fix 1: Cache `kv_transfer_params` across scheduler steps

The Request object's kv_transfer_params can be mutated or cleared between the update_state_after_alloc and build_connector_meta scheduler steps. By the time build_connector_meta runs, fields like remote_block_ids and remote_engine_id may be gone, causing KeyError on every decode request.

Fix: Snapshot kv_transfer_params into a _req_kv_params dict at allocation time and use the cached copy when building connector metadata. Cleanup is performed after both recv and save loops to avoid memory leaks.

Fix 2: Handle attention backend `block_size` overrides

For MLA models (e.g., DeepSeek-V3), the FlashMLA attention backend silently overrides block_size to 64 regardless of the --block-size CLI flag. The existing assert block_size == self.block_size compared the actual KV cache tensor shape against the CLI config value, causing an AssertionError during register_kv_caches.

Fix: Trust the actual tensor shape as ground truth. If it differs from the config, log an informational message and update self.block_size to match.

Fix 3: Default values for MORI-IO-specific `kv_transfer_params` fields

remote_handshake_port and remote_notify_port used direct dict access (kv_transfer_params["remote_handshake_port"]) with no fallback. External coordinators (e.g., the NIXL sidecar in LLM-D) that don't inject these MORI-IO-specific fields would cause KeyError.

Fix: Use .get() with MoRIIOConstants.DEFAULT_HANDSHAKE_PORT / DEFAULT_NOTIFY_PORT defaults, matching the pattern already used for tp_size and remote_dp_size.

Files Changed

File	Lines	Change
`moriio_connector.py`	+32/-3	Fix 1 (kv_params cache) + Fix 2 (block_size)
`moriio_common.py`	+10/-4	Fix 3 (default ports)

Hardware / Configuration

AMD MI300X, 8 GPUs per node, TP=8
1 Prefill / 1 Decode disaggregation
RDMA KV transfer via MORI-IO in READ mode
Kubernetes LLM-D with pd-sidecar

Signed-off-by: Rav Gupta [email protected]

gemini-code-assist

Code Review

This pull request introduces three important fixes for the MoRIIOConnector to improve its stability and compatibility in disaggregated inference scenarios. The changes are well-targeted and address clear issues: caching kv_transfer_params to prevent race conditions, handling block_size overrides from attention backends gracefully, and providing default values for port configurations to avoid KeyError exceptions.

The implementation of these fixes is mostly solid. However, I've identified a potential issue with the use of pop() when retrieving cached kv_transfer_params. If a request happens to be in both the receive and save queues, this could lead to using stale data, reintroducing the bug this PR aims to fix. I've left specific comments with a suggestion to use get() instead for a more robust solution.

Overall, these are valuable fixes, and with the suggested adjustment, the connector will be much more resilient.

vllm/distributed/kv_transfer/kv_connector/v1/moriio/moriio_connector.py

Three fixes to make MoRIIOConnector work reliably in Prefill/Decode disaggregation (both sidecar-based K8s/LLM-D and Slurm deployments): 1. Cache kv_transfer_params at allocation time: The Request object's kv_transfer_params can be mutated between scheduler steps, causing KeyError on remote_block_ids/remote_engine_id when build_connector_meta runs. Snapshot params in _req_kv_params dict during update_state_after_alloc and use the cached copy in build_connector_meta. 2. Handle attention backend block_size overrides: FlashMLA (used by DeepSeek-V3 and other MLA models) silently overrides block_size to 64 regardless of the --block-size CLI flag. The previous assertion compared the actual KV cache tensor shape against the CLI config value, causing a crash. Now trusts the tensor shape as ground truth and logs a warning. 3. Use defaults for MORI-IO-specific kv_transfer_params fields: remote_handshake_port and remote_notify_port now fall back to MoRIIOConstants defaults when not present in kv_transfer_params. This allows the existing NIXL sidecar to work with MoRIIOConnector without requiring a dedicated MORI-IO sidecar connector. Signed-off-by: Rav Gupta <[email protected]> Made-with: Cursor

raviguptaamd requested review from ApostaC, NickLucche and orozery as code owners March 20, 2026 21:41

mergify bot added bug Something isn't working kv-connector labels Mar 20, 2026

gemini-code-assist bot reviewed Mar 20, 2026

View reviewed changes

vllm/distributed/kv_transfer/kv_connector/v1/moriio/moriio_connector.py Outdated Show resolved Hide resolved

vllm/distributed/kv_transfer/kv_connector/v1/moriio/moriio_connector.py Outdated Show resolved Hide resolved

raviguptaamd force-pushed the fix/moriio-connector-pd-disagg branch from 8dd11a3 to af364cd Compare March 20, 2026 22:02

raviguptaamd marked this pull request as draft March 21, 2026 00:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[BugFix] Fix MoRIIOConnector for disaggregated P/D inference#37716

[BugFix] Fix MoRIIOConnector for disaggregated P/D inference#37716
raviguptaamd wants to merge 1 commit intovllm-project:mainfrom
raviguptaamd:fix/moriio-connector-pd-disagg

raviguptaamd commented Mar 20, 2026 •

edited

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

raviguptaamd commented Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Fix 1: Cache kv_transfer_params across scheduler steps

Fix 2: Handle attention backend block_size overrides

Fix 3: Default values for MORI-IO-specific kv_transfer_params fields

Files Changed

Hardware / Configuration

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

raviguptaamd commented Mar 20, 2026 •

edited

Loading

Fix 1: Cache `kv_transfer_params` across scheduler steps

Fix 2: Handle attention backend `block_size` overrides

Fix 3: Default values for MORI-IO-specific `kv_transfer_params` fields