[BugFix] Fix MoRIIOConnector for disaggregated P/D inference#37716
[BugFix] Fix MoRIIOConnector for disaggregated P/D inference#37716raviguptaamd wants to merge 1 commit intovllm-project:mainfrom
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces three important fixes for the MoRIIOConnector to improve its stability and compatibility in disaggregated inference scenarios. The changes are well-targeted and address clear issues: caching kv_transfer_params to prevent race conditions, handling block_size overrides from attention backends gracefully, and providing default values for port configurations to avoid KeyError exceptions.
The implementation of these fixes is mostly solid. However, I've identified a potential issue with the use of pop() when retrieving cached kv_transfer_params. If a request happens to be in both the receive and save queues, this could lead to using stale data, reintroducing the bug this PR aims to fix. I've left specific comments with a suggestion to use get() instead for a more robust solution.
Overall, these are valuable fixes, and with the suggested adjustment, the connector will be much more resilient.
vllm/distributed/kv_transfer/kv_connector/v1/moriio/moriio_connector.py
Outdated
Show resolved
Hide resolved
vllm/distributed/kv_transfer/kv_connector/v1/moriio/moriio_connector.py
Outdated
Show resolved
Hide resolved
Three fixes to make MoRIIOConnector work reliably in Prefill/Decode disaggregation (both sidecar-based K8s/LLM-D and Slurm deployments): 1. Cache kv_transfer_params at allocation time: The Request object's kv_transfer_params can be mutated between scheduler steps, causing KeyError on remote_block_ids/remote_engine_id when build_connector_meta runs. Snapshot params in _req_kv_params dict during update_state_after_alloc and use the cached copy in build_connector_meta. 2. Handle attention backend block_size overrides: FlashMLA (used by DeepSeek-V3 and other MLA models) silently overrides block_size to 64 regardless of the --block-size CLI flag. The previous assertion compared the actual KV cache tensor shape against the CLI config value, causing a crash. Now trusts the tensor shape as ground truth and logs a warning. 3. Use defaults for MORI-IO-specific kv_transfer_params fields: remote_handshake_port and remote_notify_port now fall back to MoRIIOConstants defaults when not present in kv_transfer_params. This allows the existing NIXL sidecar to work with MoRIIOConnector without requiring a dedicated MORI-IO sidecar connector. Signed-off-by: Rav Gupta <[email protected]> Made-with: Cursor
8dd11a3 to
af364cd
Compare
Summary
Three targeted fixes to
MoRIIOConnectorthat resolve crashes and improve compatibility in Prefill/Decode disaggregated inference. All changes are scoped to the MORI-IO code path only — zero impact on NIXL, shared-storage, or SGLang connectors.Fix 1: Cache
kv_transfer_paramsacross scheduler stepsThe
Requestobject'skv_transfer_paramscan be mutated or cleared between theupdate_state_after_allocandbuild_connector_metascheduler steps. By the timebuild_connector_metaruns, fields likeremote_block_idsandremote_engine_idmay be gone, causingKeyErroron every decode request.Fix: Snapshot
kv_transfer_paramsinto a_req_kv_paramsdict at allocation time and use the cached copy when building connector metadata. Cleanup is performed after both recv and save loops to avoid memory leaks.Fix 2: Handle attention backend
block_sizeoverridesFor MLA models (e.g., DeepSeek-V3), the FlashMLA attention backend silently overrides
block_sizeto 64 regardless of the--block-sizeCLI flag. The existingassert block_size == self.block_sizecompared the actual KV cache tensor shape against the CLI config value, causing anAssertionErrorduringregister_kv_caches.Fix: Trust the actual tensor shape as ground truth. If it differs from the config, log an informational message and update
self.block_sizeto match.Fix 3: Default values for MORI-IO-specific
kv_transfer_paramsfieldsremote_handshake_portandremote_notify_portused direct dict access (kv_transfer_params["remote_handshake_port"]) with no fallback. External coordinators (e.g., the NIXL sidecar in LLM-D) that don't inject these MORI-IO-specific fields would causeKeyError.Fix: Use
.get()withMoRIIOConstants.DEFAULT_HANDSHAKE_PORT/DEFAULT_NOTIFY_PORTdefaults, matching the pattern already used fortp_sizeandremote_dp_size.Files Changed
moriio_connector.pymoriio_common.pyHardware / Configuration
Signed-off-by: Rav Gupta [email protected]