Area
planner
Summary
Extend Dynamo Planner to support autoscaling of Encoder workers in multimodal E/P/D (Encoder/Prefill/Decode) disaggregated deployments. Currently, Planner only supports Prefill and Decode autoscaling, requiring manual management of Encoder capacity for vision encoding workloads.
Motivation
In multimodal LLM deployments with E/P/D disaggregation (e.g., vLLM EPD for Qwen-VL), the Encoder component handles vision encoding and can become a bottleneck during high image load. Currently:
- Planner supports 4 modes:
disagg, prefill, decode, agg
- Encoder workers run with static replica counts
- Image-heavy request bursts cause Encoder queue buildup without automatic mitigation
- Operators must manually scale Encoder alongside automated P/D scaling
Related work:
This DEP proposes adding Encoder autoscaling to complete the EPD autoscaling story.
Proposal
Add Encoder autoscaling capability to Planner through a phased approach:
Phase 1: Add encoder mode to Planner
- New
SubComponentType.ENCODER alongside PREFILL/DECODE
- New
EncoderPlanner class for throughput-based autoscaling
- Support for Multi-Pool deployment with GlobalPlanner budget coordination
Phase 2: Support heterogeneous encoders (XPU/CPU)
- Independent autoscaling for XPU and CPU encoder pools
- Device-aware profiling and capacity planning
Alternate Solutions
Instead of a phased approach with separate encoder mode, we could add a unified epd mode that extends DisaggPlanner to coordinate Encoder, Prefill, and Decode within a single Planner instance.
Why not chosen for Phase 1:
- Requires modifying core
disagg.py logic, increasing risk to existing P/D autoscaling
- Higher implementation complexity (3-way GPU budget coordination vs single component)
- Tight coupling with Single DGD deployment pattern, less flexible for heterogeneous encoder scenarios (XPU/CPU pools)
Requirements
- MUST support throughput-based Encoder autoscaling based on image/request load
- MUST respect total GPU budget across E/P/D components
- MUST maintain backward compatibility with existing Planner modes
- SHOULD support heterogeneous encoder autoscaling (XPU/CPU)
- SHOULD integrate with existing GlobalPlanner for budget coordination
- MAY support unified
epd mode for Single DGD deployments
- MAY support load-based (FPM-driven) rapid scaling
References
Area
planner
Summary
Extend Dynamo Planner to support autoscaling of Encoder workers in multimodal E/P/D (Encoder/Prefill/Decode) disaggregated deployments. Currently, Planner only supports Prefill and Decode autoscaling, requiring manual management of Encoder capacity for vision encoding workloads.
Motivation
In multimodal LLM deployments with E/P/D disaggregation (e.g., vLLM EPD for Qwen-VL), the Encoder component handles vision encoding and can become a bottleneck during high image load. Currently:
disagg,prefill,decode,aggRelated work:
This DEP proposes adding Encoder autoscaling to complete the EPD autoscaling story.
Proposal
Add Encoder autoscaling capability to Planner through a phased approach:
Phase 1: Add
encodermode to PlannerSubComponentType.ENCODERalongside PREFILL/DECODEEncoderPlannerclass for throughput-based autoscalingPhase 2: Support heterogeneous encoders (XPU/CPU)
Alternate Solutions
Instead of a phased approach with separate
encodermode, we could add a unifiedepdmode that extendsDisaggPlannerto coordinate Encoder, Prefill, and Decode within a single Planner instance.Why not chosen for Phase 1:
disagg.pylogic, increasing risk to existing P/D autoscalingRequirements
epdmode for Single DGD deploymentsReferences
examples/backends/vllm/launch/disagg_multimodal_e_pd.sh