DEP: Add Encoder autoscaling support to Dynamo Planner for EPD deployments

### Area

planner

### Summary

Extend Dynamo Planner to support autoscaling of Encoder workers in multimodal E/P/D  (Encoder/Prefill/Decode) disaggregated deployments. Currently, Planner only supports Prefill and Decode autoscaling, requiring manual management of Encoder capacity for vision encoding workloads.

### Motivation

In multimodal LLM deployments with E/P/D disaggregation (e.g., vLLM EPD for Qwen-VL), the Encoder component handles vision encoding and can become a bottleneck during high image load. Currently:

- Planner supports 4 modes: `disagg`, `prefill`, `decode`, `agg`
- Encoder workers run with static replica counts
- Image-heavy request bursts cause Encoder queue buildup without automatic mitigation
- Operators must manually scale Encoder alongside automated P/D scaling

Related work:
- PR #8161 added device-aware routing for EPD (CPU/XPU encoder selection)
- PR #7668 added per-request scheduling for multiple encoders
- DEP #7787 defined Worker Roles (ENCODE, PREFILL, DECODE) for topology readiness

This DEP proposes adding Encoder autoscaling to complete the EPD autoscaling story.

### Proposal

Add Encoder autoscaling capability to Planner through a phased approach:

### Phase 1: Add `encoder` mode to Planner

- New `SubComponentType.ENCODER` alongside PREFILL/DECODE
- New `EncoderPlanner` class for throughput-based autoscaling
- Support for Multi-Pool deployment with GlobalPlanner budget coordination

### Phase 2: Support heterogeneous encoders (XPU/CPU)

- Independent autoscaling for XPU and CPU encoder pools
- Device-aware profiling and capacity planning

### Alternate Solutions

Instead of a phased approach with separate `encoder` mode, we could add a unified `epd` mode that extends `DisaggPlanner` to coordinate Encoder, Prefill, and Decode within a single Planner instance.

**Why not chosen for Phase 1:**
- Requires modifying core `disagg.py` logic, increasing risk to existing P/D autoscaling
- Higher implementation complexity (3-way GPU budget coordination vs single component)
- Tight coupling with Single DGD deployment pattern, less flexible for heterogeneous encoder scenarios (XPU/CPU pools)


### Requirements

- **MUST** support throughput-based Encoder autoscaling based on image/request load
- **MUST** respect total GPU budget across E/P/D components
- **MUST** maintain backward compatibility with existing Planner modes
- **SHOULD** support heterogeneous encoder autoscaling (XPU/CPU)
- **SHOULD** integrate with existing GlobalPlanner for budget coordination
- **MAY** support unified `epd` mode for Single DGD deployments
- **MAY** support load-based (FPM-driven) rapid scaling

### References

- DEP #7787: Disaggregated Topology Readiness
- PR #8161: Device-aware routing for vllm epd
- PR #7668: Per Request Scheduler for dual/multiple encoders
- PR #7215: route requests by device type and load for sglang epd
- EPD examples: `examples/backends/vllm/launch/disagg_multimodal_e_pd.sh`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DEP: Add Encoder autoscaling support to Dynamo Planner for EPD deployments #8261

Area

Summary

Motivation

Proposal

Phase 1: Add `encoder` mode to Planner

Phase 2: Support heterogeneous encoders (XPU/CPU)

Alternate Solutions

Requirements

References

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

DEP: Add Encoder autoscaling support to Dynamo Planner for EPD deployments #8261

Description

Area

Summary

Motivation

Proposal

Phase 1: Add encoder mode to Planner

Phase 2: Support heterogeneous encoders (XPU/CPU)

Alternate Solutions

Requirements

References

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Phase 1: Add `encoder` mode to Planner