Skip to content

DEP: Add Encoder autoscaling support to Dynamo Planner for EPD deployments #8261

@yao531441

Description

@yao531441

Area

planner

Summary

Extend Dynamo Planner to support autoscaling of Encoder workers in multimodal E/P/D (Encoder/Prefill/Decode) disaggregated deployments. Currently, Planner only supports Prefill and Decode autoscaling, requiring manual management of Encoder capacity for vision encoding workloads.

Motivation

In multimodal LLM deployments with E/P/D disaggregation (e.g., vLLM EPD for Qwen-VL), the Encoder component handles vision encoding and can become a bottleneck during high image load. Currently:

  • Planner supports 4 modes: disagg, prefill, decode, agg
  • Encoder workers run with static replica counts
  • Image-heavy request bursts cause Encoder queue buildup without automatic mitigation
  • Operators must manually scale Encoder alongside automated P/D scaling

Related work:

This DEP proposes adding Encoder autoscaling to complete the EPD autoscaling story.

Proposal

Add Encoder autoscaling capability to Planner through a phased approach:

Phase 1: Add encoder mode to Planner

  • New SubComponentType.ENCODER alongside PREFILL/DECODE
  • New EncoderPlanner class for throughput-based autoscaling
  • Support for Multi-Pool deployment with GlobalPlanner budget coordination

Phase 2: Support heterogeneous encoders (XPU/CPU)

  • Independent autoscaling for XPU and CPU encoder pools
  • Device-aware profiling and capacity planning

Alternate Solutions

Instead of a phased approach with separate encoder mode, we could add a unified epd mode that extends DisaggPlanner to coordinate Encoder, Prefill, and Decode within a single Planner instance.

Why not chosen for Phase 1:

  • Requires modifying core disagg.py logic, increasing risk to existing P/D autoscaling
  • Higher implementation complexity (3-way GPU budget coordination vs single component)
  • Tight coupling with Single DGD deployment pattern, less flexible for heterogeneous encoder scenarios (XPU/CPU pools)

Requirements

  • MUST support throughput-based Encoder autoscaling based on image/request load
  • MUST respect total GPU budget across E/P/D components
  • MUST maintain backward compatibility with existing Planner modes
  • SHOULD support heterogeneous encoder autoscaling (XPU/CPU)
  • SHOULD integrate with existing GlobalPlanner for budget coordination
  • MAY support unified epd mode for Single DGD deployments
  • MAY support load-based (FPM-driven) rapid scaling

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    dep:draftDEP in draft status

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions