Skip to content

Development Roadmap (v0.6.0) #112

@Syspretor

Description

@Syspretor

Here is the development roadmap for v0.6.0. Contributions and feedback are welcome.

P1: Critical Path Items

  • In-place Update Enhancement

    • Warm-up stage: Introduce pre-pulled images and pre-warmed models to reduce service downtime (Owner: @Syspretor)
    • In-place recreate: Fallback mechanism when standard in-place updates fail
    • Resource reservation: Ensure deterministic scheduling during non-in-place updates
    • Redundant capacity: Warm up spare capacity to accelerate MaxSurge readiness and GPU utilization
  • Coordinated Update Improvements

    • Clarify trigger conditions
    • State machine tracking
    • Dependency configuration interactions
  • Workload Management

    • InstanceSet stateful mode: Enable as default workload with Stateful/LWS compatibility
    • Template optimization: Reduce duplication via templateRef
  • Documentation

    • Mooncake Deployment Guide
    • End-to-End Upgrade Practices
    • InstanceSet Deployment Procedures (single-node/multi-node)
    • Coordination Update Specifications

Coordination

  • Coordinated Scaling: Scale specific roles by defined ratios during scaling events

Schedule

  • Flexible Topology Scheduling:
    Multi-level scheduling with hard/soft constraints and weighted preferences
  • Multi-level Gang Scheduling:
    Co-scheduling for dependent pod groups
  • Coordinated Scheduling:
    Enforce affinity/anti-affinity policies between coordinated roles

RoleBasedGroupSet

  • RBGS-level RollingUpdate implementation
  • State machine refinement and status reporting

CLI (rbgctl) Enhancement

  • SLA-driven configuration: Integrate Dynamo AIConfigurator for initial RBG recommendations
  • Lifecycle management improvements

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions