-
Notifications
You must be signed in to change notification settings - Fork 35
Open
Labels
Milestone
Description
Here is the development roadmap for v0.6.0. Contributions and feedback are welcome.
P1: Critical Path Items
-
In-place Update Enhancement
- Warm-up stage: Introduce pre-pulled images and pre-warmed models to reduce service downtime (Owner: @Syspretor)
- In-place recreate: Fallback mechanism when standard in-place updates fail
- Resource reservation: Ensure deterministic scheduling during non-in-place updates
- Redundant capacity: Warm up spare capacity to accelerate MaxSurge readiness and GPU utilization
-
Coordinated Update Improvements
- Clarify trigger conditions
- State machine tracking
- Dependency configuration interactions
-
Workload Management
- InstanceSet stateful mode: Enable as default workload with Stateful/LWS compatibility
- Template optimization: Reduce duplication via
templateRef
-
Documentation
- Mooncake Deployment Guide
- End-to-End Upgrade Practices
- InstanceSet Deployment Procedures (single-node/multi-node)
- Coordination Update Specifications
Coordination
- Coordinated Scaling: Scale specific roles by defined ratios during scaling events
Schedule
- Flexible Topology Scheduling:
Multi-level scheduling with hard/soft constraints and weighted preferences - Multi-level Gang Scheduling:
Co-scheduling for dependent pod groups - Coordinated Scheduling:
Enforce affinity/anti-affinity policies between coordinated roles
RoleBasedGroupSet
- RBGS-level RollingUpdate implementation
- State machine refinement and status reporting
CLI (rbgctl) Enhancement
- SLA-driven configuration: Integrate Dynamo AIConfigurator for initial RBG recommendations
- Lifecycle management improvements
cheyangbcfre