generated from kubernetes/kubernetes-template-project
-
Notifications
You must be signed in to change notification settings - Fork 206
Open
Labels
needs-triageIndicates an issue or PR lacks a `triage/foo` label and requires one.Indicates an issue or PR lacks a `triage/foo` label and requires one.
Description
Hello maintainers,
I'd like to propose implementing a new intra-flow dispatch policy that prioritizes requests based on their deadline urgency (i.e., SLO-driven scheduling).
Motivation
In production LLM serving, different requests often have different latency requirements:
- Interactive user queries may require sub-second responses.
The current FCFS policy treats all requests equally, which can cause high-priority requests to be delayed by long-running or low-priority ones in the same flow (e.g., same model). A deadline-aware policy would improve SLO compliance and user experience.
Proposed Design
- Policy Name:
DeadlinePriority - Mechanism:
- Compute absolute deadline as
EnqueueTime() + EffectiveTTL() - Prioritize requests with earlier absolute deadlines
- Use FCFS as tie-breaker for requests with identical deadlines
- Compute absolute deadline as
- Queue Requirement:
CapabilityPriorityConfigurable(e.g., heap-based priority queue) - Backward Compatibility: Requests without TTL are treated as lowest priority but still scheduled fairly via FCFS.
Benefits
- Enables per-request SLO enforcement
- Improves tail latency for time-sensitive workloads
- Fully leverages existing
EffectiveTTLandEnqueueTimemetadata
I’m happy to contribute an initial implementation if this aligns with the project’s direction. Please let me know your thoughts!
Thank you!
Metadata
Metadata
Assignees
Labels
needs-triageIndicates an issue or PR lacks a `triage/foo` label and requires one.Indicates an issue or PR lacks a `triage/foo` label and requires one.