RFC: Adding Routable Taints, Tiering and Tainting Workers #5462
michaelfeil
started this conversation in
Ideas
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
A model like DeepSeek can be deployed two ways:
You might have two customers or use-cases:
It would be great if both use-cases could run on the same endpoint, and not be backed by multiple deployments.
In an complex world, it would be helpful to dynamically switch between deployments.
It would be great to have metadata,
taintsassociated with a deployment:Then a client to dynamo could sent a http request to dynamo via:
For this to happen, the user could e.g. send a
X-Allowed-Taints: "High-Throughput,Low-Latency"orX-Preferred-Taints: "Low-Latency"header.As result , the taint / preference is forwarded to the router. Request could be routed to a LOW_LATENCY tainted worker, using shared router/frontend components. In case the LOW_LATENCY workers show higher load (aka lower performance) than HIGH_THOUGHPUT, we could additionally route them to HIGH_THOUGHPUT workers.
Beta Was this translation helpful? Give feedback.
All reactions