feat: add submodule of worker-vllm, updated fastapi endpoints#6
feat: add submodule of worker-vllm, updated fastapi endpoints#6velaraptor-runpod wants to merge 11 commits intomainfrom
Conversation
TimPietruskyRunPod
left a comment
There was a problem hiding this comment.
A few items to address before merge — mostly small fixes. The core rewrite using vLLM's native serving classes is solid.
handler_lb.py
Outdated
| ): | ||
| from vllm.entrypoints.openai.protocol import ResponsesResponse | ||
| from vllm.entrypoints.openai.engine.protocol import ErrorResponse | ||
|
|
There was a problem hiding this comment.
retrieve_responses and cancel_responses import ResponsesResponse from vllm.entrypoints.openai.protocol, but create_responses imports it from vllm.entrypoints.openai.responses.protocol. These should be consistent — which is the correct path in vLLM 0.16.0?
|
|
||
| if not body.get("stream"): | ||
| return JSONResponse(response.model_dump()) | ||
|
|
There was a problem hiding this comment.
The chat and completion handlers check body.get("stream") on the raw dict to decide streaming vs non-streaming. The responses and messages handlers instead check the response type (isinstance). Consider making these consistent — checking the response type is more robust since it follows what vLLM actually returned.
.runpod/tests_json
Outdated
| @@ -0,0 +1,64 @@ | |||
| { | |||
There was a problem hiding this comment.
This file is named tests_json — should it be tests.json?
| curl -X POST "https://your-endpoint-id.api.runpod.ai/v1/completions" \ | ||
| -H "Authorization: Bearer YOUR_RUNPOD_API_KEY" \ | ||
| curl -X POST "https://<endpoint-id>.api.runpod.ai/v1/chat/completions" \ | ||
| -H "Authorization: Bearer $RUNPOD_API_KEY" \ |
There was a problem hiding this comment.
Is this a real endpoint ID? If so it should probably be replaced with a placeholder like <endpoint-id> to match the examples above. If there's a reason to keep it (e.g. a public demo endpoint), happy to leave it.
| ### Core (from worker-vllm) | ||
|
|
||
| | Variable | Required | Description | Default | | ||
| |----------|----------|-------------|---------| |
| ``` | ||
|
|
||
| | Path | Method | Description | | ||
| |------|--------|-------------| |
There was a problem hiding this comment.
MAX_CONCURRENCY default here is 300, but hub.json sets it to 10 with a note about keeping it lower for load balancing. These should be consistent.
No description provided.