fix(router): make sglang and vllm metrics port configurable#847
fix(router): make sglang and vllm metrics port configurable#847Tweakzx wants to merge 1 commit intovolcano-sh:mainfrom
Conversation
There was a problem hiding this comment.
Code Review
This pull request makes the metric ports for sglang and vllm engines configurable by updating their constructors to accept optional parameters and providing default values. While the changes include new unit tests, the reviewer pointed out that the feature is currently unreachable because the engines are initialized in a global registry without arguments, necessitating further refactoring. A suggestion was also made to add a test case for the zero-value fallback in the sglang engine to ensure parity with the vllm tests.
|
|
||
| func NewSglangEngine() *sglangEngine { | ||
| // TODO: Get MetricsPort from sglang configuration | ||
| func NewSglangEngine(metricPort ...uint32) *sglangEngine { |
There was a problem hiding this comment.
While this change makes the metric port configurable at the function level, the way this function is called in pkg/kthena-router/backend/backend.go prevents this configuration from being used. The engineRegistry is a global variable initialized with NewSglangEngine() (and NewVllmEngine()), so it will always use the default port.
To make this feature fully functional, a refactoring of backend.go is needed to allow passing configuration down to the engine constructors. This likely means changing engineRegistry from a global variable to something that is initialized with application configuration. Without this, the feature added in this PR is not usable.
|
|
||
| func NewVllmEngine() *vllmEngine { | ||
| // TODO: Get MetricsPort from vllm configuration | ||
| func NewVllmEngine(metricPort ...uint32) *vllmEngine { |
There was a problem hiding this comment.
Similar to sglang, while this change makes the metric port configurable at the function level, the way this function is called in pkg/kthena-router/backend/backend.go prevents this configuration from being used. The engineRegistry is a global variable initialized with NewVllmEngine(), so it will always use the default port.
To make this feature fully functional, a refactoring of backend.go is needed to allow passing configuration down to the engine constructors. Without this, the feature added in this PR is not usable.
| if engine.MetricPort != 31000 { | ||
| t.Fatalf("expected metric port 31000, got %d", engine.MetricPort) | ||
| } | ||
| } |
There was a problem hiding this comment.
For consistency with the vllm tests and to ensure complete test coverage for the new logic, please add a test case for when a zero-value port is provided. It should fall back to the default port.
| } | |
| } | |
| func TestNewSglangEngine_FallsBackToDefaultWhenConfiguredPortIsZero(t *testing.T) { | |
| engine := NewSglangEngine(0) | |
| if engine.MetricPort != 30000 { | |
| t.Fatalf("expected fallback metric port 30000, got %d", engine.MetricPort) | |
| } | |
| } |
There was a problem hiding this comment.
Pull request overview
This PR makes the metrics port for the SGLang and vLLM backends configurable while preserving the existing default ports to keep current behavior backward-compatible.
Changes:
- Update
NewSglangEngine/NewVllmEngineto accept an optional metrics port and fall back to the historical defaults. - Introduce
defaultMetricPortconstants for each backend to centralize defaults. - Add unit tests covering default behavior and custom port configuration (plus vLLM’s zero-value fallback).
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| pkg/kthena-router/backend/sglang/metrics.go | Adds optional metrics port parameter with default fallback (30000). |
| pkg/kthena-router/backend/sglang/metrics_test.go | Adds tests for default and custom metrics port behavior. |
| pkg/kthena-router/backend/vllm/metrics.go | Adds optional metrics port parameter with default fallback (8000). |
| pkg/kthena-router/backend/vllm/metrics_test.go | Adds tests for default, custom, and zero-value fallback behavior. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| package vllm | ||
|
|
||
| import "testing" |
There was a problem hiding this comment.
New Go files in this repo include the Apache 2.0 license header (e.g., pkg/kthena-router/backend/vllm/models.go:1-15). Please add the same header comment block to this new test file for consistency/compliance.
| package sglang | ||
|
|
||
| import "testing" |
There was a problem hiding this comment.
New Go files in this repo typically start with the Apache 2.0 license header (see pkg/kthena-router/backend/sglang/metrics.go:1-15 or other *_test.go files like pkg/kthena-router/datastore/token_tracker_test.go:1-15). Please add that header to this new test file as well.
5216ba6 to
9eb1bf8
Compare
9eb1bf8 to
80223ba
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // TODO: Get MetricsPort from vllm configuration | ||
| func NewVllmEngine(metricPort ...uint32) *vllmEngine { | ||
| port := defaultMetricPort | ||
| if len(metricPort) > 0 && metricPort[0] > 0 { |
There was a problem hiding this comment.
NewVllmEngine now accepts a variadic metricPort but silently ignores any extra values beyond the first. This can hide call-site bugs; consider either accepting a single optional value via a pointer/options pattern, or explicitly rejecting len(metricPort) > 1 (e.g., panic/log) and documenting the expected usage. Also consider validating the provided port is within the valid TCP port range (1–65535) before using it.
| if len(metricPort) > 0 && metricPort[0] > 0 { | |
| if len(metricPort) > 1 { | |
| panic("NewVllmEngine expects at most one metricPort argument") | |
| } | |
| if len(metricPort) == 1 { | |
| if metricPort[0] < 1 || metricPort[0] > 65535 { | |
| panic("NewVllmEngine: metricPort must be in range 1-65535") | |
| } |
| func NewVllmEngine(metricPort ...uint32) *vllmEngine { | ||
| port := defaultMetricPort | ||
| if len(metricPort) > 0 && metricPort[0] > 0 { | ||
| port = metricPort[0] | ||
| } | ||
|
|
||
| return &vllmEngine{ | ||
| MetricPort: 8000, | ||
| MetricPort: port, | ||
| } |
There was a problem hiding this comment.
The constructor parameter is named metricPort, but MetricPort is also used for non-metrics endpoints (e.g., /v1/models in models.go). If this value is intended to configure only the metrics port, this will unintentionally change the models/API port too. Consider clarifying via naming/doc (e.g., serverPort) or splitting into separate ports if they can differ.
| port := defaultMetricPort | ||
| if len(metricPort) > 0 && metricPort[0] > 0 { | ||
| port = metricPort[0] |
There was a problem hiding this comment.
NewSglangEngine uses a variadic metricPort but only reads metricPort[0]. To avoid silently ignoring extra arguments, consider rejecting len(metricPort) > 1 and/or switching to a clearer API (e.g., options). Also consider validating that the configured port is within 1–65535 before using it.
| port := defaultMetricPort | |
| if len(metricPort) > 0 && metricPort[0] > 0 { | |
| port = metricPort[0] | |
| if len(metricPort) > 1 { | |
| panic("NewSglangEngine accepts at most one metricPort argument") | |
| } | |
| port := defaultMetricPort | |
| if len(metricPort) == 1 { | |
| if metricPort[0] >= 1 && metricPort[0] <= 65535 { | |
| port = metricPort[0] | |
| } |
| func TestNewSglangEngine_UsesConfiguredMetricPort(t *testing.T) { | ||
| engine := NewSglangEngine(31000) | ||
|
|
||
| if engine.MetricPort != 31000 { | ||
| t.Fatalf("expected metric port 31000, got %d", engine.MetricPort) | ||
| } | ||
| } |
There was a problem hiding this comment.
NewSglangEngine implements a zero-value fallback (passing 0 results in the default port), but the tests only cover default and non-zero custom ports. Add a test case for NewSglangEngine(0) to lock down the fallback behavior (and to match the PR’s stated testing goals).
|
I found one blocking gap in the current version. The constructors now accept an optional metrics port, but the production initialization path still builds the registry with:
So the port is test-configurable, but it is not actually configurable from runtime/router configuration yet. I think this needs the config plumbing end-to-end before the PR matches its stated behavior. |
hzxuzhonghu
left a comment
There was a problem hiding this comment.
I’m requesting changes for one blocking gap in the current version.
The constructors now accept an optional metrics port, but the production initialization path still builds the registry with:
sglang.NewSglangEngine()vllm.NewVllmEngine()
So the port is test-configurable, but it is not actually configurable from runtime/router configuration yet. Please add the config plumbing end-to-end before merge so the PR matches its stated behavior.
b4729d4 to
df8bc29
Compare
| port := defaultMetricPort | ||
| if len(metricPort) == 1 && metricPort[0] > 0 && metricPort[0] <= 65535 { | ||
| port = metricPort[0] | ||
| } |
There was a problem hiding this comment.
warning info if metric port specified by user is invalid?
| port := defaultMetricPort | ||
| if len(metricPort) == 1 && metricPort[0] > 0 && metricPort[0] <= 65535 { | ||
| port = metricPort[0] | ||
| } |
pkg/kthena-router/backend/backend.go
Outdated
| "vLLM": vllm.NewVllmEngine(), | ||
| var ( | ||
| engineRegistryMu sync.RWMutex | ||
| engineRegistry = buildEngineRegistry(0, 0) |
There was a problem hiding this comment.
| engineRegistry = buildEngineRegistry(0, 0) | |
| engineRegistries = buildEngineRegistries(0, 0) |
| type RouterConfiguration struct { | ||
| Scheduler SchedulerConfiguration `yaml:"scheduler"` | ||
| Auth AuthenticationConfig `yaml:"auth"` | ||
| Backend BackendConfiguration `yaml:"backend"` |
There was a problem hiding this comment.
This field is not initialized and used?
fb1fc52 to
8012a1c
Compare
There was a problem hiding this comment.
Copilot reviewed 9 out of 9 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 9 out of 9 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| metricPort: 30000 | ||
| vllm: | ||
| metricPort: 8000 |
There was a problem hiding this comment.
The Helm template hardcodes backend.*.metricPort values, which makes the new configurability difficult to use in Helm-based deployments (users would need to edit the rendered ConfigMap). Consider templating these from chart values (with defaults 30000/8000) or omitting the backend section entirely to rely on the router’s built-in defaults; if templated, also add the corresponding keys to values.yaml/README.
| metricPort: 30000 | |
| vllm: | |
| metricPort: 8000 | |
| metricPort: {{ .Values.backend.sglang.metricPort | default 30000 }} | |
| vllm: | |
| metricPort: {{ .Values.backend.vllm.metricPort | default 8000 }} |
There was a problem hiding this comment.
Good suggestion, add a configurable value
|
|
||
| func NewSglangEngine() *sglangEngine { | ||
| // TODO: Get MetricsPort from sglang configuration | ||
| func NewSglangEngine(metricPort ...uint32) *sglangEngine { |
There was a problem hiding this comment.
I think why not pass a metricPort uint32, now you make it like a slice
There was a problem hiding this comment.
Using a variadic metricPort ...uint32 is intentional for backward compatibility.
It keeps existing call sites like NewSglangEngine() unchanged, while allowing optional configuration via NewSglangEngine(customPort).
I also guard len(metricPort) > 1 to avoid ambiguous usage.
8012a1c to
ed0e806
Compare
ed0e806 to
4227a22
Compare
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 13 out of 13 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Signed-off-by: lizhixuan <[email protected]>
4227a22 to
63fc1d7
Compare
Summary
Motivation
Previously both sglang and vllm engines hardcoded their metrics port (30000 and 8000 respectively). Different deployment environments may use different ports, so these should be configurable while keeping backward-compatible defaults.
Changes
Testing
go test ./pkg/kthena-router/backend/sglang/... -v
go test ./pkg/kthena-router/backend/vllm/... -v