Skip to content

fix(router): make sglang and vllm metrics port configurable#847

Open
Tweakzx wants to merge 1 commit intovolcano-sh:mainfrom
Tweakzx:fix/router-metrics-port-config
Open

fix(router): make sglang and vllm metrics port configurable#847
Tweakzx wants to merge 1 commit intovolcano-sh:mainfrom
Tweakzx:fix/router-metrics-port-config

Conversation

@Tweakzx
Copy link
Copy Markdown
Contributor

@Tweakzx Tweakzx commented Mar 28, 2026

Summary

  • Make sglang metrics port configurable via variadic parameter, default 30000
  • Make vllm metrics port configurable via variadic parameter, default 8000
  • Add tests for default port, custom port, and zero-value fallback

Motivation

Previously both sglang and vllm engines hardcoded their metrics port (30000 and 8000 respectively). Different deployment environments may use different ports, so these should be configurable while keeping backward-compatible defaults.

Changes

  • sglang/metrics.go: NewSglangEngine now accepts optional metricPort parameter, falls back to 30000
  • sglang/metrics_test.go: New tests for default and custom port
  • vllm/metrics.go: NewVllmEngine now accepts optional metricPort parameter, falls back to 8000
  • vllm/metrics_test.go: New tests for default, custom, and zero-value fallback

Testing

go test ./pkg/kthena-router/backend/sglang/... -v
go test ./pkg/kthena-router/backend/vllm/... -v

Copilot AI review requested due to automatic review settings March 28, 2026 09:59
Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request makes the metric ports for sglang and vllm engines configurable by updating their constructors to accept optional parameters and providing default values. While the changes include new unit tests, the reviewer pointed out that the feature is currently unreachable because the engines are initialized in a global registry without arguments, necessitating further refactoring. A suggestion was also made to add a test case for the zero-value fallback in the sglang engine to ensure parity with the vllm tests.


func NewSglangEngine() *sglangEngine {
// TODO: Get MetricsPort from sglang configuration
func NewSglangEngine(metricPort ...uint32) *sglangEngine {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

While this change makes the metric port configurable at the function level, the way this function is called in pkg/kthena-router/backend/backend.go prevents this configuration from being used. The engineRegistry is a global variable initialized with NewSglangEngine() (and NewVllmEngine()), so it will always use the default port.

To make this feature fully functional, a refactoring of backend.go is needed to allow passing configuration down to the engine constructors. This likely means changing engineRegistry from a global variable to something that is initialized with application configuration. Without this, the feature added in this PR is not usable.


func NewVllmEngine() *vllmEngine {
// TODO: Get MetricsPort from vllm configuration
func NewVllmEngine(metricPort ...uint32) *vllmEngine {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Similar to sglang, while this change makes the metric port configurable at the function level, the way this function is called in pkg/kthena-router/backend/backend.go prevents this configuration from being used. The engineRegistry is a global variable initialized with NewVllmEngine(), so it will always use the default port.

To make this feature fully functional, a refactoring of backend.go is needed to allow passing configuration down to the engine constructors. Without this, the feature added in this PR is not usable.

if engine.MetricPort != 31000 {
t.Fatalf("expected metric port 31000, got %d", engine.MetricPort)
}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

For consistency with the vllm tests and to ensure complete test coverage for the new logic, please add a test case for when a zero-value port is provided. It should fall back to the default port.

Suggested change
}
}
func TestNewSglangEngine_FallsBackToDefaultWhenConfiguredPortIsZero(t *testing.T) {
engine := NewSglangEngine(0)
if engine.MetricPort != 30000 {
t.Fatalf("expected fallback metric port 30000, got %d", engine.MetricPort)
}
}

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR makes the metrics port for the SGLang and vLLM backends configurable while preserving the existing default ports to keep current behavior backward-compatible.

Changes:

  • Update NewSglangEngine / NewVllmEngine to accept an optional metrics port and fall back to the historical defaults.
  • Introduce defaultMetricPort constants for each backend to centralize defaults.
  • Add unit tests covering default behavior and custom port configuration (plus vLLM’s zero-value fallback).

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File Description
pkg/kthena-router/backend/sglang/metrics.go Adds optional metrics port parameter with default fallback (30000).
pkg/kthena-router/backend/sglang/metrics_test.go Adds tests for default and custom metrics port behavior.
pkg/kthena-router/backend/vllm/metrics.go Adds optional metrics port parameter with default fallback (8000).
pkg/kthena-router/backend/vllm/metrics_test.go Adds tests for default, custom, and zero-value fallback behavior.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +1 to +3
package vllm

import "testing"
Copy link

Copilot AI Mar 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New Go files in this repo include the Apache 2.0 license header (e.g., pkg/kthena-router/backend/vllm/models.go:1-15). Please add the same header comment block to this new test file for consistency/compliance.

Copilot uses AI. Check for mistakes.
Comment on lines +1 to +3
package sglang

import "testing"
Copy link

Copilot AI Mar 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New Go files in this repo typically start with the Apache 2.0 license header (see pkg/kthena-router/backend/sglang/metrics.go:1-15 or other *_test.go files like pkg/kthena-router/datastore/token_tracker_test.go:1-15). Please add that header to this new test file as well.

Copilot uses AI. Check for mistakes.
@Tweakzx Tweakzx force-pushed the fix/router-metrics-port-config branch from 5216ba6 to 9eb1bf8 Compare March 28, 2026 10:04
Copilot AI review requested due to automatic review settings March 28, 2026 10:13
@Tweakzx Tweakzx force-pushed the fix/router-metrics-port-config branch from 9eb1bf8 to 80223ba Compare March 28, 2026 10:13
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

// TODO: Get MetricsPort from vllm configuration
func NewVllmEngine(metricPort ...uint32) *vllmEngine {
port := defaultMetricPort
if len(metricPort) > 0 && metricPort[0] > 0 {
Copy link

Copilot AI Mar 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NewVllmEngine now accepts a variadic metricPort but silently ignores any extra values beyond the first. This can hide call-site bugs; consider either accepting a single optional value via a pointer/options pattern, or explicitly rejecting len(metricPort) > 1 (e.g., panic/log) and documenting the expected usage. Also consider validating the provided port is within the valid TCP port range (1–65535) before using it.

Suggested change
if len(metricPort) > 0 && metricPort[0] > 0 {
if len(metricPort) > 1 {
panic("NewVllmEngine expects at most one metricPort argument")
}
if len(metricPort) == 1 {
if metricPort[0] < 1 || metricPort[0] > 65535 {
panic("NewVllmEngine: metricPort must be in range 1-65535")
}

Copilot uses AI. Check for mistakes.
Comment on lines +66 to 74
func NewVllmEngine(metricPort ...uint32) *vllmEngine {
port := defaultMetricPort
if len(metricPort) > 0 && metricPort[0] > 0 {
port = metricPort[0]
}

return &vllmEngine{
MetricPort: 8000,
MetricPort: port,
}
Copy link

Copilot AI Mar 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The constructor parameter is named metricPort, but MetricPort is also used for non-metrics endpoints (e.g., /v1/models in models.go). If this value is intended to configure only the metrics port, this will unintentionally change the models/API port too. Consider clarifying via naming/doc (e.g., serverPort) or splitting into separate ports if they can differ.

Copilot uses AI. Check for mistakes.
Comment on lines +64 to +66
port := defaultMetricPort
if len(metricPort) > 0 && metricPort[0] > 0 {
port = metricPort[0]
Copy link

Copilot AI Mar 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NewSglangEngine uses a variadic metricPort but only reads metricPort[0]. To avoid silently ignoring extra arguments, consider rejecting len(metricPort) > 1 and/or switching to a clearer API (e.g., options). Also consider validating that the configured port is within 1–65535 before using it.

Suggested change
port := defaultMetricPort
if len(metricPort) > 0 && metricPort[0] > 0 {
port = metricPort[0]
if len(metricPort) > 1 {
panic("NewSglangEngine accepts at most one metricPort argument")
}
port := defaultMetricPort
if len(metricPort) == 1 {
if metricPort[0] >= 1 && metricPort[0] <= 65535 {
port = metricPort[0]
}

Copilot uses AI. Check for mistakes.
Comment on lines +29 to +35
func TestNewSglangEngine_UsesConfiguredMetricPort(t *testing.T) {
engine := NewSglangEngine(31000)

if engine.MetricPort != 31000 {
t.Fatalf("expected metric port 31000, got %d", engine.MetricPort)
}
}
Copy link

Copilot AI Mar 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NewSglangEngine implements a zero-value fallback (passing 0 results in the default port), but the tests only cover default and non-zero custom ports. Add a test case for NewSglangEngine(0) to lock down the fallback behavior (and to match the PR’s stated testing goals).

Copilot uses AI. Check for mistakes.
@hzxuzhonghu
Copy link
Copy Markdown
Member

I found one blocking gap in the current version.

The constructors now accept an optional metrics port, but the production initialization path still builds the registry with:

  • sglang.NewSglangEngine()
  • vllm.NewVllmEngine()

So the port is test-configurable, but it is not actually configurable from runtime/router configuration yet. I think this needs the config plumbing end-to-end before the PR matches its stated behavior.

Copy link
Copy Markdown
Member

@hzxuzhonghu hzxuzhonghu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m requesting changes for one blocking gap in the current version.

The constructors now accept an optional metrics port, but the production initialization path still builds the registry with:

  • sglang.NewSglangEngine()
  • vllm.NewVllmEngine()

So the port is test-configurable, but it is not actually configurable from runtime/router configuration yet. Please add the config plumbing end-to-end before merge so the PR matches its stated behavior.

@Tweakzx Tweakzx force-pushed the fix/router-metrics-port-config branch from b4729d4 to df8bc29 Compare March 30, 2026 07:37
port := defaultMetricPort
if len(metricPort) == 1 && metricPort[0] > 0 && metricPort[0] <= 65535 {
port = metricPort[0]
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning info if metric port specified by user is invalid?

port := defaultMetricPort
if len(metricPort) == 1 && metricPort[0] > 0 && metricPort[0] <= 65535 {
port = metricPort[0]
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

"vLLM": vllm.NewVllmEngine(),
var (
engineRegistryMu sync.RWMutex
engineRegistry = buildEngineRegistry(0, 0)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
engineRegistry = buildEngineRegistry(0, 0)
engineRegistries = buildEngineRegistries(0, 0)

type RouterConfiguration struct {
Scheduler SchedulerConfiguration `yaml:"scheduler"`
Auth AuthenticationConfig `yaml:"auth"`
Backend BackendConfiguration `yaml:"backend"`
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This field is not initialized and used?

Copilot AI review requested due to automatic review settings April 2, 2026 04:10
@Tweakzx Tweakzx force-pushed the fix/router-metrics-port-config branch from fb1fc52 to 8012a1c Compare April 2, 2026 04:11
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot reviewed 9 out of 9 changed files in this pull request and generated no new comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 9 out of 9 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +45 to +47
metricPort: 30000
vllm:
metricPort: 8000
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Helm template hardcodes backend.*.metricPort values, which makes the new configurability difficult to use in Helm-based deployments (users would need to edit the rendered ConfigMap). Consider templating these from chart values (with defaults 30000/8000) or omitting the backend section entirely to rely on the router’s built-in defaults; if templated, also add the corresponding keys to values.yaml/README.

Suggested change
metricPort: 30000
vllm:
metricPort: 8000
metricPort: {{ .Values.backend.sglang.metricPort | default 30000 }}
vllm:
metricPort: {{ .Values.backend.vllm.metricPort | default 8000 }}

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good suggestion, add a configurable value


func NewSglangEngine() *sglangEngine {
// TODO: Get MetricsPort from sglang configuration
func NewSglangEngine(metricPort ...uint32) *sglangEngine {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think why not pass a metricPort uint32, now you make it like a slice

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using a variadic metricPort ...uint32 is intentional for backward compatibility.
It keeps existing call sites like NewSglangEngine() unchanged, while allowing optional configuration via NewSglangEngine(customPort).
I also guard len(metricPort) > 1 to avoid ambiguous usage.

@Tweakzx Tweakzx force-pushed the fix/router-metrics-port-config branch from 8012a1c to ed0e806 Compare April 2, 2026 06:48
Copilot AI review requested due to automatic review settings April 2, 2026 06:55
@Tweakzx Tweakzx force-pushed the fix/router-metrics-port-config branch from ed0e806 to 4227a22 Compare April 2, 2026 06:55
@volcano-sh-bot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign lizhencheng9527 for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 13 out of 13 changed files in this pull request and generated no new comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@Tweakzx Tweakzx force-pushed the fix/router-metrics-port-config branch from 4227a22 to 63fc1d7 Compare April 2, 2026 07:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants