[WIP] feat(hypernode): extract standalone module and controller image#5137
[WIP] feat(hypernode): extract standalone module and controller image#5137wangyang0616 wants to merge 1 commit intovolcano-sh:masterfrom
Conversation
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
519cd65 to
45433d2
Compare
There was a problem hiding this comment.
Pull request overview
This PR extracts the HyperNode controller into a standalone Go module (volcano.sh/hypernode) and adds a standalone controller binary/image (vc-hypernode-controller), while keeping the existing vc-controller-manager behavior via an adapter and aligning selector-to-node matching between controller and scheduler.
Changes:
- Introduce
staging/src/volcano.sh/hypernodemodule with standalone controller entrypoint, shared utilities (signals, nodematch, testutil), and documentation. - Wire the root repo to the new module and add a controller-manager adapter registration.
- Add build targets and an OCI image Dockerfile for
vc-hypernode-controller, plus docs updates describing deployment modes.
Reviewed changes
Copilot reviewed 39 out of 44 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| staging/src/volcano.sh/hypernode/pkg/testutil/build_hypernode.go | Adds module-local HyperNode test builder helper. |
| staging/src/volcano.sh/hypernode/pkg/signals/signal_windows.go | Windows shutdown signal list for standalone controller. |
| staging/src/volcano.sh/hypernode/pkg/signals/signal_posix.go | POSIX shutdown signal list for standalone controller. |
| staging/src/volcano.sh/hypernode/pkg/signals/signal.go | Adds signal-driven context cancellation helper. |
| staging/src/volcano.sh/hypernode/pkg/nodematch/nodematch.go | Extracts shared selector→node matching logic for controller/scheduler. |
| staging/src/volcano.sh/hypernode/pkg/hypernode/options.go | Defines controller initialization options for module consumers. |
| staging/src/volcano.sh/hypernode/go.sum | Adds module dependency lockfile. |
| staging/src/volcano.sh/hypernode/go.mod | Declares new nested module and dependencies/replaces. |
| staging/src/volcano.sh/hypernode/cmd/hypernode-controller/main.go | Adds standalone controller binary entrypoint. |
| staging/src/volcano.sh/hypernode/cmd/hypernode-controller/app/server.go | Implements standalone run loop + leader election + informers. |
| staging/src/volcano.sh/hypernode/cmd/hypernode-controller/app/options/options.go | Adds flags and option parsing for standalone controller. |
| staging/src/volcano.sh/hypernode/README.md | Documents module layout, build/test, and references. |
| pkg/scheduler/api/hyper_node_info.go | Switches scheduler GetMembers to shared nodematch implementation. |
| pkg/controllers/hypernode/register.go | Adds adapter to register module controller into controller-manager. |
| pkg/controllers/hypernode/hypernode_handler_test.go | Updates tests to use module-local testutil and new Controller type. |
| pkg/controllers/hypernode/hypernode_handler.go | Aligns controller member resolution with shared nodematch. |
| pkg/controllers/hypernode/hypernode_controller_test.go | Updates controller tests for extracted module packages and Options type. |
| pkg/controllers/hypernode/hypernode_controller.go | Refactors controller into exported Controller with NewController, Name, Initialize. |
| pkg/controllers/hypernode/discovery/ufm/ufm_test.go | Updates imports to module-local api. |
| pkg/controllers/hypernode/discovery/ufm/ufm.go | Updates imports to module-local api/utils. |
| pkg/controllers/hypernode/discovery/manager_test.go | Updates imports to module-local discovery/config/api. |
| pkg/controllers/hypernode/discovery/manager.go | Updates imports and blank imports for module discoverers. |
| pkg/controllers/hypernode/discovery/label/label_test.go | Updates imports to module-local api/utils. |
| pkg/controllers/hypernode/discovery/label/label.go | Updates imports to module-local api/utils. |
| pkg/controllers/hypernode/discovery/fake/fake.go | Updates imports to module-local api. |
| pkg/controllers/hypernode/configmap_handler_test.go | Updates imports and controller type rename. |
| pkg/controllers/hypernode/configmap_handler.go | Updates imports and controller receiver/type rename. |
| pkg/controllers/hypernode/config/loader_test.go | Updates imports to module-local api. |
| pkg/controllers/hypernode/config/loader.go | Updates imports to module-local api. |
| pkg/controllers/hypernode/config/fake.go | Updates imports to module-local api. |
| installer/dockerfile/hypernode-controller/Dockerfile | Adds Dockerfile to build/publish standalone controller image. |
| installer/README.md | Notes optional standalone HyperNode controller image. |
| go.mod | Adds volcano.sh/hypernode dependency and replace to staging module. |
| docs/user-guide/how_to_use_hypernode_auto_discovery.md | Documents bundled vs standalone controller modes and verification logs. |
| docs/design/hypernode-standalone-controller.zh.md | Adds Chinese summary doc for standalone controller/module. |
| docs/design/hypernode-standalone-controller.md | Adds design doc describing module layout, deploy modes, build/image. |
| docs/design/hyperNode-auto-discovery.md | Documents new implementation location and shared node matching. |
| docs/design/Network Topology Aware Scheduling.md | Updates overview text and links shared node matching + module location. |
| README.md | Mentions optional standalone HyperNode controller image/binary and conflict warning. |
| Makefile | Adds build/image targets for vc-hypernode-controller and includes it in images/unit-test/lint. |
Comments suppressed due to low confidence (3)
staging/src/volcano.sh/hypernode/pkg/signals/signal.go:1
SetupSignalContext()will panic if called more than once because it unconditionally closesonlyOneSignalHandlerwithout any guard, and it also storesshutdownHandlerin a package-level variable. If this package is used from tests or multiple binaries/libraries, that can cause flaky panics/data races. Use async.Once(or similar) to enforce one-time initialization, and keep the signal channel local to the function (or protect shared state) to avoid races.
staging/src/volcano.sh/hypernode/pkg/testutil/build_hypernode.go:1- Returning
nilon an invalidMemberConfig.Selectormakes test failures harder to diagnose (it can turn into a nil dereference far away from the cause). Prefer returning(*HyperNode, error)so tests can assert on the error, or (since this is testutil) consider panicking with a clear message to fail fast and localize the failure.
staging/src/volcano.sh/hypernode/pkg/nodematch/nodematch.go:1 - This introduces shared matching logic used by both controller and scheduler, but there are no unit tests for the new
NodeNamesForSelector. Add focused tests covering: exact-match (including empty name), regex-match (including invalid regex), label-match (including empty selector), and combinations/overlap semantics to prevent future drift between components.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if s.CaCertFile != "" && s.CertFile != "" && s.KeyFile != "" { | ||
| if err := s.ReadCAFiles(); err != nil { | ||
| fmt.Fprintf(os.Stderr, "Failed to read TLS files: %v\n", err) | ||
| os.Exit(1) | ||
| } | ||
| } |
There was a problem hiding this comment.
TLS flag validation is currently skipped when only some of --ca-cert-file/--tls-cert-file/--tls-private-key-file are provided, because ReadCAFiles() is only called when all three are non-empty. This can lead to silently ignoring user TLS intent. Call ReadCAFiles() when any of the three file flags is set (let it enforce the 'all three required' rule), or move that validation into CheckOptionOrDie().
| func buildRESTConfig(opt *options.ServerOption) (*rest.Config, error) { | ||
| loadingRules := clientcmd.NewDefaultClientConfigLoadingRules() | ||
| if opt.KubeClientOptions.KubeConfig != "" { | ||
| loadingRules.ExplicitPath = opt.KubeClientOptions.KubeConfig | ||
| } | ||
| overrides := &clientcmd.ConfigOverrides{} | ||
| if opt.KubeClientOptions.Master != "" { | ||
| overrides.ClusterInfo.Server = opt.KubeClientOptions.Master | ||
| } | ||
| cc := clientcmd.NewNonInteractiveDeferredLoadingClientConfig(loadingRules, overrides) | ||
| cfg, err := cc.ClientConfig() | ||
| if err != nil { | ||
| return nil, err | ||
| } | ||
| cfg.QPS = opt.KubeClientOptions.QPS | ||
| cfg.Burst = opt.KubeClientOptions.Burst | ||
| return cfg, nil | ||
| } |
There was a problem hiding this comment.
buildRESTConfig uses only kubeconfig-based loading and does not fall back to in-cluster configuration when --kubeconfig/--master are unset. This will break the common 'run in cluster with ServiceAccount' deployment path described in docs. Consider using clientcmd.BuildConfigFromFlags(master, kubeconfig) (which includes in-cluster fallback) or explicitly try rest.InClusterConfig() when no explicit kubeconfig/master is provided.
| if !opt.LeaderElection.LeaderElect { | ||
| run(ctx) | ||
| return fmt.Errorf("finished without leader elect") | ||
| } |
There was a problem hiding this comment.
Run() returns an error on normal shutdown in both leader-election-disabled mode (finished without leader elect) and leader-election-enabled mode (lost lease) because the return happens unconditionally after the run loop completes. This will surface as an 'exited with err' log even on SIGTERM/SIGINT. Return nil when the context is canceled (normal shutdown), and only return an error for actual exceptional cases (e.g., inability to start, or explicit lease-loss handling if you choose not to Fatalf).
| leaderelection.RunOrDie(ctx, leaderelection.LeaderElectionConfig{ | ||
| Lock: rl, | ||
| LeaseDuration: opt.LeaderElection.LeaseDuration.Duration, | ||
| RenewDeadline: opt.LeaderElection.RenewDeadline.Duration, | ||
| RetryPeriod: opt.LeaderElection.RetryPeriod.Duration, | ||
| Callbacks: leaderelection.LeaderCallbacks{ | ||
| OnStartedLeading: run, | ||
| OnStoppedLeading: func() { | ||
| klog.Fatalf("leaderelection lost") | ||
| }, | ||
| }, | ||
| }) | ||
| return fmt.Errorf("lost lease") | ||
| } |
There was a problem hiding this comment.
Run() returns an error on normal shutdown in both leader-election-disabled mode (finished without leader elect) and leader-election-enabled mode (lost lease) because the return happens unconditionally after the run loop completes. This will surface as an 'exited with err' log even on SIGTERM/SIGINT. Return nil when the context is canceled (normal shutdown), and only return an error for actual exceptional cases (e.g., inability to start, or explicit lease-loss handling if you choose not to Fatalf).
| ADD . volcano | ||
| RUN cd volcano && make vc-hypernode-controller | ||
|
|
||
| FROM alpine:latest |
There was a problem hiding this comment.
Using alpine:latest makes builds non-reproducible and can introduce breaking changes or unexpected CVEs as the base image updates. Pin to a specific Alpine version (or use the repo’s standard runtime base), and consider whether the runtime image should include ca-certificates if the controller needs to make TLS connections relying on system roots.
| FROM alpine:latest | |
| FROM alpine:3.20.1 | |
| RUN apk add --no-cache ca-certificates |
There was a problem hiding this comment.
Code Review
This pull request refactors the HyperNode controller into a standalone Go module (volcano.sh/hypernode) and introduces the vc-hypernode-controller binary for independent deployment. Key changes include updating the build system, adding a Dockerfile, and centralizing node matching logic in a shared package used by both the controller and the scheduler. I have no feedback to provide.
Extract HyperNode controller to staging module volcano.sh/hypernode with optional vc-hypernode-controller binary and OCI image. Controller-manager keeps behavior via pkg/controllers/hypernode/register.go. - Add nodematch package; scheduler GetMembers delegates for consistent semantics - Makefile: vc-hypernode-controller, image targets, unit-test and lint for submodule - Dockerfile installer/dockerfile/hypernode-controller for release/CI - Design docs, user guide and README updates Signed-off-by: wangyang0616 <[email protected]>
45433d2 to
5f9998b
Compare
|
Direction looks reasonable, but I see a few merge blockers before this is ready:
Positives:
If the operational issues above are fixed, I think the architectural direction is reviewable. |
What type of PR is this?
Feature
What this PR does / why we need it:
volcano.sh/hypernodeunderstaging/src/volcano.sh/hypernode/.vc-hypernode-controllerto compile/deploy HyperNode without the rest ofvc-controller-manager.pkg/controllers/hypernode/register.go.vc-hypernode-controllerand include it in CI builds and release publishing.For details of the proposal, please see: #5136
Which issue(s) this PR fixes:
Fixes #5133
Special notes for your reviewer:
The current pull request only addresses the HyperNode controller; podgroup-related content will be updated in a separate pull request.
Does this PR introduce a user-facing change?