fix(cluster): complete Issue #905 step 2 correctness for Rand and Rou… by mochengqian · Pull Request #915 · apache/dubbo-go-pixiu

mochengqian · 2026-04-18T08:17:47Z

What this PR does:

This PR delivers Step 1 (tests + benchmarks) and Step 2 (correctness) for Issue #905, and is intended to be the first mergeable slice with a clean boundary.

It:

fixes Rand to choose only from the healthy endpoint slice and return nil when no healthy endpoint exists
fixes RoundRobin to return nil when no healthy endpoint exists
makes the round-robin cursor safe for concurrent picks
adds deterministic regression coverage for the current Rand / RoundRobin correctness issues
extends manager-path regression coverage for missing-cluster / unhealthy-endpoint nil behavior
adds and refines the benchmark suite, including:
- PickEndpoint serial baseline
- cleaner PickEndpoint parallel baseline
- lookup-only benchmark
- simple LB hot-path-only benchmark
- healthy endpoint filtering cost
- mixed CompareAndSetStore workload
- consistent-hash resolve baseline

This PR does not start the Step 3+ runtime-consistency / snapshot refactor yet.

Which issue(s) this PR fixes:

Fixes #905

Special notes for your reviewer:

Issue #905 step progress:

Add tests and benchmarks
Fix current correctness issues
Tighten runtime consistency
Switch cluster lookup to O(1)
Introduce healthy endpoint snapshots
Optimize simple LB hot path
Optimize consistent-hash LB last
This PR intentionally covers Step 1 (tests + benchmarks) and Step 2 (correctness) only.
Runtime consistency for UpdateCluster / CompareAndSetStore, O(1) cluster lookup, healthy endpoint snapshots, simple LB hot-path optimization, and consistent-hash optimization will follow in later PRs.
Detailed benchmark baseline numbers are posted in a separate PR comment so later steps can compare against the same baseline.
Local validation completed:
- go test ./pkg/model ./pkg/server ./pkg/cluster/loadbalancer/rand ./pkg/cluster/loadbalancer/roundrobin -count=1
- go test ./pkg/server ./pkg/cluster/loadbalancer/rand ./pkg/cluster/loadbalancer/roundrobin -race -gcflags=-l -count=1
- go vet ./pkg/server ./pkg/cluster/loadbalancer/... ./pkg/model
- go test ./pkg/server -run '^$' -bench '^BenchmarkCluster' -benchmem -count=5

Does this PR introduce a user-facing change?:

NONE

…nd RoundRobin

mochengqian · 2026-04-18T08:26:18Z

Issue #905 benchmark baseline (Step 1 + Step 2)

Environment:

goos=darwin
goarch=arm64
cpu=Apple M5

Command:

go test ./pkg/server -run '^$' -bench '^BenchmarkCluster' -benchmem -count=5

Each Mean value below is the arithmetic mean across 5 runs. Range shows the min-max span across the same 5 runs.

Raw output was captured locally for reference.

PickEndpoint / Lookup

Benchmark	Mean ns/op	Range ns/op
PickEndpoint serial / Rand / `clusters=1`	16.12	15.94-16.29
PickEndpoint serial / Rand / `clusters=32`	33.09	32.98-33.32
PickEndpoint serial / Rand / `clusters=256`	113.58	112.80-115.30
PickEndpoint serial / Rand / `clusters=1024`	685.30	669.10-692.30
PickEndpoint serial / RoundRobin / `clusters=1`	13.51	13.29-13.96
PickEndpoint serial / RoundRobin / `clusters=32`	29.59	29.48-29.82
PickEndpoint serial / RoundRobin / `clusters=256`	111.00	110.20-112.30
PickEndpoint serial / RoundRobin / `clusters=1024`	767.10	675.90-1054.00
Lookup serial / `clusters=1`	3.33	3.32-3.35
Lookup serial / `clusters=32`	17.82	17.77-17.85
Lookup serial / `clusters=256`	111.68	111.50-112.20
Lookup serial / `clusters=1024`	744.18	738.00-749.60
PickEndpoint parallel / Rand	100.42	98.42-101.30
PickEndpoint parallel / RoundRobin	107.66	103.10-110.10

LB Hot Path / Healthy Filter

Benchmark	Mean ns/op	Range ns/op	B/op	allocs/op
LB hot path / Rand / `endpoints=4`	13.72	13.51-13.93	0	0
LB hot path / Rand / `endpoints=64`	134.96	131.40-136.70	512	1
LB hot path / Rand / `endpoints=512`	860.96	852.00-881.30	4864	1
LB hot path / RoundRobin / `endpoints=4`	9.29	8.96-9.57	0	0
LB hot path / RoundRobin / `endpoints=64`	125.40	124.60-128.10	512	1
LB hot path / RoundRobin / `endpoints=512`	858.72	851.60-868.40	4864	1
Healthy filter / `endpoints=8, healthy=100%`	20.02	19.92-20.11	64	1
Healthy filter / `endpoints=8, healthy=50%`	19.11	18.98-19.45	64	1
Healthy filter / `endpoints=8, healthy=0%`	16.78	16.70-16.91	64	1
Healthy filter / `endpoints=64, healthy=100%`	120.60	120.30-121.70	512	1
Healthy filter / `endpoints=64, healthy=50%`	102.90	102.00-104.50	512	1
Healthy filter / `endpoints=64, healthy=0%`	87.86	87.38-88.61	512	1
Healthy filter / `endpoints=512, healthy=100%`	900.28	894.20-908.50	4864	1
Healthy filter / `endpoints=512, healthy=50%`	744.82	730.60-758.70	4864	1
Healthy filter / `endpoints=512, healthy=0%`	498.62	495.40-501.60	4864	1

CAS / Consistent Hash

Benchmark	Mean ns/op	Range ns/op	B/op	allocs/op
CompareAndSetStore mixed	586839.40	573165.00-615044.00	1850084	6417
Consistent hash / RingHash	2644.00	2617.00-2690.00	2212	128
Consistent hash / Maglev	1944.20	1921.00-1964.00	1828	99

Notes

The split benchmarks are intended to make later step-by-step attribution clearer:
- Step 4 should primarily move the lookup-only numbers.
- Step 5/6 should primarily move the healthy-filter and simple LB hot-path numbers.
- Step 7 should primarily move the consistent-hash numbers.
PickEndpoint serial / RoundRobin / clusters=1024 showed one isolated high value in the count=5 suite run. A dedicated rerun with:
```
go test ./pkg/server -run '^$' -bench '^BenchmarkClusterPickEndpointSerial/RoundRobin/clusters=1024$' -benchmem -count=10
```
was stable at:
- mean: 687.15 ns/op
- median: 685.75 ns/op
- range: 673.60-712.40 ns/op
I am treating the earlier 1054.00 ns/op sample as an outlier rather than a representative signal.

Copilot

Pull request overview

Implements Issue #905 Step 1 (tests/benchmarks) and Step 2 (correctness) for cluster endpoint picking, focusing on fixing Rand / RoundRobin behavior with unhealthy endpoints and improving concurrency safety for round-robin picks.

Changes:

Fix Rand and RoundRobin to pick only from healthy endpoints and return nil when none are healthy.
Make round-robin cursor updates concurrency-safe via atomic operations.
Add regression tests and a benchmark suite covering pick-path behavior and related hot paths.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
pkg/server/cluster_manager_test.go	Adds/extends manager-path regression tests (missing cluster, unhealthy endpoints) and a concurrent pick race-style test.
pkg/server/cluster_manager_bench_test.go	Introduces a benchmark suite for `PickEndpoint`, lookup cost, LB hot path, health filtering cost, CAS workload, and consistent-hash resolve.
pkg/model/cluster_test.go	Adds unit tests for `GetEndpoint` health filtering, consistent-hash initialization, and `Endpoint.GetHost`.
pkg/model/cluster.go	Changes `PrePickEndpointIndex` type to `uint32` to support atomic increments.
pkg/cluster/loadbalancer/roundrobin/round_robin_test.go	Adds deterministic tests for all-unhealthy behavior and ordering/cursor handling.
pkg/cluster/loadbalancer/roundrobin/round_robin.go	Updates round-robin handler to return `nil` when no healthy endpoints and uses atomic cursor increments.
pkg/cluster/loadbalancer/rand/load_balancer_rand_test.go	Adds deterministic tests for `Rand` correctness and all-unhealthy behavior without panics.
pkg/cluster/loadbalancer/rand/load_balancer_rand.go	Fixes `Rand` to use healthy slice length, return `nil` on empty, and adds a test hook for deterministic randomness.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

sonarqubecloud · 2026-04-19T02:46:14Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

codecov-commenter · 2026-04-19T03:00:35Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 22.23%. Comparing base (e6be678) to head (3b89c87).

Additional details and impacted files

@@             Coverage Diff             @@
##           develop     #915      +/-   ##
===========================================
+ Coverage    22.05%   22.23%   +0.17%     
===========================================
  Files          270      270              
  Lines        20069    20070       +1     
===========================================
+ Hits          4426     4462      +36     
+ Misses       15193    15160      -33     
+ Partials       450      448       -2

Flag	Coverage Δ
unittests	`22.23% <100.00%> (+0.17%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Alanxtl

cool work

Alanxtl · 2026-04-24T03:22:02Z

 		HealthChecks         []HealthCheckConfig `yaml:"health_checks" json:"health_checks"`
 		Endpoints            []*Endpoint         `yaml:"endpoints" json:"endpoints"`
-		PrePickEndpointIndex int
+		PrePickEndpointIndex uint32              `yaml:"-" json:"-"` // runtime-only round-robin cursor state


Marking PrePickEndpointIndex as yaml:"-" json:"-" makes the cursor disappear when CloneStore() serializes the current store and fetchCompareAndSet() rebuilds a fresh store before CompareAndSetStore().
Spring Cloud 的 adapter 会通过 CloneStore -> CompareAndSetStore 这条链路动态刷新 cluster
On the Spring Cloud refresh path, that means every registry sync resets round-robin back to endpoint 0, biasing traffic toward the first healthy endpoint whenever the store is refreshed. Before this change the exported field survived the clone, so this is a behavior regression outside the new unit coverage.

fix(cluster): complete Issue apache#905 step 2 correctness for Rand a…

ec8f94a

…nd RoundRobin

AlexStocks requested a review from Copilot April 19, 2026 02:07

Copilot started reviewing on behalf of AlexStocks April 19, 2026 02:07 View session

Copilot AI reviewed Apr 19, 2026

View reviewed changes

Comment thread pkg/model/cluster.go Outdated

Comment thread pkg/server/cluster_manager_bench_test.go

mochengqian added 2 commits April 19, 2026 10:31

fix(cluster): address review feedback for issue 905

4b4e048

fix(test): address sonar warnings for blank imports

3b89c87

Alanxtl reviewed Apr 24, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(cluster): complete Issue #905 step 2 correctness for Rand and Rou…#915

fix(cluster): complete Issue #905 step 2 correctness for Rand and Rou…#915
mochengqian wants to merge 3 commits intoapache:developfrom
mochengqian:feat/issue-905

mochengqian commented Apr 18, 2026 •

edited

Loading

Uh oh!

mochengqian commented Apr 18, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

sonarqubecloud Bot commented Apr 19, 2026

Uh oh!

codecov-commenter commented Apr 19, 2026

Uh oh!

Alanxtl left a comment

Uh oh!

Alanxtl Apr 24, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

mochengqian commented Apr 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mochengqian commented Apr 18, 2026

PickEndpoint / Lookup

LB Hot Path / Healthy Filter

CAS / Consistent Hash

Notes

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

sonarqubecloud Bot commented Apr 19, 2026

Quality Gate passed

Uh oh!

codecov-commenter commented Apr 19, 2026

Codecov Report

Uh oh!

Alanxtl left a comment

Choose a reason for hiding this comment

Uh oh!

Alanxtl Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

mochengqian commented Apr 18, 2026 •

edited

Loading

Alanxtl Apr 24, 2026 •

edited

Loading