Skip to content

feat: enable cert rotation for allocation API service (fixes #422)#507

Merged
dgkanatsios merged 5 commits intomainfrom
copilot/cert-rotation-allocation-api
Mar 24, 2026
Merged

feat: enable cert rotation for allocation API service (fixes #422)#507
dgkanatsios merged 5 commits intomainfrom
copilot/cert-rotation-allocation-api

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Mar 24, 2026

  • Investigate CI unit test failure
  • Identify root cause: race condition in port_registry_test.go parallel allocation blocks
  • Fix both parallel allocation blocks to use atomic LoadOrStore for name deduplication
  • Verify fix by running all 85 operator tests (66 controller + 10 webhook + 9 cert watcher)

⚡ Quickly spin up Copilot coding agent tasks from anywhere on your macOS or Windows machine with Raycast.

Copilot AI and others added 3 commits March 24, 2026 04:12
Add CertificateWatcher that polls TLS cert files for changes and
reloads them dynamically. Refactor AllocationApiServer to use
GetCertificate/GetConfigForClient callbacks instead of static cert
bytes. Mount tls-secret as volume for file-based cert access.

Closes #422

Co-authored-by: dgkanatsios <8256138+dgkanatsios@users.noreply.github.com>
Agent-Logs-Url: https://github.com/PlayFab/thundernetes/sessions/3528b7ff-34c9-4621-a98f-b9b8427da515
Co-authored-by: dgkanatsios <8256138+dgkanatsios@users.noreply.github.com>
Agent-Logs-Url: https://github.com/PlayFab/thundernetes/sessions/3528b7ff-34c9-4621-a98f-b9b8427da515
@dgkanatsios dgkanatsios marked this pull request as ready for review March 24, 2026 04:58
@dgkanatsios
Copy link
Copy Markdown
Collaborator

@copilot can you check why the unit test failed? it's not because of this PR, wondering if there's a race condition bug somewhere

…r name deduplication

The port_registry_test.go had two race conditions in the parallel
allocation test:

1. First block (line 434-438): Used Load() to check if a name existed,
   then Store() to claim it - a TOCTOU race where two goroutines could
   both see the name as unclaimed and both proceed.

2. Second block (line 497): Had no name deduplication at all, so random
   name collisions with existing entries caused "ports already assigned
   for this GameServer" errors.

Fix: Use sync.Map.LoadOrStore() in both blocks to atomically claim
the generated name, preventing any possibility of duplicate
registration.

Co-authored-by: dgkanatsios <8256138+dgkanatsios@users.noreply.github.com>
Agent-Logs-Url: https://github.com/PlayFab/thundernetes/sessions/677c2bb3-f657-4a37-a811-3b14c993038a
Copy link
Copy Markdown
Contributor Author

Copilot AI commented Mar 24, 2026

@copilot can you check why the unit test failed? it's not because of this PR, wondering if there's a race condition bug somewhere

Yes, it's a pre-existing race condition in port_registry_test.go, not caused by this PR. The test "Random port registration on port registry with two thousand ports, five hundred on four nodes" had two bugs:

  1. Second parallel allocation block (line 489-503) had no name deduplication at allgenerateName(prefix) could produce a name that already existed in gameServerNamesAndPorts (from the first allocation round or from another goroutine), causing GetNewPorts to return "ports already assigned for this GameServer".

  2. First parallel allocation block (line 434-438) had a TOCTOU race — two goroutines could both call Load(gameServerName), both see !ok, and both proceed with the same name. The first to reach GetNewPorts succeeds, the second gets "ports already assigned".

Fix: replaced both blocks with sync.Map.LoadOrStore() to atomically claim generated names, preventing duplicate registration. Commit 3c477a9.

@dgkanatsios dgkanatsios merged commit 93f347d into main Mar 24, 2026
7 checks passed
@dgkanatsios dgkanatsios deleted the copilot/cert-rotation-allocation-api branch March 24, 2026 06:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants