Skip to content

fix: notification config not broadcast to swarm nodes#4563

Merged
amir20 merged 6 commits intomasterfrom
fix/swarm-notification-broadcast
Mar 25, 2026
Merged

fix: notification config not broadcast to swarm nodes#4563
amir20 merged 6 commits intomasterfrom
fix/swarm-notification-broadcast

Conversation

@amir20
Copy link
Copy Markdown
Owner

@amir20 amir20 commented Mar 25, 2026

Summary

  • Swarm agent server was created with nil notification handler (main.go:98), so gRPC config broadcasts were silently dropped on all swarm nodes
  • EventExpression was missing from subscription broadcast conversion, breaking event-based alerts on agent nodes
  • Headers was missing from dispatcher broadcast conversion, breaking webhook dispatchers with custom headers on agent nodes

Fixes #4562

Test plan

  • Deploy in swarm mode with multiple nodes
  • Add a notification rule and verify it triggers on containers running on remote swarm nodes
  • Verify event-based alerts (container start/stop) work on agent nodes
  • Verify webhook dispatchers with custom headers work on agent nodes

🤖 Generated with Claude Code

Three bugs fixed:

1. Swarm agent server created with nil notification handler, so config
   broadcasts were silently dropped. Pass multiHostService.NotificationHandler()
   instead.

2. EventExpression missing from subscription broadcast conversion,
   preventing event-based alerts on agent nodes.

3. Headers missing from dispatcher broadcast conversion, breaking
   webhook dispatchers with custom headers on agent nodes.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@claude
Copy link
Copy Markdown

claude bot commented Mar 25, 2026

Clean, minimal fix. All three issues are legitimate bugs.

  • EventExpression and Headers were both present in HandleNotificationConfig / LoadConfig but missing from broadcastNotificationConfig — copy-paste omissions confirmed.
  • Manager correctly implements the agent.NotificationConfigHandler interface (HandleNotificationConfig + GetNotificationStats).
  • NotificationHandler() accessor is safe — notificationManager is set at construction, no race risk.

No issues found.

@claude
Copy link
Copy Markdown

claude bot commented Mar 25, 2026

  • Fix is correct — nil handler was silently dropping all notification config broadcasts to swarm nodes
  • EventExpression and Headers omissions are straightforward copy bugs; worth double-checking no other fields are missing in broadcastNotificationConfig (e.g. compare all fields in notification.Subscription and Dispatcher structs vs what's copied)
  • NotificationHandler() returning concrete *notification.Manager is fine if agent.NewServer accepts it; minor nit — naming it NotificationManager() would match what it returns
  • protoc version bump in generated files is noise but harmless

@claude
Copy link
Copy Markdown

claude bot commented Mar 25, 2026

Three targeted fixes, all correct:

  • main.go:97: Passing NotificationHandler() instead of nil is the right fix. The TODO was a known gap.
  • multi_host_service.go:256: EventExpression omission was a bug — event-based alerts would silently not work on remote nodes.
  • multi_host_service.go:271: Headers omission broke webhook dispatchers with auth headers on agents.

One concern: NotificationHandler() exposes the internal *notification.Manager directly. If the agent server only needs agent.NotificationConfigHandler, returning the concrete type leaks internals. Minor, but worth considering if the interface is already defined.

Otherwise the fix is minimal, correct, and well-scoped.

Both call sites always pass a non-nil handler, so these guards indicate
a programming error if ever triggered.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@claude
Copy link
Copy Markdown

claude bot commented Mar 25, 2026

Code review:

  • server.go:394 / 447: log.Fatal() inside a gRPC handler is too aggressive — it kills the entire process if a client calls these RPCs on a server where the handler isn't registered. Should return a gRPC error instead (e.g. status.Error(codes.Internal, "...")), or at minimum keep the original warn+return pattern. Fatal is appropriate at startup, not at request time.

  • multi_host_service.go:296-301: NotificationHandler() returns the concrete *notification.Manager type — consider whether the agent.NotificationConfigHandler interface would be more appropriate here to avoid coupling.

  • The EventExpression and Headers fixes look correct and the main.go nil→handler fix addresses the root cause cleanly.

  • Protoc version bump (v7.34.0→v7.34.1) in generated files is fine noise.

Fail fast at server creation time instead of at RPC call time.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@claude
Copy link
Copy Markdown

claude bot commented Mar 25, 2026

Clarifying on the log.Fatal() concern raised in prior reviews: the nil check + fatal was moved from the request handlers to the constructor newServer(). Fatal at construction time (startup) is correct behavior — it surfaces the misconfiguration immediately rather than silently ignoring config updates at runtime. No issue here.

The three substantive fixes (nil handler, EventExpression, Headers) are all correct. LGTM.

The agent server now requires a non-nil NotificationConfigHandler,
so the test needs a mock implementation.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@claude
Copy link
Copy Markdown

claude bot commented Mar 25, 2026

Review

  • server.go: Moving nil check to newServer with log.Fatal() is correct — programmer error at startup, not a runtime condition. Good call.
  • main.go:97: Fix looks correct. TODO resolved.
  • multi_host_service.go: Missing EventExpression and Headers fields — straightforward fix.
  • NotificationHandler() returns *notification.Manager — confirm it implements agent.NotificationConfigHandler (not visible in diff, but assumed from compilation).

Test plan gap: Two items unchecked — event-based alerts and webhook custom headers on agent nodes. These are the exact scenarios being fixed; should be verified before merge.

Minor: Whitespace-only alignment change in GetNotificationStats is noise in the diff.

@amir20 amir20 merged commit f60fbc2 into master Mar 25, 2026
12 checks passed
@amir20 amir20 deleted the fix/swarm-notification-broadcast branch March 25, 2026 22:49
amir20 added a commit that referenced this pull request Mar 26, 2026
APIKey, Prefix, and ExpiresAt were missing from the NotificationDispatcher
protobuf message, causing these fields to be silently dropped when
broadcasting notification config to agent nodes. Same class of bug as #4563.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Alerts and subscriptions not syncing in Swarm mode

1 participant