Skip to content

Send HostMetadata to BPF KubeProxy#11817

Draft
aaaaaaaalex wants to merge 10 commits intoprojectcalico:masterfrom
aaaaaaaalex:bpf-send-node-updates-to-kp
Draft

Send HostMetadata to BPF KubeProxy#11817
aaaaaaaalex wants to merge 10 commits intoprojectcalico:masterfrom
aaaaaaaalex:bpf-send-node-updates-to-kp

Conversation

@aaaaaaaalex
Copy link
Contributor

Description

Exposes host metadata (e.g. labels) to BPF KP by registering a cache in the dataplane, which calls a KP callback.
The cache pools individual HostMetadataV4V6Updates into an aggregated update, and is thread-safe in regards to the KP goroutine since it uses a channel to queue new updates/restarts in the main KP run method.

Doubts

By getting the data into the proxy this way, we have no debouncing/rate-limiting to protect KP from repeatedly restarting, in the event that the updates are very chatty (say, a user has a script that writes arbitrary data to node labels).

I can augment the cache's sendAllUpdates to be rate-limited if we agree there's a need for it, or I can also filter updates, or gate label updates behind a FelixConfig, etc.

Related issues/PRs

#11202

Todos

  • Tests
  • Documentation
  • Release note

Release Note

TBD

Reminder for the reviewer

Make sure that this PR has the correct labels and milestone set.

Every PR needs one docs-* label.

  • docs-pr-required: This change requires a change to the documentation that has not been completed yet.
  • docs-completed: This change has all necessary documentation completed.
  • docs-not-required: This change has no user-facing impact and requires no docs.

Every PR needs one release-note-* label.

  • release-note-required: This PR has user-facing changes. Most PRs should have this label.
  • release-note-not-required: This PR has no user-facing changes.

Other optional labels:

  • cherry-pick-candidate: This PR should be cherry-picked to an earlier release. For bug fixes only.
  • needs-operator-pr: This PR is related to install and requires a corresponding change to the operator.

@aaaaaaaalex aaaaaaaalex self-assigned this Feb 10, 2026
@marvin-tigera marvin-tigera added this to the Calico v3.32.0 milestone Feb 10, 2026
@marvin-tigera marvin-tigera added release-note-required Change has user-facing impact (no matter how small) docs-pr-required Change is not yet documented labels Feb 10, 2026
p.hostMetadataByHostname = updates
if requestResync {
// Invoke a sync via the runner, so that we can release any locks in this goroutine.
p.syncDP()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems reasonable 👍

"github.com/projectcalico/calico/felix/proto"
)

type HostMetadataCache struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the cache in this setup is quite a bit one-off tool. I think this can be a generic manager-style plugin that would feed evens of interest to KP. Perhaps preprocessed and sanitized. Perhaps batch them until the CompleteDeferredWork and send them as a batch. That itself gives some throttling.

Since the KP can decide on its own when to kick the DP resync, I think it would be a more appropriate place to do any throttling if need be.

Copy link
Contributor Author

@aaaaaaaalex aaaaaaaalex Feb 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now it's batching them until CompleteDeferredWork - but every update after complete deferred work is "immediate". Well, "immediately" requests a throttled flush, at-least.

I found batching with CompleteDeferredWork was not enough in the FVs to kill chattiness of HostMetadata events.

We can throttle anywhere theoretically but Im pretty hesitant to try doing the throttling in KP. I think it'd make that file much more spaghetti - and I don't see a clean way to throttle the KP channel that sends the update struct - since that channel is also being used as a coalescing-buffer.

My expectation is that to maintain the current pattern in KP where the loop pulls in coalesced updates in parallel, we'd need some fancy tool on either the sending or receiving side of the channel anyway, so having a throttled manager that does the throttling cleanly, and also does the batching seemed like the cleanest way

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now it's batching them until CompleteDeferredWork - but every update after complete deferred work is "immediate". Well, "immediately" requests a throttled flush, at-least.

I don't follow. DeferredWork is called "periodically", specifically not to act on every update update immediately

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can throttle anywhere theoretically but Im pretty hesitant to try doing the throttling in KP. I think it'd make that file much more spaghetti - and I don't see a clean way to throttle the KP channel that sends the update struct - since that channel is also being used as a coalescing-buffer.

Spaghetti? Whatever you make it ;-) I think the logic should be in KP. It may throttle some updades, not throttle others, imo it should make decisions on its own. May kick off the update together with service/endpoints updates etc.

var ok bool
select {
case hostIPs, ok := <-kp.hostIPUpdates:
case hostIPs, ok = <-kp.hostIPUpdates:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this now redundant because it is also covered by the hostMetadata updates (idk 100%)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll need to check on that - these host IP updates were ultimately coming from interface updates IIRC, and I'm not sure if the same word is being used differently between that and HostMetadata updates 🤔

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

docs-pr-required Change is not yet documented release-note-required Change has user-facing impact (no matter how small)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants