Send HostMetadata to BPF KubeProxy#11817
Send HostMetadata to BPF KubeProxy#11817aaaaaaaalex wants to merge 10 commits intoprojectcalico:masterfrom
Conversation
| p.hostMetadataByHostname = updates | ||
| if requestResync { | ||
| // Invoke a sync via the runner, so that we can release any locks in this goroutine. | ||
| p.syncDP() |
There was a problem hiding this comment.
this seems reasonable 👍
| "github.com/projectcalico/calico/felix/proto" | ||
| ) | ||
|
|
||
| type HostMetadataCache struct { |
There was a problem hiding this comment.
I think the cache in this setup is quite a bit one-off tool. I think this can be a generic manager-style plugin that would feed evens of interest to KP. Perhaps preprocessed and sanitized. Perhaps batch them until the CompleteDeferredWork and send them as a batch. That itself gives some throttling.
Since the KP can decide on its own when to kick the DP resync, I think it would be a more appropriate place to do any throttling if need be.
There was a problem hiding this comment.
Right now it's batching them until CompleteDeferredWork - but every update after complete deferred work is "immediate". Well, "immediately" requests a throttled flush, at-least.
I found batching with CompleteDeferredWork was not enough in the FVs to kill chattiness of HostMetadata events.
We can throttle anywhere theoretically but Im pretty hesitant to try doing the throttling in KP. I think it'd make that file much more spaghetti - and I don't see a clean way to throttle the KP channel that sends the update struct - since that channel is also being used as a coalescing-buffer.
My expectation is that to maintain the current pattern in KP where the loop pulls in coalesced updates in parallel, we'd need some fancy tool on either the sending or receiving side of the channel anyway, so having a throttled manager that does the throttling cleanly, and also does the batching seemed like the cleanest way
There was a problem hiding this comment.
Right now it's batching them until CompleteDeferredWork - but every update after complete deferred work is "immediate". Well, "immediately" requests a throttled flush, at-least.
I don't follow. DeferredWork is called "periodically", specifically not to act on every update update immediately
There was a problem hiding this comment.
We can throttle anywhere theoretically but Im pretty hesitant to try doing the throttling in KP. I think it'd make that file much more spaghetti - and I don't see a clean way to throttle the KP channel that sends the update struct - since that channel is also being used as a coalescing-buffer.
Spaghetti? Whatever you make it ;-) I think the logic should be in KP. It may throttle some updades, not throttle others, imo it should make decisions on its own. May kick off the update together with service/endpoints updates etc.
| var ok bool | ||
| select { | ||
| case hostIPs, ok := <-kp.hostIPUpdates: | ||
| case hostIPs, ok = <-kp.hostIPUpdates: |
There was a problem hiding this comment.
Is this now redundant because it is also covered by the hostMetadata updates (idk 100%)
There was a problem hiding this comment.
I'll need to check on that - these host IP updates were ultimately coming from interface updates IIRC, and I'm not sure if the same word is being used differently between that and HostMetadata updates 🤔
Description
Exposes host metadata (e.g. labels) to BPF KP by registering a cache in the dataplane, which calls a KP callback.
The cache pools individual HostMetadataV4V6Updates into an aggregated update, and is thread-safe in regards to the KP goroutine since it uses a channel to queue new updates/restarts in the main KP
runmethod.Doubts
By getting the data into the proxy this way, we have no debouncing/rate-limiting to protect KP from repeatedly restarting, in the event that the updates are very chatty (say, a user has a script that writes arbitrary data to node labels).
I can augment the cache's
sendAllUpdatesto be rate-limited if we agree there's a need for it, or I can also filter updates, or gate label updates behind a FelixConfig, etc.Related issues/PRs
#11202
Todos
Release Note
Reminder for the reviewer
Make sure that this PR has the correct labels and milestone set.
Every PR needs one
docs-*label.docs-pr-required: This change requires a change to the documentation that has not been completed yet.docs-completed: This change has all necessary documentation completed.docs-not-required: This change has no user-facing impact and requires no docs.Every PR needs one
release-note-*label.release-note-required: This PR has user-facing changes. Most PRs should have this label.release-note-not-required: This PR has no user-facing changes.Other optional labels:
cherry-pick-candidate: This PR should be cherry-picked to an earlier release. For bug fixes only.needs-operator-pr: This PR is related to install and requires a corresponding change to the operator.