-
Notifications
You must be signed in to change notification settings - Fork 125
OCPBUGS-62269 - Fix race condition causing missing audit log entries for rapid commands #3052
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Adding the "do-not-merge/release-note-label-needed" label because no release-note block was detected, please follow our release note process to remove it. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: BhargaviGudi The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
Hi @BhargaviGudi. Thanks for your PR. I'm waiting for a github.com member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #3052 +/- ##
===========================================
- Coverage 45.50% 24.18% -21.32%
===========================================
Files 79 125 +46
Lines 7782 17818 +10036
===========================================
+ Hits 3541 4309 +768
- Misses 4099 13225 +9126
- Partials 142 284 +142 🚀 New features to boost your workflow:
|
|
/ok-to-test |
96a2136 to
69ba88b
Compare
|
/retest-required |
|
@BhargaviGudi do you mind a rebase? |
| e.retryBpfRequestUID(logBucket) | ||
|
|
||
| // Retry container info lookup if it's still missing | ||
| e.retryContainerInfo(ctx, logBucket, processID, nodeName) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@BhargaviGudi Thanks for the PR. Can you try to test if e.retryContainerInfo(ctx, logBucket, processID, nodeName) alone can fix the issue without the retryBpfCmdLine and retryBpfRequestUID? I feel missing container info is the cause of the issue.
…for rapid commands OCPBUGS-62269 - Fix GitHub Actions linting issues OCPBUGS-62269 - Fix GitHub Actions linting issues
69ba88b to
613d628
Compare
What type of PR is this?
/kind bug
What this PR does / why we need it:
This PR fixes a race condition in the JSON enricher that causes audit log entries to be missing or incomplete when commands are executed in rapid succession (e.g.,
touch test1 && touch test2 && touch test3).Problem:
When short-lived processes execute quickly, audit events arrive at the JSON enricher before the corresponding BPF events have been processed from the ring buffer. This results in:
cmdLinefieldsrequestUIDresourceinformation (pod/namespace/container)Solution:
Implements a retry mechanism in
dispatchSeccompLine()that attempts to fetch missing information multiple times with progressive delays (0ms, 10ms, 50ms):Since LogBuckets remain in cache for 60 seconds before being written to the audit log, there's ample time for retries to succeed while BPF events are processed asynchronously.
Testing Results:
Which issue(s) this PR fixes:
Fixes https://issues.redhat.com/browse/OCPBUGS-62269
Does this PR have test?
Yes - The fix was validated through:
oc execandoc rsh)No new automated tests added as this is a timing-dependent race condition that's difficult to reliably reproduce in unit tests.
Special notes for your reviewer:
jsonenricher.go- no modifications to BPF components or ring buffer configurationDoes this PR introduce a user-facing change?