fix: support --inject-cgroup and sidecar stress modes on containerd#304
Merged
alexei-led merged 10 commits intomasterfrom Mar 1, 2026
Merged
fix: support --inject-cgroup and sidecar stress modes on containerd#304alexei-led merged 10 commits intomasterfrom
alexei-led merged 10 commits intomasterfrom
Conversation
…untime The containerd StressContainer() method was silently ignoring --inject-cgroup, --stress-image, and --pull-image flags, falling back to direct exec mode. Implement three stress modes matching the Docker runtime: - Direct exec (no --stress-image): existing behaviour, runs stress-ng inside target - Default sidecar (--stress-image): /stress-ng in a child cgroup of the target - Inject-cgroup (--inject-cgroup --stress-image): /cg-inject with host cgroupns and /sys/fs/cgroup mount to inject into target's cgroup Resolves #303
Test Results546 tests 544 ✅ 2s ⏱️ Results for commit f79f571. ♻️ This comment has been updated with latest results. |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #304 +/- ##
==========================================
- Coverage 70.15% 69.28% -0.87%
==========================================
Files 46 46
Lines 1933 2051 +118
==========================================
+ Hits 1356 1421 +65
- Misses 577 630 +53 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
- Use strings.SplitSeq (modernize linter) - Use strings.HasPrefix in stressPrefix matcher - Wrap resolveCgroupPath error with context - Convert resolveCgroupPath tests to table-driven - Combine buildStressSpecOpts tests into table-driven - Extract newSidecarWithExitCode helper, deduplicate NonZeroExit test - Standardize require.NoError in SidecarDryrun test - Replace sleep 5 with wait_for polling in bats setup - Add snapshot cleanup in bats teardown - Use default-and-override for cgroupParent assignment
- Add com.gaiaadm.pumba.skip label to stress sidecar containers to prevent pumba from targeting its own sidecars (parity with Docker) - Use two-value receive on waitCh to detect unexpected channel close - Fix off-by-one capacity in inject-cgroup args builder (+4 -> +5) - Remove double-wrapped pullImage error message - Rename cleanupContainer to deleteContainer (clarify vs cleanupSidecar) - Add startSidecarTask error path tests (NewTask, Wait, Start failures) - Fold ReadError into table-driven TestResolveCgroupPath - Convert stressPrefix from package-level var to function - Add log assertion to child-cgroup bats integration test - Update mock expectations for WithContainerLabels option
- Add select with ctx.Done() in waitStressSidecar to prevent goroutine
leak when containerd becomes unresponsive; force-kills task on cancel
- Swap pull/cgroup-resolution order for fail-fast parity with Docker
- Truncate raw cgroup data in error message for log cleanliness
- Replace READER_ERROR sentinel with readerErr struct field in table test
- Remove duplicate SidecarDryrun test (already covered by Dryrun test)
- Consolidate SidecarSuccess, InjectCgroupSuccess, SidecarNonZeroExit
into single table-driven TestStressContainer_SidecarExitCodes
- Extract setupStressTarget helper to reduce preamble duplication
- Add setCgroupReader to PullImageError test for ordering robustness
- Change interface{} to any in stressPrefix() return type
- Add kill timeout on post-kill waitCh receive to prevent goroutine hang - Add nil-check after GetImage to prevent panic on nil image - Reuse deleteContainer in waitStressSidecar to eliminate duplicated cleanup - Add test for context-cancellation path in waitStressSidecar - Add test for closed wait-channel (ok=false) path - Expand TestBuildStressSpecOpts to verify spec content, not just length - Remove redundant wantSuccess field from SidecarExitCodes table - Add AssertExpectations to hand-rolled mocks in all new tests - Document which parameter each branch uses in buildStressSpecOpts
- Simplify deleteContainer to trust caller's context instead of double-wrapping - Remove dead nil-image guard after GetImage - Change rbind to bind for /sys/fs/cgroup mount (Docker parity) - Wrap pullImage error for better diagnostics - Replace fragile injectFixedArgs constant with prefix slice - Log non-NotFound kill errors instead of silently discarding - Thread *testing.T through newSidecarWithExitCode for AssertExpectations - Add TestStressContainer_SidecarCgroupResolveError test - Remove test comments per project convention
- Add killTimedOut sentinel so kill-timeout path reports accurate error instead of misleading "wait channel closed unexpectedly" - Replace defer cancel() with direct cancel() in createStressSidecar error path for clarity and immediate timer release - Move defer cancel() after cleanup calls in waitStressSidecar to release context timer immediately after use - Flatten cleanupSidecar: guard clause for task error, collapse double-if on container delete to single && condition - Rewrite createStressSidecar godoc to be specific - Remove obvious paraphrasing comments from bats integration tests
- Use context.WithoutCancel in startSidecarTask error paths so task cleanup Delete calls succeed even if caller context is canceled - Add nosuid,nodev,noexec to /sys/fs/cgroup bind mount options to reduce attack surface on the inject-cgroup sidecar - Flatten cleanupSidecar: replace else-if with separate if statements - Trim deleteContainer godoc to remove name-restating first sentence - Remove obvious bats comment before sidecar dry-run assertion
containerd does not auto-create child cgroups like Docker's --cgroup-parent. Setting CgroupsPath to the parent (e.g. /system.slice) fails with "container's cgroup is not empty". Add cgroupChildPath() that mirrors containerd CRI's heuristic: systemd drivers use "slice:pumba:id" format, cgroupfs uses "parent/id".
Images are pre-pulled in setup() via ctr_pull_image, but pumba still contacted the registry at runtime. When ghcr.io is slow, this causes context canceled errors (flaky CI failures on tests 53-55). Match the pattern used by all Docker integration tests.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
--inject-cgroup,--stress-image, and--pull-imageflags were silently ignored on containerd — now they work correctlyResolves #303
Changes
pkg/runtime/containerd/sidecar.goresolveCgroupPath()— reads/proc/<pid>/cgroupto resolve target's cgroup path (v2 + v1 fallback)buildStressSpecOpts()— builds OCI spec for default vs inject-cgroup sidecar modesstressSidecar()— creates a long-lived sidecar running stress-ng/cg-inject as its main processcreateStressSidecar()/startSidecarTask()/waitStressSidecar()helpers (extracted for cyclomatic complexity)pkg/runtime/containerd/client.goStressContainer()with three-mode dispatch:image == ""→ direct exec (existing behavior)image != ""&&!injectCgroup→ sidecar with/stress-ng, CgroupParent = target's parentimage != ""&&injectCgroup→ sidecar with/cg-inject, host cgroupns,/sys/fs/cgroupmountpkg/runtime/containerd/client_test.goresolveCgroupPath,buildStressSpecOpts, and all sidecar stress code pathstests/containerd_stress.batsTest plan
make fmt— no changesmake lint— 0 issuesCGO_ENABLED=0 go test ./...— all tests passcolima ssh -- sudo bats tests/containerd_stress.bats— containerd integration tests