feat(logger): add rate limiter by kalyazin · Pull Request #5799 · firecracker-microvm/firecracker

kalyazin · 2026-03-27T11:52:09Z

Changes

Add per-callsite rate limiting for guest-triggered logging paths, following the Linux kernel printk_ratelimited pattern. The error_rate_limited! macro gives each callsite its own independent, preconfigured rate limiter set to 10 messages per 5-second window. When messages are suppressed, a summary is emitted once the callsite resumes logging. A new rate_limited_log_count metric tracks total suppressions.

I was not able to build an integration test that demonstrates that the rate limiting is effective against a real end-to-end scenario because it would've required a custom guest kernel, but I ran an ad hoc experiment by inserting an extra error_rate_limited! line into the balloon
inflate descriptor processing loop (hot path) and saw that it was rate-limited from 128 lines to 10 as expected.

Reason

Guest VMs can trigger repeated error!() calls through various virtio device paths (balloon, net, block, PCI, MMIO). Under sustained error conditions, this leads to excessive disk I/O and CPU consumption on the host from synchronous log writes.

License Acceptance

By submitting this pull request, I confirm that my contribution is made under
the terms of the Apache 2.0 license. For more information on following Developer
Certificate of Origin and signing off your commits, please check
CONTRIBUTING.md.

PR Checklist

This functionality cannot be added in rust-vmm.

codecov · 2026-03-27T11:57:44Z

Codecov Report

❌ Patch coverage is 41.86047% with 150 lines in your changes missing coverage. Please review.
✅ Project coverage is 82.99%. Comparing base (15f2ad8) to head (d5835aa).

Files with missing lines	Patch %	Lines
...rc/vmm/src/devices/virtio/balloon/event_handler.rs	0.00%	12 Missing ⚠️
src/vmm/src/devices/virtio/vsock/event_handler.rs	20.00%	12 Missing ⚠️
src/vmm/src/devices/virtio/vsock/unix/muxer.rs	7.69%	12 Missing ⚠️
src/vmm/src/devices/virtio/net/event_handler.rs	0.00%	11 Missing ⚠️
src/vmm/src/devices/virtio/vsock/device.rs	28.57%	10 Missing ⚠️
src/vmm/src/devices/virtio/block/virtio/device.rs	30.76%	9 Missing ⚠️
...m/src/devices/virtio/block/virtio/event_handler.rs	0.00%	9 Missing ⚠️
src/vmm/src/devices/virtio/pmem/device.rs	0.00%	8 Missing ⚠️
src/vmm/src/devices/virtio/rng/event_handler.rs	0.00%	8 Missing ⚠️
...c/devices/virtio/block/vhost_user/event_handler.rs	0.00%	7 Missing ⚠️
... and 14 more

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #5799      +/-   ##
==========================================
- Coverage   83.00%   82.99%   -0.02%     
==========================================
  Files         275      276       +1     
  Lines       29396    29421      +25     
==========================================
+ Hits        24401    24417      +16     
- Misses       4995     5004       +9

Flag	Coverage Δ
5.10-m5n.metal	`83.26% <41.63%> (-0.02%)`	⬇️
5.10-m6a.metal	`82.58% <41.63%> (-0.02%)`	⬇️
5.10-m6g.metal	`79.89% <42.16%> (-0.01%)`	⬇️
5.10-m6i.metal	`83.26% <41.63%> (-0.02%)`	⬇️
5.10-m7a.metal-48xl	`82.57% <41.63%> (-0.02%)`	⬇️
5.10-m7g.metal	`79.89% <42.16%> (-0.01%)`	⬇️
5.10-m7i.metal-24xl	`83.23% <41.63%> (-0.02%)`	⬇️
5.10-m7i.metal-48xl	`83.22% <41.63%> (-0.03%)`	⬇️
5.10-m8g.metal-24xl	`79.89% <42.16%> (-0.01%)`	⬇️
5.10-m8g.metal-48xl	`79.89% <42.16%> (-0.01%)`	⬇️
5.10-m8i.metal-48xl	`83.22% <41.63%> (-0.02%)`	⬇️
5.10-m8i.metal-96xl	`83.22% <41.63%> (-0.03%)`	⬇️
6.1-m5n.metal	`83.28% <41.63%> (-0.02%)`	⬇️
6.1-m6a.metal	`82.61% <41.63%> (-0.02%)`	⬇️
6.1-m6g.metal	`79.88% <42.16%> (-0.02%)`	⬇️
6.1-m6i.metal	`83.28% <41.63%> (-0.02%)`	⬇️
6.1-m7a.metal-48xl	`82.59% <41.63%> (-0.02%)`	⬇️
6.1-m7g.metal	`79.89% <42.16%> (-0.01%)`	⬇️
6.1-m7i.metal-24xl	`83.29% <41.63%> (-0.03%)`	⬇️
6.1-m7i.metal-48xl	`83.29% <41.63%> (-0.02%)`	⬇️
6.1-m8g.metal-24xl	`79.89% <42.16%> (-0.01%)`	⬇️
6.1-m8g.metal-48xl	`79.89% <42.16%> (-0.01%)`	⬇️
6.1-m8i.metal-48xl	`83.30% <41.63%> (-0.02%)`	⬇️
6.1-m8i.metal-96xl	`83.30% <41.63%> (-0.02%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

src/vmm/src/devices/virtio/balloon/device.rs

src/vmm/src/logger/rate_limited.rs

src/vmm/src/logger/mod.rs

CHANGELOG.md

src/vmm/src/logger/mod.rs

src/vmm/src/logger/rate_limited.rs

ilstam · 2026-04-01T16:26:18Z

src/vmm/src/logger/rate_limited.rs

+use crate::rate_limiter::TokenBucket;
+
+/// Maximum number of messages allowed per refill period.
+pub const DEFAULT_BURST: u64 = 10;


Is 10 message per 5 seconds overly conservative?

Ah, it's per callsite

Add a per-callsite rate limiter for logging that reuses the existing TokenBucket implementation. LogRateLimiter wraps TokenBucket in OnceLock<Mutex<...>> for lazy initialization and thread safety. The check() method calls TokenBucket::reduce(1) to consume a token and returns whether the message should be logged. Default configuration: 10 messages per 5-second refill period. Includes unit tests for burst enforcement, callsite independence, and token refill after the refill period. Signed-off-by: Nikita Kalyazin <kalyazin@amazon.com>

Define error_rate_limited!, warn_rate_limited!, and info_rate_limited! macros via a shared __log_rate_limited_impl! helper. Each macro first checks log_enabled! at the appropriate level to avoid rate limiter overhead for filtered-out messages. If the level is enabled, each invocation site creates an independent static LogRateLimiter and AtomicU64 suppression counter. Allowed messages are emitted normally; suppressed messages increment the counter and the global rate_limited_log_count metric. When logging resumes, a warn-level summary reports the suppression count. Add rate_limited_log_count field to LoggerSystemMetrics and to the fcmetrics.py validation schema. Re-export all rate-limited macros via crate::logger for consistency with other log macros. Signed-off-by: Nikita Kalyazin <kalyazin@amazon.com>

Migrate error!, warn!, and info! log callsites in device and VMM paths to their rate-limited equivalents. debug! calls are left unchanged as they are disabled in release Firecracker builds and the log_enabled! check in the rate-limited macros already ensures zero overhead for disabled levels. Covers all virtio devices (balloon, net, block, rng, vsock, pmem, mem), transport layers (MMIO, PCI), vCPU exit handling, memory management, and the I/O rate limiter. Each callsite independently rate-limits to 10 messages per 5-second refill period. Signed-off-by: Nikita Kalyazin <kalyazin@amazon.com>

Add an entry under the Unreleased section documenting per-callsite rate limiting for error, warn, and info level log messages and the new rate_limited_log_count metric. Signed-off-by: Nikita Kalyazin <kalyazin@amazon.com>

Manciukic · 2026-04-02T15:14:57Z

src/vmm/src/arch/aarch64/vcpu.rs

 use crate::cpu_config::aarch64::custom_cpu_template::VcpuFeatures;
 use crate::cpu_config::templates::CpuConfiguration;
-use crate::logger::{IncMetric, METRICS, error};
+use crate::logger::{IncMetric, METRICS, error_rate_limited};


can we make crate::logger::error be the rate limited one and ensure we're not using log::error directly?

Clippy can be configured to check it with clippy::disallowed_macros and

disallowed-macros = [ { path = "log::error", reason = "use crate::logger::error! instead" }, { path = "log::warn", reason = "use crate::logger::warn! instead" }, { path = "log::info", reason = "use crate::logger::info! instead" }, ]

I think it is worth keeping the _rate_limited suffix for all rate limited logs for consistency
disallowed_macros looks interesting though

Manciukic · 2026-04-02T15:23:16Z

src/vmm/src/logger/mod.rs

+            static LIMITER: $crate::logger::rate_limited::LogRateLimiter =
+                $crate::logger::rate_limited::LogRateLimiter::new();
+            static SUPPRESSED: std::sync::atomic::AtomicU64 =
+                std::sync::atomic::AtomicU64::new(0);


this seems to be 88 bytes per callsite, so roughly 14KiB (with roughly 150 callsites) in .bss. I think it's acceptable, just noting it.

In the future we can shrink TokenBucket to 48 bytes if we replace all u64 to u32 and then here we will get nice 64 bytes per call.
But here maybe switching to AtomicU32/U16 maybe worth since suppressing 64K logs wold be pretty uncommon + we only track this value as a metadata, so even overflow is not an issue.

ShadowCurse · 2026-04-02T15:31:30Z

src/vmm/src/logger/rate_limited.rs

+    pub const fn new() -> Self {
+        Self {
+            inner: OnceLock::new(),
+        }
+    }


can this take burst and refil_time as args? And then the Default can be implement with new(DEFAULT_BURST, ...)
This way it can be configured if needed, like in the unit test that waits for 5 seconds for no reason.

ShadowCurse · 2026-04-02T15:38:19Z

src/vmm/src/logger/mod.rs

+            static LIMITER: $crate::logger::rate_limited::LogRateLimiter =
+                $crate::logger::rate_limited::LogRateLimiter::new();
+            static SUPPRESSED: std::sync::atomic::AtomicU64 =
+                std::sync::atomic::AtomicU64::new(0);


In the future we can shrink TokenBucket to 48 bytes if we replace all u64 to u32 and then here we will get nice 64 bytes per call.
But here maybe switching to AtomicU32/U16 maybe worth since suppressing 64K logs wold be pretty uncommon + we only track this value as a metadata, so even overflow is not an issue.

ShadowCurse · 2026-04-02T15:49:31Z

src/vmm/src/arch/aarch64/vcpu.rs

 use crate::cpu_config::aarch64::custom_cpu_template::VcpuFeatures;
 use crate::cpu_config::templates::CpuConfiguration;
-use crate::logger::{IncMetric, METRICS, error};
+use crate::logger::{IncMetric, METRICS, error_rate_limited};


I think it is worth keeping the _rate_limited suffix for all rate limited logs for consistency
disallowed_macros looks interesting though

kalyazin force-pushed the log_rate_limiter branch from 2325c61 to 3ddd1f5 Compare March 27, 2026 11:52

kalyazin force-pushed the log_rate_limiter branch 2 times, most recently from 0240225 to eb60521 Compare March 27, 2026 12:33

kalyazin marked this pull request as ready for review March 27, 2026 12:33

kalyazin requested review from Manciukic and pb8o as code owners March 27, 2026 12:33

kalyazin self-assigned this Mar 27, 2026

kalyazin added the Status: Awaiting review Indicates that a pull request is ready to be reviewed label Mar 27, 2026

ShadowCurse reviewed Mar 27, 2026

View reviewed changes

src/vmm/src/devices/virtio/balloon/device.rs Outdated Show resolved Hide resolved

src/vmm/src/logger/rate_limited.rs Outdated Show resolved Hide resolved

ShadowCurse reviewed Mar 27, 2026

View reviewed changes

src/vmm/src/logger/rate_limited.rs Outdated Show resolved Hide resolved

Manciukic reviewed Mar 27, 2026

View reviewed changes

src/vmm/src/logger/rate_limited.rs Show resolved Hide resolved

Manciukic reviewed Mar 27, 2026

View reviewed changes

src/vmm/src/logger/rate_limited.rs Show resolved Hide resolved

kalyazin force-pushed the log_rate_limiter branch 3 times, most recently from 531998b to 80580f3 Compare March 30, 2026 14:40

ShadowCurse reviewed Mar 30, 2026

View reviewed changes

src/vmm/src/logger/mod.rs Outdated Show resolved Hide resolved

ShadowCurse reviewed Mar 30, 2026

View reviewed changes

CHANGELOG.md Outdated Show resolved Hide resolved

kalyazin force-pushed the log_rate_limiter branch from 80580f3 to b795a7b Compare March 30, 2026 15:47

Manciukic reviewed Mar 30, 2026

View reviewed changes

src/vmm/src/logger/mod.rs Show resolved Hide resolved

src/vmm/src/logger/rate_limited.rs Show resolved Hide resolved

ilstam reviewed Apr 1, 2026

View reviewed changes

kalyazin added 3 commits April 2, 2026 11:11

kalyazin force-pushed the log_rate_limiter branch from b795a7b to 0514643 Compare April 2, 2026 11:17

changelog: add entry for per-callsite rate-limited logging

d5835aa

Add an entry under the Unreleased section documenting per-callsite rate limiting for error, warn, and info level log messages and the new rate_limited_log_count metric. Signed-off-by: Nikita Kalyazin <kalyazin@amazon.com>

kalyazin force-pushed the log_rate_limiter branch from 0514643 to d5835aa Compare April 2, 2026 11:49

Manciukic reviewed Apr 2, 2026

View reviewed changes

ShadowCurse reviewed Apr 2, 2026

View reviewed changes

Conversation

kalyazin commented Mar 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

Reason

License Acceptance

PR Checklist

Uh oh!

codecov bot commented Mar 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ilstam Apr 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

kalyazin commented Mar 27, 2026 •

edited

Loading

codecov bot commented Mar 27, 2026 •

edited

Loading

ilstam Apr 1, 2026 •

edited

Loading