Skip to content

feat(client): reload in-cluster CA bundle on rotation (rustls-tls)#1962

Open
chrnorm wants to merge 1 commit intokube-rs:mainfrom
chrnorm:incluster-ca-reload
Open

feat(client): reload in-cluster CA bundle on rotation (rustls-tls)#1962
chrnorm wants to merge 1 commit intokube-rs:mainfrom
chrnorm:incluster-ca-reload

Conversation

@chrnorm
Copy link
Copy Markdown

@chrnorm chrnorm commented Mar 17, 2026

Motivation

Config::incluster() reads /var/run/secrets/kubernetes.io/serviceaccount/ca.crt once at startup and bakes the bytes into a RootCertStore. After the cluster CA rotates, new TLS handshakes fail with cert errors until the process restarts. The projected service account volume already swaps the file in place — kube just never re-reads it.

TokenFile already solves the symmetric problem for the sibling token file in the same projected volume (re-reads every 60s). This PR adds the same treatment for ca.crt.

Closes #1953. Related client-go issue: kubernetes/kubernetes#119483.

Solution

  • Config.root_cert_file: Option<PathBuf> — new field, set automatically by Config::incluster(). Takes precedence over root_cert for server cert verification when set.
  • ReloadingVerifier — a ServerCertVerifier that rebuilds an inner WebPkiServerVerifier on a ~60s timer. On reload failure it keeps serving with the stale roots rather than failing closed.
  • rustls-tls only — openssl-tls path is unchanged.
  • Config is now #[non_exhaustive] — per review feedback on the issue, so future field additions don't break downstream struct literals again. Users who were constructing Config { ... } directly need to switch to Config::new() + field mutation (already recommended by the existing docs).

Config::incluster() reads /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
once and bakes the bytes into a RootCertStore. After CA rotation, new TLS
handshakes fail until the process restarts.

TokenFile already re-reads the sibling token file in that same projected
volume every 60s. This adds the symmetric piece for ca.crt:

- Config.root_cert_file: Option<PathBuf>, set by Config::incluster()
- ReloadingVerifier: ServerCertVerifier that rebuilds an inner
  WebPkiServerVerifier on a 60s timer, keeps stale roots on reload failure
- rustls-tls only; openssl-tls unchanged

Config is now #[non_exhaustive] so this field addition (and future ones)
doesn't break downstream struct literals again.

Closes kube-rs#1953

Signed-off-by: Chris Norman <[email protected]>
@codecov
Copy link
Copy Markdown

codecov Bot commented Mar 17, 2026

Codecov Report

❌ Patch coverage is 78.12500% with 14 lines in your changes missing coverage. Please review.
✅ Project coverage is 76.4%. Comparing base (288053e) to head (bc0e7db).
⚠️ Report is 9 commits behind head on main.

Files with missing lines Patch % Lines
kube-client/src/client/tls.rs 81.5% 10 Missing ⚠️
kube-client/src/client/config_ext.rs 57.2% 3 Missing ⚠️
kube-client/src/config/mod.rs 66.7% 1 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##            main   #1962     +/-   ##
=======================================
+ Coverage   76.4%   76.4%   +0.1%     
=======================================
  Files         89      89             
  Lines       8540    8602     +62     
=======================================
+ Hits        6520    6568     +48     
- Misses      2020    2034     +14     
Files with missing lines Coverage Δ
kube-client/src/config/incluster_config.rs 67.5% <ø> (ø)
kube-client/src/config/mod.rs 54.6% <66.7%> (-0.1%) ⬇️
kube-client/src/client/config_ext.rs 52.5% <57.2%> (-0.1%) ⬇️
kube-client/src/client/tls.rs 86.1% <81.5%> (-3.2%) ⬇️

... and 2 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Comment on lines +153 to +158
let guard = self.inner.read().unwrap();
if guard.1.elapsed() < Self::RELOAD_INTERVAL {
return guard.0.clone();
}
}
let mut guard = self.inner.write().unwrap();
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Consider using .unwrap_or_else(|e| e.into_inner()) instead of .unwrap() on both the read lock (L153) and write lock (L158).

Realistically there is no panic path inside the write-lock critical section, so poisoning is extremely unlikely. But this verifier sits on the critical path of every TLS handshake — if the lock were ever poisoned:

  • .unwrap() → panic → process crash
  • .unwrap_or_else(|e| e.into_inner()) → falls back to stale roots → still serves during CA overlap period, and even after overlap it fails with a TLS error (retryable) rather than a panic (process death)

Two-line change, zero cost on the happy path, and consistent with the "keep stale on failure" policy already applied to file-reload errors in L162-166.

Copy link
Copy Markdown
Member

@doxxx93 doxxx93 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clean approach — rustls's ServerCertVerifier trait makes this much simpler than the equivalent client-go fix (kubernetes/kubernetes#119483, which took ~2.5 years to land). The double-check pattern in current() is correct, the fail-open policy on reload errors mirrors TokenFile, and the test coverage hits the key scenarios.

Left one minor nit on lock poison handling (L153/L158), but not blocking — the realistic chance of triggering it is near zero.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Reload in-cluster CA bundle on rotation (rustls-tls)

2 participants