feat(client): reload in-cluster CA bundle on rotation (rustls-tls)#1962
feat(client): reload in-cluster CA bundle on rotation (rustls-tls)#1962chrnorm wants to merge 1 commit intokube-rs:mainfrom
Conversation
Config::incluster() reads /var/run/secrets/kubernetes.io/serviceaccount/ca.crt once and bakes the bytes into a RootCertStore. After CA rotation, new TLS handshakes fail until the process restarts. TokenFile already re-reads the sibling token file in that same projected volume every 60s. This adds the symmetric piece for ca.crt: - Config.root_cert_file: Option<PathBuf>, set by Config::incluster() - ReloadingVerifier: ServerCertVerifier that rebuilds an inner WebPkiServerVerifier on a 60s timer, keeps stale roots on reload failure - rustls-tls only; openssl-tls unchanged Config is now #[non_exhaustive] so this field addition (and future ones) doesn't break downstream struct literals again. Closes kube-rs#1953 Signed-off-by: Chris Norman <[email protected]>
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1962 +/- ##
=======================================
+ Coverage 76.4% 76.4% +0.1%
=======================================
Files 89 89
Lines 8540 8602 +62
=======================================
+ Hits 6520 6568 +48
- Misses 2020 2034 +14
🚀 New features to boost your workflow:
|
| let guard = self.inner.read().unwrap(); | ||
| if guard.1.elapsed() < Self::RELOAD_INTERVAL { | ||
| return guard.0.clone(); | ||
| } | ||
| } | ||
| let mut guard = self.inner.write().unwrap(); |
There was a problem hiding this comment.
nit: Consider using .unwrap_or_else(|e| e.into_inner()) instead of .unwrap() on both the read lock (L153) and write lock (L158).
Realistically there is no panic path inside the write-lock critical section, so poisoning is extremely unlikely. But this verifier sits on the critical path of every TLS handshake — if the lock were ever poisoned:
.unwrap()→ panic → process crash.unwrap_or_else(|e| e.into_inner())→ falls back to stale roots → still serves during CA overlap period, and even after overlap it fails with a TLS error (retryable) rather than a panic (process death)
Two-line change, zero cost on the happy path, and consistent with the "keep stale on failure" policy already applied to file-reload errors in L162-166.
doxxx93
left a comment
There was a problem hiding this comment.
Clean approach — rustls's ServerCertVerifier trait makes this much simpler than the equivalent client-go fix (kubernetes/kubernetes#119483, which took ~2.5 years to land). The double-check pattern in current() is correct, the fail-open policy on reload errors mirrors TokenFile, and the test coverage hits the key scenarios.
Left one minor nit on lock poison handling (L153/L158), but not blocking — the realistic chance of triggering it is near zero.
Motivation
Config::incluster()reads/var/run/secrets/kubernetes.io/serviceaccount/ca.crtonce at startup and bakes the bytes into aRootCertStore. After the cluster CA rotates, new TLS handshakes fail with cert errors until the process restarts. The projected service account volume already swaps the file in place — kube just never re-reads it.TokenFilealready solves the symmetric problem for the siblingtokenfile in the same projected volume (re-reads every 60s). This PR adds the same treatment forca.crt.Closes #1953. Related client-go issue: kubernetes/kubernetes#119483.
Solution
Config.root_cert_file: Option<PathBuf>— new field, set automatically byConfig::incluster(). Takes precedence overroot_certfor server cert verification when set.ReloadingVerifier— aServerCertVerifierthat rebuilds an innerWebPkiServerVerifieron a ~60s timer. On reload failure it keeps serving with the stale roots rather than failing closed.Configis now#[non_exhaustive]— per review feedback on the issue, so future field additions don't break downstream struct literals again. Users who were constructingConfig { ... }directly need to switch toConfig::new()+ field mutation (already recommended by the existing docs).