Which component are you using?:
/area vertical-pod-autoscaler
What version of the component are you using?:
VPA v1.4.1
Component version: v1.4.1 (recommender deployed with --oom-bump-up-ratio=2.0, --target-memory-percentile=0.99)
What k8s version are you using (kubectl version)?:
kubectl version Output
$ kubectl version
Client Version: v1.30.3
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.28.15+k3s1
What environment is this in?:
Production Kubernetes cluster on a managed cloud provider. Reproducible on any cluster running VPA recommender with --oom-bump-up-ratio > 1.0 and a VPA whose maxAllowed.memory is below the workload's real peak memory need.
What did you expect to happen?:
VPA recommendations should converge to a stable value near actual observed usage once the workload stabilizes. When maxAllowed caps requests below the workload's real peak, occasional OOMs are expected — but they should not feed back into the recommender in a way that permanently inflates uncappedTarget far beyond any value the workload has ever actually used.
What happened instead?:
VPA enters a self-reinforcing feedback loop that pins uncappedTarget at roughly maxAllowed × oom-bump-up-ratio, regardless of real usage:
- Pod is created with memory request =
maxAllowed (e.g., 30 GiB), because VPA's recommendation is capped there.
- Rare heavy job exceeds 30 GiB → pod is OOMKilled.
- VPA's
RecordOOM inserts a synthetic memory sample at max(requestedMemory, memoryPeak) × oom-bump-up-ratio = 30 GiB × 2.0 = 60 GiB.
- Histogram P99 now lands in the bucket containing 60 GiB →
uncappedTarget ≈ 61 GiB.
- VPA wants 61 GiB → capped back to 30 GiB by
maxAllowed.
- Next heavy job → another OOM → another 60 GiB synthetic sample → loop.
With the 14-day histogram decay half-life, a single OOM keeps significant weight in the histogram for 30–60 days, so recommendations never settle back toward actual usage.
Observed on one workload:
|
Value |
uncappedTarget.memory |
61.3 GiB |
target.memory (capped) |
30 GiB |
Highest memory any pod ever actually used (90d, container_memory_working_set_bytes) |
19.57 GiB |
| P99 memory (14d) |
18.2 GiB |
| Avg memory (14d) |
4.0 GiB |
| OOMKills in 90d |
3 |
uncappedTarget is ~3× the highest memory the workload has ever used.
How to reproduce it (as minimally and precisely as possible):
- Deploy VPA recommender with
--oom-bump-up-ratio=2.0 (non-default; upstream default is 1.2). Any value > 1.0 reproduces the loop; higher ratios amplify it.
- Create a VPA with
maxAllowed.memory set below the workload's actual peak memory need. Example:
resourcePolicy:
containerPolicies:
- containerName: publish
maxAllowed:
memory: 30Gi
- Run a workload whose occasional heavy jobs exceed
maxAllowed, causing OOMKills.
- After at least one OOM, inspect VPA status:
kubectl get vpa <name> -n <ns> -o yaml | sed -n '/status:/,$p'
Observe that status.recommendation.containerRecommendations[].uncappedTarget.memory is approximately maxAllowed × oom-bump-up-ratio, while target.memory stays pinned at maxAllowed.
- Confirm
uncappedTarget exceeds any real observed usage, e.g. via Prometheus:
max by(pod) (max_over_time(
container_memory_working_set_bytes{namespace="<ns>", container="<c>"}[90d]
)) / 1024 / 1024 / 1024
- Confirm the OOMKill history that's driving the synthetic samples:
sum(max_over_time(kube_pod_container_status_last_terminated_reason{
namespace="<ns>", reason="OOMKilled"
}[90d]))
Anything else we need to know?:
Root cause — in pkg/recommender/model/container.go RecordOOM:
// Get max of the request and the recent usage-based memory peak.
// Omitting oomPeak here to protect against recommendation running too high on subsequent OOMs.
memoryUsed := ResourceAmountMax(requestedMemory, container.memoryPeak)
memoryNeeded := ResourceAmountMax(
memoryUsed + MemoryAmountFromBytes(container.GetOOMMinBumpUp()),
ScaleResource(memoryUsed, container.GetOOMBumpUpRatio()),
)
When maxAllowed caps the request below real need, requestedMemory equals maxAllowed at OOM time, so the synthetic sample becomes maxAllowed × oom-bump-up-ratio. This value is by construction greater than maxAllowed, so the next OOM produces the same synthetic sample — a stable loop rather than a converging one. The existing comment ("Omitting oomPeak here to protect against recommendation running too high on subsequent OOMs") shows the maintainers have already patched a similar amplification path; the maxAllowed interaction appears to be an unhandled case.
Proposed fix — cap the base used for bump-up at maxAllowed (or the current recommendation) so synthetic samples never exceed what VPA is actually allowed to recommend:
baseMemory := ResourceAmountMax(requestedMemory, container.memoryPeak)
if maxAllowed > 0 {
baseMemory = ResourceAmountMin(baseMemory, maxAllowed)
}
memoryNeeded := ResourceAmountMax(
baseMemory + MemoryAmountFromBytes(container.GetOOMMinBumpUp()),
ScaleResource(baseMemory, container.GetOOMBumpUpRatio()),
)
This preserves OOM bump-up behavior for the common case where maxAllowed is not the binding constraint, while preventing the feedback loop when it is.
Notes:
- The upstream default
oom-bump-up-ratio=1.2 reduces but does not eliminate the loop — 1.2 × maxAllowed is still > maxAllowed, so the synthetic sample still sits permanently above the cap.
- The interaction with
--target-memory-percentile=0.99 makes this worse, since P99 is more sensitive to a small number of outlier synthetic samples than P90 would be.
- Decay half-life of 336h (14d) means a single OOM's synthetic sample retains meaningful weight for 30–60 days.
- The issue is cluster-specific: only VPAs with OOM history and
maxAllowed below real peak exhibit it. Workloads that never OOM see normal behavior.
I'd be happy to pick this up and open a PR with the fix plus a unit test covering the capped-OOM case.
Which component are you using?:
/area vertical-pod-autoscaler
What version of the component are you using?:
VPA v1.4.1
Component version: v1.4.1 (recommender deployed with
--oom-bump-up-ratio=2.0,--target-memory-percentile=0.99)What k8s version are you using (
kubectl version)?:kubectl versionOutputWhat environment is this in?:
Production Kubernetes cluster on a managed cloud provider. Reproducible on any cluster running VPA recommender with
--oom-bump-up-ratio> 1.0 and a VPA whosemaxAllowed.memoryis below the workload's real peak memory need.What did you expect to happen?:
VPA recommendations should converge to a stable value near actual observed usage once the workload stabilizes. When
maxAllowedcaps requests below the workload's real peak, occasional OOMs are expected — but they should not feed back into the recommender in a way that permanently inflatesuncappedTargetfar beyond any value the workload has ever actually used.What happened instead?:
VPA enters a self-reinforcing feedback loop that pins
uncappedTargetat roughlymaxAllowed × oom-bump-up-ratio, regardless of real usage:maxAllowed(e.g., 30 GiB), because VPA's recommendation is capped there.RecordOOMinserts a synthetic memory sample atmax(requestedMemory, memoryPeak) × oom-bump-up-ratio=30 GiB × 2.0= 60 GiB.uncappedTarget≈ 61 GiB.maxAllowed.With the 14-day histogram decay half-life, a single OOM keeps significant weight in the histogram for 30–60 days, so recommendations never settle back toward actual usage.
Observed on one workload:
uncappedTarget.memorytarget.memory(capped)container_memory_working_set_bytes)uncappedTargetis ~3× the highest memory the workload has ever used.How to reproduce it (as minimally and precisely as possible):
--oom-bump-up-ratio=2.0(non-default; upstream default is 1.2). Any value > 1.0 reproduces the loop; higher ratios amplify it.maxAllowed.memoryset below the workload's actual peak memory need. Example:maxAllowed, causing OOMKills.status.recommendation.containerRecommendations[].uncappedTarget.memoryis approximatelymaxAllowed × oom-bump-up-ratio, whiletarget.memorystays pinned atmaxAllowed.uncappedTargetexceeds any real observed usage, e.g. via Prometheus:Anything else we need to know?:
Root cause — in
pkg/recommender/model/container.goRecordOOM:When
maxAllowedcaps the request below real need,requestedMemoryequalsmaxAllowedat OOM time, so the synthetic sample becomesmaxAllowed × oom-bump-up-ratio. This value is by construction greater thanmaxAllowed, so the next OOM produces the same synthetic sample — a stable loop rather than a converging one. The existing comment ("Omitting oomPeak here to protect against recommendation running too high on subsequent OOMs") shows the maintainers have already patched a similar amplification path; themaxAllowedinteraction appears to be an unhandled case.Proposed fix — cap the base used for bump-up at
maxAllowed(or the current recommendation) so synthetic samples never exceed what VPA is actually allowed to recommend:This preserves OOM bump-up behavior for the common case where
maxAllowedis not the binding constraint, while preventing the feedback loop when it is.Notes:
oom-bump-up-ratio=1.2reduces but does not eliminate the loop —1.2 × maxAllowedis still >maxAllowed, so the synthetic sample still sits permanently above the cap.--target-memory-percentile=0.99makes this worse, since P99 is more sensitive to a small number of outlier synthetic samples than P90 would be.maxAllowedbelow real peak exhibit it. Workloads that never OOM see normal behavior.I'd be happy to pick this up and open a PR with the fix plus a unit test covering the capped-OOM case.