VPA: OOM bump-up creates self-reinforcing recommendation loop when maxAllowed caps memory below real peak

**Which component are you using?**:

/area vertical-pod-autoscaler

**What version of the component are you using?**:

VPA v1.4.1

Component version: v1.4.1 (recommender deployed with `--oom-bump-up-ratio=2.0`, `--target-memory-percentile=0.99`)

**What k8s version are you using (`kubectl version`)?**:

<details><summary><code>kubectl version</code> Output</summary><br><pre>
$ kubectl version
Client Version: v1.30.3
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.28.15+k3s1
</pre></details>

**What environment is this in?**:

Production Kubernetes cluster on a managed cloud provider. Reproducible on any cluster running VPA recommender with `--oom-bump-up-ratio` > 1.0 and a VPA whose `maxAllowed.memory` is below the workload's real peak memory need.

**What did you expect to happen?**:

VPA recommendations should converge to a stable value near actual observed usage once the workload stabilizes. When `maxAllowed` caps requests below the workload's real peak, occasional OOMs are expected — but they should not feed back into the recommender in a way that *permanently* inflates `uncappedTarget` far beyond any value the workload has ever actually used.

**What happened instead?**:

VPA enters a self-reinforcing feedback loop that pins `uncappedTarget` at roughly `maxAllowed × oom-bump-up-ratio`, regardless of real usage:

1. Pod is created with memory request = `maxAllowed` (e.g., 30 GiB), because VPA's recommendation is capped there.
2. Rare heavy job exceeds 30 GiB → pod is OOMKilled.
3. VPA's `RecordOOM` inserts a synthetic memory sample at `max(requestedMemory, memoryPeak) × oom-bump-up-ratio` = `30 GiB × 2.0` = **60 GiB**.
4. Histogram P99 now lands in the bucket containing 60 GiB → `uncappedTarget` ≈ 61 GiB.
5. VPA wants 61 GiB → capped back to 30 GiB by `maxAllowed`.
6. Next heavy job → another OOM → another 60 GiB synthetic sample → loop.

With the 14-day histogram decay half-life, a single OOM keeps significant weight in the histogram for 30–60 days, so recommendations never settle back toward actual usage.

Observed on one workload:

| | Value |
|---|---|
| `uncappedTarget.memory` | **61.3 GiB** |
| `target.memory` (capped) | 30 GiB |
| Highest memory any pod ever actually used (90d, `container_memory_working_set_bytes`) | **19.57 GiB** |
| P99 memory (14d) | 18.2 GiB |
| Avg memory (14d) | 4.0 GiB |
| OOMKills in 90d | 3 |

`uncappedTarget` is ~3× the highest memory the workload has ever used.

**How to reproduce it (as minimally and precisely as possible)**:

1. Deploy VPA recommender with `--oom-bump-up-ratio=2.0` (non-default; upstream default is 1.2). Any value > 1.0 reproduces the loop; higher ratios amplify it.
2. Create a VPA with `maxAllowed.memory` set below the workload's actual peak memory need. Example:
   ```yaml
   resourcePolicy:
     containerPolicies:
       - containerName: publish
         maxAllowed:
           memory: 30Gi
   ```
3. Run a workload whose occasional heavy jobs exceed `maxAllowed`, causing OOMKills.
4. After at least one OOM, inspect VPA status:
   ```bash
   kubectl get vpa <name> -n <ns> -o yaml | sed -n '/status:/,$p'
   ```
   Observe that `status.recommendation.containerRecommendations[].uncappedTarget.memory` is approximately `maxAllowed × oom-bump-up-ratio`, while `target.memory` stays pinned at `maxAllowed`.
5. Confirm `uncappedTarget` exceeds any real observed usage, e.g. via Prometheus:
   ```promql
   max by(pod) (max_over_time(
     container_memory_working_set_bytes{namespace="<ns>", container="<c>"}[90d]
   )) / 1024 / 1024 / 1024
   ```
6. Confirm the OOMKill history that's driving the synthetic samples:
   ```promql
   sum(max_over_time(kube_pod_container_status_last_terminated_reason{
     namespace="<ns>", reason="OOMKilled"
   }[90d]))
   ```

**Anything else we need to know?**:

**Root cause** — in [`pkg/recommender/model/container.go`](https://github.com/kubernetes/autoscaler/blob/master/vertical-pod-autoscaler/pkg/recommender/model/container.go) `RecordOOM`:

```go
// Get max of the request and the recent usage-based memory peak.
// Omitting oomPeak here to protect against recommendation running too high on subsequent OOMs.
memoryUsed := ResourceAmountMax(requestedMemory, container.memoryPeak)
memoryNeeded := ResourceAmountMax(
    memoryUsed + MemoryAmountFromBytes(container.GetOOMMinBumpUp()),
    ScaleResource(memoryUsed, container.GetOOMBumpUpRatio()),
)
```

When `maxAllowed` caps the request below real need, `requestedMemory` equals `maxAllowed` at OOM time, so the synthetic sample becomes `maxAllowed × oom-bump-up-ratio`. This value is by construction greater than `maxAllowed`, so the next OOM produces the same synthetic sample — a stable loop rather than a converging one. The existing comment ("Omitting oomPeak here to protect against recommendation running too high on subsequent OOMs") shows the maintainers have already patched a similar amplification path; the `maxAllowed` interaction appears to be an unhandled case.

**Proposed fix** — cap the base used for bump-up at `maxAllowed` (or the current recommendation) so synthetic samples never exceed what VPA is actually allowed to recommend:

```go
baseMemory := ResourceAmountMax(requestedMemory, container.memoryPeak)
if maxAllowed > 0 {
    baseMemory = ResourceAmountMin(baseMemory, maxAllowed)
}
memoryNeeded := ResourceAmountMax(
    baseMemory + MemoryAmountFromBytes(container.GetOOMMinBumpUp()),
    ScaleResource(baseMemory, container.GetOOMBumpUpRatio()),
)
```

This preserves OOM bump-up behavior for the common case where `maxAllowed` is not the binding constraint, while preventing the feedback loop when it is.

**Notes:**
- The upstream default `oom-bump-up-ratio=1.2` reduces but does not eliminate the loop — `1.2 × maxAllowed` is still > `maxAllowed`, so the synthetic sample still sits permanently above the cap.
- The interaction with `--target-memory-percentile=0.99` makes this worse, since P99 is more sensitive to a small number of outlier synthetic samples than P90 would be.
- Decay half-life of 336h (14d) means a single OOM's synthetic sample retains meaningful weight for 30–60 days.
- The issue is cluster-specific: only VPAs with OOM history and `maxAllowed` below real peak exhibit it. Workloads that never OOM see normal behavior.

I'd be happy to pick this up and open a PR with the fix plus a unit test covering the capped-OOM case.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

VPA: OOM bump-up creates self-reinforcing recommendation loop when maxAllowed caps memory below real peak #9521

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

	Value
`uncappedTarget.memory`	61.3 GiB
`target.memory` (capped)	30 GiB
Highest memory any pod ever actually used (90d, `container_memory_working_set_bytes`)	19.57 GiB
P99 memory (14d)	18.2 GiB
Avg memory (14d)	4.0 GiB
OOMKills in 90d	3

VPA: OOM bump-up creates self-reinforcing recommendation loop when maxAllowed caps memory below real peak #9521

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions