Handle unreliable Vulkan memory budgets by surma · Pull Request #613 · tobi/qmd

surma · 2026-04-28T15:35:57Z

Summary

Some Vulkan drivers do not expose VK_EXT_memory_budget. In that case, node-llama-cpp can still return heap-size values via getVramState(), but live allocation accounting is not useful: used remains 0 and free === total even after loading a GPU-offloaded model.

QMD currently treats that state as reliable VRAM availability and derives embedding/rerank context parallelism from it. On PanVK/Mali-G610 this causes QMD to create too many embedding contexts, which reproduces as:

vk::CommandBuffer::end: ErrorOutOfDeviceMemory

It can also crash the process when multiple embedding contexts evaluate concurrently.

This PR adds a conservative fallback for unreliable Vulkan memory-budget reporting:

detect the unreliable Vulkan budget shape (used === 0, free === total, no unified budget)
cap context parallelism to 1 for that case
use a fresh embedding context per embedding on that path while keeping the model loaded
run embedBatch() sequentially with fresh contexts on that path
add unit coverage for the fallback behavior

Motivation / repro notes

Observed on aarch64-linux with Mali-G610/PanVK:

Vulkan backend loads successfully
model offloading works
getVramState() reports roughly the full device heap as free before and after model loading
QMD creates its maximum embedding context pool from that bogus free-memory estimate
8 embedding contexts fail with VK_ERROR_OUT_OF_DEVICE_MEMORY
reusing one embedding context for a second medium-sized input can also fail
creating a fresh embedding context per input succeeds reliably

So the issue is not “no GPU”; it is unsafe concurrency/context reuse decisions caused by unreliable Vulkan memory accounting.

Tests

CI=true bun test test/llm.test.ts

Result: passes.

I also attempted broader local validation, but there are unrelated local environment failures in this checkout:

bun run build fails on current main with:

src/store.ts(2142,22): error TS2339: Property 'transaction' does not exist on type 'Database'.

full CI=true bun test could not complete locally because CLI tests need better-sqlite3 native bindings, which were not built in this checkout on my aarch64-linux machine.

Handle unreliable Vulkan memory budgets

f954f1e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handle unreliable Vulkan memory budgets#613

Handle unreliable Vulkan memory budgets#613
surma wants to merge 1 commit into
tobi:mainfrom
surma-dump:surma/vulkan-unreliable-memory-budget

surma commented Apr 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

surma commented Apr 28, 2026

Summary

Motivation / repro notes

Tests

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant