Skip to content

Handle unreliable Vulkan memory budgets#613

Open
surma wants to merge 1 commit into
tobi:mainfrom
surma-dump:surma/vulkan-unreliable-memory-budget
Open

Handle unreliable Vulkan memory budgets#613
surma wants to merge 1 commit into
tobi:mainfrom
surma-dump:surma/vulkan-unreliable-memory-budget

Conversation

@surma
Copy link
Copy Markdown
Contributor

@surma surma commented Apr 28, 2026

Summary

Some Vulkan drivers do not expose VK_EXT_memory_budget. In that case, node-llama-cpp can still return heap-size values via getVramState(), but live allocation accounting is not useful: used remains 0 and free === total even after loading a GPU-offloaded model.

QMD currently treats that state as reliable VRAM availability and derives embedding/rerank context parallelism from it. On PanVK/Mali-G610 this causes QMD to create too many embedding contexts, which reproduces as:

vk::CommandBuffer::end: ErrorOutOfDeviceMemory

It can also crash the process when multiple embedding contexts evaluate concurrently.

This PR adds a conservative fallback for unreliable Vulkan memory-budget reporting:

  • detect the unreliable Vulkan budget shape (used === 0, free === total, no unified budget)
  • cap context parallelism to 1 for that case
  • use a fresh embedding context per embedding on that path while keeping the model loaded
  • run embedBatch() sequentially with fresh contexts on that path
  • add unit coverage for the fallback behavior

Motivation / repro notes

Observed on aarch64-linux with Mali-G610/PanVK:

  • Vulkan backend loads successfully
  • model offloading works
  • getVramState() reports roughly the full device heap as free before and after model loading
  • QMD creates its maximum embedding context pool from that bogus free-memory estimate
  • 8 embedding contexts fail with VK_ERROR_OUT_OF_DEVICE_MEMORY
  • reusing one embedding context for a second medium-sized input can also fail
  • creating a fresh embedding context per input succeeds reliably

So the issue is not “no GPU”; it is unsafe concurrency/context reuse decisions caused by unreliable Vulkan memory accounting.

Tests

CI=true bun test test/llm.test.ts

Result: passes.

I also attempted broader local validation, but there are unrelated local environment failures in this checkout:

  • bun run build fails on current main with:

    src/store.ts(2142,22): error TS2339: Property 'transaction' does not exist on type 'Database'.
    
  • full CI=true bun test could not complete locally because CLI tests need better-sqlite3 native bindings, which were not built in this checkout on my aarch64-linux machine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant