feat: Settings β Hardware compute device selector (CPU / GPU)#139
feat: Settings β Hardware compute device selector (CPU / GPU)#139CrisAlejo26 wants to merge 5 commits intoamicalhq:mainfrom
Conversation
Lets users pick which CPU or GPU runs local Whisper models from a new
Settings β Hardware tab. Auto (current behaviour) is the default; CPU
forces the CPU binary; picking a specific GPU forwards its index to
whisper.cpp's `whisper_context_params.gpu_device`.
Also folds in the fixes that were required to build the CUDA whisper
binary on Windows with CUDA 13.x β they are prerequisites for the GPU
path to be exercised at all.
Whisper wrapper
- addon.cpp: read `gpu_device` option and propagate it.
- index.ts: expose `gpu`, `gpuDevice`, `flashAttn`, `threads` and
`preferredBackend`; actually forward them to `binding.init`.
- loader.ts: accept a `preferredBackend` / `WHISPER_NATIVE_BACKEND` env
override; cache the binding per backend.
- CMakeLists.txt: pass `/Zc:preprocessor` to MSVC (required by CCCL in
CUDA 13) and iterate CMAKE_JS_INC natively so paths containing spaces
resolve correctly.
- bin/build-addon.js: detect VS 2022 Preview alongside the other
editions when probing for lib.exe.
Desktop app
- db/schema.ts: new `compute` section on AppSettingsData.
- services/settings-service.ts: get/set compute settings with a
`compute-changed` event.
- services/hardware-detection-service.ts (new): enumerate GPUs via the
existing `systeminformation` dep with a one-shot cache.
- pipeline/providers/transcription/whisper-provider.ts: accept
SettingsService, read compute settings, forward them to the worker
fork on each initializeModel call and dispose the worker when they
change.
- pipeline/providers/transcription/whisper-worker-fork.ts: accept
WhisperInitOptions and pass them through to the wrapper constructor.
- services/transcription-service.ts: hand SettingsService to the
WhisperProvider.
- trpc: new `hardware.getSnapshot` query and
`settings.{get,set}ComputeSettings` procedures.
- renderer: new `/settings/hardware` route and page with the selector,
detected hardware list and refresh button; sidebar entry and title
mapping added.
- i18n: `en` and `es` locales for the new strings (ja / zh-TW deferred
to native translators).
Docs
- docs/hardware-selection.md β user-facing guide for the new tab.
- docs/building-whisper-cuda-windows.md β developer guide covering the
build-script fixes above.
- docs/CHANGELOG-compute-device-selector.md β release notes with the
full file list and migration / testing notes.
- README link added.
Tested on Windows 11 + RTX 4080 Laptop + CUDA 13.2: rebuilt the wrapper
with the CMake fixes, copied the two cuBLAS DLLs next to whisper.node,
launched the app, dictated a sentence. Log confirmed
\`whisper_native_binding: cuda\`; GPU peaked at 100% while CPU stayed
low. macOS (Metal) and Linux (Vulkan) paths were not re-tested on this
branch; they rely on the existing loader fallback.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
No actionable comments were generated in the recent review. π βΉοΈ Recent review infoβοΈ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: π Files selected for processing (5)
β Files skipped from review due to trivial changes (2)
π§ Files skipped from review as they are similar to previous changes (3)
π WalkthroughWalkthroughAdds hardware detection and a compute-device selector for local Whisper: new settings persistence and tRPC endpoints, UI for Auto/CPU/GPU selection, detection service, wiring into transcription provider/worker to pass GPU/backend/threads, native loader/backend selection, build fixes, and documentation. Changes
Sequence Diagram(s)sequenceDiagram
actor User
participant UI as Renderer (HardwareSection)
participant API as tRPC
participant SettingsSvc as SettingsService
participant HWSvc as HardwareDetectionService
participant TransSvc as TranscriptionService
participant Provider as WhisperProvider
participant Worker as WhisperWorker
User->>UI: select compute device (e.g. "gpu:0")
UI->>API: settings.setComputeSettings({device:"gpu", gpuDevice:0})
API->>SettingsSvc: setComputeSettings(...)
SettingsSvc->>SettingsSvc: persist & emit "compute-changed"
SettingsSvc-->>API: success
API-->>UI: success
SettingsSvc->>TransSvc: compute-changed
TransSvc->>TransSvc: acquire mutexes
TransSvc->>Provider: reset() / dispose()
Provider->>Worker: dispose() (free instance)
Worker-->>Provider: disposed
TransSvc->>TransSvc: release mutexes
User->>UI: request transcription
UI->>API: transcribe(audio)
API->>TransSvc: transcribe(...)
TransSvc->>Provider: initializeModel()
Provider->>SettingsSvc: getComputeSettings()
SettingsSvc-->>Provider: {device:"gpu", gpuDevice:0}
Provider->>Worker: initializeModel([modelPath, initOptions])
Worker->>Worker: load binding (preferred backend), init Whisper with gpu_device/threads
Worker-->>Provider: ready
Provider-->>TransSvc: ready for transcription
Estimated code review effortπ― 4 (Complex) | β±οΈ ~45 minutes Suggested reviewers
Poem
π₯ Pre-merge checks | β 2 | β 1β Failed checks (1 warning)
β Passed checks (2 passed)
βοΈ Tip: You can configure your own custom pre-merge checks in the settings. β¨ Finishing Touchesπ§ͺ Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
@CrisAlejo26 Any reason for this change? Vulkan is already working for windows users with Nvidia GPUs. |
There was a problem hiding this comment.
Actionable comments posted: 11
π§Ή Nitpick comments (2)
packages/whisper-wrapper/src/loader.ts (1)
56-72: Silent fallback whenWHISPER_NATIVE_BACKENDholds an unknown value.If a user sets
WHISPER_NATIVE_BACKEND=cudaa(typo) or any other string outside the allowed set, the env var is silently ignored and the loader falls through to"auto". A one-lineconsole.warnwhen the env var is set-but-unrecognized would save a lot of debugging time when someone reports "I set the env var and it still picked CPU".π€ Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/whisper-wrapper/src/loader.ts` around lines 56 - 72, resolvePreferredBackend currently silently ignores unknown WHISPER_NATIVE_BACKEND values; update the function (resolvePreferredBackend) to detect when process.env.WHISPER_NATIVE_BACKEND is set but not one of the allowed values ("auto","cpu","metal","openblas","cuda","vulkan") and emit a single-line warning (e.g., console.warn) that includes the raw env value and that it will fall back to "auto"; keep the existing return behavior so unrecognized values still return "auto".packages/whisper-wrapper/addon/addon.cpp (1)
269-281: LGTM β optional propagation is the right call.Guarding the assignment behind
gpu_device_setpreserves the whisper.cpp default when callers don't explicitly pick a device, which is important for older/non-CUDA builds that treat the field differently. Type-check pattern (raw.As<Napi::Number>()withoutIsNumber()) matches the other options in this file, so no new inconsistency introduced.One small follow-up worth considering: add a sanity log (or clamp) if
gpu_deviceis negative or larger than available devices, so misconfiguration surfaces clearly instead of silently falling back to device 0 inside whisper.cpp.π€ Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/whisper-wrapper/addon/addon.cpp` around lines 269 - 281, Add a sanity check before assigning cparams.gpu_device: when gpu_device_set is true, query the available GPU count (e.g., via cudaGetDeviceCount or platform API), and if gpu_device is negative or >= device_count either log a clear warning and skip setting cparams.gpu_device (to preserve whisper.cpp default) or clamp gpu_device into the valid [0, device_count-1] range before assignment; reference the variables gpu_device, gpu_device_set and the struct field cparams.gpu_device (created from whisper_context_default_params/whisper_context_params) when implementing the check and warning.
π€ Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@apps/desktop/src/pipeline/providers/transcription/whisper-worker-fork.ts`:
- Around line 9-14: The WhisperInitOptions interface needs a flashAttn?: boolean
field and that property must be included in the equality check in
sameInitOptions() and forwarded into the new Whisper(...) constructor call so
the worker doesn't drop the option; stop coalescing opts.gpu to true for "auto"
behaviorβpass opts through unchanged (do not derive gpu: true when undefined);
and change the initialization flow around whisperInstance and cache globals so
you only assign whisperInstance and set any cache/global state after
whisper.load() completes successfully (and ensure you clear/cleanup any
partially created instance if load() throws) to avoid leaving an invalid
instance in the cache.
In `@apps/desktop/src/renderer/main/pages/settings/hardware/index.tsx`:
- Around line 33-40: The toStored function currently returns only
device/gpuDevice which causes setComputeSettings to overwrite and drop any
existing threads value; update toStored to merge the existing compute settings'
threads into the returned object by reading the current compute state (or
accepting the previous compute object) and including threads when present, so
setComputeSettings preserves threads; apply the same fix to the other occurrence
of toStored used in the component (the second call around the GPU-selector
handling) so both branches retain threads when saving.
- Around line 54-57: The current useQuery is called with undefined so calling
refetchSnapshot() reuses undefined and hardwareService.detect(input?.refresh ??
false) keeps getting false; update the refetch calls (refetchSnapshot and the
other refetch at lines 155-159) to pass an explicit input object that sets
refresh: true (i.e., call refetch({ refresh: true })) so the
backend/hardwareService.detect receives true and forces re-detection; ensure the
query input type (api.hardware.getSnapshot.useQuery) supports a { refresh:
boolean } argument when you change the refetch invocation.
- Around line 69-72: The GPU-detection logic in hasAnyGpuBackend incorrectly
treats "openblas" as a GPU; update the check that inspects
snapshot?.availableBackends in hasAnyGpuBackend to either (a) explicitly check
for known GPU backend names (e.g., "cuda", "metal", "vulkan", "rocm") or (b)
exclude both "cpu" and "openblas" instead of only "cpu", so openblas is treated
as CPU-acceleration; apply the same change to the duplicate backend-check logic
used elsewhere in this file (the similar block around the other
availableBackends usage).
In `@apps/desktop/src/services/hardware-detection-service.ts`:
- Around line 49-51: The current dedicated flag uses only vramMB (dedicated:
typeof c.vram === "number" ? c.vram >= 2048 : false) which misclassifies
integrated GPUs; update the logic that sets dedicated (near vramMB,
vendorCategory, and categorizeVendor) to combine vendorCategory with vramMB so
that GPUs from "intel" and "apple" (as returned by categorizeVendor) are treated
as non-dedicated regardless of modest VRAM, or require a much larger VRAM
threshold for those vendors (e.g., only mark dedicated when vendorCategory not
in ["intel","apple"] and vramMB exceeds a higher cutoff). Ensure you modify the
dedicated assignment only, referencing vramMB, vendorCategory, and
categorizeVendor.
- Around line 60-66: The catch block in detect() is caching a permanent failure
by assigning this.snapshot = { gpus: [], cpuThreads: 0, cpuModel: "Unknown" } so
subsequent calls short-circuit; instead, log the error with
logger.main.error(error) but do NOT overwrite this.snapshot on transient
si.graphics()/si.cpu() failures β either return the previous snapshot or
null/undefined to force future detect() calls to retry, and only update
this.snapshot inside the successful path (the code that builds the snapshot from
si.graphics()/si.cpu()). Ensure inFlight is still cleared in finally so retries
can proceed.
In `@apps/desktop/src/services/transcription-service.ts`:
- Line 69: The WhisperProvider is currently constructed with settingsService
which registers its own compute-changed listener that can dispose the worker
outside the transcriptionMutex and race with active calls (affecting
processStreamingChunk, finalizeSession, retryTranscription). Fix by removing
direct settingsService subscription from WhisperProvider and route
compute-device change events through TranscriptionService (or add a
serialization hook on WhisperProvider) so that any worker reset/dispose runs
under transcriptionMutex or is deferred until no session is active; update the
constructor call in TranscriptionService where WhisperProvider is instantiated
and ensure compute-change handling is executed inside the transcriptionMutex (or
call a provided provider.resetSafely() that acquires the mutex) instead of
letting WhisperProvider dispose the worker asynchronously.
In `@apps/desktop/src/trpc/routers/hardware.ts`:
- Around line 13-37: In detectAvailableBackends(), remove the unused
destructuring "const { platform, arch } = process;" and delete the unreachable
statements "void arch;" and "void platform;" that appear after the return; keep
the rest of the logic (the BACKENDS loop, available.push("cpu"), and the de-dup
return Array.from(new Set(available))) unchanged so behavior is preserved.
In `@docs/building-whisper-cuda-windows.md`:
- Around line 48-62: The Markdown fenced code blocks showing build outputs (the
block containing "packages/whisper-wrapper/native/win32-x64-cuda/whisper.node"
and the block showing the error from "cccl/std/__cccl/preprocessor.h(23): fatal
error C1189") need explicit language tags to satisfy markdownlint; update both
fences to use ```text instead of bare ``` so the blocks around the package paths
and the MSVC/cl.exe preprocessor error are annotated as text.
In `@packages/whisper-wrapper/src/index.ts`:
- Around line 39-42: If opts.gpuDevice is provided but opts.gpu is undefined,
the constructed initOpts may set gpu_device without enabling GPU; update the
initOpts construction so that when opts.gpuDevice !== undefined you also set
initOpts.gpu = true (in addition to initOpts.gpu_device = opts.gpuDevice).
Modify the logic around initOpts (the block that sets model, gpu, gpu_device,
flash_attn) so gpu is set to true whenever gpu_device is set, ensuring the addon
(cparams.use_gpu) is enabled when a specific GPU is selected.
- Around line 55-68: The code currently calls loadBinding() inside full() and
free(), which can return a different native addon than the one that created
this.ctx; instead, capture and store the binding used when the Whisper instance
is constructed (e.g. set this.binding = loadBinding(...) in the constructor
where ctx and preferredBackend are chosen) and replace internal calls to
loadBinding() in full() and free() with this.binding so full(this.ctx, ...) and
binding.free(this.ctx) always use the same native addon that created this.ctx.
---
Nitpick comments:
In `@packages/whisper-wrapper/addon/addon.cpp`:
- Around line 269-281: Add a sanity check before assigning cparams.gpu_device:
when gpu_device_set is true, query the available GPU count (e.g., via
cudaGetDeviceCount or platform API), and if gpu_device is negative or >=
device_count either log a clear warning and skip setting cparams.gpu_device (to
preserve whisper.cpp default) or clamp gpu_device into the valid [0,
device_count-1] range before assignment; reference the variables gpu_device,
gpu_device_set and the struct field cparams.gpu_device (created from
whisper_context_default_params/whisper_context_params) when implementing the
check and warning.
In `@packages/whisper-wrapper/src/loader.ts`:
- Around line 56-72: resolvePreferredBackend currently silently ignores unknown
WHISPER_NATIVE_BACKEND values; update the function (resolvePreferredBackend) to
detect when process.env.WHISPER_NATIVE_BACKEND is set but not one of the allowed
values ("auto","cpu","metal","openblas","cuda","vulkan") and emit a single-line
warning (e.g., console.warn) that includes the raw env value and that it will
fall back to "auto"; keep the existing return behavior so unrecognized values
still return "auto".
πͺ Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
βΉοΈ Review info
βοΈ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 528096ad-3ddd-4145-b6b7-88915f691927
π Files selected for processing (25)
README.mdapps/desktop/src/db/schema.tsapps/desktop/src/i18n/locales/en.jsonapps/desktop/src/i18n/locales/es.jsonapps/desktop/src/pipeline/providers/transcription/whisper-provider.tsapps/desktop/src/pipeline/providers/transcription/whisper-worker-fork.tsapps/desktop/src/renderer/main/lib/settings-navigation.tsapps/desktop/src/renderer/main/pages/settings/hardware/index.tsxapps/desktop/src/renderer/main/routeTree.gen.tsapps/desktop/src/renderer/main/routes/settings/hardware.tsxapps/desktop/src/renderer/main/routes/settings/route.tsxapps/desktop/src/services/hardware-detection-service.tsapps/desktop/src/services/settings-service.tsapps/desktop/src/services/transcription-service.tsapps/desktop/src/trpc/router.tsapps/desktop/src/trpc/routers/hardware.tsapps/desktop/src/trpc/routers/settings.tsdocs/CHANGELOG-compute-device-selector.mddocs/building-whisper-cuda-windows.mddocs/hardware-selection.mdpackages/whisper-wrapper/addon/CMakeLists.txtpackages/whisper-wrapper/addon/addon.cpppackages/whisper-wrapper/bin/build-addon.jspackages/whisper-wrapper/src/index.tspackages/whisper-wrapper/src/loader.ts
| function toStored(value: DeviceValue): { | ||
| device: "auto" | "cpu" | "gpu"; | ||
| gpuDevice?: number; | ||
| } { | ||
| if (value === "auto" || value === "cpu") return { device: value }; | ||
| const index = Number(value.slice("gpu:".length)); | ||
| return { device: "gpu", gpuDevice: Number.isFinite(index) ? index : 0 }; | ||
| } |
There was a problem hiding this comment.
Preserve existing thread settings when changing devices.
setComputeSettings replaces the compute section, but toStored() only returns device/gpuDevice. Any persisted threads value is dropped as soon as the user changes the selector.
π Proposed fix
function toStored(value: DeviceValue): {
device: "auto" | "cpu" | "gpu";
gpuDevice?: number;
+ threads?: number;
-} {
- if (value === "auto" || value === "cpu") return { device: value };
+}, compute?: { threads?: number }) {
+ const withThreads = compute?.threads ? { threads: compute.threads } : {};
+ if (value === "auto" || value === "cpu") {
+ return { device: value, ...withThreads };
+ }
const index = Number(value.slice("gpu:".length));
- return { device: "gpu", gpuDevice: Number.isFinite(index) ? index : 0 };
+ return {
+ device: "gpu",
+ gpuDevice: Number.isFinite(index) ? index : 0,
+ ...withThreads,
+ };
}
@@
setSelection(value);
try {
- await mutation.mutateAsync(toStored(value));
+ await mutation.mutateAsync(toStored(value, compute));
} catch (error) {Also applies to: 76-80
π€ Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@apps/desktop/src/renderer/main/pages/settings/hardware/index.tsx` around
lines 33 - 40, The toStored function currently returns only device/gpuDevice
which causes setComputeSettings to overwrite and drop any existing threads
value; update toStored to merge the existing compute settings' threads into the
returned object by reading the current compute state (or accepting the previous
compute object) and including threads when present, so setComputeSettings
preserves threads; apply the same fix to the other occurrence of toStored used
in the component (the second call around the GPU-selector handling) so both
branches retain threads when saving.
| const { data: snapshot, refetch: refetchSnapshot, isFetching: snapshotLoading } = | ||
| api.hardware.getSnapshot.useQuery(undefined, { | ||
| staleTime: 5 * 60 * 1000, | ||
| }); |
There was a problem hiding this comment.
Make Refresh actually force hardware re-detection.
The query is created with undefined input, and refetchSnapshot() reuses that input, so hardwareService.detect(input?.refresh ?? false) still receives false and returns the cached snapshot.
π Proposed fix
const { data: snapshot, refetch: refetchSnapshot, isFetching: snapshotLoading } =
api.hardware.getSnapshot.useQuery(undefined, {
staleTime: 5 * 60 * 1000,
});
+
+ const refreshSnapshot = async () => {
+ const freshSnapshot = await utils.hardware.getSnapshot.fetch({
+ refresh: true,
+ });
+ utils.hardware.getSnapshot.setData(undefined, freshSnapshot);
+ };
@@
<Button
variant="ghost"
size="sm"
- onClick={() => refetchSnapshot()}
+ onClick={() => void refreshSnapshot()}
disabled={snapshotLoading}
>Also applies to: 155-159
π€ Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@apps/desktop/src/renderer/main/pages/settings/hardware/index.tsx` around
lines 54 - 57, The current useQuery is called with undefined so calling
refetchSnapshot() reuses undefined and hardwareService.detect(input?.refresh ??
false) keeps getting false; update the refetch calls (refetchSnapshot and the
other refetch at lines 155-159) to pass an explicit input object that sets
refresh: true (i.e., call refetch({ refresh: true })) so the
backend/hardwareService.detect receives true and forces re-detection; ensure the
query input type (api.hardware.getSnapshot.useQuery) supports a { refresh:
boolean } argument when you change the refetch invocation.
| const hasAnyGpuBackend = useMemo(() => { | ||
| const backends = snapshot?.availableBackends ?? []; | ||
| return backends.some((b) => b !== "cpu"); | ||
| }, [snapshot]); |
There was a problem hiding this comment.
Donβt treat OpenBLAS as a GPU backend.
availableBackends can include openblas, which is CPU acceleration. With the current b !== "cpu" check, selecting a GPU on an OpenBLAS-only build suppresses the warning even though no GPU backend is available.
π Proposed fix
+const GPU_BACKENDS = new Set(["cuda", "vulkan", "metal"]);
+
export default function HardwareSettingsPage() {
@@
const hasAnyGpuBackend = useMemo(() => {
const backends = snapshot?.availableBackends ?? [];
- return backends.some((b) => b !== "cpu");
+ return backends.some((b) => GPU_BACKENDS.has(b));
}, [snapshot]);Also applies to: 142-146
π€ Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@apps/desktop/src/renderer/main/pages/settings/hardware/index.tsx` around
lines 69 - 72, The GPU-detection logic in hasAnyGpuBackend incorrectly treats
"openblas" as a GPU; update the check that inspects snapshot?.availableBackends
in hasAnyGpuBackend to either (a) explicitly check for known GPU backend names
(e.g., "cuda", "metal", "vulkan", "rocm") or (b) exclude both "cpu" and
"openblas" instead of only "cpu", so openblas is treated as CPU-acceleration;
apply the same change to the duplicate backend-check logic used elsewhere in
this file (the similar block around the other availableBackends usage).
- Add the missing `flashAttn` field to WhisperInitOptions so it stops being silently dropped on its way to the wrapper. - Include `flashAttn` in sameInitOptions' equality so a change there forces a rebuild of the whisper context. - Stop coalescing `opts.gpu` to true when undefined. The caller (WhisperProvider.resolveInitOptions) already sets it explicitly for cpu / gpu modes and leaves it undefined for auto, which lets whisper_context_default_params() supply its own default instead of us overriding it. - Commit `whisperInstance` / `currentModelPath` / `currentInitOptions` only after a successful load, and free the half-initialised context if load() throws. Also clear the globals before freeing the previous instance so a free() that throws does not leave the worker pointing at a disposed native ctx. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Drops the dedicated /settings/hardware route and sidebar entry and renders the compute-device selector, GPU list and warnings as an extra section inside Settings β Advanced. Keeps the feature available but gated as power-user territory, matching the maintainer's direction on PR amicalhq#139. Also tightens the extracted section: toStored preserves existing compute.threads, hasAnyGpuBackend filters to known GPU backends only (cuda/metal/vulkan/rocm) instead of treating openblas as a GPU, and Refresh now calls utils.hardware.getSnapshot.fetch({ refresh: true }) so tapping it actually forces re-detection. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- whisper-wrapper/src/index.ts: setting gpuDevice now also forces gpu=true so a selected device can't silently fall back to CPU. Capture the resolved binding in the constructor and reuse it in transcribe()/free() so the ctx always goes back to the native addon that created it. - whisper-wrapper/src/loader.ts: warn (once) when WHISPER_NATIVE_BACKEND holds a value outside the allowed set instead of silently falling back to "auto". - whisper-wrapper/addon/addon.cpp: sanity-check the requested gpu_device β negative values are logged and skipped so whisper.cpp falls back to its own default instead of passing garbage to the backend. - hardware-detection-service.ts: combine vendorCategory with VRAM so Intel/Apple integrated GPUs aren't flagged as dedicated; on systeminformation failure, keep the previous snapshot instead of caching an empty result forever. - transcription-service.ts + whisper-provider.ts: move the compute-changed listener out of WhisperProvider and run the worker teardown inside transcriptionMutex.runExclusive(), so a setting change can no longer race a mid-call initializeWhisper β transcribeAudio. - trpc/routers/hardware.ts: drop the unreachable void arch/platform statements after the return. - docs/building-whisper-cuda-windows.md: tag the two fenced code blocks with `text` for markdownlint. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The specific caseI have a laptop with an Intel UHD (iGPU) + NVIDIA RTX 4080 Laptop (dGPU). With the current build, Vulkan does load, but it picks adapter 0 β which on this box is the Intel iGPU. So "it works" in the sense that the GPU binary is used, but whisper-large-v3 runs on the Intel iGPU and keeps CPU pegged at ~100% while the 12 GB RTX 4080 sits at 14% utilisation. There is no way for a user to redirect whisper to adapter 1 without rebuilding from source or editing env vars by hand. This is common on most modern Windows laptops and it is not user error β What the PR actually doesThe PR does three things, of which only the first strictly needs UI:
Moved the selector under Advanced as you suggestedI just pushed a commit that drops the dedicated
Net UI impact: power-user-only, no new top-level nav entry, and users who don't need it never see it. Also pushed CodeRabbit follow-ups in the same batchWhile I was in there I addressed the actionable items from the latest CodeRabbit pass:
The Let me know if the placement looks right or you'd like the section collapsed behind an accordion / gated on multi-GPU detection β happy to iterate. |
There was a problem hiding this comment.
Actionable comments posted: 4
β»οΈ Duplicate comments (1)
packages/whisper-wrapper/src/index.ts (1)
40-49:β οΈ Potential issue | π MajorKeep
gpuDeviceauthoritative overgpu: false.Line 48 can undo Lines 44-46 when callers pass
{ gpuDevice: 1, gpu: false }, so a specific GPU can still initialize with GPU execution disabled. Applyopts.gpufirst, then forcegpu=truewhengpuDeviceis present.π Proposed fix
this.binding = loadBinding(loadOpts); const initOpts: Record<string, unknown> = { model: modelPath }; + if (opts?.gpu !== undefined) initOpts.gpu = opts.gpu; // If the caller picked a specific GPU, make sure GPU execution is enabled. // Otherwise the native addon would set gpu_device but leave use_gpu=false, // silently falling back to CPU. if (opts?.gpuDevice !== undefined) { initOpts.gpu = true; initOpts.gpu_device = opts.gpuDevice; } - if (opts?.gpu !== undefined) initOpts.gpu = opts.gpu; if (opts?.flashAttn !== undefined) initOpts.flash_attn = opts.flashAttn;π€ Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/whisper-wrapper/src/index.ts` around lines 40 - 49, The init logic allows opts.gpu to override a previously-set gpu=true from opts.gpuDevice, so callers like {gpuDevice: 1, gpu: false} will disable GPU despite a specific device being requested; change the order so you apply opts.gpu to initOpts first, then apply the gpuDevice branch (set initOpts.gpu = true and initOpts.gpu_device = opts.gpuDevice) last to ensure gpuDevice is authoritative; update the code paths around initOpts, opts.gpu, opts.gpuDevice, and gpu_device accordingly while leaving flash_attn handling unchanged.
π€ Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@apps/desktop/src/renderer/main/pages/settings/advanced/hardware-section.tsx`:
- Around line 116-144: The Select can end up with a value like "gpu:N" that has
no matching SelectItem when snapshot.gpus doesn't include that index, causing a
blank display; update the component handling around selection/gpus (the
selection state and handleChange) to detect when selection starts with "gpu:"
but gpus doesn't contain that index and either (a) render a disabled/unavailable
SelectItem for that exact `gpu:N` value (using the same SelectItem component)
with a label like "GPU N (not detected)" so the current persisted choice remains
visible, or (b) prompt/coerce the selection back to "auto" by calling
handleChange("auto") after user confirmation; use formatVram and gpu.index to
build the fallback label so it matches existing items.
In `@apps/desktop/src/services/transcription-service.ts`:
- Around line 79-87: The compute-change listener serialises reset/dispose with
transcriptionMutex but preload/model-load paths (preloadWhisperModel,
handleModelChange) call whisperProvider.preloadModel outside that lock, allowing
concurrent dispose during init; fix by guarding all WhisperProvider lifecycle
ops with the same lock or by always acquiring both modelLoadMutex and
transcriptionMutex in a consistent order before calling
whisperProvider.preloadModel, whisperProvider.reset, or whisperProvider.dispose
(e.g. acquire modelLoadMutex then transcriptionMutex everywhere), and update
preloadWhisperModel() and handleModelChange() to acquire those mutexes before
interacting with whisperProvider so reset/dispose cannot interleave with model
preload/load.
In `@docs/CHANGELOG-compute-device-selector.md`:
- Around line 41-44: The changelog entry incorrectly states that WhisperProvider
disposes its worker when settings change; update the lifecycle note to reflect
that TranscriptionServiceβs compute-changed listener performs teardown. Edit the
sentence referencing `TranscriptionService`, `WhisperProvider`, and
`initializeModel` so it says TranscriptionService listens for `compute-changed`
and disposes/tears down the worker when settings change, while WhisperProvider
only receives forwarded compute settings during `initializeModel`.
- Line 61: The fenced code block needs an explicit language to satisfy
markdownlint MD040; update the triple-backtick fence that currently has no
language to use "text" and ensure the listed path
"packages/whisper-wrapper/addon/addon.cpp" remains inside the fence so the block
becomes ```text ... ```; locate the empty fence in the CHANGELOG entry and
replace it with a fenced block starting with ```text and ending with ``` to fix
the lint.
---
Duplicate comments:
In `@packages/whisper-wrapper/src/index.ts`:
- Around line 40-49: The init logic allows opts.gpu to override a previously-set
gpu=true from opts.gpuDevice, so callers like {gpuDevice: 1, gpu: false} will
disable GPU despite a specific device being requested; change the order so you
apply opts.gpu to initOpts first, then apply the gpuDevice branch (set
initOpts.gpu = true and initOpts.gpu_device = opts.gpuDevice) last to ensure
gpuDevice is authoritative; update the code paths around initOpts, opts.gpu,
opts.gpuDevice, and gpu_device accordingly while leaving flash_attn handling
unchanged.
πͺ Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
βΉοΈ Review info
βοΈ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 46b545b5-1963-4e4f-9688-7f8d939a8cde
π Files selected for processing (15)
README.mdapps/desktop/src/i18n/locales/en.jsonapps/desktop/src/i18n/locales/es.jsonapps/desktop/src/pipeline/providers/transcription/whisper-provider.tsapps/desktop/src/renderer/main/pages/settings/advanced/hardware-section.tsxapps/desktop/src/renderer/main/pages/settings/advanced/index.tsxapps/desktop/src/services/hardware-detection-service.tsapps/desktop/src/services/transcription-service.tsapps/desktop/src/trpc/routers/hardware.tsdocs/CHANGELOG-compute-device-selector.mddocs/building-whisper-cuda-windows.mddocs/hardware-selection.mdpackages/whisper-wrapper/addon/addon.cpppackages/whisper-wrapper/src/index.tspackages/whisper-wrapper/src/loader.ts
β Files skipped from review due to trivial changes (3)
- README.md
- docs/hardware-selection.md
- docs/building-whisper-cuda-windows.md
π§ Files skipped from review as they are similar to previous changes (5)
- packages/whisper-wrapper/addon/addon.cpp
- apps/desktop/src/pipeline/providers/transcription/whisper-provider.ts
- apps/desktop/src/trpc/routers/hardware.ts
- apps/desktop/src/i18n/locales/es.json
- apps/desktop/src/i18n/locales/en.json
- transcription-service.ts: compute-changed reset now runs under modelLoadMutex + transcriptionMutex (in that order), and preloadWhisperModel() takes modelLoadMutex, so a setting change can no longer race a startup preload or handleModelChange() preload. - hardware-section.tsx: render a "GPU #N (not detected)" fallback item when the persisted compute.gpuDevice is no longer in the snapshot, so the Select doesn't render blank after a GPU is unplugged / detection temporarily fails. New i18n key settings.hardware.computeDevice.options.unavailable in en/es. - CHANGELOG-compute-device-selector.md: correct the lifecycle note (worker disposal lives in TranscriptionService, not WhisperProvider) and add a "text" language tag to the "Files touched" fence for markdownlint. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
All review feedback addressed β ready for another look π What changed after your questionPlacement β moved the selector under Settings β Advanced as you suggested. Dropped the dedicated CodeRabbit passesThree review passes, all clean now:
Highlights of what was fixed:
Verification
Happy to iterate on anything else you spot β and if you'd rather gate the section behind "multi-GPU detected" or an accordion I can do that too. |
Lets users pick which CPU or GPU runs local Whisper models from a new Settings β Hardware tab. Auto (current behaviour) is the default; CPU forces the CPU binary; picking a specific GPU forwards its index to whisper.cpp's
whisper_context_params.gpu_device.Also folds in the fixes that were required to build the CUDA whisper binary on Windows with CUDA 13.x β they are prerequisites for the GPU path to be exercised at all.
Whisper wrapper
gpu_deviceoption and propagate it.gpu,gpuDevice,flashAttn,threadsandpreferredBackend; actually forward them tobinding.init.preferredBackend/WHISPER_NATIVE_BACKENDenv override; cache the binding per backend./Zc:preprocessorto MSVC (required by CCCL in CUDA 13) and iterate CMAKE_JS_INC natively so paths containing spaces resolve correctly.Desktop app
computesection on AppSettingsData.compute-changedevent.systeminformationdep with a one-shot cache.hardware.getSnapshotquery andsettings.{get,set}ComputeSettingsprocedures./settings/hardwareroute and page with the selector, detected hardware list and refresh button; sidebar entry and title mapping added.enandeslocales for the new strings (ja / zh-TW deferred to native translators).Docs
Tested on Windows 11 + RTX 4080 Laptop + CUDA 13.2: rebuilt the wrapper with the CMake fixes, copied the two cuBLAS DLLs next to whisper.node, launched the app, dictated a sentence. Log confirmed `whisper_native_binding: cuda`; GPU peaked at 100% while CPU stayed low. macOS (Metal) and Linux (Vulkan) paths were not re-tested on this branch; they rely on the existing loader fallback.
Summary by CodeRabbit
New Features
Bug Fixes
Documentation