Releases: tetherto/qvac
BCI Whispercpp Test Assets v0.1.0
Model files and test fixtures for @qvac/bci-whispercpp integration tests and examples.
QVAC SDK v0.9.0
📦 NPM: https://www.npmjs.com/package/@qvac/sdk/v/0.9.0
This release significantly expands the SDK's capabilities with finetuning support, image generation via Stable Diffusion, duplex streaming transcription, and a suspend/resume lifecycle for mobile apps. Delegation gets healthier with heartbeat probes and remote cancellation. Tool-calling completions are now more robust with KV cache fixes, and a new profiler gives deep visibility into operation performance. React Native compatibility improves with Buffer-free diffusion and better progress event handling.
💥 Breaking Changes
ping() Replaced by heartbeat()
The ping() API has been replaced by heartbeat(), which supports both local and delegated (P2P) health checks. This enables proactive provider status monitoring before and during delegated inference.
Before:
import { ping } from "@qvac/sdk";
const pong = await ping();After:
import { heartbeat } from "@qvac/sdk";
// Local heartbeat (replaces ping)
await heartbeat();
// Delegated heartbeat — check if a remote provider is alive
await heartbeat({
delegate: { topic: "topicHex", providerPublicKey: "peerHex", timeout: 3000 },
});🔌 New APIs
Finetuning
The SDK now supports LoRA finetuning of loaded LLM models. Training runs can be started, paused, resumed, cancelled, and inspected — all through a single finetune() function. Progress streams provide real-time loss and step metrics.
import { finetune } from "@qvac/sdk";
const handle = finetune({
modelId,
options: {
trainDatasetDir: "./dataset/train",
validation: { type: "dataset", path: "./dataset/eval" },
outputParametersDir: "./artifacts/lora",
numberOfEpochs: 2,
},
});
for await (const progress of handle.progressStream) {
console.log(progress.global_steps, progress.loss);
}
const result = await handle.result;Operations: start, resume, pause, cancel, getState. Omit operation to let the addon auto-detect whether to start fresh or resume.
Image Generation (Diffusion)
Stable Diffusion models are now integrated as a first-class SDK capability. Load a diffusion model and generate images with step-by-step progress tracking.
import { loadModel, diffusion, SD_V2_1_1B_Q8_0 } from "@qvac/sdk";
const modelId = await loadModel({
modelSrc: SD_V2_1_1B_Q8_0,
modelType: "diffusion",
modelConfig: { prediction: "v" },
});
const { progressStream, outputs, stats } = diffusion({
modelId,
prompt: "a cat sitting on a windowsill",
width: 512,
height: 512,
steps: 20,
});
for await (const { step, totalSteps } of progressStream) {
console.log(`${step}/${totalSteps}`);
}
const buffers = await outputs;Duplex Streaming Transcription (transcribeStream)
A new bidirectional streaming API lets you feed audio incrementally and receive transcription segments as speech is detected, enabling real-time voice interfaces.
import { transcribeStream } from "@qvac/sdk";
const session = await transcribeStream({ modelId });
session.write(audioChunk);
session.end();
for await (const text of session) {
console.log(text);
}
session.destroy();The previous single-shot transcribeStream({ modelId, audioChunk }) pattern still works but logs a deprecation warning — use transcribe() for batch transcription.
Suspend/Resume Lifecycle
Mobile and desktop apps can now cleanly suspend and resume SDK operations when the app enters the background or foreground, preventing resource leaks and stale state.
import { suspend, resume } from "@qvac/sdk";
await suspend(); // app going to background
await resume(); // app returning to foregroundDelegated Cancellation
Remote inference and downloads running on a delegation provider can now be cancelled from the consumer side.
import { cancel } from "@qvac/sdk";
await cancel({ operation: "inference", modelId: "delegated-model-id" });
await cancel({
operation: "downloadAsset",
downloadKey: "download-key",
delegate: { topic: "topicHex", providerPublicKey: "peerHex" },
});Delegation Health Check Timeout
A new healthCheckTimeout option on the delegate config lets you control how long the RPC health probe waits before marking a cached connection as stale and reconnecting.
await loadModel({
modelSrc: LLAMA_3_2_1B_INST_Q4_0,
modelType: "llm",
delegate: {
topic: topicHex,
providerPublicKey,
timeout: 30_000,
healthCheckTimeout: 2000,
},
});Addon Stats Across All Operations
All inference operations now return detailed performance stats from the underlying addons. Completion, transcription, translation, TTS, and embedding responses all include stats like tokensPerSecond, timeToFirstToken, audioDuration, and the new backendDevice field ("cpu" or "gpu").
const { embedding, stats } = await embed({ modelId, text: "hello" });
console.log(stats?.backendDevice); // "cpu" | "gpu"✨ Features
- CLD2 language detection is now integrated into the SDK for automatic language identification.
- OCR plugin updated to work with
@qvac/ocr-onnx@0.4.0. - TTS interface refactored — the TTS package uses a new
files-based constructor with absolute paths, replacing the legacy loader pattern.
🐞 Bug Fixes
- KV cache preserved across tool-call round-trips — multi-turn tool-calling completions no longer lose context between rounds.
- KV cache save race condition fixed in tool-calling completions — concurrent saves no longer corrupt the cache.
<think>blocks stripped before parsing tool calls — reasoning traces from models like DeepSeek no longer break tool call extraction.- Progress event buffering — throttled progress events are now buffered instead of dropped, ensuring no updates are lost during fast download sequences.
- RPC progress throttling — progress frames are throttled to prevent
Maximum call stack size exceedederrors during high-frequency updates. - Clean process exit — the Bare runtime process global is now handled correctly, and RPC close triggers a clean exit.
- Connection teardown race in
closeConnectionsresolved — concurrent teardowns no longer deadlock. - React Native diffusion compatibility —
Bufferreplaced withUint8Arrayin the diffusion client, fixing React Native builds. - Download progress accuracy — registry downloads now use network-layer progress instead of disk I/O measurements.
- VLM addon classification — the model registry was regenerated to fix incorrect VLM addon type assignments.
- ONNX companion files —
.onnx.datacompanion files are now correctly resolved during registry model resolution. - Security hardening — multiple code scanning alerts resolved across SDK pod packages.
📦 Model Changes
Model registry updated: 312 → 653 (+341). See model changes for the full list.
- 295 Bergamot translation models — offline NMT covering 42 language pairs bidirectional (az, be, bg, bn, bs, ca, da, de, el, et, fa, fi, gu, he, hi, hr, hu, id, is, kn, ko, lt, lv, ml, ms, mt, nb, nl, nn, pl, ro, sk, sl, sq, sr, sv, ta, te, tr, uk, vi). Each pair includes model weights, lexical shortlists, vocabularies, and metadata.
- 5 FLUX models — FLUX.2 Klein 4B in Q4_0, Q4_K_M, Q6_K, Q8_0 quantizations plus VAE.
- 4 Stable Diffusion models — SD v2.1 1B (Q4_0, Q8_0) and SDXL Base 1.0 3B (Q4_0, Q8_0).
- 17 TTS Supertonic models — Official Supertone FP32 variants including duration predictor, text encoder, vocoder, config, unicode indexer, and 10 voice styles.
- 1 LLM model — Qwen3 4B (Q4_K_M).
🧹 Other Changes
- Updated addon dependencies:
@qvac/tts-onnxto v0.6.7,@qvac/transcription-whispercppto latest, Parakeet to v0.2.7,@qvac/diffusion-cppto ^0.1.3. - Replaced FeatureBase support links with Discord channel.
- Bumped
bare-cryptoand@qvac/ragfor runtime stability. - Renamed
@tethertonpm references to@qvacnamespace across READMEs. - Improved test infrastructure with SDK test bootstrap and CI model caching.
QVAC LLM Addon v0.16.0
This release migrates the LLM addon off BaseInference inheritance and the WeightsProvider download layer onto the composable createJobHandler + exclusiveRunQueue utilities from @qvac/infer-base@^0.4.0. The constructor signature is replaced with a single object whose files.model field is an ordered array of absolute paths and files.projectionModel is an optional absolute path for multimodal models. This is a breaking change — every caller must update.
Breaking Changes
Constructor signature: single object with files, no Loader
LlmLlamacpp now takes a single { files, config, logger?, opts? } object. The old Loader + diskPath + modelName + two-arg (args, config) shape is gone — callers pre-resolve absolute paths and supply them as files.model.
// BEFORE (≤ 0.15.x)
const FilesystemDL = require('@qvac/dl-filesystem')
const loader = new FilesystemDL({ dirPath: '/models' })
const model = new LlmLlamacpp({
loader,
modelName: 'Qwen3-1.7B-Q4_0.gguf',
diskPath: '/models',
logger: console,
opts: { stats: true }
}, { ctx_size: '4096', gpu_layers: '99' })
// AFTER (0.16.0)
const model = new LlmLlamacpp({
files: {
model: ['/models/Qwen3-1.7B-Q4_0.gguf']
},
config: { ctx_size: '4096', gpu_layers: '99' },
logger: console,
opts: { stats: true }
})For sharded models the caller passes the full ordered list — the <basename>.tensors.txt companion first, followed by every <basename>-NNNNN-of-MMMMM.gguf shard in ascending order. For multimodal models, files.projectionModel carries the absolute path to the mmproj file:
const model = new LlmLlamacpp({
files: {
model: [
'/models/medgemma-4b-it-Q4_1.tensors.txt',
'/models/medgemma-4b-it-Q4_1-00001-of-00005.gguf',
'/models/medgemma-4b-it-Q4_1-00002-of-00005.gguf',
'/models/medgemma-4b-it-Q4_1-00003-of-00005.gguf',
'/models/medgemma-4b-it-Q4_1-00004-of-00005.gguf',
'/models/medgemma-4b-it-Q4_1-00005-of-00005.gguf'
],
projectionModel: '/models/mmproj-model-f16.gguf'
},
config: { gpu_layers: '99' }
})BaseInference inheritance and WeightsProvider removed
LlmLlamacpp no longer extends BaseInference and no longer touches the WeightsProvider download layer. The class composes createJobHandler and exclusiveRunQueue from @qvac/infer-base@^0.4.0 directly. Public lifecycle methods (load / run / finetune / pause / cancel / unload / getState) are unchanged in shape, but downloadWeights and the loader-based progress callbacks are gone — the caller is responsible for placing files on disk before constructing the model.
In-memory streaming from network sources (URLs, Hyperdrive) is no longer supported in the current API. The SDK does not currently use it (models are stored to disk first); this can be re-added when/if the SDK plans to support that feature. Before, it was possible through the Loader abstraction.
Dependency changes
@qvac/infer-basebumped from^0.3.0to^0.4.0.bare-fsis now a runtime dependency (used to stream shards from disk).@qvac/dl-baseand@qvac/dl-filesystemare no longer used by this package and have been removed fromdevDependencies.
getState() returns a narrower shape
getState() previously returned { configLoaded, weightsLoaded, destroyed } (the three-field shape inherited from BaseInference). It now returns { configLoaded } only. The weightsLoaded and destroyed fields are gone — weightsLoaded collapsed into configLoaded because the refactored load() does both in one step, and destroyed is no longer tracked since unload() resets configLoaded and nulls the addon handle instead. Callers reading state.weightsLoaded or state.destroyed must switch to state.configLoaded.
Public methods removed from LlmLlamacpp
LlmLlamacpp previously exposed these methods via BaseInference inheritance, all of which are now gone:
downloadWeights(onDownloadProgress, opts)— the download layer is removed; the caller places files on disk and passes absolute paths infiles.model/files.projectionModel.unpause()/stop()— BaseInference job-lifecycle helpers. The refactor still exposespause()andcancel();unpauseis superseded by issuing a newrun()aftercancel().status()— replaced bygetState()for the static readiness flag; per-job state is observed via theQvacResponsereturned byrun().destroy()— folded intounload(), which now both releases native resources and nullsthis.addon.getApiDefinition()— no longer exposed; consumers should import types fromindex.d.ts.
load() takes no arguments
load() previously forwarded ...args through BaseInference.load into LLM's _load(closeLoader, onDownloadProgress). Both arguments are gone — closeLoader is meaningless without a Loader, and onDownloadProgress is superseded by the caller owning download-and-placement before construction. Call await model.load() with no arguments.
Type exports removed from index.d.ts
The following exports are no longer part of the package's public type surface because the loader/download layer they described is gone: ReportProgressCallback, Loader, DownloadWeightsOptions, DownloadResult. TypeScript consumers importing any of these must update to the new LlmLlamacppArgs / files shape.
Features
Constructor input validation
The constructor now throws TypeError('files.model must be a non-empty array of absolute paths') when files or files.model is missing or empty. This produces a clear error for callers porting old code instead of a confusing Cannot read properties of undefined.
run()-before-load() guard
Calling run() before load() now throws Error('Addon not initialized. Call load() first.') instead of dereferencing null and crashing. finetune() already had this guard since the previous release.
load() is now idempotent when already loaded
A second load() call on an already-loaded instance is now a silent no-op instead of unloading and reloading. This aligns with the ReadyResource pattern used elsewhere in QVAC and prevents accidental double-loads from triggering expensive work. Callers that intentionally want to swap weights must call unload() first (which clears configLoaded) and then load() again.
Crash-safe shard streaming
If _streamShards() or addon.activate() throws mid-load (for example a corrupted shard file or a native init failure), the partially-initialized addon is now best-effort-unloaded and this.addon is reset to null. A subsequent load() call starts cleanly instead of leaking a zombie native instance.
Restored JSDoc on FinetuneOptions
Every FinetuneOptions field carries a /** … */ doc comment again, including the default values (numberOfEpochs = 1, learningRate = 1e-4, batchSize = 128, …) so IDE tooltips show them without needing to read docs/finetuning.md.
Bug Fixes
unload() clears the addon reference
unload() now sets this.addon = null after await this.addon.unload(), so post-unload cancel() / pause() / run() calls hit the explicit guards rather than dereferencing a disposed native handle. pause(), cancel(), and the job-handler cancel closure all use optional chaining for the same reason.
Removed dead _isSuppressedNoResponseLog filter
The _createFilteredLogger infrastructure that wrapped the user-supplied logger to swallow 'No response found for job' warnings was tied to the old BaseInference _jobToResponse Map. The new architecture cannot emit that message at all, so the filter, the wrapped logger, and the _originalLogger indirection are all removed. The user-supplied logger is now used directly.
load() is serialized through the exclusive run queue
load() is now routed through the same exclusiveRunQueue used by run(), finetune(), and unload(). Previously two overlapping load() calls on the same instance could both pass the configLoaded guard before it flipped to true, both stream shards into and activate the native addon, and clobber this.addon — leaking one native handle. Concurrent load() on a single instance is now safe.
Constructor rejects non-absolute path entries
Each entry in files.model is now validated with path.isAbsolute() (matching the existing error-message contract), and the same check now applies to the optional files.projectionModel — previously it had no validation at all. Relative paths are rejected at construction time instead of bubbling up from bare-fs or the native load.
Pull Requests
- #1494 - chore[bc]: LLM addon interface refactor — remove BaseInference and WeightsProvider
QVAC Embed Addon v0.14.0
This release migrates the embed addon off BaseInference inheritance and the WeightsProvider download layer onto the composable createJobHandler + exclusiveRunQueue utilities from @qvac/infer-base@^0.4.0. The constructor signature is replaced with a single object whose files.model field is an ordered array of absolute paths, mirroring the parallel LLM and diffusion addon refactors. This is a breaking change — every caller must update.
Breaking Changes
Constructor signature: single object with files, no Loader
GGMLBert now takes a single { files, config?, logger?, opts? } object. The old Loader + diskPath + modelName + two-arg (args, config) shape is gone — callers pre-resolve absolute paths and supply them as files.model.
// BEFORE (≤ 0.13.x)
const FilesystemDL = require('@qvac/dl-filesystem')
const loader = new FilesystemDL({ dirPath: '/models' })
const model = new GGMLBert({
loader,
modelName: 'bge-small-en-v1.5-q4_0.gguf',
diskPath: '/models',
logger: console,
opts: { stats: true }
}, { device: 'gpu', batch_size: '512' })
// AFTER (0.14.0)
const model = new GGMLBert({
files: {
model: ['/models/bge-small-en-v1.5-q4_0.gguf']
},
config: { device: 'gpu', batch_size: '512' },
logger: console,
opts: { stats: true }
})For sharded models the caller passes the full ordered list — the <basename>.tensors.txt companion first, followed by every <basename>-NNNNN-of-MMMMM.gguf shard in ascending order:
const model = new GGMLBert({
files: {
model: [
'/models/big-embed-model.tensors.txt',
'/models/big-embed-model-00001-of-00003.gguf',
'/models/big-embed-model-00002-of-00003.gguf',
'/models/big-embed-model-00003-of-00003.gguf'
]
},
config: { device: 'gpu' }
})BaseInference inheritance and WeightsProvider removed
GGMLBert no longer extends BaseInference and no longer touches the WeightsProvider download layer. The class composes createJobHandler and exclusiveRunQueue from @qvac/infer-base@^0.4.0 directly. Public lifecycle methods (load / run / cancel / unload / getState) are unchanged in shape, but downloadWeights and the loader-based progress callbacks are gone — the caller is responsible for placing files on disk before constructing the model.
In-memory streaming from network sources (URLs, Hyperdrive) is no longer supported in the current API. The SDK does not currently use it (models are stored to disk first); this can be re-added when/if the SDK plans to support that feature. Before, it was possible through the Loader abstraction.
Dependency changes
@qvac/infer-basebumped from^0.2.2to^0.4.0.bare-fsis now a runtime dependency (used to stream shards from disk).@qvac/dl-filesystemand@qvac/dl-hyperdriveare no longer used by this package and have been removed fromdevDependencies/peerDependencies.
getState() returns a narrower shape
getState() previously returned { configLoaded, weightsLoaded, destroyed } (the three-field shape inherited from BaseInference). It now returns { configLoaded } only. The weightsLoaded and destroyed fields are gone — weightsLoaded collapsed into configLoaded because the refactored load() does both in one step, and destroyed is no longer tracked since unload() resets configLoaded and nulls the addon handle instead. Callers reading state.weightsLoaded or state.destroyed must switch to state.configLoaded.
Public methods removed from GGMLBert
GGMLBert previously exposed these methods via BaseInference inheritance, all of which are now gone:
downloadWeights(onDownloadProgress, opts)— the download layer is removed; the caller places files on disk and passes absolute paths infiles.model.pause()/unpause()/stop()— BaseInference job-lifecycle helpers. The refactor usescreateJobHandlerdirectly; usecancel()to terminate an in-flight run.status()— replaced bygetState()for the static readiness flag; per-job state is observed via theQvacResponsereturned byrun().destroy()— folded intounload(), which now both releases native resources and nullsthis.addon.getApiDefinition()— no longer exposed; consumers should import types fromindex.d.ts.
load() takes no arguments
load() previously forwarded ...args through BaseInference.load into embed's _load(closeLoader, reportProgressCallback). Both arguments are gone — closeLoader is meaningless without a Loader, and reportProgressCallback is superseded by the caller owning download-and-placement before construction. Call await model.load() with no arguments.
Type exports removed from index.d.ts
The following exports are no longer part of the package's public type surface because the loader/download layer they described is gone: ReportProgressCallback, Loader, GGMLArgs, DownloadWeightsOptions, DownloadResult. TypeScript consumers importing any of these must update to the new GGMLBertArgs / files shape.
BertInterface outputCb signature: jobId dropped
The exported BertInterface class's constructor still takes (binding, configurationParams, outputCb), but the outputCb signature changed:
// BEFORE
(addon: unknown, event: string, jobId: number, data: unknown, error?: Error) => void
// AFTER
(addon: unknown, event: string, data: unknown, error?: Error) => voidThe jobId: number argument is gone because createJobHandler owns the single active job directly; the wrapper no longer needs a per-job identifier in the callback chain. External callers constructing BertInterface with a custom outputCb must drop the third argument.
BertInterface.runJob return type
BertInterface.runJob(input) previously returned Promise<void>. It now returns Promise<boolean> — true if the job was accepted, false if the addon was already busy. GGMLBert uses this return to surface a busy error to the caller instead of silently dropping the job.
Features
Constructor input validation
The constructor now throws TypeError('files.model must be a non-empty array of absolute paths') when files or files.model is missing or empty. This produces a clear error for callers porting old code instead of a confusing Cannot read properties of undefined.
run()-before-load() guard
Calling run() before load() now throws Error('Addon not initialized. Call load() first.') instead of dereferencing null and crashing.
load() is now idempotent when already loaded
A second load() call on an already-loaded instance is now a silent no-op instead of unloading and reloading. This aligns with the ReadyResource pattern used elsewhere in QVAC and prevents accidental double-loads from triggering expensive work. Callers that intentionally want to swap weights must call unload() first (which clears configLoaded) and then load() again.
Crash-safe shard streaming
If _streamShards() or addon.activate() throws mid-load (for example a corrupted shard file or a native init failure), the partially-initialized addon is now best-effort-unloaded and this.addon is reset to null. A subsequent load() call starts cleanly instead of leaking a zombie native instance.
Bug Fixes
unload() clears the addon reference
unload() now sets this.addon = null after await this.addon.unload(), so post-unload cancel() / run() calls hit the explicit guards rather than dereferencing a disposed native handle. cancel() and the job-handler cancel closure both use optional chaining for the same reason.
Unknown addon events no longer pollute the output stream
_addonOutputCallback previously fed any non-stats / non-error event payload into response.output, including unknown events. It now logs unknown events at warn level (these indicate a native-layer change worth surfacing) and only forwards Embeddings payloads to the active response.
load() is serialized through the exclusive run queue
load() is now routed through the same exclusiveRunQueue used by run() and unload(). Previously two overlapping load() calls on the same instance could both pass the configLoaded guard before it flipped to true, both stream shards into and activate the native addon, and clobber this.addon — leaking one native handle. Concurrent load() on a single instance is now safe.
Constructor rejects non-absolute path entries
Each entry in files.model is now validated with path.isAbsolute() (matching the existing error-message contract). Relative paths are rejected at construction time instead of bubbling up from bare-fs or the native load.
Pull Requests
- #1493 - chore[bc]: embed addon interface refactor — remove BaseInference and WeightsProvider
QVAC Stable Diffusion Addon v0.3.0
This release migrates the diffusion addon off BaseInference inheritance and onto the composable createJobHandler + exclusiveRunQueue utilities from @qvac/infer-base@^0.4.0. The constructor signature is replaced with a single object whose files field carries absolute paths for every model component, mirroring the parallel embed and LLM addon refactors. This is a breaking change — every caller must update.
Breaking Changes
Constructor signature: single object with files instead of (args, config)
ImgStableDiffusion now takes a single { files, config, logger?, opts? } object. The old diskPath + modelName + per-component filename pattern is gone — callers pass absolute paths directly via files. Companion model fields are renamed (clipLModel → clipL, clipGModel → clipG, t5XxlModel → t5Xxl, llmModel → llm, vaeModel → vae).
// BEFORE (≤ 0.2.x)
const model = new ImgStableDiffusion({
diskPath: '/models',
modelName: 'flux-2-klein-4b-Q8_0.gguf',
llmModel: 'Qwen3-4B-Q4_K_M.gguf',
vaeModel: 'flux2-vae.safetensors',
logger: console
}, { threads: 8 })
// AFTER (0.3.0)
const model = new ImgStableDiffusion({
files: {
model: '/models/flux-2-klein-4b-Q8_0.gguf',
llm: '/models/Qwen3-4B-Q4_K_M.gguf',
vae: '/models/flux2-vae.safetensors'
},
config: { threads: 8 },
logger: console,
opts: { stats: true }
})BaseInference inheritance removed
ImgStableDiffusion no longer extends BaseInference. The class composes createJobHandler and exclusiveRunQueue from @qvac/infer-base@^0.4.0 directly. The public lifecycle (load / run / cancel / unload / getState) is unchanged in shape; only construction differs. Internal helpers like _withExclusiveRun and _outputCallback are removed.
Caller owns absolute paths — addon no longer joins diskPath + filename
Callers that previously relied on the addon to resolve path.join(diskPath, filename) must now do that resolution themselves before constructing the model.
getState() returns a narrower shape
getState() previously returned { configLoaded, weightsLoaded, destroyed } (the three-field shape from BaseInference). It now returns { configLoaded } only. The weightsLoaded and destroyed fields are gone — weightsLoaded collapsed into configLoaded because the refactored load() does both in one step, and destroyed is no longer tracked since unload() resets configLoaded and nulls the addon handle instead. Callers reading state.weightsLoaded or state.destroyed must switch to state.configLoaded.
Public methods removed from ImgStableDiffusion
ImgStableDiffusion previously exposed these methods via BaseInference inheritance, all of which are now gone:
downloadWeights(onDownloadProgress, opts)— the diffusion addon never used the loader in practice, but the inherited method was still present on the public surface. It is removed along with the base class.pause()/unpause()/stop()— BaseInference job-lifecycle helpers. The refactor usescreateJobHandlerdirectly; usecancel()to terminate an in-flight generation.status()— replaced bygetState()for the static readiness flag; per-job state is observed via theQvacResponsereturned byrun().destroy()— folded intounload(), which now both releases native resources and nullsthis.addon.getApiDefinition()— no longer exposed; consumers should import types fromindex.d.ts.
cancel() no longer accepts a jobId
BaseInference.cancel(jobId) took an optional jobId argument. The refactor's cancel() is parameterless — there is always at most one active generation per instance, owned by createJobHandler. Any caller passing a jobId will have it ignored; update call sites to await model.cancel().
Features
Constructor input validation
The constructor now throws TypeError('files.model must be an absolute path string') when files.model is missing or not a string, or TypeError('files.model must be an absolute path (got: <value>)') when supplied as a relative path. This produces a clear error for callers porting old code instead of a confusing Cannot read properties of undefined. The same validation applies to optional companion fields (clipL, clipG, t5Xxl, llm, vae) when supplied.
run()-before-load() guard
Calling run() before load() now throws Error('Addon not initialized. Call load() first.') instead of crashing in native code. Covered by a new regression test in test/integration/api-behavior.test.js.
load() is now idempotent when already loaded
A second load() call on an already-loaded instance is now a silent no-op instead of unloading and reloading. This aligns with the ReadyResource pattern used elsewhere in QVAC and prevents accidental double-loads from triggering expensive work. Callers that intentionally want to swap weights must call unload() first (which clears configLoaded) and then load() again.
Broader split-layout detection
isSplitLayout now also triggers when only clipL or clipG is supplied. This closes a footgun where a FLUX.1 caller passing { model, clipL, clipG, vae } (without t5Xxl) would silently mis-route the diffusion model into the all-in-one path parameter and fail to load.
Bug Fixes
unload() clears the addon reference
unload() now sets this.addon = null after await this.addon.unload(), so post-unload cancel() / run() calls hit the explicit if (!this.addon) guard rather than dereferencing a disposed native handle.
Unknown addon events no longer pollute the output stream
_addonOutputCallback previously had a fallthrough that pushed any non-error / non-image / non-stats event into response.output (including null and undefined). It now logs unknown events at debug level and does not feed them into the active response.
Crash-safe activation
If addon.activate() throws during _load() (for example a native init failure or a missing model file discovered late), the partially-initialized addon is now best-effort-unloaded, the native logger is released, and this.addon is reset to null. A subsequent load() call starts cleanly instead of leaking a zombie native instance.
load() is serialized through the exclusive run queue
load() is now routed through the same exclusiveRunQueue used by run() and unload(). Previously two overlapping load() calls on the same instance could both pass the configLoaded guard before it flipped to true, both allocate a native addon, and clobber this.addon — leaking one native handle. Concurrent load() on a single instance is now safe.
Pull Requests
- #1496 - chore[bc]: diffusion addon interface refactor — remove BaseInference
QVAC Stable Diffusion Addon v0.2.0
Added
- FLUX.2 img2img support with in-context conditioning (
ref_images) viainit_imageparameter - JS-side input validation for
readImageDimensions()with buffer-length guards for truncated PNG/JPEG - Regression tests for FLUX img2img prediction guard and truncated image handling
Changed
- FLUX img2img now requires explicit
prediction: 'flux2_flow'in config to prevent silent fallback to SDEdit - Updated
predictiondocstring to clarify auto-detection is insufficient for FLUX img2img - Exported
readImageDimensions()for testing and external use
Fixed
readImageDimensions()now safely handles truncated/corrupt PNG and JPEG buffers
QVAC Stable Diffusion Addon v0.1.3
Changed
- README,
index.d.ts, andindex.jsJSDoc no longer claim FLUX.1 support forclipLModelandt5XxlModel. The addon exposes SDXL, SD3, and FLUX.2-klein only — FLUX.1 was never wired through the JS layer. The example model name in the constructor JSDoc is also corrected toflux-2-klein-4b-Q8_0.gguf.
QVAC OCR Addon v0.4.2
Fixed
- Updated README to use current package name (
@qvac/ocr-onnx) and monorepo paths - Removed redundant
ensure-npm-publicjob from on-merge workflow
QVAC OCR Addon v0.4.1
Fixed
- SIGABRT crash on process exit in OCR addon
- Use HTTPS instead of SSH for vcpkg registry URLs
Changed
- Updated OCR integration tests for
createJobHandlermigration - Removed hyperdrive references and dependencies
- Renamed
dl-hyperdriveanddl-filesystempackage references - Migrated qvac-devops to oss-action
QVAC LLM Addon v0.14.4
Changed
- Updated qvac-fabric dependency from 7248.2.1 to 7248.2.3, which fixes OpenCL kernel cache support on Android.
Added
openclCacheDiroption inLlamaConfig(index.d.ts): writable directory for OpenCL kernel binary cache, required on Android for fast GPU startup.cache-type-kandcache-type-voptions inLlamaConfig(index.d.ts): configure KV cache quantization types.