Skip to content

Releases: tetherto/qvac

BCI Whispercpp Test Assets v0.1.0

21 Apr 12:29
1f9897f

Choose a tag to compare

Pre-release

Model files and test fixtures for @qvac/bci-whispercpp integration tests and examples.

QVAC SDK v0.9.0

17 Apr 10:51
5eed356

Choose a tag to compare

📦 NPM: https://www.npmjs.com/package/@qvac/sdk/v/0.9.0

This release significantly expands the SDK's capabilities with finetuning support, image generation via Stable Diffusion, duplex streaming transcription, and a suspend/resume lifecycle for mobile apps. Delegation gets healthier with heartbeat probes and remote cancellation. Tool-calling completions are now more robust with KV cache fixes, and a new profiler gives deep visibility into operation performance. React Native compatibility improves with Buffer-free diffusion and better progress event handling.


💥 Breaking Changes

ping() Replaced by heartbeat()

The ping() API has been replaced by heartbeat(), which supports both local and delegated (P2P) health checks. This enables proactive provider status monitoring before and during delegated inference.

Before:

import { ping } from "@qvac/sdk";
const pong = await ping();

After:

import { heartbeat } from "@qvac/sdk";

// Local heartbeat (replaces ping)
await heartbeat();

// Delegated heartbeat — check if a remote provider is alive
await heartbeat({
  delegate: { topic: "topicHex", providerPublicKey: "peerHex", timeout: 3000 },
});

🔌 New APIs

Finetuning

The SDK now supports LoRA finetuning of loaded LLM models. Training runs can be started, paused, resumed, cancelled, and inspected — all through a single finetune() function. Progress streams provide real-time loss and step metrics.

import { finetune } from "@qvac/sdk";

const handle = finetune({
  modelId,
  options: {
    trainDatasetDir: "./dataset/train",
    validation: { type: "dataset", path: "./dataset/eval" },
    outputParametersDir: "./artifacts/lora",
    numberOfEpochs: 2,
  },
});

for await (const progress of handle.progressStream) {
  console.log(progress.global_steps, progress.loss);
}
const result = await handle.result;

Operations: start, resume, pause, cancel, getState. Omit operation to let the addon auto-detect whether to start fresh or resume.

Image Generation (Diffusion)

Stable Diffusion models are now integrated as a first-class SDK capability. Load a diffusion model and generate images with step-by-step progress tracking.

import { loadModel, diffusion, SD_V2_1_1B_Q8_0 } from "@qvac/sdk";

const modelId = await loadModel({
  modelSrc: SD_V2_1_1B_Q8_0,
  modelType: "diffusion",
  modelConfig: { prediction: "v" },
});

const { progressStream, outputs, stats } = diffusion({
  modelId,
  prompt: "a cat sitting on a windowsill",
  width: 512,
  height: 512,
  steps: 20,
});

for await (const { step, totalSteps } of progressStream) {
  console.log(`${step}/${totalSteps}`);
}
const buffers = await outputs;

Duplex Streaming Transcription (transcribeStream)

A new bidirectional streaming API lets you feed audio incrementally and receive transcription segments as speech is detected, enabling real-time voice interfaces.

import { transcribeStream } from "@qvac/sdk";

const session = await transcribeStream({ modelId });
session.write(audioChunk);
session.end();

for await (const text of session) {
  console.log(text);
}
session.destroy();

The previous single-shot transcribeStream({ modelId, audioChunk }) pattern still works but logs a deprecation warning — use transcribe() for batch transcription.

Suspend/Resume Lifecycle

Mobile and desktop apps can now cleanly suspend and resume SDK operations when the app enters the background or foreground, preventing resource leaks and stale state.

import { suspend, resume } from "@qvac/sdk";

await suspend(); // app going to background
await resume();  // app returning to foreground

Delegated Cancellation

Remote inference and downloads running on a delegation provider can now be cancelled from the consumer side.

import { cancel } from "@qvac/sdk";

await cancel({ operation: "inference", modelId: "delegated-model-id" });

await cancel({
  operation: "downloadAsset",
  downloadKey: "download-key",
  delegate: { topic: "topicHex", providerPublicKey: "peerHex" },
});

Delegation Health Check Timeout

A new healthCheckTimeout option on the delegate config lets you control how long the RPC health probe waits before marking a cached connection as stale and reconnecting.

await loadModel({
  modelSrc: LLAMA_3_2_1B_INST_Q4_0,
  modelType: "llm",
  delegate: {
    topic: topicHex,
    providerPublicKey,
    timeout: 30_000,
    healthCheckTimeout: 2000,
  },
});

Addon Stats Across All Operations

All inference operations now return detailed performance stats from the underlying addons. Completion, transcription, translation, TTS, and embedding responses all include stats like tokensPerSecond, timeToFirstToken, audioDuration, and the new backendDevice field ("cpu" or "gpu").

const { embedding, stats } = await embed({ modelId, text: "hello" });
console.log(stats?.backendDevice); // "cpu" | "gpu"

✨ Features

  • CLD2 language detection is now integrated into the SDK for automatic language identification.
  • OCR plugin updated to work with @qvac/ocr-onnx@0.4.0.
  • TTS interface refactored — the TTS package uses a new files-based constructor with absolute paths, replacing the legacy loader pattern.

🐞 Bug Fixes

  • KV cache preserved across tool-call round-trips — multi-turn tool-calling completions no longer lose context between rounds.
  • KV cache save race condition fixed in tool-calling completions — concurrent saves no longer corrupt the cache.
  • <think> blocks stripped before parsing tool calls — reasoning traces from models like DeepSeek no longer break tool call extraction.
  • Progress event buffering — throttled progress events are now buffered instead of dropped, ensuring no updates are lost during fast download sequences.
  • RPC progress throttling — progress frames are throttled to prevent Maximum call stack size exceeded errors during high-frequency updates.
  • Clean process exit — the Bare runtime process global is now handled correctly, and RPC close triggers a clean exit.
  • Connection teardown race in closeConnections resolved — concurrent teardowns no longer deadlock.
  • React Native diffusion compatibilityBuffer replaced with Uint8Array in the diffusion client, fixing React Native builds.
  • Download progress accuracy — registry downloads now use network-layer progress instead of disk I/O measurements.
  • VLM addon classification — the model registry was regenerated to fix incorrect VLM addon type assignments.
  • ONNX companion files.onnx.data companion files are now correctly resolved during registry model resolution.
  • Security hardening — multiple code scanning alerts resolved across SDK pod packages.

📦 Model Changes

Model registry updated: 312 → 653 (+341). See model changes for the full list.

  • 295 Bergamot translation models — offline NMT covering 42 language pairs bidirectional (az, be, bg, bn, bs, ca, da, de, el, et, fa, fi, gu, he, hi, hr, hu, id, is, kn, ko, lt, lv, ml, ms, mt, nb, nl, nn, pl, ro, sk, sl, sq, sr, sv, ta, te, tr, uk, vi). Each pair includes model weights, lexical shortlists, vocabularies, and metadata.
  • 5 FLUX models — FLUX.2 Klein 4B in Q4_0, Q4_K_M, Q6_K, Q8_0 quantizations plus VAE.
  • 4 Stable Diffusion models — SD v2.1 1B (Q4_0, Q8_0) and SDXL Base 1.0 3B (Q4_0, Q8_0).
  • 17 TTS Supertonic models — Official Supertone FP32 variants including duration predictor, text encoder, vocoder, config, unicode indexer, and 10 voice styles.
  • 1 LLM model — Qwen3 4B (Q4_K_M).

🧹 Other Changes

  • Updated addon dependencies: @qvac/tts-onnx to v0.6.7, @qvac/transcription-whispercpp to latest, Parakeet to v0.2.7, @qvac/diffusion-cpp to ^0.1.3.
  • Replaced FeatureBase support links with Discord channel.
  • Bumped bare-crypto and @qvac/rag for runtime stability.
  • Renamed @tetherto npm references to @qvac namespace across READMEs.
  • Improved test infrastructure with SDK test bootstrap and CI model caching.

QVAC LLM Addon v0.16.0

19 Apr 09:06
df05614

Choose a tag to compare

This release migrates the LLM addon off BaseInference inheritance and the WeightsProvider download layer onto the composable createJobHandler + exclusiveRunQueue utilities from @qvac/infer-base@^0.4.0. The constructor signature is replaced with a single object whose files.model field is an ordered array of absolute paths and files.projectionModel is an optional absolute path for multimodal models. This is a breaking change — every caller must update.

Breaking Changes

Constructor signature: single object with files, no Loader

LlmLlamacpp now takes a single { files, config, logger?, opts? } object. The old Loader + diskPath + modelName + two-arg (args, config) shape is gone — callers pre-resolve absolute paths and supply them as files.model.

// BEFORE (≤ 0.15.x)
const FilesystemDL = require('@qvac/dl-filesystem')
const loader = new FilesystemDL({ dirPath: '/models' })
const model = new LlmLlamacpp({
  loader,
  modelName: 'Qwen3-1.7B-Q4_0.gguf',
  diskPath: '/models',
  logger: console,
  opts: { stats: true }
}, { ctx_size: '4096', gpu_layers: '99' })

// AFTER (0.16.0)
const model = new LlmLlamacpp({
  files: {
    model: ['/models/Qwen3-1.7B-Q4_0.gguf']
  },
  config: { ctx_size: '4096', gpu_layers: '99' },
  logger: console,
  opts: { stats: true }
})

For sharded models the caller passes the full ordered list — the <basename>.tensors.txt companion first, followed by every <basename>-NNNNN-of-MMMMM.gguf shard in ascending order. For multimodal models, files.projectionModel carries the absolute path to the mmproj file:

const model = new LlmLlamacpp({
  files: {
    model: [
      '/models/medgemma-4b-it-Q4_1.tensors.txt',
      '/models/medgemma-4b-it-Q4_1-00001-of-00005.gguf',
      '/models/medgemma-4b-it-Q4_1-00002-of-00005.gguf',
      '/models/medgemma-4b-it-Q4_1-00003-of-00005.gguf',
      '/models/medgemma-4b-it-Q4_1-00004-of-00005.gguf',
      '/models/medgemma-4b-it-Q4_1-00005-of-00005.gguf'
    ],
    projectionModel: '/models/mmproj-model-f16.gguf'
  },
  config: { gpu_layers: '99' }
})

BaseInference inheritance and WeightsProvider removed

LlmLlamacpp no longer extends BaseInference and no longer touches the WeightsProvider download layer. The class composes createJobHandler and exclusiveRunQueue from @qvac/infer-base@^0.4.0 directly. Public lifecycle methods (load / run / finetune / pause / cancel / unload / getState) are unchanged in shape, but downloadWeights and the loader-based progress callbacks are gone — the caller is responsible for placing files on disk before constructing the model.

In-memory streaming from network sources (URLs, Hyperdrive) is no longer supported in the current API. The SDK does not currently use it (models are stored to disk first); this can be re-added when/if the SDK plans to support that feature. Before, it was possible through the Loader abstraction.

Dependency changes

  • @qvac/infer-base bumped from ^0.3.0 to ^0.4.0.
  • bare-fs is now a runtime dependency (used to stream shards from disk).
  • @qvac/dl-base and @qvac/dl-filesystem are no longer used by this package and have been removed from devDependencies.

getState() returns a narrower shape

getState() previously returned { configLoaded, weightsLoaded, destroyed } (the three-field shape inherited from BaseInference). It now returns { configLoaded } only. The weightsLoaded and destroyed fields are gone — weightsLoaded collapsed into configLoaded because the refactored load() does both in one step, and destroyed is no longer tracked since unload() resets configLoaded and nulls the addon handle instead. Callers reading state.weightsLoaded or state.destroyed must switch to state.configLoaded.

Public methods removed from LlmLlamacpp

LlmLlamacpp previously exposed these methods via BaseInference inheritance, all of which are now gone:

  • downloadWeights(onDownloadProgress, opts) — the download layer is removed; the caller places files on disk and passes absolute paths in files.model / files.projectionModel.
  • unpause() / stop() — BaseInference job-lifecycle helpers. The refactor still exposes pause() and cancel(); unpause is superseded by issuing a new run() after cancel().
  • status() — replaced by getState() for the static readiness flag; per-job state is observed via the QvacResponse returned by run().
  • destroy() — folded into unload(), which now both releases native resources and nulls this.addon.
  • getApiDefinition() — no longer exposed; consumers should import types from index.d.ts.

load() takes no arguments

load() previously forwarded ...args through BaseInference.load into LLM's _load(closeLoader, onDownloadProgress). Both arguments are gone — closeLoader is meaningless without a Loader, and onDownloadProgress is superseded by the caller owning download-and-placement before construction. Call await model.load() with no arguments.

Type exports removed from index.d.ts

The following exports are no longer part of the package's public type surface because the loader/download layer they described is gone: ReportProgressCallback, Loader, DownloadWeightsOptions, DownloadResult. TypeScript consumers importing any of these must update to the new LlmLlamacppArgs / files shape.

Features

Constructor input validation

The constructor now throws TypeError('files.model must be a non-empty array of absolute paths') when files or files.model is missing or empty. This produces a clear error for callers porting old code instead of a confusing Cannot read properties of undefined.

run()-before-load() guard

Calling run() before load() now throws Error('Addon not initialized. Call load() first.') instead of dereferencing null and crashing. finetune() already had this guard since the previous release.

load() is now idempotent when already loaded

A second load() call on an already-loaded instance is now a silent no-op instead of unloading and reloading. This aligns with the ReadyResource pattern used elsewhere in QVAC and prevents accidental double-loads from triggering expensive work. Callers that intentionally want to swap weights must call unload() first (which clears configLoaded) and then load() again.

Crash-safe shard streaming

If _streamShards() or addon.activate() throws mid-load (for example a corrupted shard file or a native init failure), the partially-initialized addon is now best-effort-unloaded and this.addon is reset to null. A subsequent load() call starts cleanly instead of leaking a zombie native instance.

Restored JSDoc on FinetuneOptions

Every FinetuneOptions field carries a /** … */ doc comment again, including the default values (numberOfEpochs = 1, learningRate = 1e-4, batchSize = 128, …) so IDE tooltips show them without needing to read docs/finetuning.md.

Bug Fixes

unload() clears the addon reference

unload() now sets this.addon = null after await this.addon.unload(), so post-unload cancel() / pause() / run() calls hit the explicit guards rather than dereferencing a disposed native handle. pause(), cancel(), and the job-handler cancel closure all use optional chaining for the same reason.

Removed dead _isSuppressedNoResponseLog filter

The _createFilteredLogger infrastructure that wrapped the user-supplied logger to swallow 'No response found for job' warnings was tied to the old BaseInference _jobToResponse Map. The new architecture cannot emit that message at all, so the filter, the wrapped logger, and the _originalLogger indirection are all removed. The user-supplied logger is now used directly.

load() is serialized through the exclusive run queue

load() is now routed through the same exclusiveRunQueue used by run(), finetune(), and unload(). Previously two overlapping load() calls on the same instance could both pass the configLoaded guard before it flipped to true, both stream shards into and activate the native addon, and clobber this.addon — leaking one native handle. Concurrent load() on a single instance is now safe.

Constructor rejects non-absolute path entries

Each entry in files.model is now validated with path.isAbsolute() (matching the existing error-message contract), and the same check now applies to the optional files.projectionModel — previously it had no validation at all. Relative paths are rejected at construction time instead of bubbling up from bare-fs or the native load.

Pull Requests

  • #1494 - chore[bc]: LLM addon interface refactor — remove BaseInference and WeightsProvider

QVAC Embed Addon v0.14.0

19 Apr 09:20
df05614

Choose a tag to compare

This release migrates the embed addon off BaseInference inheritance and the WeightsProvider download layer onto the composable createJobHandler + exclusiveRunQueue utilities from @qvac/infer-base@^0.4.0. The constructor signature is replaced with a single object whose files.model field is an ordered array of absolute paths, mirroring the parallel LLM and diffusion addon refactors. This is a breaking change — every caller must update.

Breaking Changes

Constructor signature: single object with files, no Loader

GGMLBert now takes a single { files, config?, logger?, opts? } object. The old Loader + diskPath + modelName + two-arg (args, config) shape is gone — callers pre-resolve absolute paths and supply them as files.model.

// BEFORE (≤ 0.13.x)
const FilesystemDL = require('@qvac/dl-filesystem')
const loader = new FilesystemDL({ dirPath: '/models' })
const model = new GGMLBert({
  loader,
  modelName: 'bge-small-en-v1.5-q4_0.gguf',
  diskPath: '/models',
  logger: console,
  opts: { stats: true }
}, { device: 'gpu', batch_size: '512' })

// AFTER (0.14.0)
const model = new GGMLBert({
  files: {
    model: ['/models/bge-small-en-v1.5-q4_0.gguf']
  },
  config: { device: 'gpu', batch_size: '512' },
  logger: console,
  opts: { stats: true }
})

For sharded models the caller passes the full ordered list — the <basename>.tensors.txt companion first, followed by every <basename>-NNNNN-of-MMMMM.gguf shard in ascending order:

const model = new GGMLBert({
  files: {
    model: [
      '/models/big-embed-model.tensors.txt',
      '/models/big-embed-model-00001-of-00003.gguf',
      '/models/big-embed-model-00002-of-00003.gguf',
      '/models/big-embed-model-00003-of-00003.gguf'
    ]
  },
  config: { device: 'gpu' }
})

BaseInference inheritance and WeightsProvider removed

GGMLBert no longer extends BaseInference and no longer touches the WeightsProvider download layer. The class composes createJobHandler and exclusiveRunQueue from @qvac/infer-base@^0.4.0 directly. Public lifecycle methods (load / run / cancel / unload / getState) are unchanged in shape, but downloadWeights and the loader-based progress callbacks are gone — the caller is responsible for placing files on disk before constructing the model.

In-memory streaming from network sources (URLs, Hyperdrive) is no longer supported in the current API. The SDK does not currently use it (models are stored to disk first); this can be re-added when/if the SDK plans to support that feature. Before, it was possible through the Loader abstraction.

Dependency changes

  • @qvac/infer-base bumped from ^0.2.2 to ^0.4.0.
  • bare-fs is now a runtime dependency (used to stream shards from disk).
  • @qvac/dl-filesystem and @qvac/dl-hyperdrive are no longer used by this package and have been removed from devDependencies / peerDependencies.

getState() returns a narrower shape

getState() previously returned { configLoaded, weightsLoaded, destroyed } (the three-field shape inherited from BaseInference). It now returns { configLoaded } only. The weightsLoaded and destroyed fields are gone — weightsLoaded collapsed into configLoaded because the refactored load() does both in one step, and destroyed is no longer tracked since unload() resets configLoaded and nulls the addon handle instead. Callers reading state.weightsLoaded or state.destroyed must switch to state.configLoaded.

Public methods removed from GGMLBert

GGMLBert previously exposed these methods via BaseInference inheritance, all of which are now gone:

  • downloadWeights(onDownloadProgress, opts) — the download layer is removed; the caller places files on disk and passes absolute paths in files.model.
  • pause() / unpause() / stop() — BaseInference job-lifecycle helpers. The refactor uses createJobHandler directly; use cancel() to terminate an in-flight run.
  • status() — replaced by getState() for the static readiness flag; per-job state is observed via the QvacResponse returned by run().
  • destroy() — folded into unload(), which now both releases native resources and nulls this.addon.
  • getApiDefinition() — no longer exposed; consumers should import types from index.d.ts.

load() takes no arguments

load() previously forwarded ...args through BaseInference.load into embed's _load(closeLoader, reportProgressCallback). Both arguments are gone — closeLoader is meaningless without a Loader, and reportProgressCallback is superseded by the caller owning download-and-placement before construction. Call await model.load() with no arguments.

Type exports removed from index.d.ts

The following exports are no longer part of the package's public type surface because the loader/download layer they described is gone: ReportProgressCallback, Loader, GGMLArgs, DownloadWeightsOptions, DownloadResult. TypeScript consumers importing any of these must update to the new GGMLBertArgs / files shape.

BertInterface outputCb signature: jobId dropped

The exported BertInterface class's constructor still takes (binding, configurationParams, outputCb), but the outputCb signature changed:

// BEFORE
(addon: unknown, event: string, jobId: number, data: unknown, error?: Error) => void
// AFTER
(addon: unknown, event: string, data: unknown, error?: Error) => void

The jobId: number argument is gone because createJobHandler owns the single active job directly; the wrapper no longer needs a per-job identifier in the callback chain. External callers constructing BertInterface with a custom outputCb must drop the third argument.

BertInterface.runJob return type

BertInterface.runJob(input) previously returned Promise<void>. It now returns Promise<boolean>true if the job was accepted, false if the addon was already busy. GGMLBert uses this return to surface a busy error to the caller instead of silently dropping the job.

Features

Constructor input validation

The constructor now throws TypeError('files.model must be a non-empty array of absolute paths') when files or files.model is missing or empty. This produces a clear error for callers porting old code instead of a confusing Cannot read properties of undefined.

run()-before-load() guard

Calling run() before load() now throws Error('Addon not initialized. Call load() first.') instead of dereferencing null and crashing.

load() is now idempotent when already loaded

A second load() call on an already-loaded instance is now a silent no-op instead of unloading and reloading. This aligns with the ReadyResource pattern used elsewhere in QVAC and prevents accidental double-loads from triggering expensive work. Callers that intentionally want to swap weights must call unload() first (which clears configLoaded) and then load() again.

Crash-safe shard streaming

If _streamShards() or addon.activate() throws mid-load (for example a corrupted shard file or a native init failure), the partially-initialized addon is now best-effort-unloaded and this.addon is reset to null. A subsequent load() call starts cleanly instead of leaking a zombie native instance.

Bug Fixes

unload() clears the addon reference

unload() now sets this.addon = null after await this.addon.unload(), so post-unload cancel() / run() calls hit the explicit guards rather than dereferencing a disposed native handle. cancel() and the job-handler cancel closure both use optional chaining for the same reason.

Unknown addon events no longer pollute the output stream

_addonOutputCallback previously fed any non-stats / non-error event payload into response.output, including unknown events. It now logs unknown events at warn level (these indicate a native-layer change worth surfacing) and only forwards Embeddings payloads to the active response.

load() is serialized through the exclusive run queue

load() is now routed through the same exclusiveRunQueue used by run() and unload(). Previously two overlapping load() calls on the same instance could both pass the configLoaded guard before it flipped to true, both stream shards into and activate the native addon, and clobber this.addon — leaking one native handle. Concurrent load() on a single instance is now safe.

Constructor rejects non-absolute path entries

Each entry in files.model is now validated with path.isAbsolute() (matching the existing error-message contract). Relative paths are rejected at construction time instead of bubbling up from bare-fs or the native load.

Pull Requests

  • #1493 - chore[bc]: embed addon interface refactor — remove BaseInference and WeightsProvider

QVAC Stable Diffusion Addon v0.3.0

19 Apr 08:59
df05614

Choose a tag to compare

This release migrates the diffusion addon off BaseInference inheritance and onto the composable createJobHandler + exclusiveRunQueue utilities from @qvac/infer-base@^0.4.0. The constructor signature is replaced with a single object whose files field carries absolute paths for every model component, mirroring the parallel embed and LLM addon refactors. This is a breaking change — every caller must update.

Breaking Changes

Constructor signature: single object with files instead of (args, config)

ImgStableDiffusion now takes a single { files, config, logger?, opts? } object. The old diskPath + modelName + per-component filename pattern is gone — callers pass absolute paths directly via files. Companion model fields are renamed (clipLModelclipL, clipGModelclipG, t5XxlModelt5Xxl, llmModelllm, vaeModelvae).

// BEFORE (≤ 0.2.x)
const model = new ImgStableDiffusion({
  diskPath: '/models',
  modelName: 'flux-2-klein-4b-Q8_0.gguf',
  llmModel: 'Qwen3-4B-Q4_K_M.gguf',
  vaeModel: 'flux2-vae.safetensors',
  logger: console
}, { threads: 8 })

// AFTER (0.3.0)
const model = new ImgStableDiffusion({
  files: {
    model: '/models/flux-2-klein-4b-Q8_0.gguf',
    llm:   '/models/Qwen3-4B-Q4_K_M.gguf',
    vae:   '/models/flux2-vae.safetensors'
  },
  config: { threads: 8 },
  logger: console,
  opts: { stats: true }
})

BaseInference inheritance removed

ImgStableDiffusion no longer extends BaseInference. The class composes createJobHandler and exclusiveRunQueue from @qvac/infer-base@^0.4.0 directly. The public lifecycle (load / run / cancel / unload / getState) is unchanged in shape; only construction differs. Internal helpers like _withExclusiveRun and _outputCallback are removed.

Caller owns absolute paths — addon no longer joins diskPath + filename

Callers that previously relied on the addon to resolve path.join(diskPath, filename) must now do that resolution themselves before constructing the model.

getState() returns a narrower shape

getState() previously returned { configLoaded, weightsLoaded, destroyed } (the three-field shape from BaseInference). It now returns { configLoaded } only. The weightsLoaded and destroyed fields are gone — weightsLoaded collapsed into configLoaded because the refactored load() does both in one step, and destroyed is no longer tracked since unload() resets configLoaded and nulls the addon handle instead. Callers reading state.weightsLoaded or state.destroyed must switch to state.configLoaded.

Public methods removed from ImgStableDiffusion

ImgStableDiffusion previously exposed these methods via BaseInference inheritance, all of which are now gone:

  • downloadWeights(onDownloadProgress, opts) — the diffusion addon never used the loader in practice, but the inherited method was still present on the public surface. It is removed along with the base class.
  • pause() / unpause() / stop() — BaseInference job-lifecycle helpers. The refactor uses createJobHandler directly; use cancel() to terminate an in-flight generation.
  • status() — replaced by getState() for the static readiness flag; per-job state is observed via the QvacResponse returned by run().
  • destroy() — folded into unload(), which now both releases native resources and nulls this.addon.
  • getApiDefinition() — no longer exposed; consumers should import types from index.d.ts.

cancel() no longer accepts a jobId

BaseInference.cancel(jobId) took an optional jobId argument. The refactor's cancel() is parameterless — there is always at most one active generation per instance, owned by createJobHandler. Any caller passing a jobId will have it ignored; update call sites to await model.cancel().

Features

Constructor input validation

The constructor now throws TypeError('files.model must be an absolute path string') when files.model is missing or not a string, or TypeError('files.model must be an absolute path (got: <value>)') when supplied as a relative path. This produces a clear error for callers porting old code instead of a confusing Cannot read properties of undefined. The same validation applies to optional companion fields (clipL, clipG, t5Xxl, llm, vae) when supplied.

run()-before-load() guard

Calling run() before load() now throws Error('Addon not initialized. Call load() first.') instead of crashing in native code. Covered by a new regression test in test/integration/api-behavior.test.js.

load() is now idempotent when already loaded

A second load() call on an already-loaded instance is now a silent no-op instead of unloading and reloading. This aligns with the ReadyResource pattern used elsewhere in QVAC and prevents accidental double-loads from triggering expensive work. Callers that intentionally want to swap weights must call unload() first (which clears configLoaded) and then load() again.

Broader split-layout detection

isSplitLayout now also triggers when only clipL or clipG is supplied. This closes a footgun where a FLUX.1 caller passing { model, clipL, clipG, vae } (without t5Xxl) would silently mis-route the diffusion model into the all-in-one path parameter and fail to load.

Bug Fixes

unload() clears the addon reference

unload() now sets this.addon = null after await this.addon.unload(), so post-unload cancel() / run() calls hit the explicit if (!this.addon) guard rather than dereferencing a disposed native handle.

Unknown addon events no longer pollute the output stream

_addonOutputCallback previously had a fallthrough that pushed any non-error / non-image / non-stats event into response.output (including null and undefined). It now logs unknown events at debug level and does not feed them into the active response.

Crash-safe activation

If addon.activate() throws during _load() (for example a native init failure or a missing model file discovered late), the partially-initialized addon is now best-effort-unloaded, the native logger is released, and this.addon is reset to null. A subsequent load() call starts cleanly instead of leaking a zombie native instance.

load() is serialized through the exclusive run queue

load() is now routed through the same exclusiveRunQueue used by run() and unload(). Previously two overlapping load() calls on the same instance could both pass the configLoaded guard before it flipped to true, both allocate a native addon, and clobber this.addon — leaking one native handle. Concurrent load() on a single instance is now safe.

Pull Requests

  • #1496 - chore[bc]: diffusion addon interface refactor — remove BaseInference

QVAC Stable Diffusion Addon v0.2.0

15 Apr 18:02
8e4b6c4

Choose a tag to compare

Added

  • FLUX.2 img2img support with in-context conditioning (ref_images) via init_image parameter
  • JS-side input validation for readImageDimensions() with buffer-length guards for truncated PNG/JPEG
  • Regression tests for FLUX img2img prediction guard and truncated image handling

Changed

  • FLUX img2img now requires explicit prediction: 'flux2_flow' in config to prevent silent fallback to SDEdit
  • Updated prediction docstring to clarify auto-detection is insufficient for FLUX img2img
  • Exported readImageDimensions() for testing and external use

Fixed

  • readImageDimensions() now safely handles truncated/corrupt PNG and JPEG buffers

QVAC Stable Diffusion Addon v0.1.3

15 Apr 11:08
0b06ef0

Choose a tag to compare

Changed

  • README, index.d.ts, and index.js JSDoc no longer claim FLUX.1 support for clipLModel and t5XxlModel. The addon exposes SDXL, SD3, and FLUX.2-klein only — FLUX.1 was never wired through the JS layer. The example model name in the constructor JSDoc is also corrected to flux-2-klein-4b-Q8_0.gguf.

QVAC OCR Addon v0.4.2

14 Apr 08:37
8cb2102

Choose a tag to compare

Fixed

  • Updated README to use current package name (@qvac/ocr-onnx) and monorepo paths
  • Removed redundant ensure-npm-public job from on-merge workflow

QVAC OCR Addon v0.4.1

14 Apr 05:52
d641edc

Choose a tag to compare

Fixed

  • SIGABRT crash on process exit in OCR addon
  • Use HTTPS instead of SSH for vcpkg registry URLs

Changed

  • Updated OCR integration tests for createJobHandler migration
  • Removed hyperdrive references and dependencies
  • Renamed dl-hyperdrive and dl-filesystem package references
  • Migrated qvac-devops to oss-action

QVAC LLM Addon v0.14.4

14 Apr 10:11
02a57be

Choose a tag to compare

Changed

  • Updated qvac-fabric dependency from 7248.2.1 to 7248.2.3, which fixes OpenCL kernel cache support on Android.

Added

  • openclCacheDir option in LlamaConfig (index.d.ts): writable directory for OpenCL kernel binary cache, required on Android for fast GPU startup.
  • cache-type-k and cache-type-v options in LlamaConfig (index.d.ts): configure KV cache quantization types.