react-native-sherpa-onnx

React Native SDK for sherpa-onnx – offline and streaming speech processing

⚠️ SDK 0.3.0 – Breaking changes from 0.2.0
Since the last release I have restructured and improved the SDK significantly: full iOS support, smoother behaviour, fewer failure points, and a much smaller footprint (~95% size reduction). As a result, logic and the public API have changed. If you are upgrading from 0.2.x, please follow the Breaking changes (upgrading to 0.3.0) section and the updated API documentation

A React Native TurboModule that provides offline and streaming speech processing capabilities using sherpa-onnx. The SDK aims to support all functionalities that sherpa-onnx offers, including offline and online (streaming) speech-to-text, text-to-speech (batch and streaming), speaker diarization, speech enhancement, source separation, and VAD (Voice Activity Detection).

Installation

npm install react-native-sherpa-onnx

If your project uses Yarn (v3+) or Plug'n'Play, configure Yarn to use the Node Modules linker to avoid postinstall issues:

# .yarnrc.yml
nodeLinker: node-modules

Alternatively, set the environment variable during install:

YARN_NODE_LINKER=node-modules yarn install

Android

No additional setup required. The library automatically handles native dependencies via Gradle. For execution provider support (CPU, NNAPI, XNNPACK, QNN) and optional QNN setup, see Execution provider support. For building Android native libs yourself, see sherpa-onnx-prebuilt.

iOS

The sherpa-onnx XCFramework is not shipped in the repo or npm (size ~80MB). It is downloaded automatically when you run pod install; no manual steps are required. The version used is pinned in third_party/sherpa-onnx-prebuilt/IOS_RELEASE_TAG (format: sherpa-onnx-ios-vX.Y.Z or sherpa-onnx-ios-vX.Y.Z-N with optional build number) and the archive is fetched from GitHub Releases.

Setup

cd your-app/ios
bundle install
bundle exec pod install

The podspec runs scripts/setup-ios-framework.sh, which downloads the XCFramework (and, if needed, libarchive sources) so the Pod builds correctly. Libarchive is compiled from source as part of the Pod; its version is pinned in third_party/libarchive_prebuilt/IOS_RELEASE_TAG.

Building the iOS framework

To build the sherpa-onnx iOS XCFramework yourself (e.g. custom version or patches), see third_party/sherpa-onnx-prebuilt/README.md and the Framework - Sherpa-Onnx (iOS) Release workflow.

Model download (optional)

If you use the download manager to fetch models at runtime, add the following to your AppDelegate so background downloads can finish when the app is in the background or after it was terminated. Without it, downloads only work reliably while the app is in the foreground.

Swift (RN 0.77+): In your bridging header add #import <RNBackgroundDownloader.h>. In AppDelegate.swift, implement:

func application(_ application: UIApplication, handleEventsForBackgroundURLSession identifier: String, completionHandler: @escaping () -> Void) {
  RNBackgroundDownloader.setCompletionHandlerWithIdentifier(identifier, completionHandler: completionHandler)
}

Objective-C: In AppDelegate.m add #import <RNBackgroundDownloader.h> and the application:handleEventsForBackgroundURLSession:completionHandler: implementation that calls [RNBackgroundDownloader setCompletionHandlerWithIdentifier:identifier completionHandler:completionHandler].

Full step-by-step: Download manager – Setup (iOS & Android). Expo users can use the library’s config plugin to apply this automatically.

Android: Foreground service permissions (Play Console), visible download notifications, and POST_NOTIFICATIONS (API 33+) are covered in Download manager – Android: foreground service & notifications.

Bundled sherpa-onnx version

Platform	Version
Android	1.12.35
iOS	1.12.35

Feature Support

Feature	Status	Docs	Notes
Offline Speech-to-Text	✅ Supported	STT	No internet required; multiple model types (Zipformer, Paraformer, Whisper, Qwen3 ASR, Cohere Transcribe, etc.). See Supported Model Types.
Online (streaming) Speech-to-Text	✅ Supported	Streaming STT	Real-time recognition from microphone or stream; partial results, endpoint detection. Use streaming-capable models (e.g. transducer, paraformer).
Live capture API	✅ Supported	PCM live stream	Native microphone capture with resampling for live transcription (use with streaming STT).
Text-to-Speech	✅ Supported	TTS	Multiple model types (VITS, Matcha, Kokoro, etc.). See Supported Model Types.
Streaming Text-to-Speech	✅ Supported	Streaming TTS	Incremental speech generation for low time-to-first-byte and playback while generating.
TTS Alignment / Timestamps	✅ Supported	TTS Alignment	Full implementation: `fast` (native chunk-based, estimated timing) and `accurate` (wav2vec2 CTC forced alignment, `timingMode: 'aligned'`). Optional alignment ONNX via `react-native-sherpa-onnx/alignment` (see TTS Alignment). Standalone API: `generateSubtitlesFromAudio()`.
Execution providers (CPU, NNAPI, XNNPACK, Core ML, QNN)	✅ Supported	Execution providers	CPU default; optional accelerators per platform.
Play Asset Delivery (PAD)	✅ Supported	Model setup	Android only. Archives: Extraction API.
Automatic Model type detection	✅ Supported	Model detection	`detectSttModel()` and `detectTtsModel()` for a path.
Model quantization	✅ Supported	Model setup	Automatic detection and preference for quantized (int8) models.
Flexible model loading	✅ Supported	Model setup	Asset models, file system models, or auto-detection.
TypeScript	✅ Supported	—	Full type definitions included.
Speech Enhancement	✅ Supported	Speech Enhancement	API and initialization covered in docs.
Speaker Diarization	❌ Not yet supported	Diarization	Scheduled for release 0.5.0
Source Separation	❌ Not yet supported	Separation	Scheduled for release 0.6.0
VAD (Voice Activity Detection)	❌ Not yet supported	VAD	Scheduled for release 0.7.0

Platform Support Status

Platform	Status	Notes
Android	✅ Production Ready	CI/CD automated, multiple models supported
iOS	✅ Production Ready	CI/CD automated, multiple models supported

Known issues

Pocket TTS (voice cloning) — voice cloning: Android supported; iOS experimental. Heuristic EOS and iOS vs Android drift (length/quality); not a React Native–only issue. Full notes: investigation doc.

Supported Model Types

Speech-to-Text (STT) Models

Model Type	`modelType` Value	Description	Download Links
Auto Detect	`'auto'`	Automatically detects model layout/type from files in the model folder and picks the best supported STT type.	n/a
Zipformer/Transducer	`'transducer'`	Encoder–decoder–joiner (e.g. icefall). Good balance of speed and accuracy. Folder name should contain zipformer or transducer for auto-detection.	Download
LSTM Transducer	`'transducer'`	Same layout as Zipformer (encoder–decoder–joiner). LSTM-based streaming ASR; detected as transducer. Folder name may contain lstm.	Download
Paraformer	`'paraformer'`	Single-model non-autoregressive ASR; fast and accurate. Detected by `model.onnx`; no folder token required.	Download
NeMo CTC	`'nemo_ctc'`	NeMo CTC; good for English and streaming. Folder name should contain nemo or parakeet.	Download
Whisper	`'whisper'`	Multilingual, encoder–decoder; strong zero-shot. Detected by encoder+decoder (no joiner); folder token optional.	Download
WeNet CTC	`'wenet_ctc'`	CTC from WeNet; compact. Folder name should contain wenet.	Download
SenseVoice	`'sense_voice'`	Multilingual with emotion/punctuation. Folder name should contain sense or sensevoice.	Download
FunASR Nano	`'funasr_nano'`	Lightweight LLM-based ASR. Folder name should contain funasr or funasr-nano.	Download
Qwen3 ASR	`'qwen3_asr'`	Encoder–decoder ASR (Qwen3-ASR ONNX: conv frontend, encoder, decoder, tokenizer). Folder name should contain qwen3. Optional `modelOptions.qwen3Asr` (e.g. comma-separated hotwords).	Download
Cohere Transcribe	`'cohere_transcribe'`	Cohere Transcribe ONNX (encoder, decoder, `tokens.txt`). Folder name should contain cohere. Optional `modelOptions.cohereTranscribe` (language, punctuation, ITN).	Download
Moonshine (v1)	`'moonshine'`	Four-part streaming-capable ASR (preprocess, encode, uncached/cached decode). Folder name should contain moonshine.	Download
Moonshine (v2)	`'moonshine_v2'`	Two-part Moonshine (encoder + merged decoder); `.onnx` or `.ort`. Folder name should contain moonshine (v2 preferred if both layouts present).	Download
Fire Red ASR	`'fire_red_asr'`	Fire Red encoder–decoder ASR. Folder name should contain fire_red or fire-red.	Download
Dolphin	`'dolphin'`	Single-model CTC. Folder name should contain dolphin.	Download
Canary	`'canary'`	NeMo Canary multilingual. Folder name should contain canary.	Download
Omnilingual	`'omnilingual'`	Omnilingual CTC. Folder name should contain omnilingual.	Download
MedASR	`'medasr'`	Medical ASR CTC. Folder name should contain medasr.	Download
Telespeech CTC	`'telespeech_ctc'`	Telespeech CTC. Folder name should contain telespeech.	Download
Tone CTC (t-one)	`'tone_ctc'`	Lightweight streaming CTC (e.g. t-one). Folder name should contain t-one, t_one, or tone (as word).	Download

For real-time (streaming) recognition from a microphone or audio stream, use streaming-capable model types: transducer, paraformer, zipformer2_ctc, nemo_ctc, or tone_ctc. See Streaming (Online) Speech-to-Text.

Text-to-Speech (TTS) Models

Model Type	`modelType` Value	Description	Download Links
Auto Detect	`'auto'`	Automatically detects the TTS model layout from files in the model folder and selects the matching supported type.	n/a
VITS	`'vits'`	Fast, high-quality TTS (Piper, Coqui, MeloTTS, MMS). Folder name should contain vits if used with other voice models.	Download
Matcha	`'matcha'`	High-quality acoustic model + vocoder. Detected by acoustic_model + vocoder; no folder token required.	Download
Kokoro	`'kokoro'`	Multi-speaker, multi-language. Folder name should contain kokoro (not kitten) for auto-detection.	Download
KittenTTS	`'kitten'`	Lightweight, multi-speaker. Folder name should contain kitten (not kokoro) for auto-detection.	Download
Zipvoice	`'zipvoice'`	Standard TTS with `sid`. Voice cloning (reference audio + `referenceText`): batch via `generateSpeech` only—streaming TTS does not support reference audio for Zipvoice. Default `numSteps` when omitted is 5 on Android and iOS (matches sherpa-onnx `GenerationConfig` / Kotlin helper). Cloning is supported on Android & iOS. Encoder + decoder + vocoder.	Download
Pocket	`'pocket'`	Flow-matching TTS. Voice cloning on Android: batch and streaming TTS. iOS: cloning is experimental. Detected by lm_flow, lm_main, text_conditioner, vocab/token_scores.	Download
Supertonic	`'supertonic'`	Lightning-fast, on-device text-to-speech system designed for extreme performance with minimal computational overhead.	Download

For streaming TTS (incremental generation, low latency), use createStreamingTTS() with supported model types. See Streaming Text-to-Speech.

Speech Enhancement Models

Speech enhancement improves noisy or degraded speech using ONNX models from the sherpa-onnx speech-enhancement-models release. Detection looks for .onnx filenames containing gtcrn or dpdfnet (case-insensitive). With 'auto', GTCRN is preferred when both are present in the same folder.

Model Type	`modelType` Value	Description	Download Links
Auto Detect	`'auto'`	Picks GTCRN if a matching `.onnx` exists, otherwise DPDFNet if found.	n/a
GTCRN	`'gtcrn'`	Lightweight speech enhancement (e.g. `gtcrn_simple.onnx`).	Download
DPDFNet	`'dpdfnet'`	Deep speech enhancement variants (e.g. `dpdfnet2.onnx`, `dpdfnet4.onnx`, `dpdfnet8.onnx`, `dpdfnet_baseline.onnx`, `dpdfnet2_48khz_hr.onnx`).	Download

APIs, batch vs online processing, and initialization are covered in Speech Enhancement.

Documentation

Known issues – SDK-facing notes (e.g. Pocket TTS cloning / cross-platform behavior)
Speech-to-Text (STT) – Offline transcription (file or samples)
Streaming (Online) Speech-to-Text – Real-time recognition, partial results, endpoint detection
PCM Live Stream – Native microphone capture with resampling for live transcription (use with streaming STT)
Text-to-Speech (TTS) – Offline and streaming generation
TTS Alignment / Timestamps – fast and accurate modes, sentence/word/character granularity, alignment model download, generateSpeechWithTimestamps() and generateSubtitlesFromAudio()
Streaming Text-to-Speech – Incremental TTS (createStreamingTTS)
Execution provider support (QNN, NNAPI, XNNPACK, Core ML) – Checking and using acceleration backends
Voice Activity Detection (VAD)
Speaker Diarization
Speech Enhancement
Source Separation
Model Setup – Bundled assets, Play Asset Delivery (PAD), model discovery APIs, and troubleshooting
Model Download Manager
Extraction API
Disable FFMPEG
Disable LIBARCHIVE

Note: For when to use listAssetModels() vs listModelsAtPath() and how to combine bundled and PAD/file-based models, see Model Setup.

Requirements

React Native >= 0.70
Android API 24+ (Android 7.0+)
iOS 13.0+

Example Apps

We provide example applications to help you get started with react-native-sherpa-onnx:

Example App (Audio to Text)

The example app included in this repository demonstrates audio-to-text transcription, text-to-speech, and streaming features. It includes:

Multiple model type support (Zipformer, Paraformer, NeMo CTC, Whisper, WeNet CTC, SenseVoice, FunASR Nano, Qwen3 ASR, Cohere Transcribe, Moonshine, and more)
Model selection and configuration
Offline audio file transcription
Online (streaming) STT – live transcription from the microphone with partial results
Streaming TTS – incremental speech generation and playback
Generate timestamp – subtitle/timestamp generation from audio (fast / accurate with optional alignment model download)
Test audio files for different languages

Getting started:

cd example
yarn install
yarn android  # or yarn ios

Video to Text Comparison App

A comprehensive comparison app that demonstrates video-to-text transcription using react-native-sherpa-onnx alongside other speech-to-text solutions:

Repository: mobile-videototext-comparison

Features:

Video to audio conversion (using native APIs)
Audio to text transcription
Video to text (video --> WAV --> text)
Comparison between different STT providers
Performance benchmarking

This app showcases how to integrate react-native-sherpa-onnx into a real-world application that processes video files and converts them to text.

Contributing

License

MIT

Third-Party Libraries

This SDK includes the following open source components:

Full license texts are available in the THIRD_PARTY_LICENSES directory.

LGPL Notice

This SDK includes LGPL-licensed components such as FFmpeg and Shine.
Applications using this SDK must ensure compliance with LGPL requirements when distributing binaries.

FFmpeg source code can be obtained at: https://ffmpeg.org

Qualcomm QNN Support

This SDK supports optional integration with Qualcomm AI Runtime (QNN).

QNN is proprietary software provided by Qualcomm and is not included in this SDK.
To use QNN acceleration, users must obtain and include the required QNN libraries separately and comply with Qualcomm's license terms:

https://softwarecenter.qualcomm.com/

Responsibility

By using this SDK, you are responsible for complying with all third-party licenses included in this project.

Made with create-react-native-library

Name		Name	Last commit message	Last commit date
Latest commit History 1,122 Commits
.github		.github
.yarn/releases		.yarn/releases
THIRD_PARTY_LICENSES		THIRD_PARTY_LICENSES
android		android
docs		docs
example		example
gradle		gradle
ios		ios
scripts		scripts
src		src
test		test
third_party		third_party
.editorconfig		.editorconfig
.gitattributes		.gitattributes
.gitignore		.gitignore
.gitmodules		.gitmodules
.nvmrc		.nvmrc
.prettierignore		.prettierignore
.watchmanconfig		.watchmanconfig
.yarnrc.yml		.yarnrc.yml
CHANGELOG.md		CHANGELOG.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
SherpaOnnx.podspec		SherpaOnnx.podspec
_codeql_detected_source_root		_codeql_detected_source_root
babel.config.js		babel.config.js
build.gradle		build.gradle
eslint.config.mjs		eslint.config.mjs
gradle.properties		gradle.properties
gradlew		gradlew
gradlew.bat		gradlew.bat
lefthook.yml		lefthook.yml
package.json		package.json
settings.gradle		settings.gradle
tsconfig.build.json		tsconfig.build.json
tsconfig.json		tsconfig.json
turbo.json		turbo.json
yarn.lock		yarn.lock

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

react-native-sherpa-onnx

Installation

Android

iOS

Setup

Building the iOS framework

Model download (optional)

Table of contents

Bundled sherpa-onnx version

Feature Support

Platform Support Status

Known issues

Supported Model Types

Speech-to-Text (STT) Models

Text-to-Speech (TTS) Models

Speech Enhancement Models

Documentation

Requirements

Example Apps

Example App (Audio to Text)

Video to Text Comparison App

Contributing

License

Third-Party Libraries

LGPL Notice

Qualcomm QNN Support

Responsibility

About

Topics

Resources

License

Code of conduct

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 42

Sponsor this project

Uh oh!

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages