Skip to content

XDcobra/react-native-sherpa-onnx

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1,122 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

react-native-sherpa-onnx

React Native SDK for sherpa-onnx – offline and streaming speech processing

Banner

npm version npm downloads npm license Android iOS

Buy Me A Coffee

⚠️ SDK 0.3.0 – Breaking changes from 0.2.0
Since the last release I have restructured and improved the SDK significantly: full iOS support, smoother behaviour, fewer failure points, and a much smaller footprint (~95% size reduction). As a result, logic and the public API have changed. If you are upgrading from 0.2.x, please follow the Breaking changes (upgrading to 0.3.0) section and the updated API documentation

A React Native TurboModule that provides offline and streaming speech processing capabilities using sherpa-onnx. The SDK aims to support all functionalities that sherpa-onnx offers, including offline and online (streaming) speech-to-text, text-to-speech (batch and streaming), speaker diarization, speech enhancement, source separation, and VAD (Voice Activity Detection).

Installation

npm install react-native-sherpa-onnx

If your project uses Yarn (v3+) or Plug'n'Play, configure Yarn to use the Node Modules linker to avoid postinstall issues:

# .yarnrc.yml
nodeLinker: node-modules

Alternatively, set the environment variable during install:

YARN_NODE_LINKER=node-modules yarn install

Android

No additional setup required. The library automatically handles native dependencies via Gradle. For execution provider support (CPU, NNAPI, XNNPACK, QNN) and optional QNN setup, see Execution provider support. For building Android native libs yourself, see sherpa-onnx-prebuilt.

iOS

The sherpa-onnx XCFramework is not shipped in the repo or npm (size ~80MB). It is downloaded automatically when you run pod install; no manual steps are required. The version used is pinned in third_party/sherpa-onnx-prebuilt/IOS_RELEASE_TAG (format: sherpa-onnx-ios-vX.Y.Z or sherpa-onnx-ios-vX.Y.Z-N with optional build number) and the archive is fetched from GitHub Releases.

Setup

cd your-app/ios
bundle install
bundle exec pod install

The podspec runs scripts/setup-ios-framework.sh, which downloads the XCFramework (and, if needed, libarchive sources) so the Pod builds correctly. Libarchive is compiled from source as part of the Pod; its version is pinned in third_party/libarchive_prebuilt/IOS_RELEASE_TAG.

Building the iOS framework

To build the sherpa-onnx iOS XCFramework yourself (e.g. custom version or patches), see third_party/sherpa-onnx-prebuilt/README.md and the Framework - Sherpa-Onnx (iOS) Release workflow.

Model download (optional)

If you use the download manager to fetch models at runtime, add the following to your AppDelegate so background downloads can finish when the app is in the background or after it was terminated. Without it, downloads only work reliably while the app is in the foreground.

  • Swift (RN 0.77+): In your bridging header add #import <RNBackgroundDownloader.h>. In AppDelegate.swift, implement:
    func application(_ application: UIApplication, handleEventsForBackgroundURLSession identifier: String, completionHandler: @escaping () -> Void) {
      RNBackgroundDownloader.setCompletionHandlerWithIdentifier(identifier, completionHandler: completionHandler)
    }
  • Objective-C: In AppDelegate.m add #import <RNBackgroundDownloader.h> and the application:handleEventsForBackgroundURLSession:completionHandler: implementation that calls [RNBackgroundDownloader setCompletionHandlerWithIdentifier:identifier completionHandler:completionHandler].

Full step-by-step: Download manager – Setup (iOS & Android). Expo users can use the library’s config plugin to apply this automatically.

Android: Foreground service permissions (Play Console), visible download notifications, and POST_NOTIFICATIONS (API 33+) are covered in Download manager – Android: foreground service & notifications.

Table of contents

Bundled sherpa-onnx version

Platform Version
Android 1.12.35
iOS 1.12.35

Feature Support

Feature Status Docs Notes
Offline Speech-to-Text Supported STT No internet required; multiple model types (Zipformer, Paraformer, Whisper, Qwen3 ASR, Cohere Transcribe, etc.). See Supported Model Types.
Online (streaming) Speech-to-Text Supported Streaming STT Real-time recognition from microphone or stream; partial results, endpoint detection. Use streaming-capable models (e.g. transducer, paraformer).
Live capture API Supported PCM live stream Native microphone capture with resampling for live transcription (use with streaming STT).
Text-to-Speech Supported TTS Multiple model types (VITS, Matcha, Kokoro, etc.). See Supported Model Types.
Streaming Text-to-Speech Supported Streaming TTS Incremental speech generation for low time-to-first-byte and playback while generating.
TTS Alignment / Timestamps Supported TTS Alignment Full implementation: fast (native chunk-based, estimated timing) and accurate (wav2vec2 CTC forced alignment, timingMode: 'aligned'). Optional alignment ONNX via react-native-sherpa-onnx/alignment (see TTS Alignment). Standalone API: generateSubtitlesFromAudio().
Execution providers (CPU, NNAPI, XNNPACK, Core ML, QNN) Supported Execution providers CPU default; optional accelerators per platform.
Play Asset Delivery (PAD) Supported Model setup Android only. Archives: Extraction API.
Automatic Model type detection Supported Model detection detectSttModel() and detectTtsModel() for a path.
Model quantization Supported Model setup Automatic detection and preference for quantized (int8) models.
Flexible model loading Supported Model setup Asset models, file system models, or auto-detection.
TypeScript Supported Full type definitions included.
Speech Enhancement Supported Speech Enhancement API and initialization covered in docs.
Speaker Diarization ❌ Not yet supported Diarization Scheduled for release 0.5.0
Source Separation ❌ Not yet supported Separation Scheduled for release 0.6.0
VAD (Voice Activity Detection) ❌ Not yet supported VAD Scheduled for release 0.7.0

Platform Support Status

Platform Status Notes
Android Production Ready CI/CD automated, multiple models supported
iOS Production Ready CI/CD automated, multiple models supported

Known issues

Supported Model Types

Speech-to-Text (STT) Models

Model Type modelType Value Description Download Links
Auto Detect 'auto' Automatically detects model layout/type from files in the model folder and picks the best supported STT type. n/a
Zipformer/Transducer 'transducer' Encoder–decoder–joiner (e.g. icefall). Good balance of speed and accuracy. Folder name should contain zipformer or transducer for auto-detection. Download
LSTM Transducer 'transducer' Same layout as Zipformer (encoder–decoder–joiner). LSTM-based streaming ASR; detected as transducer. Folder name may contain lstm. Download
Paraformer 'paraformer' Single-model non-autoregressive ASR; fast and accurate. Detected by model.onnx; no folder token required. Download
NeMo CTC 'nemo_ctc' NeMo CTC; good for English and streaming. Folder name should contain nemo or parakeet. Download
Whisper 'whisper' Multilingual, encoder–decoder; strong zero-shot. Detected by encoder+decoder (no joiner); folder token optional. Download
WeNet CTC 'wenet_ctc' CTC from WeNet; compact. Folder name should contain wenet. Download
SenseVoice 'sense_voice' Multilingual with emotion/punctuation. Folder name should contain sense or sensevoice. Download
FunASR Nano 'funasr_nano' Lightweight LLM-based ASR. Folder name should contain funasr or funasr-nano. Download
Qwen3 ASR 'qwen3_asr' Encoder–decoder ASR (Qwen3-ASR ONNX: conv frontend, encoder, decoder, tokenizer). Folder name should contain qwen3. Optional modelOptions.qwen3Asr (e.g. comma-separated hotwords). Download
Cohere Transcribe 'cohere_transcribe' Cohere Transcribe ONNX (encoder, decoder, tokens.txt). Folder name should contain cohere. Optional modelOptions.cohereTranscribe (language, punctuation, ITN). Download
Moonshine (v1) 'moonshine' Four-part streaming-capable ASR (preprocess, encode, uncached/cached decode). Folder name should contain moonshine. Download
Moonshine (v2) 'moonshine_v2' Two-part Moonshine (encoder + merged decoder); .onnx or .ort. Folder name should contain moonshine (v2 preferred if both layouts present). Download
Fire Red ASR 'fire_red_asr' Fire Red encoder–decoder ASR. Folder name should contain fire_red or fire-red. Download
Dolphin 'dolphin' Single-model CTC. Folder name should contain dolphin. Download
Canary 'canary' NeMo Canary multilingual. Folder name should contain canary. Download
Omnilingual 'omnilingual' Omnilingual CTC. Folder name should contain omnilingual. Download
MedASR 'medasr' Medical ASR CTC. Folder name should contain medasr. Download
Telespeech CTC 'telespeech_ctc' Telespeech CTC. Folder name should contain telespeech. Download
Tone CTC (t-one) 'tone_ctc' Lightweight streaming CTC (e.g. t-one). Folder name should contain t-one, t_one, or tone (as word). Download

For real-time (streaming) recognition from a microphone or audio stream, use streaming-capable model types: transducer, paraformer, zipformer2_ctc, nemo_ctc, or tone_ctc. See Streaming (Online) Speech-to-Text.

Text-to-Speech (TTS) Models

Model Type modelType Value Description Download Links
Auto Detect 'auto' Automatically detects the TTS model layout from files in the model folder and selects the matching supported type. n/a
VITS 'vits' Fast, high-quality TTS (Piper, Coqui, MeloTTS, MMS). Folder name should contain vits if used with other voice models. Download
Matcha 'matcha' High-quality acoustic model + vocoder. Detected by acoustic_model + vocoder; no folder token required. Download
Kokoro 'kokoro' Multi-speaker, multi-language. Folder name should contain kokoro (not kitten) for auto-detection. Download
KittenTTS 'kitten' Lightweight, multi-speaker. Folder name should contain kitten (not kokoro) for auto-detection. Download
Zipvoice 'zipvoice' Standard TTS with sid. Voice cloning (reference audio + referenceText): batch via generateSpeech only—streaming TTS does not support reference audio for Zipvoice. Default numSteps when omitted is 5 on Android and iOS (matches sherpa-onnx GenerationConfig / Kotlin helper). Cloning is supported on Android & iOS. Encoder + decoder + vocoder. Download
Pocket 'pocket' Flow-matching TTS. Voice cloning on Android: batch and streaming TTS. iOS: cloning is experimental. Detected by lm_flow, lm_main, text_conditioner, vocab/token_scores. Download
Supertonic 'supertonic' Lightning-fast, on-device text-to-speech system designed for extreme performance with minimal computational overhead. Download

For streaming TTS (incremental generation, low latency), use createStreamingTTS() with supported model types. See Streaming Text-to-Speech.

Speech Enhancement Models

Speech enhancement improves noisy or degraded speech using ONNX models from the sherpa-onnx speech-enhancement-models release. Detection looks for .onnx filenames containing gtcrn or dpdfnet (case-insensitive). With 'auto', GTCRN is preferred when both are present in the same folder.

Model Type modelType Value Description Download Links
Auto Detect 'auto' Picks GTCRN if a matching .onnx exists, otherwise DPDFNet if found. n/a
GTCRN 'gtcrn' Lightweight speech enhancement (e.g. gtcrn_simple.onnx). Download
DPDFNet 'dpdfnet' Deep speech enhancement variants (e.g. dpdfnet2.onnx, dpdfnet4.onnx, dpdfnet8.onnx, dpdfnet_baseline.onnx, dpdfnet2_48khz_hr.onnx). Download

APIs, batch vs online processing, and initialization are covered in Speech Enhancement.

Documentation

Note: For when to use listAssetModels() vs listModelsAtPath() and how to combine bundled and PAD/file-based models, see Model Setup.

Requirements

  • React Native >= 0.70
  • Android API 24+ (Android 7.0+)
  • iOS 13.0+

Example Apps

We provide example applications to help you get started with react-native-sherpa-onnx:

Example App (Audio to Text)

The example app included in this repository demonstrates audio-to-text transcription, text-to-speech, and streaming features. It includes:

  • Multiple model type support (Zipformer, Paraformer, NeMo CTC, Whisper, WeNet CTC, SenseVoice, FunASR Nano, Qwen3 ASR, Cohere Transcribe, Moonshine, and more)
  • Model selection and configuration
  • Offline audio file transcription
  • Online (streaming) STT – live transcription from the microphone with partial results
  • Streaming TTS – incremental speech generation and playback
  • Generate timestamp – subtitle/timestamp generation from audio (fast / accurate with optional alignment model download)
  • Test audio files for different languages

Getting started:

cd example
yarn install
yarn android  # or yarn ios
Model selection home screen Transcribe english audio Transcribe cantonese audio
Text to speech generation Text to speech generation Text to speech generation

Video to Text Comparison App

A comprehensive comparison app that demonstrates video-to-text transcription using react-native-sherpa-onnx alongside other speech-to-text solutions:

Repository: mobile-videototext-comparison

Features:

  • Video to audio conversion (using native APIs)
  • Audio to text transcription
  • Video to text (video --> WAV --> text)
  • Comparison between different STT providers
  • Performance benchmarking

This app showcases how to integrate react-native-sherpa-onnx into a real-world application that processes video files and converts them to text.

Video-to-Text Model Overview Video-to-Text file picker Video-to-Text test audio

Contributing

License

MIT

Third-Party Libraries

This SDK includes the following open source components:

Full license texts are available in the THIRD_PARTY_LICENSES directory.

LGPL Notice

This SDK includes LGPL-licensed components such as FFmpeg and Shine.
Applications using this SDK must ensure compliance with LGPL requirements when distributing binaries.

FFmpeg source code can be obtained at: https://ffmpeg.org

Qualcomm QNN Support

This SDK supports optional integration with Qualcomm AI Runtime (QNN).

QNN is proprietary software provided by Qualcomm and is not included in this SDK.
To use QNN acceleration, users must obtain and include the required QNN libraries separately and comply with Qualcomm's license terms:

https://softwarecenter.qualcomm.com/

Responsibility

By using this SDK, you are responsible for complying with all third-party licenses included in this project.


Made with create-react-native-library

About

React Native TurboModule for Sherpa-ONNX offline on-device Speech Processing (STT/TTS/Diarization/VAD) completely offline on the device. Support for Android & iOS

Topics

Resources

License

Code of conduct

Contributing

Stars

Watchers

Forks

Sponsor this project

Packages

 
 
 

Contributors