-
-
Notifications
You must be signed in to change notification settings - Fork 209
Open
Labels
enhancementNew feature or requestNew feature or request
Description
Description
Support live/streaming transcription that outputs text in real-time as the user speaks, rather than waiting until recording stops to process the entire audio at once.
I'm happy to take a stab at this if this is something that you'd want to see in the project.
Current behavior
- User starts recording
- User speaks
- User stops recording
- Audio is sent to whisper.cpp / OpenAI API for processing
- Transcribed text is pasted
Proposed behavior
- User starts recording
- As the user speaks, transcribed text appears in real-time
- User stops recording
- Final text is pasted (or incrementally inserted as it's produced)
Notes
- whisper.cpp supports streaming mode via
--streamflag - OpenAI Whisper API does not currently support streaming
- sherpa-onnx supports streaming ASR models natively
- This would pair well with VAD (Voice Activity Detection (VAD) #185) for detecting speech segments
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request