Skip to content

Live transcription - stream text output while recording #296

@slonka

Description

@slonka

Description

Support live/streaming transcription that outputs text in real-time as the user speaks, rather than waiting until recording stops to process the entire audio at once.

I'm happy to take a stab at this if this is something that you'd want to see in the project.

Current behavior

  1. User starts recording
  2. User speaks
  3. User stops recording
  4. Audio is sent to whisper.cpp / OpenAI API for processing
  5. Transcribed text is pasted

Proposed behavior

  1. User starts recording
  2. As the user speaks, transcribed text appears in real-time
  3. User stops recording
  4. Final text is pasted (or incrementally inserted as it's produced)

Notes

  • whisper.cpp supports streaming mode via --stream flag
  • OpenAI Whisper API does not currently support streaming
  • sherpa-onnx supports streaming ASR models natively
  • This would pair well with VAD (Voice Activity Detection (VAD) #185) for detecting speech segments

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions