You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Recently, I have started developing new ASR AI model tools to replace Whisper. I am curious about the implementation of the --sentence flag in whisper-standalone-win: does it rely on NLP-based sentence segmentation?
Currently, I am working on a transcription tool called VibeVoice-ASR to improve transcription accuracy. I have attempted to replicate the --sentence functionality of whisper-standalone-win using NLP-based sentence splitting. However, the results are not quite matching; in segments with high speech density, the timestamps are often inaccurate, typically off by one to two seconds.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
-
Recently, I have started developing new ASR AI model tools to replace Whisper. I am curious about the implementation of the --sentence flag in whisper-standalone-win: does it rely on NLP-based sentence segmentation?
Currently, I am working on a transcription tool called VibeVoice-ASR to improve transcription accuracy. I have attempted to replicate the --sentence functionality of whisper-standalone-win using NLP-based sentence splitting. However, the results are not quite matching; in segments with high speech density, the timestamps are often inaccurate, typically off by one to two seconds.
Beta Was this translation helpful? Give feedback.
All reactions