Create a Python-based voice input system for Ubuntu that uses OpenAI's Whisper for offline speech-to-text transcription. The system should run continuously in the background and be controlled via keyboard shortcuts. The system is used on Ubuntu, which doesn't have a built-in voice input system.
- Toggle Recording: Use
Ctrl+Shift+Spaceas hotkey to start/stop audio recording - Speech Recognition: Use OpenAI Whisper for offline transcription (default model: "base")
- Keyboard Simulation: Type transcribed text at current cursor position using reliable library.
- Background Operation: Run as a continuous script (no GUI, terminate when script stops)
- Cross-Application: Work in any text input field across Ubuntu applications, by simulating keyboard events.
- Sample Rate: 16kHz
- Channels: Mono
- Format: 32-bit float
- Recording Method: Toggle-based (press once to start, again to stop)
- Max Duration: 30 seconds (auto-stop if hotkey not pressed)
- Library: Use
pyaudiofor audio capture.
- Engine: OpenAI Whisper
- Model: "base" (configurable)
- Processing: Offline, local transcription
- Library:
openai-whisper
- Library:
pynputfor cross-platform keyboard control - Method: Simulate typing at current cursor position
- Behavior: Type transcribed text followed by a space
- Combination:
Ctrl+Shift+Space - Detection: Global hotkey listener (works when app not in focus)
- Library:
pynputfor global hotkey detection
Use Python.
mic2key.py
├── VoiceInputSystem class
├── Audio recording management
├── Whisper transcription
├── Keyboard simulation
├── Global hotkey listener
└── Main execution loop
Structure the code so that each problem is solved in a different file. This will make it easier to test different solutions.
-
VoiceInputSystem Class : the main class that uses other classes
- Initialize Whisper model
- Set up audio recording parameters
- Create keyboard controller instance
- Manage recording state (idle/recording/processing)
- On startup, delete previous recording to preserve user's privacy.
- On exit, gracefully delete recordings to preserve user's privacy.
-
Audio Recording Manager
- Toggle recording on hotkey press
- Capture audio data in memory
- Handle recording timeout (30 seconds max)
- Convert audio format for Whisper compatibility
-
Transcription Handler
- Process recorded audio with Whisper
- Handle transcription errors gracefully
- Return cleaned text (strip whitespace, handle empty results)
- Delete audio to preserver user's privacy.
-
Keyboard Controller
- Type transcribed text at cursor position
- Add space after each transcription
- Handle special characters and unicode properly
-
Global Hotkey Listener
- Listen for
Ctrl+Shift+Spaceglobally - Toggle recording state
- Provide visual/audio feedback (optional beep or console message)
- Listen for
- Graceful handling of audio device errors
- Whisper model loading failures
- Empty or failed transcriptions
- Keyboard permission issues
- Microphone access problems
- Whisper model size selection
- Max recording duration limit (to avoid forgetting to stop recording)
- Audio device selection
- Hotkey customization
- Debug mode toggle
All paramters are done through CLI parameters, with sensible defaults.
import whisper
import pyaudio
import numpy as np
import pynput.keyboard as keyboard
import threading
import time
import sys
import logging- Script starts and loads Whisper model
- Prints "Voice input system ready. Press Ctrl+Shift+Space to start/stop recording."
- Waits for hotkey press
- On first hotkey: Start recording, print "Recording started..."
- On second hotkey: Stop recording, print "Processing...", transcribe, type result
- Repeat cycle until script termination
- Load Whisper model once at startup
- Use threading for non-blocking audio recording
- Minimal memory usage for audio buffering
- Fast transcription processing (< 3 seconds for short clips)
- Target: Ubuntu 20.04+ (primary)
- Python: 3.8+
- Audio: PulseAudio/ALSA compatible
- Permissions: Standard user permissions (no root required)
mic2key.py # Main script
requirements.txt # Python dependencies
config.py # Configuration options (optional)
# Other code file to get a proper system
- Include comprehensive error handling and logging
- Add command-line arguments for debug mode and model selection
- Ensure clean shutdown on Ctrl+C
- Include helpful console output for user feedback
- Test with various microphone qualities and background noise levels
- Handle edge cases like very short recordings or silence
The generated code should:
- Run without errors on a fresh Ubuntu installation
- Respond to the specified hotkey combination
- Record audio when hotkey is pressed
- Stop recording and transcribe when hotkey is pressed again
- Type the transcribed text in the currently focused application
- Handle errors gracefully without crashing
- Provide clear console feedback about system state
Generate clean, well-documented Python code that implements this voice input system according to these specifications.