Transform your text into engaging podcast conversations with multiple realistic AI voices! PodcastAgent uses advanced text-to-speech technology to create natural-sounding multi-speaker podcasts from any written content.
- Multi-Speaker Conversations: Generate podcasts with 1-4 different AI voices
- Realistic Voices: Uses Microsoft Edge TTS for natural-sounding speech
- AI-Powered Script Generation: Leverage Google Gemini to create engaging conversations
- Multiple TTS Engines: Choose from Edge TTS, gTTS, or pyttsx3
- Easy Download: Download your generated podcasts as audio files
- Web Interface: User-friendly Gradio interface
- Real-time Processing: Watch your podcast being generated step by step
| Speakers | Configuration | Voices |
|---|---|---|
| 1 Speaker | Solo narration | Single voice |
| 2 Speakers | Host conversation | Alex (female) & Brian (male) |
| 3 Speakers | Panel discussion | Sarah (female), Mike (male) & Emma (female) |
| 4 Speakers | Full roundtable | Sarah, Mike, Emma & David (male) |
- Python 3.8+
- FFmpeg (for audio processing)
-
Clone the repository
git clone https://github.com/hari7261/PodcastAgent.git cd PodcastAgent -
Install dependencies
pip install -r requirements.txt
-
Run the application
python app.py
-
Open your browser Navigate to
http://localhost:7860
-
Get a Gemini API Key
- Visit Google AI Studio
- Create a free API key
- Paste it in the "Gemini API Key" field in the app
-
Benefits of API Key
- Natural conversation generation
- Intelligent speaker distribution
- Context-aware dialogue
- Enter your content - Paste any text (articles, blogs, stories)
- Configure speakers - Choose 1-4 speakers for your podcast
- Select voice engine - Edge TTS recommended for multi-speaker
- Generate - Click the button and wait for processing
- Download - Save your podcast as an audio file
Renewable energy is transforming our world. Solar panels are becoming more efficient,
wind farms are expanding offshore, and battery storage is solving intermittency issues.
The convergence of these technologies is creating unprecedented opportunities for clean energy.
Alex: The renewable energy boom is incredible, right? Solar's plummeting costs are game-changing.
Brian: Absolutely! And offshore wind farms are revolutionizing energy generation. Those massive turbines can power entire cities.
Alex: It's not just generation; storage is key. Lithium-ion batteries are getting cheaper and more efficient every year.
Brian: Exactly. Plus, smart grids are optimizing energy distribution in real-time. The whole ecosystem is evolving rapidly.
- Best Quality: Most natural and realistic voices
- Multi-Speaker: Different voices for each speaker
- Internet Required: Needs online connection
- Format: MP3/WAV output
- Good Quality: Clear and understandable
- Single Voice: One voice for entire podcast
- Internet Required: Needs online connection
- Format: MP3 output
- Offline: Works without internet
- Basic Quality: System-dependent quality
- Single Voice: One voice for entire podcast
- Format: WAV output
Input Text โ AI Script Generation โ Multi-Speaker Parsing โ TTS Generation โ Audio Combining โ Output
- Gradio: Web interface
- Google Generative AI: Script generation
- Edge-TTS: Advanced text-to-speech
- PyDub: Audio processing
- gTTS: Google text-to-speech
- pyttsx3: Offline TTS
Try PodcastAgent online at Hugging Face Spaces
PodcastAgent/
โโโ app.py # Main application
โโโ app_simple.py # Simplified version
โโโ requirements.txt # Dependencies
โโโ README.md # This file
โโโ LICENSE # MIT License
โโโ demo.png # Demo screenshot
โโโ .gitignore # Git ignore rules
We welcome contributions! Here's how you can help:
- Fork the repository
- Create a feature branch
git checkout -b feature/amazing-feature
- Commit your changes
git commit -m 'Add amazing feature' - Push to the branch
git push origin feature/amazing-feature
- Open a Pull Request
- ๐ต Additional voice options
- ๐ Multi-language support
- ๐จ UI/UX improvements
- ๐ง Performance optimizations
- ๐ Documentation enhancements
1. "Edge TTS not available"
pip install edge-tts2. "FFmpeg not found"
- Windows: Download from FFmpeg.org
- macOS:
brew install ffmpeg - Linux:
sudo apt install ffmpeg
3. "Import Error: numpy"
pip install "numpy<2"4. Audio combining fails
- Ensure FFmpeg is properly installed
- Check file permissions
- Try using a different TTS engine
- Use shorter text inputs for faster processing
- Edge TTS provides best quality but requires internet
- For offline use, choose pyttsx3 engine
- Enable AI script generation for better conversations
| Text Length | Speakers | Processing Time | Quality |
|---|---|---|---|
| 500 chars | 2 | ~15 seconds | โญโญโญโญโญ |
| 1000 chars | 3 | ~25 seconds | โญโญโญโญโญ |
| 2000 chars | 4 | ~45 seconds | โญโญโญโญโญ |
Note: Times may vary based on internet connection and system performance
- Real-time streaming - Live podcast generation
- Voice cloning - Custom voice integration
- Background music - Automatic music addition
- Emotion control - Emotional speech synthesis
- Multi-language - Support for multiple languages
- API endpoints - RESTful API for integration
- Batch processing - Process multiple texts at once
This project is licensed under the MIT License - see the LICENSE file for details.
Hariom Kumar
- Microsoft Edge TTS for amazing voice synthesis
- Google for Generative AI capabilities
- Gradio team for the excellent web framework
- Open source community for continuous support
If you find this project useful, please consider giving it a star! โญ
Made with โค๏ธ for the AI community
Transform your text into engaging podcasts with PodcastAgent!
