🎙️ PodcastAgent - Multi-Speaker Podcast Generator

Transform your text into engaging podcast conversations with multiple realistic AI voices! PodcastAgent uses advanced text-to-speech technology to create natural-sounding multi-speaker podcasts from any written content.

🌟 Features

Multi-Speaker Conversations: Generate podcasts with 1-4 different AI voices
Realistic Voices: Uses Microsoft Edge TTS for natural-sounding speech
AI-Powered Script Generation: Leverage Google Gemini to create engaging conversations
Multiple TTS Engines: Choose from Edge TTS, gTTS, or pyttsx3
Easy Download: Download your generated podcasts as audio files
Web Interface: User-friendly Gradio interface
Real-time Processing: Watch your podcast being generated step by step

🎭 Speaker Configurations

Speakers	Configuration	Voices
1 Speaker	Solo narration	Single voice
2 Speakers	Host conversation	Alex (female) & Brian (male)
3 Speakers	Panel discussion	Sarah (female), Mike (male) & Emma (female)
4 Speakers	Full roundtable	Sarah, Mike, Emma & David (male)

🚀 Quick Start

Prerequisites

Python 3.8+
FFmpeg (for audio processing)

Installation

Clone the repository

git clone https://github.com/hari7261/PodcastAgent.git
cd PodcastAgent

Install dependencies
```
pip install -r requirements.txt
```
Run the application
```
python app.py
```
Open your browser Navigate to http://localhost:7860

🔧 Configuration

API Setup (Optional but Recommended)

Get a Gemini API Key
- Visit Google AI Studio
- Create a free API key
- Paste it in the "Gemini API Key" field in the app
Benefits of API Key
- Natural conversation generation
- Intelligent speaker distribution
- Context-aware dialogue

📝 Usage

Enter your content - Paste any text (articles, blogs, stories)
Configure speakers - Choose 1-4 speakers for your podcast
Select voice engine - Edge TTS recommended for multi-speaker
Generate - Click the button and wait for processing
Download - Save your podcast as an audio file

Example Input

Renewable energy is transforming our world. Solar panels are becoming more efficient, 
wind farms are expanding offshore, and battery storage is solving intermittency issues. 
The convergence of these technologies is creating unprecedented opportunities for clean energy.

Example Output (2 Speakers)

Alex: The renewable energy boom is incredible, right? Solar's plummeting costs are game-changing.

Brian: Absolutely! And offshore wind farms are revolutionizing energy generation. Those massive turbines can power entire cities.

Alex: It's not just generation; storage is key. Lithium-ion batteries are getting cheaper and more efficient every year.

Brian: Exactly. Plus, smart grids are optimizing energy distribution in real-time. The whole ecosystem is evolving rapidly.

🎯 Voice Engines

Edge TTS (Recommended)

Best Quality: Most natural and realistic voices
Multi-Speaker: Different voices for each speaker
Internet Required: Needs online connection
Format: MP3/WAV output

gTTS (Google Text-to-Speech)

Good Quality: Clear and understandable
Single Voice: One voice for entire podcast
Internet Required: Needs online connection
Format: MP3 output

pyttsx3 (System TTS)

Offline: Works without internet
Basic Quality: System-dependent quality
Single Voice: One voice for entire podcast
Format: WAV output

🛠️ Technical Details

Architecture

Input Text → AI Script Generation → Multi-Speaker Parsing → TTS Generation → Audio Combining → Output

Dependencies

Gradio: Web interface
Google Generative AI: Script generation
Edge-TTS: Advanced text-to-speech
PyDub: Audio processing
gTTS: Google text-to-speech
pyttsx3: Offline TTS

🌐 Online Demo

Try PodcastAgent online at Hugging Face Spaces

📁 Project Structure

PodcastAgent/
├── app.py                 # Main application
├── app_simple.py         # Simplified version
├── requirements.txt      # Dependencies
├── README.md            # This file
├── LICENSE              # MIT License
├── demo.png             # Demo screenshot
└── .gitignore           # Git ignore rules

🤝 Contributing

We welcome contributions! Here's how you can help:

Fork the repository
Create a feature branch
```
git checkout -b feature/amazing-feature
```
Commit your changes
```
git commit -m 'Add amazing feature'
```
Push to the branch
```
git push origin feature/amazing-feature
```
Open a Pull Request

Areas for Contribution

🎵 Additional voice options
🌍 Multi-language support
🎨 UI/UX improvements
🔧 Performance optimizations
📚 Documentation enhancements

🐛 Troubleshooting

Common Issues

1. "Edge TTS not available"

pip install edge-tts

2. "FFmpeg not found"

Windows: Download from FFmpeg.org
macOS: brew install ffmpeg
Linux: sudo apt install ffmpeg

3. "Import Error: numpy"

pip install "numpy<2"

4. Audio combining fails

Ensure FFmpeg is properly installed
Check file permissions
Try using a different TTS engine

Performance Tips

Use shorter text inputs for faster processing
Edge TTS provides best quality but requires internet
For offline use, choose pyttsx3 engine
Enable AI script generation for better conversations

📊 Performance Benchmarks

Text Length	Speakers	Processing Time	Quality
500 chars	2	~15 seconds	⭐⭐⭐⭐⭐
1000 chars	3	~25 seconds	⭐⭐⭐⭐⭐
2000 chars	4	~45 seconds	⭐⭐⭐⭐⭐

Note: Times may vary based on internet connection and system performance

🔮 Roadmap

Real-time streaming - Live podcast generation
Voice cloning - Custom voice integration
Background music - Automatic music addition
Emotion control - Emotional speech synthesis
Multi-language - Support for multiple languages
API endpoints - RESTful API for integration
Batch processing - Process multiple texts at once

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

👨‍💻 Author

Hariom Kumar

GitHub: @hari7261
Hugging Face: hari7261

🙏 Acknowledgments

Microsoft Edge TTS for amazing voice synthesis
Google for Generative AI capabilities
Gradio team for the excellent web framework
Open source community for continuous support

⭐ Star History

If you find this project useful, please consider giving it a star! ⭐

Made with ❤️ for the AI community

Transform your text into engaging podcasts with PodcastAgent!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

🎙️ PodcastAgent - Multi-Speaker Podcast Generator

🌟 Features

🎭 Speaker Configurations

🚀 Quick Start

Prerequisites

Installation

🔧 Configuration

API Setup (Optional but Recommended)

📝 Usage

Example Input

Example Output (2 Speakers)

🎯 Voice Engines

Edge TTS (Recommended)

gTTS (Google Text-to-Speech)

pyttsx3 (System TTS)

🛠️ Technical Details

Architecture

Dependencies

🌐 Online Demo

📁 Project Structure

🤝 Contributing

Areas for Contribution

🐛 Troubleshooting

Common Issues

Performance Tips

📊 Performance Benchmarks

🔮 Roadmap

📄 License

👨‍💻 Author

🙏 Acknowledgments

⭐ Star History

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

🎙️ PodcastAgent - Multi-Speaker Podcast Generator

🌟 Features

🎭 Speaker Configurations

🚀 Quick Start

Prerequisites

Installation

🔧 Configuration

API Setup (Optional but Recommended)

📝 Usage

Example Input

Example Output (2 Speakers)

🎯 Voice Engines

Edge TTS (Recommended)

gTTS (Google Text-to-Speech)

pyttsx3 (System TTS)

🛠️ Technical Details

Architecture

Dependencies

🌐 Online Demo

📁 Project Structure

🤝 Contributing

Areas for Contribution

🐛 Troubleshooting

Common Issues

Performance Tips

📊 Performance Benchmarks

🔮 Roadmap

📄 License

👨‍💻 Author

🙏 Acknowledgments

⭐ Star History