Skip to content

spences10/audiomind

Repository files navigation

AudioMind

A podcast transcription and semantic search tool with an AI-powered chat interface. Transcribe audio files, generate embeddings, and ask questions about your podcast content using RAG (Retrieval Augmented Generation).

Features

  • Audio Transcription - Transcribe podcasts using Deepgram's Nova-3 model with smart formatting and paragraph detection
  • Semantic Search - Vector similarity search powered by Voyage AI embeddings and sqlite-vec
  • AI Chat Interface - Ask questions about your podcasts with context-aware responses using Claude
  • ID3 Tag Support - Automatically extract podcast and episode metadata from audio files
  • Local Storage - All data stored locally in SQLite

Requirements

Setup

  1. Clone the repository and install dependencies:
pnpm install
  1. Create a .env file with your API keys:
DEEPGRAM_API_KEY=your_key
VOYAGE_API_KEY=your_key
ANTHROPIC_API_KEY=your_key
  1. Initialize the database:
pnpm cli init

CLI Usage

The CLI provides tools for processing audio files and managing your podcast library.

Process an Audio File

Full pipeline - transcribe, embed, and ingest in one command:

pnpm cli process path/to/episode.mp3

Podcast name and episode title are auto-detected from ID3 tags. Override with flags:

pnpm cli process episode.mp3 --podcast "My Podcast" --episode "Episode 1"

Search Your Library

pnpm cli search "topic you're looking for"
pnpm cli search "machine learning" --limit 20
pnpm cli search "interviews" --podcast "Tech Talk"

Other Commands

pnpm cli list                    # List all podcasts and episodes
pnpm cli inspect audio.mp3       # View audio file metadata
pnpm cli transcribe audio.mp3    # Transcribe only (no embedding)
pnpm cli update --podcast 1 --name "New Name"  # Update metadata

Web Interface

Start the development server:

pnpm dev

The web interface provides a chat UI where you can ask questions about your ingested podcasts. Responses include source citations with timestamps.

Tech Stack

  • Frontend: SvelteKit, Tailwind CSS, shadcn-svelte
  • Backend: SvelteKit API routes, better-sqlite3, sqlite-vec
  • AI: Anthropic Claude (chat), Voyage AI (embeddings), Deepgram (transcription)
  • CLI: citty, music-metadata

Project Structure

src/
├── cli/           # CLI tool for processing audio
├── lib/
│   ├── components/  # Svelte components
│   └── server/      # Server-side utilities (database, AI clients)
└── routes/
    ├── api/         # API endpoints for chat
    └── chat/        # Chat interface pages

Development

pnpm dev          # Start dev server
pnpm check        # Type check
pnpm lint         # Lint code
pnpm test         # Run tests
pnpm build        # Production build

License

MIT

About

An MP3 to AI Chat Assistant - A configurable AI chat assistant that can be customized for your content and use case. Transform audio content into interactive, searchable conversations.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors