Express + TypeScript backend for Speech Coach G2. Provides real speech-to-text via OpenAI Whisper or Deepgram, plus speech quality analysis (WPM, filler words, pauses, word count).
npm install
cp .env.example .env
# Edit .env and set OPENAI_API_KEY or DEEPGRAM_API_KEY
npm run devServer listens on PORT (default 8787).
Without an API key, transcription runs in mock mode and returns a placeholder string so the glasses client can still exercise the full flow.
| Method | Path | Purpose |
|---|---|---|
| GET | /health |
Liveness + provider info |
| POST | /transcribe |
One-shot PCM -> text + metrics |
| POST | /session |
Create session, returns { id } |
| POST | /session/:id/audio |
Append raw PCM to session |
| GET | /session/:id/stream |
SSE stream of live metrics |
| POST | /session/:id/finalize |
Finalize session, returns summary |
| GET | /session/:id |
Fetch current metrics |
All audio endpoints accept raw PCM (16 kHz, signed 16-bit LE, mono) with
Content-Type: application/octet-stream.
{
"type": "partial" | "final" | "metrics" | "error",
"transcript": "...",
"metrics": {
"wpm": 128,
"fillerWords": 5,
"pauseCount": 12,
"avgPauseMs": 320,
"wordCount": 212,
"fillerBreakdown": { "um": 3, "like": 2 }
},
"elapsedMs": 15230
}Priority order on startup:
OPENAI_API_KEY-> OpenAI Whisper (whisper-1)DEEPGRAM_API_KEY-> Deepgramnova-2with filler-word detection- neither -> mock mode (placeholder transcript)
npm run dev- nodemon + tsx hot reloadnpm start- run once via tsxnpm run build-tscemit todist/npm run typecheck- type-check only