|
1 | | -# ASR stream goes stale after mute/unmute or long silence |
| 1 | +# ASR stream goes stale / dies prematurely |
2 | 2 |
|
3 | | -After muting then unmuting the mic (or after a prolonged period where no speech reaches Riva), the ASR stream silently stops producing results even though PCM audio is still flowing. |
| 3 | +The Riva ASR gRPC stream can die mid-session — silently stopping result production or terminating entirely. This happens in multiple scenarios: after mute/unmute, after long idle periods, or even during normal operation with 0 results. |
4 | 4 |
|
5 | 5 | ## Observed behavior |
6 | 6 |
|
| 7 | +### Scenario 1: Stale after long LLM block + mute/unmute |
7 | 8 | - Session `c87be1b2` (2026-03-14): 9 turns completed successfully. |
8 | 9 | - Turn 8 triggered a degenerate LLM reasoning loop (10,101 chars, **91.89 s** wall-clock). |
9 | 10 | - During that wait, the user muted and later unmuted the mic. |
10 | | -- After turn 9 completed, no further `asr_final` events appeared for ~1.5 min despite the green **user_amplitude** waveform being visible on the timeline (PCM capture was healthy). |
11 | | -- Terminal showed no ASR errors; the stream ended normally at session close with `Stream task timeout, cancelling`. |
| 11 | +- After turn 9 completed, no further `asr_final` events appeared for ~1.5 min despite the green **user_amplitude** waveform being visible on the timeline. |
| 12 | +- Terminal showed no ASR errors; the stream ended normally at session close. |
| 13 | + |
| 14 | +### Scenario 2: Stream dies after ~2 min (normal operation, no mute) |
| 15 | +- Session on jat-4cbb47141bb7 (2026-03-14): 3 turns completed in ~2 min. |
| 16 | +- After turn 3, Riva ASR stream ended with 16 results total. |
| 17 | +- `_feed_pcm_to_pipeline` continued sending PCM but `send_audio()` raised `RuntimeError: Stream not started` — **crashing the entire pipeline**. |
| 18 | + |
| 19 | +### Scenario 3: Stream dies with 0 results (~23s) |
| 20 | +- Session `f9748641` on same device: ASR stream started, 0 results received, stream ended after ~23s. |
| 21 | +- Same `RuntimeError` crash. |
| 22 | + |
| 23 | +### Scenario 4: USB contention with Brio 4K camera |
| 24 | +- On Jetsons with Brio 4K (USB 3.0) + USB audio, severe bus contention causes: |
| 25 | + - Camera `VIDIOC_REQBUFS: errno=19 (No such device)` — camera disappears from bus. |
| 26 | + - `arecord: audio open error: Device or resource busy` — audio device locked by previous pipeline. |
| 27 | + - ASR stream dies with 0 results; pipeline crashes before user even speaks. |
12 | 28 |
|
13 | 29 | ## Why amplitude shows but ASR does not |
14 | 30 |
|
15 | | -In `_feed_pcm_to_pipeline`, amplitude is always computed and sent to the client (lines 1005-1024) regardless of `mic_muted`. The ASR send is gated: |
| 31 | +In `_feed_pcm_to_pipeline`, amplitude is always computed and sent to the client regardless of `mic_muted`. The ASR send is gated: |
16 | 32 |
|
17 | 33 | ```python |
18 | 34 | if not mic_muted: |
19 | | - await asr.send_audio(pcm_bytes) |
| 35 | + accepted = await asr.send_audio(pcm_bytes) |
20 | 36 | ``` |
21 | 37 |
|
22 | | -So the timeline waveform looks alive, but if the Riva gRPC stream has internally timed out (or VAD state has gone stale after 90+ seconds of silence/mute), newly sent audio produces no results. |
| 38 | +So the timeline waveform looks alive, but if the Riva gRPC stream has internally timed out or died, newly sent audio is silently dropped (or previously, would crash). |
23 | 39 |
|
24 | | -## Probable root cause (needs confirmation) |
| 40 | +## Root causes |
25 | 41 |
|
26 | 42 | Riva Streaming ASR has internal session limits: |
27 | 43 | - **gRPC keepalive / idle timeout**: if no audio is sent for an extended period the server may silently close the stream. |
28 | | -- **VAD state**: after a long silence gap, the VAD model may reset or require a fresh trigger to start detecting speech again. |
29 | | -- **Maximum session duration**: Riva may cap single-stream duration; after that, the stream yields no more results even though it stays open. |
30 | | - |
31 | | -The exact Riva behavior here is unconfirmed — the stream appeared open (no error logged) but stopped producing finals. |
| 44 | +- **VAD state**: after a long silence gap, the VAD model may reset or require a fresh trigger. |
| 45 | +- **Maximum session duration**: Riva may cap single-stream duration (~2 min observed); after that, the stream yields no more results. |
| 46 | +- **USB bus contention**: on Jetson devices with multiple USB peripherals (especially high-bandwidth cameras like Brio 4K), the audio device can become temporarily unavailable, preventing `arecord` from opening. |
32 | 47 |
|
33 | | -## What is already in place |
| 48 | +## Implemented fixes (2026-03-14) |
34 | 49 |
|
35 | | -- `mic_muted` gates `asr.send_audio()` in the classic pipeline (line 1003). |
36 | | -- On mute, 0.5 s of silence is injected (`b"\x00" * int(16000 * 2 * 0.5)`) to flush any pending VAD partial (line 1041-1044). |
37 | | -- On unmute, `mic_muted = False` resumes sending PCM to ASR. |
38 | | -- No stream-health monitoring or automatic restart exists today. |
39 | | - |
40 | | -## Proposed solutions (pick one or combine) |
41 | | - |
42 | | -### Option A: Keep-alive noise during mute |
| 50 | +### Fix 1: Graceful `send_audio` (riva.py) |
| 51 | +`send_audio()` now returns `bool` instead of raising `RuntimeError` when the stream is dead. Returns `False` if `_sync_audio_queue` is `None`, allowing the PCM feeder to continue without crashing. |
43 | 52 |
|
44 | | -While `mic_muted` is True, instead of sending nothing, send **very low amplitude white noise** (e.g., ±10 out of ±32768) at normal cadence. This keeps the gRPC stream active and the VAD model warm without triggering false speech detection. |
| 53 | +### Fix 2: Log-once warning (_feed_pcm_to_pipeline) |
| 54 | +When `send_audio` returns `False`, a warning is logged once per dead-stream episode: `[asr] send_audio dropped — ASR stream not active (waiting for auto-restart)`. The flag resets on stream restart. |
45 | 55 |
|
46 | | -Pros: Simplest change; no stream lifecycle management. \ |
47 | | -Cons: Assumes the Riva stream itself is still healthy; does not help if the stream has a hard session-duration cap. |
| 56 | +### Fix 3: Auto-restart in asr_consumer (voice_pipeline.py) |
| 57 | +`asr_consumer` now wraps the `async for result in asr.receive_results()` loop in a `while not stopped.is_set()` loop. When the inner iterator ends (stream died) and the pipeline is still running: |
| 58 | +1. Increments restart counter (max 10). |
| 59 | +2. Logs a WARNING with result count and restart number. |
| 60 | +3. Emits `asr_stream_restart` timeline event. |
| 61 | +4. Calls `asr.stop_stream()` → sleep with exponential backoff (2s, 4s, ..., max 10s) → `asr.start_stream()`. |
| 62 | +5. Resets the `send_audio` log-once flag and result counter. |
| 63 | +6. Re-enters the `async for` loop on the fresh stream. |
48 | 64 |
|
49 | | -### Option B: Restart ASR stream after stale timeout |
| 65 | +## Remaining work |
50 | 66 |
|
51 | | -Monitor elapsed time since the last `asr_final`. If no final arrives within a configurable window (e.g., 60 s while unmuted), tear down the current `RivaASRBackend` stream and create a fresh one. |
52 | | - |
53 | | -1. Track `_last_asr_final_time` in the turn executor; update it on every `asr_final`. |
54 | | -2. In `server_capture_consumer` (or a watchdog task), check `time.time() - _last_asr_final_time > ASR_STALE_TIMEOUT`. |
55 | | -3. If stale and `not mic_muted`: call `asr.stop()`, then `asr.start()` to open a fresh streaming session. |
56 | | -4. Log `[asr] Stream restarted after stale timeout` at WARNING level. |
57 | | - |
58 | | -Pros: Covers all root causes (idle timeout, VAD reset, session-duration cap). \ |
59 | | -Cons: Slightly more complex; brief gap in ASR coverage during restart (~200 ms). |
60 | | - |
61 | | -### Option C: Proactive stream rotation |
62 | | - |
63 | | -After every turn (or every N turns), close and re-open the ASR stream. This preempts any session-duration limit and keeps the stream fresh. |
64 | | - |
65 | | -Pros: Eliminates stale state entirely. \ |
66 | | -Cons: Adds latency at turn boundaries; may lose a partial if speech is ongoing during rotation. |
| 67 | +### Option A: Keep-alive noise during mute |
| 68 | +While `mic_muted` is True, send very low amplitude white noise (e.g., ±10 out of ±32768) at normal cadence. This keeps the gRPC stream active and the VAD model warm. |
67 | 69 |
|
68 | | -## Recommendation |
| 70 | +Pros: Prevents idle timeout during mute. \ |
| 71 | +Cons: Does not help if stream has a hard session-duration cap (but auto-restart covers that now). |
69 | 72 |
|
70 | | -**Option A + B combined**: send keep-alive noise during mute (A) to prevent idle timeout, and add a stale-timeout watchdog (B) as a safety net for unexpected stream failures. Option C is heavier and only needed if Riva has a hard session cap that A+B cannot address. |
| 73 | +### Device contention mitigations |
| 74 | +- Investigate separating camera and audio onto different USB host controllers. |
| 75 | +- Consider CSI camera instead of USB to free USB bandwidth entirely. |
| 76 | +- Current `arecord` retry logic (8 attempts with backoff) helps, but persistent `Device or resource busy` across all retries indicates the previous pipeline's `arecord` process was not killed before the new one started. |
71 | 77 |
|
72 | | -## Diagnosis checklist (before implementing) |
| 78 | +## Diagnosis checklist |
73 | 79 |
|
74 | | -- [ ] Confirm Riva Streaming ASR session limits: check `riva_asr` service config for `max_duration_seconds`, keepalive settings, or gRPC deadline. |
75 | | -- [ ] Add a log line in `RivaASRBackend` when the gRPC response iterator ends (to distinguish "server closed stream" from "no results but stream open"). |
76 | | -- [ ] Reproduce by muting for 60+ s mid-session and verifying ASR stops producing results on unmute. |
| 80 | +- [x] Add auto-restart when gRPC stream ends unexpectedly. |
| 81 | +- [x] Make `send_audio` graceful (no crash on dead stream). |
| 82 | +- [x] Emit timeline events for stream restarts. |
| 83 | +- [ ] Confirm Riva session limits: check `riva_asr` config for `max_duration_seconds`, keepalive settings, or gRPC deadline. |
| 84 | +- [ ] Implement keep-alive noise during mute (Option A). |
| 85 | +- [ ] Investigate cleanup of old `arecord` processes on WebSocket reconnect. |
77 | 86 |
|
78 | 87 | ## Effort |
79 | 88 |
|
80 | | -**Small–Medium**: Option A is ~30 min (noise generator in `_feed_pcm_to_pipeline`). Option B is ~1–2 hours (watchdog task + stream restart plumbing + tests). |
| 89 | +**Done**: Auto-restart (Fix 3) + graceful send_audio (Fix 1). \ |
| 90 | +**Remaining**: Keep-alive noise ~30 min. Device cleanup investigation ~1–2 hours. |
0 commit comments