Feat: Meeting Recording Mode with Speaker Diarization

## **Is your feature request related to a problem? Please describe.**

I'm currently running both Hex and [SamScribe](https://github.com/Steven-Weng/SamScribe) simultaneously to handle different transcription needs. Hex is perfect for quick voice-to-text dictation with its hotkey-activated workflow, but it lacks persistent meeting recording capabilities. When I need to transcribe Zoom/Teams meetings with speaker identification and save transcripts for future reference, I have to switch to SamScribe. This creates friction in my workflow and requires downloading the same FluidAudio/Parakeet transcription models twice (once per app). Additionally, Hex's current "press-and-hold" or "double-tap" modes aren't designed for continuous, multi-speaker meeting capture where I need the recording to persist after the session ends.



## **Describe the solution you'd like**

I'd like to propose adding an optional **"Meeting Mode"** to Hex that integrates continuous recording capabilities inspired by SamScribe (MIT licensed, also uses FluidAudio). This would add:

1. **Continuous Recording Toggle**: A mode switch in the UI to enable persistent recording sessions (not just hotkey-activated bursts)
2. **Per-Process Audio Capture**: Integration with `ScreenCaptureKit` to isolate audio from specific applications (Zoom, Teams, Chrome tabs) rather than just microphone input
3. **Speaker Diarization**: Leverage FluidAudio's existing `OfflineDiarizerManager`/`LSEENDDiarizer` capabilities to automatically identify and separate different speakers
4. **Persistent Storage**: Save full meeting transcripts with speaker labels using SwiftData, accessible via a sidebar history view (extending the existing `HistoryFeature`)
5. **Cross-Session Speaker Recognition**: Voice embeddings to remember speaker identities across multiple meetings (name "Bob" once, recognize automatically next time)
6. **Editable Transcripts**: Inline editing for both speaker names and transcription text, similar to Hex's current text editing workflow

The key is maintaining Hex's elegant simplicity while adding this as an **optional mode**;users who only want hotkey dictation shouldn't see any UI complexity.


## **Describe alternatives you've considered**

1. **Continue using both apps separately**: This works but wastes storage (duplicate models), requires context switching, and creates fragmented transcript history across two different databases.

2. **Fork Hex entirely**: Possible since both are MIT licensed, but this fragments the community and loses upstream improvements. I'd prefer to contribute back to the main project if the architecture aligns.

3. **Request SamScribe add hotkey dictation**: SamScribe is designed around meeting workflows; adding Hex's "paste at cursor" magic would require significant architectural changes to their recording-centric model.

4. **Build a new hybrid app**: Overkill when both existing apps share the same core technologies (Swift, SwiftUI, FluidAudio, TCA) and MIT licenses.

5. **Use Hex's current HistoryFeature with manual workarounds**: Tried saving individual transcription snippets, but this loses speaker context, meeting continuity, and per-process audio isolation.



## **Additional context**

- **Technical feasibility**: Both Hex and SamScribe use [FluidAudio](https://github.com/hex-lab/FluidAudio) for transcription, meaning they share the same underlying ASR engine (Parakeet TDT v3) and speaker diarization capabilities. A unified app would eliminate duplicate model downloads (~500MB-1GB savings).

- **Architecture alignment**: Hex already uses Swift Composable Architecture and has a `HistoryFeature` managing transcription history. SamScribe's SwiftData-based persistence and speaker management could integrate cleanly as a new feature slice.

- **License compatibility**: Both projects are MIT licensed, so code sharing/reference is legally unencumbered.

- **Use case priority**: This targets users who need both quick dictation (Hex's strength) AND meeting documentation (SamScribe's strength) without maintaining two separate transcription pipelines.

- **Reference implementation**: [SamScribe](https://github.com/Steven-Weng/SamScribe) demonstrates exactly how this works with ScreenCaptureKit, speaker embeddings, and persistent storage;all using the same FluidAudio stack Hex already depends on.



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feat: Meeting Recording Mode with Speaker Diarization #203

Is your feature request related to a problem? Please describe.

Describe the solution you'd like

Describe alternatives you've considered

Additional context

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

Feat: Meeting Recording Mode with Speaker Diarization #203

Description

Is your feature request related to a problem? Please describe.

Describe the solution you'd like

Describe alternatives you've considered

Additional context

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions