Skip to content

Add FLM multi-modal support (ASR + embeddings) with test refactor#1270

Merged
superm1 merged 9 commits intomainfrom
jfowers/flm-mm
Mar 3, 2026
Merged

Add FLM multi-modal support (ASR + embeddings) with test refactor#1270
superm1 merged 9 commits intomainfrom
jfowers/flm-mm

Conversation

@jeremyfowers
Copy link
Member

@jeremyfowers jeremyfowers commented Mar 2, 2026

image image image image

Summary

  • Add FLM backend support for audio transcription (ASR) and embeddings in the C++ server
  • Refactor test capability system from flat backend-keyed to modality-first structure (CAPABILITIES["llm"]["flm"] vs CAPABILITIES["whisper"]["flm"]), disambiguating backends that serve multiple modalities
  • Add flm-whisper CI matrix entry and make existing whisper entries explicit with --wrapped-server whispercpp

Changes

C++ server:

  • fastflowlm_server.cpp/h: FLM ASR transcription and embedding inference
  • router.cpp/h: Route audio/embedding requests to FLM backend
  • server_models.json: Add whisper-v3-turbo-FLM and embed-gemma-300m-FLM model entries

Test refactor:

  • capabilities.py: Modality-first CAPABILITIES dict with backward-compat flat alias
  • server_base.py: Thread modality through parse_args/run_server_tests; add default_wrapped_server for backward compat
  • server_whisper.py: Dynamic model via get_test_model("audio"), @skip_if_unsupported decorators for rai_cache/realtime_websocket
  • server_llm.py, server_sd.py: Pass modality= to run_server_tests

CI:

  • Add flm-whisper matrix entry on [rai300_400, Windows]
  • Existing whisper entries now pass --wrapped-server whispercpp explicitly

🤖 Generated with Claude Code

…isper CI

Add FLM multi-modal support (ASR + embeddings) in the C++ server and
restructure the test capability system to support it.

C++ implementation:
- fastflowlm_server: add ASR transcription and embedding inference support
- router: route audio transcription and embedding requests to FLM backend
- server_models.json: add whisper-v3-turbo-FLM and embed-gemma-300m-FLM models

Test refactor:
- capabilities.py: modality-first CAPABILITIES dict with backward-compat flat alias
- server_base.py: thread modality through parse_args/run_server_tests
- server_whisper.py: use capability system for model lookup and skip decorators
- server_llm.py, server_sd.py: pass modality to run_server_tests
- CI: add flm-whisper matrix entry, explicit --wrapped-server whispercpp

Co-Authored-By: Claude Opus 4.6 <[email protected]>
@jeremyfowers jeremyfowers self-assigned this Mar 2, 2026
jeremyfowers and others added 2 commits March 2, 2026 14:48
Use label-based checks (transcription, image, speech) consistently for
all model types, matching the pattern already used for embeddings and
reranking. This decouples the UI view routing from specific recipe names,
so any recipe with the right labels gets the right view.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
The WebSocket realtime layer is backend-agnostic — it buffers audio and
calls the same audio_transcriptions() method used by HTTP. FLM already
implements IAudioServer so no code changes needed beyond flipping the
capability flag. Verified with test_006 and test_007 passing.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
jeremyfowers and others added 5 commits March 2, 2026 19:16
WhisperCpp with NPU backend takes exclusive hold of the NPU, like
RyzenAI. This ensures whispercpp evicts all NPU servers on load, FLM
evicts whispercpp NPU servers before starting, and unknown NPU recipes
default to evicting all rather than just one server.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
Copy link
Contributor

@ramkrishna2910 ramkrishna2910 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works for me!

@superm1 superm1 enabled auto-merge March 3, 2026 12:31
@superm1 superm1 added this pull request to the merge queue Mar 3, 2026
Merged via the queue into main with commit a811000 Mar 3, 2026
40 checks passed
@superm1 superm1 deleted the jfowers/flm-mm branch March 3, 2026 14:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants