Add FLM multi-modal support (ASR + embeddings) with test refactor by jeremyfowers · Pull Request #1270 · lemonade-sdk/lemonade

jeremyfowers · 2026-03-02T19:21:13Z

Summary

Add FLM backend support for audio transcription (ASR) and embeddings in the C++ server
Refactor test capability system from flat backend-keyed to modality-first structure (CAPABILITIES["llm"]["flm"] vs CAPABILITIES["whisper"]["flm"]), disambiguating backends that serve multiple modalities
Add flm-whisper CI matrix entry and make existing whisper entries explicit with --wrapped-server whispercpp

Changes

C++ server:

fastflowlm_server.cpp/h: FLM ASR transcription and embedding inference
router.cpp/h: Route audio/embedding requests to FLM backend
server_models.json: Add whisper-v3-turbo-FLM and embed-gemma-300m-FLM model entries

Test refactor:

capabilities.py: Modality-first CAPABILITIES dict with backward-compat flat alias
server_base.py: Thread modality through parse_args/run_server_tests; add default_wrapped_server for backward compat
server_whisper.py: Dynamic model via get_test_model("audio"), @skip_if_unsupported decorators for rai_cache/realtime_websocket
server_llm.py, server_sd.py: Pass modality= to run_server_tests

CI:

Add flm-whisper matrix entry on [rai300_400, Windows]
Existing whisper entries now pass --wrapped-server whispercpp explicitly

🤖 Generated with Claude Code

…isper CI Add FLM multi-modal support (ASR + embeddings) in the C++ server and restructure the test capability system to support it. C++ implementation: - fastflowlm_server: add ASR transcription and embedding inference support - router: route audio transcription and embedding requests to FLM backend - server_models.json: add whisper-v3-turbo-FLM and embed-gemma-300m-FLM models Test refactor: - capabilities.py: modality-first CAPABILITIES dict with backward-compat flat alias - server_base.py: thread modality through parse_args/run_server_tests - server_whisper.py: use capability system for model lookup and skip decorators - server_llm.py, server_sd.py: pass modality to run_server_tests - CI: add flm-whisper matrix entry, explicit --wrapped-server whispercpp Co-Authored-By: Claude Opus 4.6 <[email protected]>

Use label-based checks (transcription, image, speech) consistently for all model types, matching the pattern already used for embeddings and reranking. This decouples the UI view routing from specific recipe names, so any recipe with the right labels gets the right view. Co-Authored-By: Claude Opus 4.6 <[email protected]>

The WebSocket realtime layer is backend-agnostic — it buffers audio and calls the same audio_transcriptions() method used by HTTP. FLM already implements IAudioServer so no code changes needed beyond flipping the capability flag. Verified with test_006 and test_007 passing. Co-Authored-By: Claude Opus 4.6 <[email protected]>

.github/workflows/cpp_server_build_test_release.yml

src/cpp/server/router.cpp

Co-Authored-By: Claude Opus 4.6 <[email protected]>

WhisperCpp with NPU backend takes exclusive hold of the NPU, like RyzenAI. This ensures whispercpp evicts all NPU servers on load, FLM evicts whispercpp NPU servers before starting, and unknown NPU recipes default to evicting all rather than just one server. Co-Authored-By: Claude Opus 4.6 <[email protected]>

…nade into jfowers/flm-mm

ramkrishna2910

Works for me!

src/cpp/resources/server_models.json

superm1 added target::npu labels Mar 2, 2026

jeremyfowers self-assigned this Mar 2, 2026

jeremyfowers and others added 2 commits March 2, 2026 14:48

superm1 reviewed Mar 2, 2026

View reviewed changes

.github/workflows/cpp_server_build_test_release.yml Outdated Show resolved Hide resolved

.github/workflows/cpp_server_build_test_release.yml Show resolved Hide resolved

src/cpp/server/router.cpp Show resolved Hide resolved

jeremyfowers and others added 5 commits March 2, 2026 19:16

Skip batch embeddings test for FLM (not yet supported)

2bccfbb

Co-Authored-By: Claude Opus 4.6 <[email protected]>

Add npu backend to flm whisper tests

5f3bf79

Merge branch 'jfowers/flm-mm' of https://github.com/lemonade-sdk/lemo…

7012a03

…nade into jfowers/flm-mm

Disable test

5bf65c4

jeremyfowers requested review from ramkrishna2910 and superm1 March 3, 2026 01:07

ramkrishna2910 approved these changes Mar 3, 2026

View reviewed changes

src/cpp/resources/server_models.json Show resolved Hide resolved

Restore multi_model capability to False

aef35ba

superm1 enabled auto-merge March 3, 2026 12:31

superm1 added this pull request to the merge queue Mar 3, 2026

Merged via the queue into main with commit a811000 Mar 3, 2026
40 checks passed

superm1 deleted the jfowers/flm-mm branch March 3, 2026 14:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add FLM multi-modal support (ASR + embeddings) with test refactor#1270

Add FLM multi-modal support (ASR + embeddings) with test refactor#1270
superm1 merged 9 commits intomainfrom
jfowers/flm-mm

jeremyfowers commented Mar 2, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ramkrishna2910 left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

jeremyfowers commented Mar 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ramkrishna2910 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

jeremyfowers commented Mar 2, 2026 •

edited

Loading