Skip to content

feat: add multimodal embedding support for gemini-embedding-2-preview#7

Merged
ajroetker merged 2 commits intomainfrom
feat/multimodal-capability-override
Mar 18, 2026
Merged

feat: add multimodal embedding support for gemini-embedding-2-preview#7
ajroetker merged 2 commits intomainfrom
feat/multimodal-capability-override

Conversation

@ajroetker
Copy link
Copy Markdown
Contributor

@ajroetker ajroetker commented Mar 18, 2026

Summary

  • Add gemini-embedding-2-preview to the known model registry with multimodal capabilities (images, audio, video, PDFs, 3072 dims, fusion support)
  • Introduce multimodal: true config escape hatch so users can declare any future model as multimodal without waiting for a registry update
  • Update the Gemini provider (google_gemini.go) to route multimodal content through the genai SDK's NewPartFromBytes/NewContentFromParts path
  • Wire GetConfigCapabilities() through all providers (OpenAI, Bedrock, Ollama, Cohere, OpenRouter, Termite)
  • Guard the legacy Vertex provider's multimodal path to only allow multimodalembedding models, with a clear error directing users to use provider: "gemini" with GOOGLE_GENAI_USE_VERTEXAI=1 for other models
  • Fix copy-paste "ollama" error message in GenaiGoogleImpl.GetModels()
  • Simplify: merge config_capabilities.go into capabilities.go, reuse allText() in vertex.go, clean up NewTermiteClient signature

Test plan

  • All lib/embeddings/... tests pass locally
  • Live integration tests pass with GEMINI_API_KEY set (text, image, fused text+image)
  • Live tests skip cleanly without credentials (CI-safe)
  • Pure logic tests always pass (capabilities resolution, config override, multimodal config)
  • CI passes

Add gemini-embedding-2-preview to the known model registry with multimodal
capabilities (images, audio, video, PDFs, 3072 dims, fusion support).

Introduce `multimodal: true` config escape hatch so users can declare any
future model as multimodal without waiting for a registry update. The
GenaiGoogleImpl (gemini provider) now routes multimodal content through the
genai SDK's NewPartFromBytes/NewContentFromParts path.

Also:
- Wire GetConfigCapabilities() through all providers (OpenAI, Bedrock,
  Ollama, Cohere, OpenRouter, Termite)
- Use capabilities-based routing in vertex provider instead of hardcoded
  model name check
- Guard vertex provider's multimodal path to only allow
  multimodalembedding models (legacy Prediction API), with clear error
  pointing to gemini provider for other models
- Fix copy-paste "ollama" error message in GenaiGoogleImpl.GetModels()
- Simplify: merge config_capabilities.go into capabilities.go, reuse
  allText() in vertex.go, clean up NewTermiteClient signature
- Add integration tests (skip without credentials) and capability tests
Picks up fix for nil map panic in zapx invertedIndexCache.Clear() and
synonymIndexCache.Clear() during concurrent shard split operations.
@ajroetker ajroetker merged commit 300abf0 into main Mar 18, 2026
5 of 6 checks passed
@ajroetker ajroetker deleted the feat/multimodal-capability-override branch March 18, 2026 03:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant