-
Notifications
You must be signed in to change notification settings - Fork 304
Description
Description
Following the resolution of HuggingFace token access (HF_TOKEN now configured by maintainers), this issue tracks the work to re-enable the google/embeddinggemma-300m gated model support that was previously disabled across the codebase.
Background
The EmbeddingGemma-300m model (google/embeddinggemma-300m) is a gated model on HuggingFace that requires authentication via HF_TOKEN. Due to CI/CD authentication limitations, Gemma support was disabled in work related to Issue #573 to allow tests to pass without the gated model.
Now that the maintainer has configured HF_TOKEN in the CI environment, we can restore full Gemma embedding model support.
Scope of Changes
1. Model Download Configuration (tools/make/models.mk)
Current State:
download-models-minimalexcludes Gemma (line 29, 63-64)- Comment states: "Gemma is gated and requires HF_TOKEN, so it's excluded from CI"
Required Changes:
- Add
embeddinggemma-300mtodownload-models-minimaltarget - Update comments to reflect that HF_TOKEN is now available
2. Go Test Constants (candle-binding/semantic-router_test.go)
Current State:
const (
GemmaEmbeddingModelPath = "" // Gemma is gated, not used in CI tests (line 1641)
)
Test Skip (lines 1704-1707):
t.Run("InitGemmaOnly", func(t *testing.T) {
t.Skip("Skipping Gemma-only test: Gemma is a gated model requiring HF_TOKEN")
})
Required Changes:
- Set
GemmaEmbeddingModelPath = "../models/embeddinggemma-300m" - Remove
t.Skip()fromInitGemmaOnlytest - Enable any other Gemma-related tests currently skipped
3. E2E Profile Configurations
Files to Update:
| File | Current gemma_model_path |
Required Change |
|---|---|---|
e2e/profiles/ai-gateway/values.yaml |
Not present (using bert only) | Add Gemma model path |
e2e/profiles/dynamic-config/values.yaml |
"" (empty, line 130) |
"models/embeddinggemma-300m" |
e2e/profiles/routing-strategies/values.yaml |
Already configured | Verify works with HF_TOKEN |
Environment Variable Override (dynamic-config):
env:
- name: EMBEDDING_MODEL_OVERRIDE
value: "qwen3" # Force qwen3 for tests (Gemma requires HF_TOKEN)- RemoveEMBEDDING_MODEL_OVERRIDEor set to"auto"to use intelligent model selection
4. initContainer Model Downloads (e2e/profiles/*/values.yaml)
Required Changes:
Add Gemma model to initContainer models list in relevant profiles:
initContainer:
models:
# ... existing models ...
- name: embeddinggemma-300m
repo: google/embeddinggemma-300m
5. Rust Test Fixtures (candle-binding/src/test_fixtures.rs)
Current State:
GEMMA_EMBEDDING_300Mconstant is defined (line 50)gemma_embedding_model()fixture exists and attempts to load the model- Tests will panic if model is not available
Required Changes:
- Verify
gemma_embedding_model()fixture works with downloaded model - Enable all Gemma-related Rust tests in:
candle-binding/src/model_architectures/embedding/gemma_embedding_test.rscandle-binding/src/model_architectures/embedding/gemma3_model_test.rs
6. GitHub Actions Workflow (.github/workflows/integration-test-k8s.yml)
Required Changes:
- Ensure
HF_TOKENsecret is passed to the workflow environment - Add HF_TOKEN to model download steps if not already present:
env:
HF_TOKEN: ${{ secrets.HF_TOKEN }}
7. Quickstart Script (scripts/quickstart.sh)
Current State (lines 182-188):
Check if failure was due to gated model (embeddinggemma-300m)
if grep -q "embeddinggemma.*401|embeddinggemma.*Unauthorized|embeddinggemma.*GatedRepoError" ...Required Changes:
- Update fallback message to indicate HF_TOKEN may not be set
- Or remove fallback if Gemma download is now expected to succeed
Acceptance Criteria
-
make download-models-minimalsuccessfully downloadsembeddinggemma-300m - Go tests for Gemma embedding pass without skips
- Rust tests for
GemmaEmbeddingModelpass (cosine similarity ≥ 0.99 vs Python reference) - E2E tests with
embedding_model: "auto"correctly route to Gemma for short texts - E2E tests with
embedding_model: "gemma"work correctly - CI/CD pipeline (integration-test-k8s.yml) passes with Gemma enabled
- Documentation updated to reflect Gemma availability
Related Issues
- [Testing] improve and enable the skipped testing cases #573 - Original issue that disabled Gemma support
Metadata
Metadata
Assignees
Labels
Type
Projects
Status