Skip to content

[Feature] Re-enable EmbeddingGemma-300m Support with HF_TOKEN Configuration #790

@yehudit1987

Description

@yehudit1987

Description

Following the resolution of HuggingFace token access (HF_TOKEN now configured by maintainers), this issue tracks the work to re-enable the google/embeddinggemma-300m gated model support that was previously disabled across the codebase.

Background

The EmbeddingGemma-300m model (google/embeddinggemma-300m) is a gated model on HuggingFace that requires authentication via HF_TOKEN. Due to CI/CD authentication limitations, Gemma support was disabled in work related to Issue #573 to allow tests to pass without the gated model.

Now that the maintainer has configured HF_TOKEN in the CI environment, we can restore full Gemma embedding model support.


Scope of Changes

1. Model Download Configuration (tools/make/models.mk)

Current State:

  • download-models-minimal excludes Gemma (line 29, 63-64)
  • Comment states: "Gemma is gated and requires HF_TOKEN, so it's excluded from CI"

Required Changes:

  • Add embeddinggemma-300m to download-models-minimal target
  • Update comments to reflect that HF_TOKEN is now available

2. Go Test Constants (candle-binding/semantic-router_test.go)

Current State:

const (
    GemmaEmbeddingModelPath = "" // Gemma is gated, not used in CI tests (line 1641)
)

Test Skip (lines 1704-1707):

t.Run("InitGemmaOnly", func(t *testing.T) {
    t.Skip("Skipping Gemma-only test: Gemma is a gated model requiring HF_TOKEN")
})

Required Changes:

  • Set GemmaEmbeddingModelPath = "../models/embeddinggemma-300m"
  • Remove t.Skip() from InitGemmaOnly test
  • Enable any other Gemma-related tests currently skipped

3. E2E Profile Configurations

Files to Update:

File Current gemma_model_path Required Change
e2e/profiles/ai-gateway/values.yaml Not present (using bert only) Add Gemma model path
e2e/profiles/dynamic-config/values.yaml "" (empty, line 130) "models/embeddinggemma-300m"
e2e/profiles/routing-strategies/values.yaml Already configured Verify works with HF_TOKEN

Environment Variable Override (dynamic-config):
env:

  • name: EMBEDDING_MODEL_OVERRIDE
    value: "qwen3" # Force qwen3 for tests (Gemma requires HF_TOKEN)- Remove EMBEDDING_MODEL_OVERRIDE or set to "auto" to use intelligent model selection

4. initContainer Model Downloads (e2e/profiles/*/values.yaml)

Required Changes:
Add Gemma model to initContainer models list in relevant profiles:

initContainer:
  models:
    # ... existing models ...
    - name: embeddinggemma-300m
      repo: google/embeddinggemma-300m

5. Rust Test Fixtures (candle-binding/src/test_fixtures.rs)

Current State:

  • GEMMA_EMBEDDING_300M constant is defined (line 50)
  • gemma_embedding_model() fixture exists and attempts to load the model
  • Tests will panic if model is not available

Required Changes:

  • Verify gemma_embedding_model() fixture works with downloaded model
  • Enable all Gemma-related Rust tests in:
    • candle-binding/src/model_architectures/embedding/gemma_embedding_test.rs
    • candle-binding/src/model_architectures/embedding/gemma3_model_test.rs

6. GitHub Actions Workflow (.github/workflows/integration-test-k8s.yml)

Required Changes:

  • Ensure HF_TOKEN secret is passed to the workflow environment
  • Add HF_TOKEN to model download steps if not already present:
env:
  HF_TOKEN: ${{ secrets.HF_TOKEN }}

7. Quickstart Script (scripts/quickstart.sh)

Current State (lines 182-188):

Check if failure was due to gated model (embeddinggemma-300m)

if grep -q "embeddinggemma.*401|embeddinggemma.*Unauthorized|embeddinggemma.*GatedRepoError" ...Required Changes:

  • Update fallback message to indicate HF_TOKEN may not be set
  • Or remove fallback if Gemma download is now expected to succeed

Acceptance Criteria

  • make download-models-minimal successfully downloads embeddinggemma-300m
  • Go tests for Gemma embedding pass without skips
  • Rust tests for GemmaEmbeddingModel pass (cosine similarity ≥ 0.99 vs Python reference)
  • E2E tests with embedding_model: "auto" correctly route to Gemma for short texts
  • E2E tests with embedding_model: "gemma" work correctly
  • CI/CD pipeline (integration-test-k8s.yml) passes with Gemma enabled
  • Documentation updated to reflect Gemma availability

Related Issues

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

Status

Backlog

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions