[Feature] Re-enable EmbeddingGemma-300m Support with HF_TOKEN Configuration

## Description

Following the resolution of HuggingFace token access (HF_TOKEN now configured by maintainers), this issue tracks the work to re-enable the `google/embeddinggemma-300m` gated model support that was previously disabled across the codebase.

### Background

The EmbeddingGemma-300m model (`google/embeddinggemma-300m`) is a gated model on HuggingFace that requires authentication via `HF_TOKEN`. Due to CI/CD authentication limitations, Gemma support was disabled in work related to Issue #573 to allow tests to pass without the gated model.

Now that the maintainer has configured `HF_TOKEN` in the CI environment, we can restore full Gemma embedding model support.

---

## Scope of Changes

### 1. Model Download Configuration (`tools/make/models.mk`)

**Current State:**
- `download-models-minimal` excludes Gemma (line 29, 63-64)
- Comment states: "Gemma is gated and requires HF_TOKEN, so it's excluded from CI"

**Required Changes:**
- Add `embeddinggemma-300m` to `download-models-minimal` target
- Update comments to reflect that HF_TOKEN is now available

---

### 2. Go Test Constants (`candle-binding/semantic-router_test.go`)

**Current State:**
```
const (
    GemmaEmbeddingModelPath = "" // Gemma is gated, not used in CI tests (line 1641)
)
```
**Test Skip (lines 1704-1707):**
```
t.Run("InitGemmaOnly", func(t *testing.T) {
    t.Skip("Skipping Gemma-only test: Gemma is a gated model requiring HF_TOKEN")
})
```
**Required Changes:**
- Set `GemmaEmbeddingModelPath = "../models/embeddinggemma-300m"`
- Remove `t.Skip()` from `InitGemmaOnly` test
- Enable any other Gemma-related tests currently skipped

---

### 3. E2E Profile Configurations

**Files to Update:**

| File | Current `gemma_model_path` | Required Change |
|------|---------------------------|-----------------|
| `e2e/profiles/ai-gateway/values.yaml` | Not present (using bert only) | Add Gemma model path |
| `e2e/profiles/dynamic-config/values.yaml` | `""` (empty, line 130) | `"models/embeddinggemma-300m"` |
| `e2e/profiles/routing-strategies/values.yaml` | Already configured | Verify works with HF_TOKEN |

**Environment Variable Override (dynamic-config):**
env:
  - name: EMBEDDING_MODEL_OVERRIDE
    value: "qwen3"  # Force qwen3 for tests (Gemma requires HF_TOKEN)- Remove `EMBEDDING_MODEL_OVERRIDE` or set to `"auto"` to use intelligent model selection

---

### 4. initContainer Model Downloads (`e2e/profiles/*/values.yaml`)

**Required Changes:**
Add Gemma model to initContainer models list in relevant profiles:
```
initContainer:
  models:
    # ... existing models ...
    - name: embeddinggemma-300m
      repo: google/embeddinggemma-300m
```
---
### 5. Rust Test Fixtures (`candle-binding/src/test_fixtures.rs`)

**Current State:**
- `GEMMA_EMBEDDING_300M` constant is defined (line 50)
- `gemma_embedding_model()` fixture exists and attempts to load the model
- Tests will panic if model is not available

**Required Changes:**
- Verify `gemma_embedding_model()` fixture works with downloaded model
- Enable all Gemma-related Rust tests in:
  - `candle-binding/src/model_architectures/embedding/gemma_embedding_test.rs`
  - `candle-binding/src/model_architectures/embedding/gemma3_model_test.rs`

---

### 6. GitHub Actions Workflow (`.github/workflows/integration-test-k8s.yml`)

**Required Changes:**
- Ensure `HF_TOKEN` secret is passed to the workflow environment
- Add HF_TOKEN to model download steps if not already present:
```
env:
  HF_TOKEN: ${{ secrets.HF_TOKEN }}
```

---

### 7. Quickstart Script (`scripts/quickstart.sh`)

**Current State (lines 182-188):**
# Check if failure was due to gated model (embeddinggemma-300m)
if grep -q "embeddinggemma.*401\|embeddinggemma.*Unauthorized\|embeddinggemma.*GatedRepoError" ...**Required Changes:**
- Update fallback message to indicate HF_TOKEN may not be set
- Or remove fallback if Gemma download is now expected to succeed

---

## Acceptance Criteria

- [ ] `make download-models-minimal` successfully downloads `embeddinggemma-300m`
- [ ] Go tests for Gemma embedding pass without skips
- [ ] Rust tests for `GemmaEmbeddingModel` pass (cosine similarity ≥ 0.99 vs Python reference)
- [ ] E2E tests with `embedding_model: "auto"` correctly route to Gemma for short texts
- [ ] E2E tests with `embedding_model: "gemma"` work correctly
- [ ] CI/CD pipeline (integration-test-k8s.yml) passes with Gemma enabled
- [ ] Documentation updated to reflect Gemma availability

---


## Related Issues

- #573 - Original issue that disabled Gemma support

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Feature] Re-enable EmbeddingGemma-300m Support with HF_TOKEN Configuration #790

Description

Background

Scope of Changes

1. Model Download Configuration (`tools/make/models.mk`)

2. Go Test Constants (`candle-binding/semantic-router_test.go`)

3. E2E Profile Configurations

4. initContainer Model Downloads (`e2e/profiles/*/values.yaml`)

5. Rust Test Fixtures (`candle-binding/src/test_fixtures.rs`)

6. GitHub Actions Workflow (`.github/workflows/integration-test-k8s.yml`)

7. Quickstart Script (`scripts/quickstart.sh`)

Check if failure was due to gated model (embeddinggemma-300m)

Acceptance Criteria

Related Issues

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

File	Current `gemma_model_path`	Required Change
`e2e/profiles/ai-gateway/values.yaml`	Not present (using bert only)	Add Gemma model path
`e2e/profiles/dynamic-config/values.yaml`	`""` (empty, line 130)	`"models/embeddinggemma-300m"`
`e2e/profiles/routing-strategies/values.yaml`	Already configured	Verify works with HF_TOKEN

[Feature] Re-enable EmbeddingGemma-300m Support with HF_TOKEN Configuration #790

Description

Description

Background

Scope of Changes

1. Model Download Configuration (tools/make/models.mk)

2. Go Test Constants (candle-binding/semantic-router_test.go)

3. E2E Profile Configurations

4. initContainer Model Downloads (e2e/profiles/*/values.yaml)

5. Rust Test Fixtures (candle-binding/src/test_fixtures.rs)

6. GitHub Actions Workflow (.github/workflows/integration-test-k8s.yml)

7. Quickstart Script (scripts/quickstart.sh)

Check if failure was due to gated model (embeddinggemma-300m)

Acceptance Criteria

Related Issues

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

1. Model Download Configuration (`tools/make/models.mk`)

2. Go Test Constants (`candle-binding/semantic-router_test.go`)

4. initContainer Model Downloads (`e2e/profiles/*/values.yaml`)

5. Rust Test Fixtures (`candle-binding/src/test_fixtures.rs`)

6. GitHub Actions Workflow (`.github/workflows/integration-test-k8s.yml`)

7. Quickstart Script (`scripts/quickstart.sh`)