Skip to content

fix(llmrails): backfill embedding model params into search provider config (fixes stale KB cache)#1753

Open
Pouyanpi wants to merge 3 commits intodevelopfrom
fix/kb-cache-invalidation-on-embedding-model-change
Open

fix(llmrails): backfill embedding model params into search provider config (fixes stale KB cache)#1753
Pouyanpi wants to merge 3 commits intodevelopfrom
fix/kb-cache-invalidation-on-embedding-model-change

Conversation

@Pouyanpi
Copy link
Copy Markdown
Collaborator

@Pouyanpi Pouyanpi commented Mar 31, 2026

#‪# Description

Resolves #1751

  • fix KB annoy index cache not invalidating when the embedding model is changed via the models: [{type: embeddings, ...}] config section
  • When a models[type=embeddings] entry is found, backfill embedding_model and embedding_engine into embedding_search_provider.parameters for both core and knowledge_base configs so the cache hash includes the actual model name
  • Previously, switching embedding models (e.g., FastEmbed to OpenAI) caused IndexError: Vector has wrong length at runtime with no clear error message

Test plan

  • run with default FastEmbed model to create KB cache (e.g., poetry run nemoguardrails chat --config=./examples/bots/abc)
  • switch to OpenAI embeddings via models section (add
- type: embeddings
    engine: openai
    model: text-embedding-3-small
  • verify new cache file is created (different hash) and no vector length error
  • run existing embedding tests: pytest tests/ -k embedding

or just run (ensure .cache dir is deleted prior running it):

from nemoguardrails import LLMRails, RailsConfig
from nemoguardrails.rails.llm.config import Model

config1 = RailsConfig.from_path("./examples/bots/abc")
rails1 = LLMRails(config1)
r1 = rails1.generate(messages=[{"role": "user", "content": "Hello"}])
print(f"Run 1 (default FastEmbed): {r1['content']}")

config2 = RailsConfig.from_path("./examples/bots/abc")
config2.models.append(Model(type="embeddings", engine="openai", model="text-embedding-3-small"))
rails2 = LLMRails(config2)
r2 = rails2.generate(messages=[{"role": "user", "content": "Hello"}])
print(f"Run 2 (OpenAI embeddings): {r2['content']}")

Expected Output:

After the Fix:

Run 1 (default FastEmbed): Hello! How can I assist you today?
Run 2 (OpenAI embeddings): Hello! How can I assist you today?

Prior to the Fix:

Run 1 (default FastEmbed): Hello! How can I help you today?
WARNING:nemoguardrails.actions.action_dispatcher:Error while execution 'retrieve_relevant_chunks' with parameters '{'context': {'last_user_message': 'Hello', 'last_bot_message': None, 'user_message': 'Hello', 'input_flows': ['self check input'], 'i': 1, 'triggered_input_rail': None, 'allowed': True, 'event': {'type': 'StartInternalSystemAction', 'uid': '341a946f-75e5-49b0-a44a-a811f47851f0', 'event_created_at': '2026-03-31T13:16:09.424440+00:00', 'source_uid': 'NeMoGuardrails', 'action_name': 'retrieve_relevant_chunks', 'action_params': {}, 'action_result_key': None, 'action_uid': 'ed2433e0-7c6c-43a4-88bd-ede638ee8e0c', 'is_system_action': True}}, 'kb': <nemoguardrails.kb.kb.KnowledgeBase object at 0x10f30ad50>}': Vector has wrong length (expected 384, got 1536)
ERROR:nemoguardrails.actions.action_dispatcher:Vector has wrong length (expected 384, got 1536)
Traceback (most recent call last):
  File "develop/nemoguardrails/actions/action_dispatcher.py", line 217, in execute_action
    result = await result
             ^^^^^^^^^^^^
  File "develop/nemoguardrails/actions/retrieve_relevant_chunks.py", line 65, in retrieve_relevant_chunks
    chunks = [chunk["body"] for chunk in await kb.search_relevant_chunks(user_message)]
                                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "develop/nemoguardrails/kb/kb.py", line 179, in search_relevant_chunks
    results = await self.index.search(text, max_results=max_results)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "nemoguardrails/embeddings/basic.py", line 282, in search
    results = self._index.get_nns_by_vector(
        _embedding,
        max_results,
        include_distances=True,
    )
IndexError: Vector has wrong length (expected 384, got 1536)
Run 2 (OpenAI embeddings): I'm sorry, an internal error has occurred.

Summary by CodeRabbit

Release Notes

  • New Features

    • Embedding search provider configuration now automatically injects embedding model and engine parameters during initialization for default providers when values are not explicitly provided, while preserving any user-specified settings.
  • Tests

    • Added comprehensive test coverage validating embedding search provider parameter configuration, including backfilling behavior, explicit value preservation, and custom provider handling.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Mar 31, 2026

Greptile Summary

This PR fixes a cache-invalidation bug in the KB annoy index where switching the embedding model via the models[type=embeddings] config section had no effect on the computed cache hash, causing a silent IndexError: Vector has wrong length at runtime.

The fix backfills embedding_model and embedding_engine into embedding_search_provider.parameters for both core and knowledge_base configs immediately after the embedding model loop in LLMRails.__init__. This ensures the hash computed in kb.py (which reads from embedding_search_provider.parameters) correctly varies with the configured model.

Key implementation details:

  • Only backfills when esp.name == \"default\", correctly leaving custom providers untouched
  • Guards model.model is not None (since model is Optional[str]) before writing — addressing the concern raised in the prior thread
  • model.engine is not None guard is technically redundant because engine: str is a required, non-optional field and can never be None, but it is harmless defensive code
  • Five new unit tests cover all relevant scenarios: fresh backfill, no-overwrite, partial fill, custom provider skip, and no-embeddings-model baseline

Confidence Score: 5/5

Safe to merge — the fix is minimal, well-guarded, and fully covered by new tests; no P0/P1 issues found.

The prior thread concern about None backfill has been addressed with explicit guards. The engine guard is redundant but harmless. All five test scenarios cover expected behavior. No logic or data-integrity regressions identified.

No files require special attention.

Important Files Changed

Filename Overview
nemoguardrails/rails/llm/llmrails.py Adds backfill logic for embedding_model/embedding_engine into embedding_search_provider.parameters for both core and knowledge_base configs when a models[type=embeddings] entry is found; correctly guards against None model.model and only targets the default provider.
tests/test_llmrails.py Adds five targeted test cases covering: backfill of both providers, no overwrite of explicit params, partial backfill, no backfill for custom providers, and no backfill when embeddings model is absent; coverage is comprehensive.

Sequence Diagram

sequenceDiagram
    participant User as User Config (YAML)
    participant LLMRails as LLMRails.__init__
    participant ESP_Core as core.embedding_search_provider
    participant ESP_KB as knowledge_base.embedding_search_provider
    participant KB as kb.py build()

    User->>LLMRails: models[type=embeddings] entry
    LLMRails->>LLMRails: set default_embedding_model/engine
    LLMRails->>ESP_Core: if name=="default" and key missing → backfill embedding_model/engine
    LLMRails->>ESP_KB: if name=="default" and key missing → backfill embedding_model/engine
    KB->>ESP_KB: parameters.get("embedding_engine","") + parameters.get("embedding_model","")
    KB->>KB: hash_prefix + content → cache hash
    KB->>KB: load or build annoy index with correct model
Loading

Reviews (3): Last reviewed commit: "assert KB provider gets backfill when co..." | Re-trigger Greptile

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 31, 2026

📝 Walkthrough

Walkthrough

The code now populates embedding search provider parameters with embedding model and engine values during LLMRails initialization. When a default embedding search provider is configured, the system backfills missing embedding_model and embedding_engine parameters from the configured embeddings model, ensuring the cache hash reflects the actual embedding configuration.

Changes

Cohort / File(s) Summary
Embedding provider initialization
nemoguardrails/rails/llm/llmrails.py
Updated LLMRails.__init__ to backfill embedding_model and embedding_engine into config.core.embedding_search_provider.parameters and config.knowledge_base.embedding_search_provider.parameters when the provider name is "default" and these keys are absent, using values from the configured embeddings model.
Embedding provider parameter tests
tests/test_llmrails.py
Added five comprehensive test cases validating parameter backfill behavior: (1) backfilling missing parameters for both core and knowledge_base; (2) preserving explicitly provided parameters without overwriting; (3) partial backfill for only missing keys; (4) skipping backfill for non-default custom providers; (5) no backfill when no embeddings model is configured.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~15 minutes

🚥 Pre-merge checks | ✅ 6
✅ Passed checks (6 passed)
Check name Status Explanation
Linked Issues check ✅ Passed The code changes implement the core fix for #1751 by backfilling embedding model parameters into embedding_search_provider.parameters, which makes the cache hash include the actual model name.
Out of Scope Changes check ✅ Passed All changes are directly scoped to addressing #1751: embedding parameter backfilling logic and comprehensive test coverage for that functionality.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Test Results For Major Changes ✅ Passed Bug fix PR includes comprehensive unit tests (+119 lines) and documented test plan validating KB cache invalidation behavior across multiple scenarios.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly summarizes the main change: backfilling embedding model parameters into search provider configuration to fix stale KB cache issues.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/kb-cache-invalidation-on-embedding-model-change

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
tests/test_llmrails.py (1)

1439-1456: Consider also asserting knowledge_base backfill in this test.

This test verifies that custom providers are skipped, but only checks core.embedding_search_provider. The knowledge_base.embedding_search_provider defaults to name: "default" and should receive the backfill. Adding assertions for the knowledge_base would strengthen this test by verifying the selectivity of the backfill logic—ensuring it correctly applies to default providers while skipping custom ones in the same initialization.

💡 Optional enhancement to verify selective backfill
 def test_embedding_model_no_backfill_for_custom_provider():
     config = RailsConfig.parse_object(
         {
             **EMBEDDING_MODEL_CONFIG_BASE,
             "core": {
                 "embedding_search_provider": {
                     "name": "custom",
                     "parameters": {"some_param": "value"},
                 }
             },
         }
     )

     rails = LLMRails(config=config, llm=FakeLLM(responses=["  express greeting"]))

     assert "embedding_model" not in rails.config.core.embedding_search_provider.parameters
     assert "embedding_engine" not in rails.config.core.embedding_search_provider.parameters
     assert rails.config.core.embedding_search_provider.parameters["some_param"] == "value"
+    # knowledge_base remains default and should still be backfilled
+    assert rails.config.knowledge_base.embedding_search_provider.parameters["embedding_model"] == "intfloat/e5-large-v2"
+    assert rails.config.knowledge_base.embedding_search_provider.parameters["embedding_engine"] == "SentenceTransformers"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/test_llmrails.py` around lines 1439 - 1456, Extend
test_embedding_model_no_backfill_for_custom_provider to also assert that the
knowledge_base.embedding_search_provider received the backfill: after creating
rails = LLMRails(...), add assertions that
rails.config.knowledge_base.embedding_search_provider.parameters contains the
backfilled keys ("embedding_model" and/or "embedding_engine") and that their
values match the expected values from EMBEDDING_MODEL_CONFIG_BASE (so the custom
provider remains untouched while the default knowledge_base provider is
populated).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@tests/test_llmrails.py`:
- Around line 1439-1456: Extend
test_embedding_model_no_backfill_for_custom_provider to also assert that the
knowledge_base.embedding_search_provider received the backfill: after creating
rails = LLMRails(...), add assertions that
rails.config.knowledge_base.embedding_search_provider.parameters contains the
backfilled keys ("embedding_model" and/or "embedding_engine") and that their
values match the expected values from EMBEDDING_MODEL_CONFIG_BASE (so the custom
provider remains untouched while the default knowledge_base provider is
populated).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: dfaf86f1-bea6-45cf-a0d1-0b1c2dcef416

📥 Commits

Reviewing files that changed from the base of the PR and between 94fc57e and db7ff10.

📒 Files selected for processing (2)
  • nemoguardrails/rails/llm/llmrails.py
  • tests/test_llmrails.py

@codecov
Copy link
Copy Markdown

codecov bot commented Mar 31, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

@Pouyanpi Pouyanpi self-assigned this Mar 31, 2026
@Pouyanpi Pouyanpi added the bug Something isn't working label Mar 31, 2026
@Pouyanpi Pouyanpi added this to the v0.22.0 milestone Mar 31, 2026
@Pouyanpi Pouyanpi changed the title fix(llmrails): invalidate KB cache when embedding model changes via models config fix(llmrails): backfill embedding model params into search provider config to fix stale KB cache Mar 31, 2026
@Pouyanpi Pouyanpi changed the title fix(llmrails): backfill embedding model params into search provider config to fix stale KB cache fix(llmrails): backfill embedding model params into search provider config (fixes stale KB cache) Mar 31, 2026
Copy link
Copy Markdown
Collaborator

@tgasser-nv tgasser-nv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, approving! Not blocking for this PR, but items to consider as follow-ons:

  1. It feels a bit wrong for an object to mutate its data object passed in during initialization. I'd prefer to handle this at the Pydantic validation stage.
  2. Some embedding models (Gemini) have a dimensionality value in embedding_parameters which we'd need to add to item hash. Without this if a users changes the dimensionality we won't invalidate the cache and cause errors.

]:
if esp.name != "default":
continue
if "embedding_model" not in esp.parameters and model.model is not None:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: The Model model and engine fields can't be None. The model fields has its own validator, and engine isn't optional so it can't be None. I think the and model.{model,engine} is not None checks can be removed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bug: KB cache not invalidated when embedding model changes

2 participants