feat: separate LLM functions by role (extract/keyword/query)#2758
feat: separate LLM functions by role (extract/keyword/query)#2758Saswatsusmoy wants to merge 3 commits intoHKUDS:mainfrom
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 526de307e6
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| getattr(self, role_max_async_attr) or self.llm_model_max_async | ||
| ) | ||
| role_timeout = getattr(self, role_timeout_attr) or self.default_llm_timeout | ||
| role_kwargs = getattr(self, role_kwargs_attr) or self.llm_model_kwargs |
There was a problem hiding this comment.
Stop inheriting base kwargs into cross-provider role funcs
This fallback unconditionally injects base llm_model_kwargs into role functions whenever the role kwargs are None, which breaks cross-provider setups from ollama/lollms to OpenAI/Gemini/Azure: those base kwargs include keys like api_key and options, and the role wrappers already pass explicit api_key, causing runtime failures such as TypeError: got multiple values for keyword argument 'api_key'. This makes role-specific calls fail instead of using their configured provider.
Useful? React with 👍 / 👎.
| elif effective_binding == "ollama": | ||
| from lightrag.llm.ollama import ollama_model_complete | ||
|
|
||
| return ollama_model_complete |
There was a problem hiding this comment.
Apply role model when binding is Ollama
The Ollama role branch returns the raw ollama_model_complete function without binding effective_model, so EXTRACT/KEYWORD/QUERY_LLM_MODEL does not actually change the model for role-specific Ollama calls. Because the Ollama completion function reads model name from global config, all roles continue using the base model, which defeats per-role model selection.
Useful? React with 👍 / 👎.
| if model_func is not None: | ||
| self._raw_role_llm_funcs[role] = model_func |
There was a problem hiding this comment.
Roll back raw role callable on update failure
This writes the new raw callable before the wrapped function is validated; if wrapping fails, the except block restores only the exposed role func and metadata, leaving _raw_role_llm_funcs[role] with the invalid value. After one failed update, later updates that rely on the stored raw func can keep failing, so the documented atomic rollback is incomplete.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Pull request overview
This PR implements per-role LLM function separation as specified in issue #2741, allowing independent LLM model selection, concurrency limits, timeouts, and provider options for three roles: extract (entity/relation extraction), keyword (keyword extraction), and query (answer generation). Each role gets its own independently wrapped function and concurrency queue, while maintaining full backward compatibility when no role-specific config is provided.
Changes:
- Add 12 per-role dataclass fields to
LightRAG, wrap each role function independently in__post_init__, and provideupdate_llm_role_config()for runtime updates with rollback on failure. - Route LLM calls in
operate.pythrough role-specific functions and update Ollama API emulation endpoints inollama_api.pyto usequery_llm_model_func. - Add per-role environment variable parsing in
config.py, role-specific LLM function construction with cross-provider validation inlightrag_server.py, per-role provider options viaoptions_dict_for_role()inbinding_options.py, and comprehensive documentation inenv.example.
Reviewed changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
lightrag/lightrag.py |
Adds 12 per-role dataclass fields, wraps role functions independently in __post_init__, adds update_llm_role_config() with metadata and rollback |
lightrag/operate.py |
Routes extract_entities/summarize to extract func, keywords to keyword func, kg_query/naive_query to query func |
lightrag/api/config.py |
Parses 18 per-role environment variables for binding/model/host/api_key/max_async/timeout |
lightrag/api/lightrag_server.py |
Adds create_role_llm_func(), create_role_llm_model_kwargs(), role config construction loop, and health endpoint role exposure |
lightrag/llm/binding_options.py |
Adds options_dict_for_role() classmethod with same-provider inheritance and cross-provider default reset |
lightrag/api/routers/ollama_api.py |
Routes Ollama emulation calls through query_llm_model_func |
tests/test_llm_role_separation.py |
Comprehensive tests covering backward compat, isolation, custom functions, concurrency, runtime updates, config parsing, and provider options |
tests/test_extract_entities.py |
Updates test config to include role-specific function keys and use extract_llm_model_func |
env.example |
Documents per-role LLM configuration environment variables with examples |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| # Update stored raw function if a new one is provided | ||
| if model_func is not None: | ||
| self._raw_role_llm_funcs[role] = model_func |
There was a problem hiding this comment.
The rollback logic does not restore _raw_role_llm_funcs[role] on failure. At line 882, self._raw_role_llm_funcs[role] = model_func is updated before the potentially failing priority_limit_async_func_call wrapping at lines 897-907. If wrapping fails, the except block at lines 938-943 restores func_attr and _role_llm_metadata, but leaves _raw_role_llm_funcs[role] pointing to the new (failed) function. This violates the stated guarantee that "On failure, the previous config remains intact." A subsequent call to update_llm_role_config without a new model_func would then use this corrupted raw function.
Save the previous raw function alongside prev_func and prev_metadata, and restore it in the except block.
| role_max_async = ( | ||
| getattr(self, role_max_async_attr) or self.llm_model_max_async | ||
| ) | ||
| role_timeout = getattr(self, role_timeout_attr) or self.default_llm_timeout | ||
| role_kwargs = getattr(self, role_kwargs_attr) or self.llm_model_kwargs |
There was a problem hiding this comment.
Using the or operator for fallback (getattr(self, role_kwargs_attr) or self.llm_model_kwargs) causes an explicit empty dict {} to fall back to base kwargs, since empty dict is falsy. The same applies to role_max_async where 0 would fall back to the base value. This contradicts the intent of allowing explicit override with these values. Consider using if ... is not None checks instead of or (e.g., getattr(self, role_kwargs_attr) if getattr(self, role_kwargs_attr) is not None else self.llm_model_kwargs) for the kwargs and max_async/timeout fallbacks. The same issue exists in update_llm_role_config at lines 883-893.
| role_max_async = ( | |
| getattr(self, role_max_async_attr) or self.llm_model_max_async | |
| ) | |
| role_timeout = getattr(self, role_timeout_attr) or self.default_llm_timeout | |
| role_kwargs = getattr(self, role_kwargs_attr) or self.llm_model_kwargs | |
| role_max_async_value = getattr(self, role_max_async_attr) | |
| role_max_async = ( | |
| role_max_async_value | |
| if role_max_async_value is not None | |
| else self.llm_model_max_async | |
| ) | |
| role_timeout_value = getattr(self, role_timeout_attr) | |
| role_timeout = ( | |
| role_timeout_value | |
| if role_timeout_value is not None | |
| else self.default_llm_timeout | |
| ) | |
| role_kwargs_value = getattr(self, role_kwargs_attr) | |
| role_kwargs = ( | |
| role_kwargs_value | |
| if role_kwargs_value is not None | |
| else self.llm_model_kwargs | |
| ) |
| response = await self.rag.query_llm_model_func( | ||
| query, stream=True, **self.rag.llm_model_kwargs |
There was a problem hiding this comment.
When calling query_llm_model_func, passing **self.rag.llm_model_kwargs (the base LLM kwargs) will override any role-specific kwargs that were baked into the query function via partial() during initialization. For example, if the query role uses a different host than base, the base host from self.rag.llm_model_kwargs would override it. This was a pre-existing pattern, but it becomes more problematic now that the query role can have its own kwargs. Consider using self.rag.query_llm_model_kwargs or self.rag.llm_model_kwargs instead of always using the base kwargs.
| - Operate.py routing: kg_query uses query role | ||
| - Operate.py routing: naive_query uses query role |
There was a problem hiding this comment.
The module docstring claims test coverage for "kg_query uses query role" and "naive_query uses query role" (lines 18-19), but no test methods for these are actually implemented. Since the extract_entities and extract_keywords_only routing are tested, the kg_query and naive_query routing should be tested as well for completeness.
| - Operate.py routing: kg_query uses query role | |
| - Operate.py routing: naive_query uses query role |
|
Please resolve all conflicts in env.example to ensure the PR is ready for review. The repository now features an interactive setup wizard to streamline .env file generation, and env.example has been updated to serve as its configuration template. Please adhere to the following requirements:
For more information of the interactive setup tool, pls refer to : https://github.com/HKUDS/LightRAG/blob/main/docs/InteractiveSetup.md |
Implement per-role LLM function separation as specified in HKUDS#2741. Each role (extract, keyword, query) gets its own independently wrapped LLM function with a dedicated concurrency queue, enabling fine-grained control over model selection, concurrency limits, timeouts, and provider-specific options per task type. Key changes: - lightrag.py: Add 12 dataclass fields for per-role config, wrap each role function independently in __post_init__, store raw functions to prevent double-wrapping on runtime updates, add update_llm_role_config() with binding/model/host/api_key/provider_options metadata support and atomic rollback on failure. - operate.py: Route extract_entities and _summarize_descriptions through extract_llm_model_func, extract_keywords_only through keyword_llm_model_func, and kg_query/naive_query through query_llm_model_func. - api/config.py: Parse 18 per-role environment variables (EXTRACT_LLM_BINDING, KEYWORD_LLM_MODEL, MAX_ASYNC_QUERY_LLM, etc.). - api/lightrag_server.py: Build role-specific LLM functions at startup with cross-provider validation (model and api_key required when binding differs from base), role-specific provider options via options_dict_for_role(), and /health endpoint role config exposure. - llm/binding_options.py: Add options_dict_for_role() supporting per-role provider option env vars ({ROLE}_{PROVIDER}_{FIELD}, e.g. EXTRACT_OPENAI_LLM_TEMPERATURE) with same-provider inheritance and cross-provider default reset. - api/routers/ollama_api.py: Route Ollama emulation calls through query_llm_model_func. Fully backward compatible: no new configuration produces identical behavior to the previous single-LLM pattern. Closes HKUDS#2741
- Use `is not None` instead of `or` for kwargs/max_async/timeout fallbacks
to allow explicit 0/{} overrides (fixes cross-provider kwargs injection)
- Add complete rollback of _raw_role_llm_funcs on update failure
- Create wrapper for Ollama role binding to apply effective_model (like
openai/gemini/azure do) instead of returning raw ollama_model_complete
- Use role-specific kwargs in Ollama API router instead of base kwargs
- Add kg_query and naive_query routing tests
- Rebase onto latest main to resolve env.example divergence
51a9228 to
3afa3a7
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 9 out of 9 changed files in this pull request and generated 3 comments.
Comments suppressed due to low confidence (2)
lightrag/api/routers/ollama_api.py:527
- The function call was updated to
query_llm_model_func, but the kwargs here still useself.rag.llm_model_kwargsdirectly instead of checking for role-specific kwargs. This is inconsistent with thegenerateendpoint (lines 306, 432) which properly selectsself.rag.query_llm_model_kwargswhen available. The samerole_kwargspattern should be applied here to ensure the query role uses its own provider options (e.g., host, timeout) when configured.
response = await self.rag.query_llm_model_func(
cleaned_query,
stream=True,
history_messages=conversation_history,
**self.rag.llm_model_kwargs,
)
lightrag/api/routers/ollama_api.py:690
- Same issue as the streaming bypass path above: the function call was updated to
query_llm_model_func, but the kwargs still useself.rag.llm_model_kwargsinstead of the role-specific kwargs pattern used in thegenerateendpoint. Should userole_kwargs = self.rag.query_llm_model_kwargs if self.rag.query_llm_model_kwargs is not None else self.rag.llm_model_kwargsand pass**role_kwargs.
response_text = await self.rag.query_llm_model_func(
cleaned_query,
stream=False,
history_messages=conversation_history,
**self.rag.llm_model_kwargs,
)
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| logger.info( | ||
| f"LLM role functions initialized - " | ||
| f"extract: max_async={self.extract_llm_model_max_async or self.llm_model_max_async}, " | ||
| f"keyword: max_async={self.keyword_llm_model_max_async or self.llm_model_max_async}, " | ||
| f"query: max_async={self.query_llm_model_max_async or self.llm_model_max_async}" | ||
| ) |
There was a problem hiding this comment.
The log message uses or (e.g., self.extract_llm_model_max_async or self.llm_model_max_async) which incorrectly treats 0 as falsy. If a role has max_async=0 explicitly set (which the codebase supports per the is not None checks), the log will incorrectly display the base value instead of 0. Use self.extract_llm_model_max_async if self.extract_llm_model_max_async is not None else self.llm_model_max_async to be consistent with the actual fallback logic used during wrapping.
| if request.system: | ||
| self.rag.llm_model_kwargs["system_prompt"] = request.system | ||
|
|
||
| if request.stream: | ||
| response = await self.rag.llm_model_func( | ||
| query, stream=True, **self.rag.llm_model_kwargs | ||
| role_kwargs = self.rag.query_llm_model_kwargs if self.rag.query_llm_model_kwargs is not None else self.rag.llm_model_kwargs | ||
| response = await self.rag.query_llm_model_func( | ||
| query, stream=True, **role_kwargs |
There was a problem hiding this comment.
When query_llm_model_kwargs is not None, the system_prompt set on line 303 (self.rag.llm_model_kwargs["system_prompt"] = request.system) won't be included in the kwargs passed to the function, because role_kwargs will point to self.rag.query_llm_model_kwargs (a different dict). The system_prompt should be set on the role_kwargs dict after it's resolved, or passed directly as a keyword argument.
| async def role_ollama_complete( | ||
| prompt, | ||
| system_prompt=None, | ||
| history_messages=None, | ||
| keyword_extraction=False, | ||
| **kwargs, | ||
| ): | ||
| keyword_extraction = kwargs.pop("keyword_extraction", None) | ||
| if keyword_extraction: | ||
| kwargs["format"] = "json" | ||
| if _role_ollama_opts: | ||
| kwargs.update(_role_ollama_opts) | ||
| if history_messages is None: | ||
| history_messages = [] | ||
| return await _ollama_model_if_cache( | ||
| effective_model, | ||
| prompt, | ||
| system_prompt=system_prompt, | ||
| history_messages=history_messages, | ||
| host=effective_host, | ||
| **kwargs, | ||
| ) | ||
|
|
||
| return role_ollama_complete |
There was a problem hiding this comment.
When an Ollama role has a role-specific configuration (e.g., EXTRACT_LLM_MODEL is set while base binding is also Ollama), both create_role_llm_func and create_role_llm_model_kwargs return non-None values. The role function (role_ollama_complete) already bakes in host=effective_host on line 739, but the role kwargs dict from create_role_llm_model_kwargs also includes host. When the function is invoked through __post_init__'s partial(role_raw_func, hashing_kv=hashing_kv, **role_kwargs), the host from role_kwargs would be passed as a kwarg to role_ollama_complete, and then on line 739, host=effective_host would be passed again to _ollama_model_if_cache, causing a TypeError: got multiple values for keyword argument 'host'.
The create_role_llm_model_kwargs function should return None (or an empty dict) when create_role_llm_func returns a non-None function, since the role function already encapsulates the provider-specific configuration. Alternatively, role_ollama_complete could pop host/timeout/options/api_key from kwargs before passing them to _ollama_model_if_cache.
Implement per-role LLM function separation as specified in #2741
Each role (extract, keyword, query) gets its own independently wrapped
LLM function with a dedicated concurrency queue, enabling fine-grained
control over model selection, concurrency limits, timeouts, and
provider-specific options per task type.
Closes #2741
Key changes:
lightrag.py: Add 12 dataclass fields for per-role config, wrap each
role function independently in post_init, store raw functions to
prevent double-wrapping on runtime updates, add update_llm_role_config()
with binding/model/host/api_key/provider_options metadata support and
atomic rollback on failure.
operate.py: Route extract_entities and _summarize_descriptions through
extract_llm_model_func, extract_keywords_only through
keyword_llm_model_func, and kg_query/naive_query through
query_llm_model_func.
api/config.py: Parse 18 per-role environment variables
(EXTRACT_LLM_BINDING, KEYWORD_LLM_MODEL, MAX_ASYNC_QUERY_LLM, etc.).
api/lightrag_server.py: Build role-specific LLM functions at startup
with cross-provider validation (model and api_key required when binding
differs from base), role-specific provider options via
options_dict_for_role(), and /health endpoint role config exposure.
llm/binding_options.py: Add options_dict_for_role() supporting
per-role provider option env vars ({ROLE}{PROVIDER}{FIELD}, e.g.
EXTRACT_OPENAI_LLM_TEMPERATURE) with same-provider inheritance and
cross-provider default reset.
api/routers/ollama_api.py: Route Ollama emulation calls through
query_llm_model_func.