feat: separate LLM functions by role (extract/keyword/query) by Saswatsusmoy · Pull Request #2758 · HKUDS/LightRAG

Saswatsusmoy · 2026-03-08T14:41:05Z

Implement per-role LLM function separation as specified in #2741
Each role (extract, keyword, query) gets its own independently wrapped
LLM function with a dedicated concurrency queue, enabling fine-grained
control over model selection, concurrency limits, timeouts, and
provider-specific options per task type.

Closes #2741

Key changes:

lightrag.py: Add 12 dataclass fields for per-role config, wrap each
role function independently in post_init, store raw functions to
prevent double-wrapping on runtime updates, add update_llm_role_config()
with binding/model/host/api_key/provider_options metadata support and
atomic rollback on failure.
operate.py: Route extract_entities and _summarize_descriptions through
extract_llm_model_func, extract_keywords_only through
keyword_llm_model_func, and kg_query/naive_query through
query_llm_model_func.
api/config.py: Parse 18 per-role environment variables
(EXTRACT_LLM_BINDING, KEYWORD_LLM_MODEL, MAX_ASYNC_QUERY_LLM, etc.).
api/lightrag_server.py: Build role-specific LLM functions at startup
with cross-provider validation (model and api_key required when binding
differs from base), role-specific provider options via
options_dict_for_role(), and /health endpoint role config exposure.
llm/binding_options.py: Add options_dict_for_role() supporting
per-role provider option env vars ({ROLE}{PROVIDER}{FIELD}, e.g.
EXTRACT_OPENAI_LLM_TEMPERATURE) with same-provider inheritance and
cross-provider default reset.
api/routers/ollama_api.py: Route Ollama emulation calls through
query_llm_model_func.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 526de307e6

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-03-08T14:47:26Z

+                getattr(self, role_max_async_attr) or self.llm_model_max_async
+            )
+            role_timeout = getattr(self, role_timeout_attr) or self.default_llm_timeout
+            role_kwargs = getattr(self, role_kwargs_attr) or self.llm_model_kwargs


Stop inheriting base kwargs into cross-provider role funcs

This fallback unconditionally injects base llm_model_kwargs into role functions whenever the role kwargs are None, which breaks cross-provider setups from ollama/lollms to OpenAI/Gemini/Azure: those base kwargs include keys like api_key and options, and the role wrappers already pass explicit api_key, causing runtime failures such as TypeError: got multiple values for keyword argument 'api_key'. This makes role-specific calls fail instead of using their configured provider.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-03-08T14:47:26Z

+            elif effective_binding == "ollama":
+                from lightrag.llm.ollama import ollama_model_complete
+
+                return ollama_model_complete


Apply role model when binding is Ollama

The Ollama role branch returns the raw ollama_model_complete function without binding effective_model, so EXTRACT/KEYWORD/QUERY_LLM_MODEL does not actually change the model for role-specific Ollama calls. Because the Ollama completion function reads model name from global config, all roles continue using the base model, which defeats per-role model selection.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-03-08T14:47:26Z

+            if model_func is not None:
+                self._raw_role_llm_funcs[role] = model_func


Roll back raw role callable on update failure

This writes the new raw callable before the wrapped function is validated; if wrapping fails, the except block restores only the exposed role func and metadata, leaving _raw_role_llm_funcs[role] with the invalid value. After one failed update, later updates that rely on the stored raw func can keep failing, so the documented atomic rollback is incomplete.

Useful? React with 👍 / 👎.

Copilot

Pull request overview

This PR implements per-role LLM function separation as specified in issue #2741, allowing independent LLM model selection, concurrency limits, timeouts, and provider options for three roles: extract (entity/relation extraction), keyword (keyword extraction), and query (answer generation). Each role gets its own independently wrapped function and concurrency queue, while maintaining full backward compatibility when no role-specific config is provided.

Changes:

Add 12 per-role dataclass fields to LightRAG, wrap each role function independently in __post_init__, and provide update_llm_role_config() for runtime updates with rollback on failure.
Route LLM calls in operate.py through role-specific functions and update Ollama API emulation endpoints in ollama_api.py to use query_llm_model_func.
Add per-role environment variable parsing in config.py, role-specific LLM function construction with cross-provider validation in lightrag_server.py, per-role provider options via options_dict_for_role() in binding_options.py, and comprehensive documentation in env.example.

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
`lightrag/lightrag.py`	Adds 12 per-role dataclass fields, wraps role functions independently in `__post_init__`, adds `update_llm_role_config()` with metadata and rollback
`lightrag/operate.py`	Routes extract_entities/summarize to extract func, keywords to keyword func, kg_query/naive_query to query func
`lightrag/api/config.py`	Parses 18 per-role environment variables for binding/model/host/api_key/max_async/timeout
`lightrag/api/lightrag_server.py`	Adds `create_role_llm_func()`, `create_role_llm_model_kwargs()`, role config construction loop, and health endpoint role exposure
`lightrag/llm/binding_options.py`	Adds `options_dict_for_role()` classmethod with same-provider inheritance and cross-provider default reset
`lightrag/api/routers/ollama_api.py`	Routes Ollama emulation calls through `query_llm_model_func`
`tests/test_llm_role_separation.py`	Comprehensive tests covering backward compat, isolation, custom functions, concurrency, runtime updates, config parsing, and provider options
`tests/test_extract_entities.py`	Updates test config to include role-specific function keys and use `extract_llm_model_func`
`env.example`	Documents per-role LLM configuration environment variables with examples

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-08T14:52:01Z

+            # Update stored raw function if a new one is provided
+            if model_func is not None:
+                self._raw_role_llm_funcs[role] = model_func


The rollback logic does not restore _raw_role_llm_funcs[role] on failure. At line 882, self._raw_role_llm_funcs[role] = model_func is updated before the potentially failing priority_limit_async_func_call wrapping at lines 897-907. If wrapping fails, the except block at lines 938-943 restores func_attr and _role_llm_metadata, but leaves _raw_role_llm_funcs[role] pointing to the new (failed) function. This violates the stated guarantee that "On failure, the previous config remains intact." A subsequent call to update_llm_role_config without a new model_func would then use this corrupted raw function.

Save the previous raw function alongside prev_func and prev_metadata, and restore it in the except block.

Copilot · 2026-03-08T14:52:01Z

+            role_max_async = (
+                getattr(self, role_max_async_attr) or self.llm_model_max_async
+            )
+            role_timeout = getattr(self, role_timeout_attr) or self.default_llm_timeout
+            role_kwargs = getattr(self, role_kwargs_attr) or self.llm_model_kwargs


Using the or operator for fallback (getattr(self, role_kwargs_attr) or self.llm_model_kwargs) causes an explicit empty dict {} to fall back to base kwargs, since empty dict is falsy. The same applies to role_max_async where 0 would fall back to the base value. This contradicts the intent of allowing explicit override with these values. Consider using if ... is not None checks instead of or (e.g., getattr(self, role_kwargs_attr) if getattr(self, role_kwargs_attr) is not None else self.llm_model_kwargs) for the kwargs and max_async/timeout fallbacks. The same issue exists in update_llm_role_config at lines 883-893.

Suggested change

role_max_async = (

getattr(self, role_max_async_attr) or self.llm_model_max_async

)

role_timeout = getattr(self, role_timeout_attr) or self.default_llm_timeout

role_kwargs = getattr(self, role_kwargs_attr) or self.llm_model_kwargs

role_max_async_value = getattr(self, role_max_async_attr)

role_max_async = (

role_max_async_value

if role_max_async_value is not None

else self.llm_model_max_async

)

role_timeout_value = getattr(self, role_timeout_attr)

role_timeout = (

role_timeout_value

if role_timeout_value is not None

else self.default_llm_timeout

)

role_kwargs_value = getattr(self, role_kwargs_attr)

role_kwargs = (

role_kwargs_value

if role_kwargs_value is not None

else self.llm_model_kwargs

)

Copilot · 2026-03-08T14:52:01Z

+                    response = await self.rag.query_llm_model_func(
                        query, stream=True, **self.rag.llm_model_kwargs


When calling query_llm_model_func, passing **self.rag.llm_model_kwargs (the base LLM kwargs) will override any role-specific kwargs that were baked into the query function via partial() during initialization. For example, if the query role uses a different host than base, the base host from self.rag.llm_model_kwargs would override it. This was a pre-existing pattern, but it becomes more problematic now that the query role can have its own kwargs. Consider using self.rag.query_llm_model_kwargs or self.rag.llm_model_kwargs instead of always using the base kwargs.

Copilot · 2026-03-08T14:52:02Z

+- Operate.py routing: kg_query uses query role
+- Operate.py routing: naive_query uses query role


The module docstring claims test coverage for "kg_query uses query role" and "naive_query uses query role" (lines 18-19), but no test methods for these are actually implemented. Since the extract_entities and extract_keywords_only routing are tested, the kg_query and naive_query routing should be tested as well for completeness.

Suggested change

- Operate.py routing: kg_query uses query role

- Operate.py routing: naive_query uses query role

danielaskdd · 2026-03-09T01:19:20Z

Please resolve all conflicts in env.example to ensure the PR is ready for review. The repository now features an interactive setup wizard to streamline .env file generation, and env.example has been updated to serve as its configuration template. Please adhere to the following requirements:

All configurable environment variables must be included in this file (either active or commented out).
Lines starting with # # denote repeated placeholders; these must remain as-is and will not be substituted with actual values by the setup wizard.

For more information of the interactive setup tool, pls refer to : https://github.com/HKUDS/LightRAG/blob/main/docs/InteractiveSetup.md

Implement per-role LLM function separation as specified in HKUDS#2741. Each role (extract, keyword, query) gets its own independently wrapped LLM function with a dedicated concurrency queue, enabling fine-grained control over model selection, concurrency limits, timeouts, and provider-specific options per task type. Key changes: - lightrag.py: Add 12 dataclass fields for per-role config, wrap each role function independently in __post_init__, store raw functions to prevent double-wrapping on runtime updates, add update_llm_role_config() with binding/model/host/api_key/provider_options metadata support and atomic rollback on failure. - operate.py: Route extract_entities and _summarize_descriptions through extract_llm_model_func, extract_keywords_only through keyword_llm_model_func, and kg_query/naive_query through query_llm_model_func. - api/config.py: Parse 18 per-role environment variables (EXTRACT_LLM_BINDING, KEYWORD_LLM_MODEL, MAX_ASYNC_QUERY_LLM, etc.). - api/lightrag_server.py: Build role-specific LLM functions at startup with cross-provider validation (model and api_key required when binding differs from base), role-specific provider options via options_dict_for_role(), and /health endpoint role config exposure. - llm/binding_options.py: Add options_dict_for_role() supporting per-role provider option env vars ({ROLE}_{PROVIDER}_{FIELD}, e.g. EXTRACT_OPENAI_LLM_TEMPERATURE) with same-provider inheritance and cross-provider default reset. - api/routers/ollama_api.py: Route Ollama emulation calls through query_llm_model_func. Fully backward compatible: no new configuration produces identical behavior to the previous single-LLM pattern. Closes HKUDS#2741

- Use `is not None` instead of `or` for kwargs/max_async/timeout fallbacks to allow explicit 0/{} overrides (fixes cross-provider kwargs injection) - Add complete rollback of _raw_role_llm_funcs on update failure - Create wrapper for Ollama role binding to apply effective_model (like openai/gemini/azure do) instead of returning raw ollama_model_complete - Use role-specific kwargs in Ollama API router instead of base kwargs - Add kg_query and naive_query routing tests - Rebase onto latest main to resolve env.example divergence

Copilot

Pull request overview

Copilot reviewed 9 out of 9 changed files in this pull request and generated 3 comments.

Comments suppressed due to low confidence (2)

lightrag/api/routers/ollama_api.py:527

The function call was updated to query_llm_model_func, but the kwargs here still use self.rag.llm_model_kwargs directly instead of checking for role-specific kwargs. This is inconsistent with the generate endpoint (lines 306, 432) which properly selects self.rag.query_llm_model_kwargs when available. The same role_kwargs pattern should be applied here to ensure the query role uses its own provider options (e.g., host, timeout) when configured.

                        response = await self.rag.query_llm_model_func(
                            cleaned_query,
                            stream=True,
                            history_messages=conversation_history,
                            **self.rag.llm_model_kwargs,
                        )

lightrag/api/routers/ollama_api.py:690

Same issue as the streaming bypass path above: the function call was updated to query_llm_model_func, but the kwargs still use self.rag.llm_model_kwargs instead of the role-specific kwargs pattern used in the generate endpoint. Should use role_kwargs = self.rag.query_llm_model_kwargs if self.rag.query_llm_model_kwargs is not None else self.rag.llm_model_kwargs and pass **role_kwargs.

                        response_text = await self.rag.query_llm_model_func(
                            cleaned_query,
                            stream=False,
                            history_messages=conversation_history,
                            **self.rag.llm_model_kwargs,
                        )

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-10T09:50:08Z

+        logger.info(
+            f"LLM role functions initialized - "
+            f"extract: max_async={self.extract_llm_model_max_async or self.llm_model_max_async}, "
+            f"keyword: max_async={self.keyword_llm_model_max_async or self.llm_model_max_async}, "
+            f"query: max_async={self.query_llm_model_max_async or self.llm_model_max_async}"
+        )


The log message uses or (e.g., self.extract_llm_model_max_async or self.llm_model_max_async) which incorrectly treats 0 as falsy. If a role has max_async=0 explicitly set (which the codebase supports per the is not None checks), the log will incorrectly display the base value instead of 0. Use self.extract_llm_model_max_async if self.extract_llm_model_max_async is not None else self.llm_model_max_async to be consistent with the actual fallback logic used during wrapping.

Copilot · 2026-03-10T09:50:09Z

                if request.system:
                    self.rag.llm_model_kwargs["system_prompt"] = request.system

                if request.stream:
-                    response = await self.rag.llm_model_func(
-                        query, stream=True, **self.rag.llm_model_kwargs
+                    role_kwargs = self.rag.query_llm_model_kwargs if self.rag.query_llm_model_kwargs is not None else self.rag.llm_model_kwargs
+                    response = await self.rag.query_llm_model_func(
+                        query, stream=True, **role_kwargs


When query_llm_model_kwargs is not None, the system_prompt set on line 303 (self.rag.llm_model_kwargs["system_prompt"] = request.system) won't be included in the kwargs passed to the function, because role_kwargs will point to self.rag.query_llm_model_kwargs (a different dict). The system_prompt should be set on the role_kwargs dict after it's resolved, or passed directly as a keyword argument.

Copilot · 2026-03-10T09:50:09Z

+                async def role_ollama_complete(
+                    prompt,
+                    system_prompt=None,
+                    history_messages=None,
+                    keyword_extraction=False,
+                    **kwargs,
+                ):
+                    keyword_extraction = kwargs.pop("keyword_extraction", None)
+                    if keyword_extraction:
+                        kwargs["format"] = "json"
+                    if _role_ollama_opts:
+                        kwargs.update(_role_ollama_opts)
+                    if history_messages is None:
+                        history_messages = []
+                    return await _ollama_model_if_cache(
+                        effective_model,
+                        prompt,
+                        system_prompt=system_prompt,
+                        history_messages=history_messages,
+                        host=effective_host,
+                        **kwargs,
+                    )
+
+                return role_ollama_complete


When an Ollama role has a role-specific configuration (e.g., EXTRACT_LLM_MODEL is set while base binding is also Ollama), both create_role_llm_func and create_role_llm_model_kwargs return non-None values. The role function (role_ollama_complete) already bakes in host=effective_host on line 739, but the role kwargs dict from create_role_llm_model_kwargs also includes host. When the function is invoked through __post_init__'s partial(role_raw_func, hashing_kv=hashing_kv, **role_kwargs), the host from role_kwargs would be passed as a kwarg to role_ollama_complete, and then on line 739, host=effective_host would be passed again to _ollama_model_if_cache, causing a TypeError: got multiple values for keyword argument 'host'.

The create_role_llm_model_kwargs function should return None (or an empty dict) when create_role_llm_func returns a non-None function, since the role function already encapsulates the provider-specific configuration. Alternatively, role_ollama_complete could pop host/timeout/options/api_key from kwargs before passing them to _ollama_model_if_cache.

Copilot AI review requested due to automatic review settings March 8, 2026 14:41

Copilot started reviewing on behalf of Saswatsusmoy March 8, 2026 14:41 View session

chatgpt-codex-connector bot reviewed Mar 8, 2026

View reviewed changes

Copilot AI reviewed Mar 8, 2026

View reviewed changes

Saswatsusmoy added 2 commits March 10, 2026 15:06

Saswatsusmoy force-pushed the feat/separate-llm-by-role branch from 51a9228 to 3afa3a7 Compare March 10, 2026 09:39

Merge branch 'main' into feat/separate-llm-by-role

4103ea6

Saswatsusmoy requested a review from Copilot March 10, 2026 09:41

Copilot started reviewing on behalf of Saswatsusmoy March 10, 2026 09:41 View session

Copilot AI reviewed Mar 10, 2026

View reviewed changes

		if model_func is not None:
		self._raw_role_llm_funcs[role] = model_func

-            role_max_async = (
-                getattr(self, role_max_async_attr) or self.llm_model_max_async
-            )
-            role_timeout = getattr(self, role_timeout_attr) or self.default_llm_timeout
-            role_kwargs = getattr(self, role_kwargs_attr) or self.llm_model_kwargs
+            role_max_async_value = getattr(self, role_max_async_attr)
+            role_max_async = (
+                role_max_async_value
+                if role_max_async_value is not None
+                else self.llm_model_max_async
+            )
+            role_timeout_value = getattr(self, role_timeout_attr)
+            role_timeout = (
+                role_timeout_value
+                if role_timeout_value is not None
+                else self.default_llm_timeout
+            )
+            role_kwargs_value = getattr(self, role_kwargs_attr)
+            role_kwargs = (
+                role_kwargs_value
+                if role_kwargs_value is not None
+                else self.llm_model_kwargs
+            )

		response = await self.rag.query_llm_model_func(
		query, stream=True, **self.rag.llm_model_kwargs

		- Operate.py routing: kg_query uses query role
		- Operate.py routing: naive_query uses query role

Conversation

Saswatsusmoy commented Mar 8, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Mar 8, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot Mar 8, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot Mar 8, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 8, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 8, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 8, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 8, 2026

Choose a reason for hiding this comment

Uh oh!

danielaskdd commented Mar 9, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants