Skip to content

feat: separate LLM functions by role (extract/keyword/query)#2758

Open
Saswatsusmoy wants to merge 3 commits intoHKUDS:mainfrom
Saswatsusmoy:feat/separate-llm-by-role
Open

feat: separate LLM functions by role (extract/keyword/query)#2758
Saswatsusmoy wants to merge 3 commits intoHKUDS:mainfrom
Saswatsusmoy:feat/separate-llm-by-role

Conversation

@Saswatsusmoy
Copy link
Copy Markdown

Implement per-role LLM function separation as specified in #2741
Each role (extract, keyword, query) gets its own independently wrapped
LLM function with a dedicated concurrency queue, enabling fine-grained
control over model selection, concurrency limits, timeouts, and
provider-specific options per task type.

Closes #2741

Key changes:

  • lightrag.py: Add 12 dataclass fields for per-role config, wrap each
    role function independently in post_init, store raw functions to
    prevent double-wrapping on runtime updates, add update_llm_role_config()
    with binding/model/host/api_key/provider_options metadata support and
    atomic rollback on failure.

  • operate.py: Route extract_entities and _summarize_descriptions through
    extract_llm_model_func, extract_keywords_only through
    keyword_llm_model_func, and kg_query/naive_query through
    query_llm_model_func.

  • api/config.py: Parse 18 per-role environment variables
    (EXTRACT_LLM_BINDING, KEYWORD_LLM_MODEL, MAX_ASYNC_QUERY_LLM, etc.).

  • api/lightrag_server.py: Build role-specific LLM functions at startup
    with cross-provider validation (model and api_key required when binding
    differs from base), role-specific provider options via
    options_dict_for_role(), and /health endpoint role config exposure.

  • llm/binding_options.py: Add options_dict_for_role() supporting
    per-role provider option env vars ({ROLE}{PROVIDER}{FIELD}, e.g.
    EXTRACT_OPENAI_LLM_TEMPERATURE) with same-provider inheritance and
    cross-provider default reset.

  • api/routers/ollama_api.py: Route Ollama emulation calls through
    query_llm_model_func.

Copilot AI review requested due to automatic review settings March 8, 2026 14:41
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 526de307e6

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread lightrag/lightrag.py Outdated
getattr(self, role_max_async_attr) or self.llm_model_max_async
)
role_timeout = getattr(self, role_timeout_attr) or self.default_llm_timeout
role_kwargs = getattr(self, role_kwargs_attr) or self.llm_model_kwargs
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Stop inheriting base kwargs into cross-provider role funcs

This fallback unconditionally injects base llm_model_kwargs into role functions whenever the role kwargs are None, which breaks cross-provider setups from ollama/lollms to OpenAI/Gemini/Azure: those base kwargs include keys like api_key and options, and the role wrappers already pass explicit api_key, causing runtime failures such as TypeError: got multiple values for keyword argument 'api_key'. This makes role-specific calls fail instead of using their configured provider.

Useful? React with 👍 / 👎.

Comment thread lightrag/api/lightrag_server.py Outdated
Comment on lines +715 to +718
elif effective_binding == "ollama":
from lightrag.llm.ollama import ollama_model_complete

return ollama_model_complete
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Apply role model when binding is Ollama

The Ollama role branch returns the raw ollama_model_complete function without binding effective_model, so EXTRACT/KEYWORD/QUERY_LLM_MODEL does not actually change the model for role-specific Ollama calls. Because the Ollama completion function reads model name from global config, all roles continue using the base model, which defeats per-role model selection.

Useful? React with 👍 / 👎.

Comment thread lightrag/lightrag.py
Comment on lines +881 to +882
if model_func is not None:
self._raw_role_llm_funcs[role] = model_func
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Roll back raw role callable on update failure

This writes the new raw callable before the wrapped function is validated; if wrapping fails, the except block restores only the exposed role func and metadata, leaving _raw_role_llm_funcs[role] with the invalid value. After one failed update, later updates that rely on the stored raw func can keep failing, so the documented atomic rollback is incomplete.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR implements per-role LLM function separation as specified in issue #2741, allowing independent LLM model selection, concurrency limits, timeouts, and provider options for three roles: extract (entity/relation extraction), keyword (keyword extraction), and query (answer generation). Each role gets its own independently wrapped function and concurrency queue, while maintaining full backward compatibility when no role-specific config is provided.

Changes:

  • Add 12 per-role dataclass fields to LightRAG, wrap each role function independently in __post_init__, and provide update_llm_role_config() for runtime updates with rollback on failure.
  • Route LLM calls in operate.py through role-specific functions and update Ollama API emulation endpoints in ollama_api.py to use query_llm_model_func.
  • Add per-role environment variable parsing in config.py, role-specific LLM function construction with cross-provider validation in lightrag_server.py, per-role provider options via options_dict_for_role() in binding_options.py, and comprehensive documentation in env.example.

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
lightrag/lightrag.py Adds 12 per-role dataclass fields, wraps role functions independently in __post_init__, adds update_llm_role_config() with metadata and rollback
lightrag/operate.py Routes extract_entities/summarize to extract func, keywords to keyword func, kg_query/naive_query to query func
lightrag/api/config.py Parses 18 per-role environment variables for binding/model/host/api_key/max_async/timeout
lightrag/api/lightrag_server.py Adds create_role_llm_func(), create_role_llm_model_kwargs(), role config construction loop, and health endpoint role exposure
lightrag/llm/binding_options.py Adds options_dict_for_role() classmethod with same-provider inheritance and cross-provider default reset
lightrag/api/routers/ollama_api.py Routes Ollama emulation calls through query_llm_model_func
tests/test_llm_role_separation.py Comprehensive tests covering backward compat, isolation, custom functions, concurrency, runtime updates, config parsing, and provider options
tests/test_extract_entities.py Updates test config to include role-specific function keys and use extract_llm_model_func
env.example Documents per-role LLM configuration environment variables with examples

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread lightrag/lightrag.py
Comment on lines +880 to +882
# Update stored raw function if a new one is provided
if model_func is not None:
self._raw_role_llm_funcs[role] = model_func
Copy link

Copilot AI Mar 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The rollback logic does not restore _raw_role_llm_funcs[role] on failure. At line 882, self._raw_role_llm_funcs[role] = model_func is updated before the potentially failing priority_limit_async_func_call wrapping at lines 897-907. If wrapping fails, the except block at lines 938-943 restores func_attr and _role_llm_metadata, but leaves _raw_role_llm_funcs[role] pointing to the new (failed) function. This violates the stated guarantee that "On failure, the previous config remains intact." A subsequent call to update_llm_role_config without a new model_func would then use this corrupted raw function.

Save the previous raw function alongside prev_func and prev_metadata, and restore it in the except block.

Copilot uses AI. Check for mistakes.
Comment thread lightrag/lightrag.py Outdated
Comment on lines +756 to +760
role_max_async = (
getattr(self, role_max_async_attr) or self.llm_model_max_async
)
role_timeout = getattr(self, role_timeout_attr) or self.default_llm_timeout
role_kwargs = getattr(self, role_kwargs_attr) or self.llm_model_kwargs
Copy link

Copilot AI Mar 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using the or operator for fallback (getattr(self, role_kwargs_attr) or self.llm_model_kwargs) causes an explicit empty dict {} to fall back to base kwargs, since empty dict is falsy. The same applies to role_max_async where 0 would fall back to the base value. This contradicts the intent of allowing explicit override with these values. Consider using if ... is not None checks instead of or (e.g., getattr(self, role_kwargs_attr) if getattr(self, role_kwargs_attr) is not None else self.llm_model_kwargs) for the kwargs and max_async/timeout fallbacks. The same issue exists in update_llm_role_config at lines 883-893.

Suggested change
role_max_async = (
getattr(self, role_max_async_attr) or self.llm_model_max_async
)
role_timeout = getattr(self, role_timeout_attr) or self.default_llm_timeout
role_kwargs = getattr(self, role_kwargs_attr) or self.llm_model_kwargs
role_max_async_value = getattr(self, role_max_async_attr)
role_max_async = (
role_max_async_value
if role_max_async_value is not None
else self.llm_model_max_async
)
role_timeout_value = getattr(self, role_timeout_attr)
role_timeout = (
role_timeout_value
if role_timeout_value is not None
else self.default_llm_timeout
)
role_kwargs_value = getattr(self, role_kwargs_attr)
role_kwargs = (
role_kwargs_value
if role_kwargs_value is not None
else self.llm_model_kwargs
)

Copilot uses AI. Check for mistakes.
Comment thread lightrag/api/routers/ollama_api.py Outdated
Comment on lines 306 to 307
response = await self.rag.query_llm_model_func(
query, stream=True, **self.rag.llm_model_kwargs
Copy link

Copilot AI Mar 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When calling query_llm_model_func, passing **self.rag.llm_model_kwargs (the base LLM kwargs) will override any role-specific kwargs that were baked into the query function via partial() during initialization. For example, if the query role uses a different host than base, the base host from self.rag.llm_model_kwargs would override it. This was a pre-existing pattern, but it becomes more problematic now that the query role can have its own kwargs. Consider using self.rag.query_llm_model_kwargs or self.rag.llm_model_kwargs instead of always using the base kwargs.

Copilot uses AI. Check for mistakes.
Comment on lines +18 to +19
- Operate.py routing: kg_query uses query role
- Operate.py routing: naive_query uses query role
Copy link

Copilot AI Mar 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The module docstring claims test coverage for "kg_query uses query role" and "naive_query uses query role" (lines 18-19), but no test methods for these are actually implemented. Since the extract_entities and extract_keywords_only routing are tested, the kg_query and naive_query routing should be tested as well for completeness.

Suggested change
- Operate.py routing: kg_query uses query role
- Operate.py routing: naive_query uses query role

Copilot uses AI. Check for mistakes.
@danielaskdd
Copy link
Copy Markdown
Collaborator

Please resolve all conflicts in env.example to ensure the PR is ready for review. The repository now features an interactive setup wizard to streamline .env file generation, and env.example has been updated to serve as its configuration template. Please adhere to the following requirements:

  • All configurable environment variables must be included in this file (either active or commented out).
  • Lines starting with # # denote repeated placeholders; these must remain as-is and will not be substituted with actual values by the setup wizard.

For more information of the interactive setup tool, pls refer to : https://github.com/HKUDS/LightRAG/blob/main/docs/InteractiveSetup.md

Implement per-role LLM function separation as specified in HKUDS#2741.
Each role (extract, keyword, query) gets its own independently wrapped
LLM function with a dedicated concurrency queue, enabling fine-grained
control over model selection, concurrency limits, timeouts, and
provider-specific options per task type.

Key changes:

- lightrag.py: Add 12 dataclass fields for per-role config, wrap each
  role function independently in __post_init__, store raw functions to
  prevent double-wrapping on runtime updates, add update_llm_role_config()
  with binding/model/host/api_key/provider_options metadata support and
  atomic rollback on failure.

- operate.py: Route extract_entities and _summarize_descriptions through
  extract_llm_model_func, extract_keywords_only through
  keyword_llm_model_func, and kg_query/naive_query through
  query_llm_model_func.

- api/config.py: Parse 18 per-role environment variables
  (EXTRACT_LLM_BINDING, KEYWORD_LLM_MODEL, MAX_ASYNC_QUERY_LLM, etc.).

- api/lightrag_server.py: Build role-specific LLM functions at startup
  with cross-provider validation (model and api_key required when binding
  differs from base), role-specific provider options via
  options_dict_for_role(), and /health endpoint role config exposure.

- llm/binding_options.py: Add options_dict_for_role() supporting
  per-role provider option env vars ({ROLE}_{PROVIDER}_{FIELD}, e.g.
  EXTRACT_OPENAI_LLM_TEMPERATURE) with same-provider inheritance and
  cross-provider default reset.

- api/routers/ollama_api.py: Route Ollama emulation calls through
  query_llm_model_func.

Fully backward compatible: no new configuration produces identical
behavior to the previous single-LLM pattern.

Closes HKUDS#2741
- Use `is not None` instead of `or` for kwargs/max_async/timeout fallbacks
  to allow explicit 0/{} overrides (fixes cross-provider kwargs injection)
- Add complete rollback of _raw_role_llm_funcs on update failure
- Create wrapper for Ollama role binding to apply effective_model (like
  openai/gemini/azure do) instead of returning raw ollama_model_complete
- Use role-specific kwargs in Ollama API router instead of base kwargs
- Add kg_query and naive_query routing tests
- Rebase onto latest main to resolve env.example divergence
@Saswatsusmoy Saswatsusmoy force-pushed the feat/separate-llm-by-role branch from 51a9228 to 3afa3a7 Compare March 10, 2026 09:39
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 9 out of 9 changed files in this pull request and generated 3 comments.

Comments suppressed due to low confidence (2)

lightrag/api/routers/ollama_api.py:527

  • The function call was updated to query_llm_model_func, but the kwargs here still use self.rag.llm_model_kwargs directly instead of checking for role-specific kwargs. This is inconsistent with the generate endpoint (lines 306, 432) which properly selects self.rag.query_llm_model_kwargs when available. The same role_kwargs pattern should be applied here to ensure the query role uses its own provider options (e.g., host, timeout) when configured.
                        response = await self.rag.query_llm_model_func(
                            cleaned_query,
                            stream=True,
                            history_messages=conversation_history,
                            **self.rag.llm_model_kwargs,
                        )

lightrag/api/routers/ollama_api.py:690

  • Same issue as the streaming bypass path above: the function call was updated to query_llm_model_func, but the kwargs still use self.rag.llm_model_kwargs instead of the role-specific kwargs pattern used in the generate endpoint. Should use role_kwargs = self.rag.query_llm_model_kwargs if self.rag.query_llm_model_kwargs is not None else self.rag.llm_model_kwargs and pass **role_kwargs.
                        response_text = await self.rag.query_llm_model_func(
                            cleaned_query,
                            stream=False,
                            history_messages=conversation_history,
                            **self.rag.llm_model_kwargs,
                        )

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread lightrag/lightrag.py
Comment on lines +790 to +795
logger.info(
f"LLM role functions initialized - "
f"extract: max_async={self.extract_llm_model_max_async or self.llm_model_max_async}, "
f"keyword: max_async={self.keyword_llm_model_max_async or self.llm_model_max_async}, "
f"query: max_async={self.query_llm_model_max_async or self.llm_model_max_async}"
)
Copy link

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The log message uses or (e.g., self.extract_llm_model_max_async or self.llm_model_max_async) which incorrectly treats 0 as falsy. If a role has max_async=0 explicitly set (which the codebase supports per the is not None checks), the log will incorrectly display the base value instead of 0. Use self.extract_llm_model_max_async if self.extract_llm_model_max_async is not None else self.llm_model_max_async to be consistent with the actual fallback logic used during wrapping.

Copilot uses AI. Check for mistakes.
Comment on lines 302 to +308
if request.system:
self.rag.llm_model_kwargs["system_prompt"] = request.system

if request.stream:
response = await self.rag.llm_model_func(
query, stream=True, **self.rag.llm_model_kwargs
role_kwargs = self.rag.query_llm_model_kwargs if self.rag.query_llm_model_kwargs is not None else self.rag.llm_model_kwargs
response = await self.rag.query_llm_model_func(
query, stream=True, **role_kwargs
Copy link

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When query_llm_model_kwargs is not None, the system_prompt set on line 303 (self.rag.llm_model_kwargs["system_prompt"] = request.system) won't be included in the kwargs passed to the function, because role_kwargs will point to self.rag.query_llm_model_kwargs (a different dict). The system_prompt should be set on the role_kwargs dict after it's resolved, or passed directly as a keyword argument.

Copilot uses AI. Check for mistakes.
Comment on lines +720 to +743
async def role_ollama_complete(
prompt,
system_prompt=None,
history_messages=None,
keyword_extraction=False,
**kwargs,
):
keyword_extraction = kwargs.pop("keyword_extraction", None)
if keyword_extraction:
kwargs["format"] = "json"
if _role_ollama_opts:
kwargs.update(_role_ollama_opts)
if history_messages is None:
history_messages = []
return await _ollama_model_if_cache(
effective_model,
prompt,
system_prompt=system_prompt,
history_messages=history_messages,
host=effective_host,
**kwargs,
)

return role_ollama_complete
Copy link

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When an Ollama role has a role-specific configuration (e.g., EXTRACT_LLM_MODEL is set while base binding is also Ollama), both create_role_llm_func and create_role_llm_model_kwargs return non-None values. The role function (role_ollama_complete) already bakes in host=effective_host on line 739, but the role kwargs dict from create_role_llm_model_kwargs also includes host. When the function is invoked through __post_init__'s partial(role_raw_func, hashing_kv=hashing_kv, **role_kwargs), the host from role_kwargs would be passed as a kwarg to role_ollama_complete, and then on line 739, host=effective_host would be passed again to _ollama_model_if_cache, causing a TypeError: got multiple values for keyword argument 'host'.

The create_role_llm_model_kwargs function should return None (or an empty dict) when create_role_llm_func returns a non-None function, since the role function already encapsulates the provider-specific configuration. Alternatively, role_ollama_complete could pop host/timeout/options/api_key from kwargs before passing them to _ollama_model_if_cache.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature Request]: Separate LLM func by role

3 participants