♻️ refactor(llm): unify structured output control via response_format#2956
Merged
danielaskdd merged 8 commits intoHKUDS:devfrom Apr 19, 2026
Merged
♻️ refactor(llm): unify structured output control via response_format#2956danielaskdd merged 8 commits intoHKUDS:devfrom
danielaskdd merged 8 commits intoHKUDS:devfrom
Conversation
Promote `response_format` to the canonical structured-output parameter
across all LLM providers. Demote `entity_extraction` / `keyword_extraction`
booleans to deprecated shims that map to `{"type": "json_object"}` and
emit a single `DeprecationWarning` at the driver layer.
- Providers translate `response_format` to their native API: OpenAI
passes through, Ollama -> `format`, Gemini ->
`response_mime_type` / `response_schema`, Zhipu forwards to client.
- Zhipu: drop the JSON-prompt injection path; task prompt ownership
returns to the caller.
- Providers without a JSON mode (lollms, lmdeploy, anthropic, hf,
bedrock, llama_index) safely strip `response_format`.
- Server wrappers become pure forwarders; deprecation shims live only
at the driver layer to avoid duplicate warnings.
- Simplify OpenAI dispatch to `create()` only — the project never
passes Pydantic / JSON Schema, so `parse()` buys no extra value and
risks compatibility on OpenAI-alike providers. Truncated responses
are returned raw for upstream tolerant JSON parsing.
Co-Authored-By: Claude Opus 4.7 <[email protected]>
- add docstrings explaining response_format handling to anthropic, bedrock, gemini, hf, llama_index, lmdeploy, lollms, ollama, openai, and zhipu modules - document which adapters support OpenAI-style JSON mode vs compatibility shims - clarify deprecated keyword_extraction and entity_extraction behavior - update type hints in llama_index_impl from LlamaIndexSettings to Any for flexibility - fix minor formatting inconsistencies in operate.py and test files
- replace deprecated keyword_extraction parameter with response_format in bedrock test - add pytest.warns context manager to zhipu test for deprecated parameter usage
…cache partitioning - update Gemini LLM to use response_json_schema instead of deprecated response_schema - enhance LLM cache to include response_format in hash computation for proper partitioning - update OpenAI LLM documentation to clarify json_schema passthrough behavior - add unit tests for Gemini schema mapping and cache partitioning logic
- add `_normalize_gemini_response_schema` helper to unwrap OpenAI-style json_schema wrappers - support Pydantic model classes via `model_json_schema()` method - simplify `_build_generation_config` by delegating schema normalization - add test coverage for OpenAI json_schema wrapper and Pydantic model inputs
…ders - add explicit validation to reject typed/Pydantic response_format in gemini, openai, and cache wrapper - update ollama to support dict-form json_schema response_format unwrapping - remove deprecated model_json_schema() handling from gemini - update docstrings to clarify supported response_format types - add comprehensive tests for rejection behavior and json_schema support
…eters
- map deprecated `keyword_extraction` and `entity_extraction` booleans to `response_format={"type": "json_object"}` when no explicit format is supplied
- force disable COT when structured output is used to prevent reasoning_content from corrupting JSON payload
- update kwargs filtering to include `entity_extraction` in removal list
- add deprecation warnings with migration guidance for legacy parameter usage
✅ test(zhipu): add coverage for entity extraction and COT interaction
- verify `entity_extraction=True` maps to json_object response format and triggers deprecation warning
- verify explicit `response_format` disables COT automatically to protect JSON output
- disable enable_cot when response_format is specified in gemini and openai modules - ensures structured JSON output is not polluted with reasoning content - add tests to verify COT is disabled for streaming structured output in both gemini and openai
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
response_formatto the canonical structured-output control across every LLM provider; demoteentity_extraction/keyword_extractionto deprecated shims that map to{"type": "json_object"}and emit a singleDeprecationWarningat the driver layer.response_formatto its native surface: OpenAI passes through, Ollama →format, Gemini →response_mime_type/response_schema, Zhipu forwards to its client. Providers without a JSON mode (lollms, lmdeploy, anthropic, hf, bedrock, llama_index) safely stripresponse_format.keyword_extraction— task prompt ownership returns to the caller.lightrag_server.pybecome pure forwarders; deprecation shims live only at the driver layer so legacy callers get exactly one warning.create()only — the project never passes Pydantic / JSON Schema, soparse()brings no benefit and risks compatibility on OpenAI-alike providers. Truncated responses are returned raw for upstream tolerant JSON parsing.Test plan
python -m pytest tests(865 passed, 1 skipped)ruff check lightrag tests(clean)entity_extraction=True/keyword_extraction=Truestill produce JSON output and emitDeprecationWarningarg_hash) unchanged — existingextract/keywords/summaryentries remain readable after upgrade🤖 Generated with Claude Code