Skip to content

♻️ refactor(llm): unify structured output control via response_format#2956

Merged
danielaskdd merged 8 commits intoHKUDS:devfrom
danielaskdd:refac/unify-response-format
Apr 19, 2026
Merged

♻️ refactor(llm): unify structured output control via response_format#2956
danielaskdd merged 8 commits intoHKUDS:devfrom
danielaskdd:refac/unify-response-format

Conversation

@danielaskdd
Copy link
Copy Markdown
Collaborator

Summary

  • Promote response_format to the canonical structured-output control across every LLM provider; demote entity_extraction / keyword_extraction to deprecated shims that map to {"type": "json_object"} and emit a single DeprecationWarning at the driver layer.
  • Each provider now translates response_format to its native surface: OpenAI passes through, Ollama → format, Gemini → response_mime_type / response_schema, Zhipu forwards to its client. Providers without a JSON mode (lollms, lmdeploy, anthropic, hf, bedrock, llama_index) safely strip response_format.
  • Zhipu: drop the JSON-prompt injection driven by keyword_extraction — task prompt ownership returns to the caller.
  • Server wrappers in lightrag_server.py become pure forwarders; deprecation shims live only at the driver layer so legacy callers get exactly one warning.
  • Simplify OpenAI dispatch to create() only — the project never passes Pydantic / JSON Schema, so parse() brings no benefit and risks compatibility on OpenAI-alike providers. Truncated responses are returned raw for upstream tolerant JSON parsing.

Test plan

  • python -m pytest tests (865 passed, 1 skipped)
  • ruff check lightrag tests (clean)
  • Legacy entity_extraction=True / keyword_extraction=True still produce JSON output and emit DeprecationWarning
  • LLM cache keys (arg_hash) unchanged — existing extract / keywords / summary entries remain readable after upgrade

🤖 Generated with Claude Code

danielaskdd and others added 8 commits April 19, 2026 02:24
Promote `response_format` to the canonical structured-output parameter
across all LLM providers. Demote `entity_extraction` / `keyword_extraction`
booleans to deprecated shims that map to `{"type": "json_object"}` and
emit a single `DeprecationWarning` at the driver layer.

- Providers translate `response_format` to their native API: OpenAI
  passes through, Ollama -> `format`, Gemini ->
  `response_mime_type` / `response_schema`, Zhipu forwards to client.
- Zhipu: drop the JSON-prompt injection path; task prompt ownership
  returns to the caller.
- Providers without a JSON mode (lollms, lmdeploy, anthropic, hf,
  bedrock, llama_index) safely strip `response_format`.
- Server wrappers become pure forwarders; deprecation shims live only
  at the driver layer to avoid duplicate warnings.
- Simplify OpenAI dispatch to `create()` only — the project never
  passes Pydantic / JSON Schema, so `parse()` buys no extra value and
  risks compatibility on OpenAI-alike providers. Truncated responses
  are returned raw for upstream tolerant JSON parsing.

Co-Authored-By: Claude Opus 4.7 <[email protected]>
- add docstrings explaining response_format handling to anthropic, bedrock, gemini, hf, llama_index, lmdeploy, lollms, ollama, openai, and zhipu modules
- document which adapters support OpenAI-style JSON mode vs compatibility shims
- clarify deprecated keyword_extraction and entity_extraction behavior
- update type hints in llama_index_impl from LlamaIndexSettings to Any for flexibility
- fix minor formatting inconsistencies in operate.py and test files
- replace deprecated keyword_extraction parameter with response_format in bedrock test
- add pytest.warns context manager to zhipu test for deprecated parameter usage
…cache partitioning

- update Gemini LLM to use response_json_schema instead of deprecated response_schema
- enhance LLM cache to include response_format in hash computation for proper partitioning
- update OpenAI LLM documentation to clarify json_schema passthrough behavior
- add unit tests for Gemini schema mapping and cache partitioning logic
- add `_normalize_gemini_response_schema` helper to unwrap OpenAI-style json_schema wrappers
- support Pydantic model classes via `model_json_schema()` method
- simplify `_build_generation_config` by delegating schema normalization
- add test coverage for OpenAI json_schema wrapper and Pydantic model inputs
…ders

- add explicit validation to reject typed/Pydantic response_format in gemini, openai, and cache wrapper
- update ollama to support dict-form json_schema response_format unwrapping
- remove deprecated model_json_schema() handling from gemini
- update docstrings to clarify supported response_format types
- add comprehensive tests for rejection behavior and json_schema support
…eters

- map deprecated `keyword_extraction` and `entity_extraction` booleans to `response_format={"type": "json_object"}` when no explicit format is supplied
- force disable COT when structured output is used to prevent reasoning_content from corrupting JSON payload
- update kwargs filtering to include `entity_extraction` in removal list
- add deprecation warnings with migration guidance for legacy parameter usage

✅ test(zhipu): add coverage for entity extraction and COT interaction

- verify `entity_extraction=True` maps to json_object response format and triggers deprecation warning
- verify explicit `response_format` disables COT automatically to protect JSON output
- disable enable_cot when response_format is specified in gemini and openai modules
- ensures structured JSON output is not polluted with reasoning content
- add tests to verify COT is disabled for streaming structured output in both gemini and openai
@danielaskdd danielaskdd reopened this Apr 19, 2026
@danielaskdd danielaskdd merged commit e592b8b into HKUDS:dev Apr 19, 2026
0 of 6 checks passed
@danielaskdd danielaskdd deleted the refac/unify-response-format branch April 19, 2026 06:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant