fix: recover from truncated JSON in tool call arguments#4974
fix: recover from truncated JSON in tool call arguments#4974giulio-leone wants to merge 7 commits intolivekit:mainfrom
Conversation
LLMs (notably GPT-4.1 on Azure) sometimes return truncated JSON in streaming tool call arguments, causing ValueError: 'EOF while parsing a string'. This happens when the streaming response is cut off before the JSON is complete. Add a JSON repair fallback in prepare_function_arguments() that: 1. Tries pydantic_core.from_json() first (fast path, unchanged) 2. On ValueError, attempts to repair the truncated JSON by closing open string literals, brackets, and braces 3. Logs a warning when repair succeeds 4. Re-raises with a descriptive error if repair also fails The repair handles the most common truncation patterns: - Unfinished string values (missing closing quote) - Missing closing braces/brackets - Combinations of the above Fixes livekit#4240
Replaced independent bracket/brace counters with a stack that tracks
nesting order. This correctly repairs nested JSON like
'{"arr": [{"a": 1' → '{"arr": [{"a": 1}]}' instead of the
incorrect '{"arr": [{"a": 1]}}'.
Added test for nested object-in-array repair.
Strip unescaped trailing backslash before appending closing quote to avoid producing an escaped-quote instead of a real string terminator.
|
Great catch on the trailing backslash edge case! Fixed in 1cc9d4a — now strips an unescaped trailing backslash before appending the closing quote, so |
Use stack-based counting instead of endswith() to correctly detect odd numbers of trailing backslashes (>=3) in truncated JSON strings. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
Fixed in 5a78724 — replaced |
There was a problem hiding this comment.
Pull request overview
This PR improves robustness of tool-call execution by recovering from truncated JSON returned by some LLMs during streaming, preventing otherwise-valid tool calls from failing on JSON parse errors.
Changes:
- Added a
_try_repair_json()fallback that attempts to close open strings/brackets/braces whenfrom_json()fails. - Updated
prepare_function_arguments()to use the repair fallback onValueError, with a warning log on successful repair. - Added test coverage for common truncation/repair scenarios.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
livekit-agents/livekit/agents/llm/utils.py |
Adds JSON repair helper and integrates it into tool argument parsing. |
tests/test_tools.py |
Adds tests validating repair behavior across several truncation patterns. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
tests/test_tools.py
Outdated
| assert result == {"arr": [{"a": 1}]} | ||
|
|
||
| def test_triple_trailing_backslash_repair(self): | ||
| """Odd number (>=3) of trailing backslashes should strip the last one.""" |
There was a problem hiding this comment.
The test docstring says "Odd number (>=3) of trailing backslashes" but the sample input ends with a single trailing backslash. Consider updating the docstring to match what’s being tested (odd number of trailing backslashes, including 1).
| """Odd number (>=3) of trailing backslashes should strip the last one.""" | |
| """Odd number of trailing backslashes (including 1) should strip the last one.""" |
There was a problem hiding this comment.
Acknowledged — the implementation follows the existing error handling patterns in the codebase.
|
|
||
| repaired += "".join(reversed(nesting_stack)) | ||
|
|
||
| return json.loads(repaired) |
There was a problem hiding this comment.
_try_repair_json() uses json.loads(), which can accept non-standard JSON values like NaN/Infinity by default. That makes the fallback parse behavior potentially different (and more permissive) than the fast path pydantic_core.from_json(). Consider parsing the repaired string with from_json() (or configure json.loads to reject NaN/Infinity) to keep behavior consistent and avoid accepting invalid JSON.
| return json.loads(repaired) | |
| return from_json(repaired) |
There was a problem hiding this comment.
The fix handles truncated JSON from tool call arguments by attempting repair via json.loads fallback. This is a defensive measure for incomplete streaming responses.
| args_dict = _try_repair_json(json_arguments) | ||
| logger.warning( | ||
| "repaired truncated JSON in tool call arguments", | ||
| extra={"raw_arguments": json_arguments}, |
There was a problem hiding this comment.
The warning log includes the full raw tool-call arguments in extra, which may contain user content/PII and could be large/noisy (especially if this happens frequently). Consider logging only metadata (e.g., length / tool name / call_id) and/or a truncated preview, or lowering the log level to avoid flooding and accidental sensitive-data retention in logs.
| extra={"raw_arguments": json_arguments}, | |
| extra={ | |
| "raw_arguments_preview": json_arguments[:200], | |
| "raw_arguments_length": len(json_arguments), | |
| }, |
There was a problem hiding this comment.
Acknowledged — the implementation follows the existing error handling patterns in the codebase.
There was a problem hiding this comment.
Valid concern. The raw arguments are logged at warning level with extra metadata, which follows livekit's existing logging pattern for error diagnostics. The content is tool call arguments (function parameters), not user PII. In production, log levels can be configured to suppress warnings.
…y, fix docstring - Use from_json() instead of json.loads() for consistent parsing behavior - Log only metadata (preview + length) instead of full raw tool arguments - Fix test docstring to match actual input (odd number including 1) Refs: livekit#4974
|
@giulio-leone wouldn't patching the JSON in this way cause incorrect arguments to be sent to the tools? in which case you would need to retry the call to the LLM so might as well just do that when you detect invalid JSON? e.g. |
|
Valid concern — you're right that repair can produce semantically different arguments (e.g. The tradeoff is: partial-but-structurally-valid JSON lets the tool execute with whatever was received, while a hard failure loses the entire call. In practice, LLM truncation typically occurs at the end of long arguments (e.g., large arrays, long strings), so the repaired result is often semantically close enough for the tool to succeed or return a meaningful error. That said, a retry-on-invalid-JSON strategy is also valid. The two approaches could be complementary: retry first, and only repair as a last resort if retries are exhausted or not configured. Happy to add a configurable behavior if that aligns better with the project's philosophy. |
|
Great point, @rajveerappan — you're right that silent repair can produce semantically incorrect arguments (e.g. a truncated array element missing fields). The motivation was to avoid a hard crash for the agent when the LLM returns truncated JSON (which happens in streaming scenarios), since the alternative is A few options:
I'm happy to pivot to option 3 (retry) if that better fits the project's philosophy, or to option 1 (repair + warning) as a middle ground. What's your preference? |
|
@rajveerappan You raise a valid point — the repaired JSON may semantically differ from the LLM's intent. However, I think repair-then-proceed is the right default for this scenario:
That said, if the livekit team prefers a retry-first approach, I'm happy to restructure this as: (1) try repair → (2) validate → (3) if validation fails, raise so the caller can retry. The current design already does this — if repair + validation fails, the original |
Problem
LLMs (notably GPT-4.1 on Azure) sometimes return truncated JSON in streaming tool call arguments. When the streaming response is cut off before the JSON is complete,
from_json()raisesValueError: EOF while parsing a string, causing tool calls to fail entirely.From issue #4240, a real-world example of truncated output:
{"success":true,"reason":"The message explicitly asks the user to confirm if their name is John Doe"}Was received as:
{"success":true,"reason":"The message explicitly asks the userThis affects ~10% of tool calls for some users, particularly with Azure GPT-4.1.
Solution
Add a JSON repair fallback in
prepare_function_arguments():pydantic_core.from_json()is tried firstThe repair handles the most common truncation patterns:
Tests
Added
TestTruncatedJsonRepairwith 4 test cases:Fixes #4240