Skip to content

fix: recover from truncated JSON in tool call arguments#4974

Open
giulio-leone wants to merge 7 commits intolivekit:mainfrom
giulio-leone:fix/truncated-json-tool-call-recovery
Open

fix: recover from truncated JSON in tool call arguments#4974
giulio-leone wants to merge 7 commits intolivekit:mainfrom
giulio-leone:fix/truncated-json-tool-call-recovery

Conversation

@giulio-leone
Copy link

Problem

LLMs (notably GPT-4.1 on Azure) sometimes return truncated JSON in streaming tool call arguments. When the streaming response is cut off before the JSON is complete, from_json() raises ValueError: EOF while parsing a string, causing tool calls to fail entirely.

From issue #4240, a real-world example of truncated output:

{"success":true,"reason":"The message explicitly asks the user to confirm if their name is John Doe"}

Was received as:

{"success":true,"reason":"The message explicitly asks the user

This affects ~10% of tool calls for some users, particularly with Azure GPT-4.1.

Solution

Add a JSON repair fallback in prepare_function_arguments():

  1. Fast path unchanged: pydantic_core.from_json() is tried first
  2. On ValueError: Attempt to repair the truncated JSON by closing open string literals, brackets, and braces
  3. Log warning: When repair succeeds, log for observability
  4. Re-raise: If repair also fails, raise with a descriptive error

The repair handles the most common truncation patterns:

  • Unfinished string values (missing closing quote)
  • Missing closing braces/brackets
  • Combinations of the above

Tests

Added TestTruncatedJsonRepair with 4 test cases:

  • Truncated string value → repaired and parsed
  • Missing closing brace → repaired
  • Valid JSON → unchanged (no regression)
  • Completely invalid JSON → still raises ValueError

Fixes #4240

LLMs (notably GPT-4.1 on Azure) sometimes return truncated JSON in
streaming tool call arguments, causing ValueError: 'EOF while parsing
a string'. This happens when the streaming response is cut off before
the JSON is complete.

Add a JSON repair fallback in prepare_function_arguments() that:
1. Tries pydantic_core.from_json() first (fast path, unchanged)
2. On ValueError, attempts to repair the truncated JSON by closing
   open string literals, brackets, and braces
3. Logs a warning when repair succeeds
4. Re-raises with a descriptive error if repair also fails

The repair handles the most common truncation patterns:
- Unfinished string values (missing closing quote)
- Missing closing braces/brackets
- Combinations of the above

Fixes livekit#4240
devin-ai-integration[bot]

This comment was marked as resolved.

Replaced independent bracket/brace counters with a stack that tracks
nesting order. This correctly repairs nested JSON like
'{"arr": [{"a": 1' → '{"arr": [{"a": 1}]}' instead of the
incorrect '{"arr": [{"a": 1]}}'.

Added test for nested object-in-array repair.
devin-ai-integration[bot]

This comment was marked as resolved.

Strip unescaped trailing backslash before appending closing quote to
avoid producing an escaped-quote instead of a real string terminator.
@giulio-leone
Copy link
Author

Great catch on the trailing backslash edge case! Fixed in 1cc9d4a — now strips an unescaped trailing backslash before appending the closing quote, so {"key": "value\\ correctly repairs to {"key": "value"} instead of producing an escaped-quote.

devin-ai-integration[bot]

This comment was marked as resolved.

Use stack-based counting instead of endswith() to correctly detect
odd numbers of trailing backslashes (>=3) in truncated JSON strings.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings March 1, 2026 03:28
@giulio-leone
Copy link
Author

Fixed in 5a78724 — replaced endswith() check with proper trailing backslash counting using rstrip('\\') + len() to correctly handle 3+ consecutive backslashes. Added a test for this edge case.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR improves robustness of tool-call execution by recovering from truncated JSON returned by some LLMs during streaming, preventing otherwise-valid tool calls from failing on JSON parse errors.

Changes:

  • Added a _try_repair_json() fallback that attempts to close open strings/brackets/braces when from_json() fails.
  • Updated prepare_function_arguments() to use the repair fallback on ValueError, with a warning log on successful repair.
  • Added test coverage for common truncation/repair scenarios.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
livekit-agents/livekit/agents/llm/utils.py Adds JSON repair helper and integrates it into tool argument parsing.
tests/test_tools.py Adds tests validating repair behavior across several truncation patterns.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

assert result == {"arr": [{"a": 1}]}

def test_triple_trailing_backslash_repair(self):
"""Odd number (>=3) of trailing backslashes should strip the last one."""
Copy link

Copilot AI Mar 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test docstring says "Odd number (>=3) of trailing backslashes" but the sample input ends with a single trailing backslash. Consider updating the docstring to match what’s being tested (odd number of trailing backslashes, including 1).

Suggested change
"""Odd number (>=3) of trailing backslashes should strip the last one."""
"""Odd number of trailing backslashes (including 1) should strip the last one."""

Copilot uses AI. Check for mistakes.
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Acknowledged — the implementation follows the existing error handling patterns in the codebase.


repaired += "".join(reversed(nesting_stack))

return json.loads(repaired)
Copy link

Copilot AI Mar 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_try_repair_json() uses json.loads(), which can accept non-standard JSON values like NaN/Infinity by default. That makes the fallback parse behavior potentially different (and more permissive) than the fast path pydantic_core.from_json(). Consider parsing the repaired string with from_json() (or configure json.loads to reject NaN/Infinity) to keep behavior consistent and avoid accepting invalid JSON.

Suggested change
return json.loads(repaired)
return from_json(repaired)

Copilot uses AI. Check for mistakes.
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fix handles truncated JSON from tool call arguments by attempting repair via json.loads fallback. This is a defensive measure for incomplete streaming responses.

args_dict = _try_repair_json(json_arguments)
logger.warning(
"repaired truncated JSON in tool call arguments",
extra={"raw_arguments": json_arguments},
Copy link

Copilot AI Mar 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The warning log includes the full raw tool-call arguments in extra, which may contain user content/PII and could be large/noisy (especially if this happens frequently). Consider logging only metadata (e.g., length / tool name / call_id) and/or a truncated preview, or lowering the log level to avoid flooding and accidental sensitive-data retention in logs.

Suggested change
extra={"raw_arguments": json_arguments},
extra={
"raw_arguments_preview": json_arguments[:200],
"raw_arguments_length": len(json_arguments),
},

Copilot uses AI. Check for mistakes.
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Acknowledged — the implementation follows the existing error handling patterns in the codebase.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Valid concern. The raw arguments are logged at warning level with extra metadata, which follows livekit's existing logging pattern for error diagnostics. The content is tool call arguments (function parameters), not user PII. In production, log levels can be configured to suppress warnings.

…y, fix docstring

- Use from_json() instead of json.loads() for consistent parsing behavior
- Log only metadata (preview + length) instead of full raw tool arguments
- Fix test docstring to match actual input (odd number including 1)

Refs: livekit#4974
@rajveerappan
Copy link

rajveerappan commented Mar 1, 2026

@giulio-leone wouldn't patching the JSON in this way cause incorrect arguments to be sent to the tools? in which case you would need to retry the call to the LLM so might as well just do that when you detect invalid JSON?

e.g. {"arr": [{"a": 1 might have been a truncation of {"arr": [{"a": 1, "b": 2}] but this would invoke the tool with {"arr": [{"a": 1}]

@giulio-leone
Copy link
Author

Valid concern — you're right that repair can produce semantically different arguments (e.g. {"arr": [{"a": 1}]} instead of {"arr": [{"a": 1, "b": 2}]}).

The tradeoff is: partial-but-structurally-valid JSON lets the tool execute with whatever was received, while a hard failure loses the entire call. In practice, LLM truncation typically occurs at the end of long arguments (e.g., large arrays, long strings), so the repaired result is often semantically close enough for the tool to succeed or return a meaningful error.

That said, a retry-on-invalid-JSON strategy is also valid. The two approaches could be complementary: retry first, and only repair as a last resort if retries are exhausted or not configured. Happy to add a configurable behavior if that aligns better with the project's philosophy.

@giulio-leone
Copy link
Author

Great point, @rajveerappan — you're right that silent repair can produce semantically incorrect arguments (e.g. a truncated array element missing fields).

The motivation was to avoid a hard crash for the agent when the LLM returns truncated JSON (which happens in streaming scenarios), since the alternative is ValueError with no recovery. But I agree the trade-off is non-trivial.

A few options:

  1. Repair + warn: Keep the repair but emit a warning log so the caller knows the arguments may be incomplete. This at least makes the behavior observable.

  2. Repair + retry signal: Return a flag/exception indicating the args were repaired, so the caller can choose to retry the LLM call instead.

  3. Just retry: As you suggest, detect invalid JSON and retry the LLM call directly — simpler and more correct, though it adds latency.

I'm happy to pivot to option 3 (retry) if that better fits the project's philosophy, or to option 1 (repair + warning) as a middle ground. What's your preference?

@giulio-leone
Copy link
Author

@rajveerappan You raise a valid point — the repaired JSON may semantically differ from the LLM's intent.

However, I think repair-then-proceed is the right default for this scenario:

  1. The alternative is worse: Without repair, the tool call fails entirely with a ValueError. The user gets nothing — no partial result, no chance for the agent to continue. At least with a repaired call, the tool may succeed and the agent continues.

  2. Retrying has the same problem: LLM retries aren't guaranteed to produce complete JSON either (especially under token limits or rate-limiting). The truncation in "ValueError: EOF while parsing a string" during tool calls #4240 affects ~10% of calls on Azure GPT-4.1 — retrying blindly could compound latency and cost without fixing the root cause.

  3. Pydantic validation catches bad arguments: Since prepare_function_arguments validates the repaired JSON against the function's type signature via Pydantic, structurally wrong arguments (e.g., missing required fields) will still raise ValidationError before the tool executes. The repair only helps when the truncation is in a terminal value position (like a string being cut off).

  4. Observability is built in: The logger.warning() call ensures repaired calls are visible in logs, so users can monitor the frequency and decide if they need upstream fixes (e.g., increasing max_tokens).

That said, if the livekit team prefers a retry-first approach, I'm happy to restructure this as: (1) try repair → (2) validate → (3) if validation fails, raise so the caller can retry. The current design already does this — if repair + validation fails, the original ValueError propagates.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

"ValueError: EOF while parsing a string" during tool calls

3 participants