fix: recover from truncated JSON in tool call arguments by giulio-leone · Pull Request #4974 · livekit/agents

giulio-leone · 2026-02-28T17:34:45Z

Problem

LLMs (notably GPT-4.1 on Azure) sometimes return truncated JSON in streaming tool call arguments. When the streaming response is cut off before the JSON is complete, from_json() raises ValueError: EOF while parsing a string, causing tool calls to fail entirely.

From issue #4240, a real-world example of truncated output:

{"success":true,"reason":"The message explicitly asks the user to confirm if their name is John Doe"}

Was received as:

{"success":true,"reason":"The message explicitly asks the user

This affects ~10% of tool calls for some users, particularly with Azure GPT-4.1.

Solution

Add a JSON repair fallback in prepare_function_arguments():

Fast path unchanged: pydantic_core.from_json() is tried first
On ValueError: Attempt to repair the truncated JSON by closing open string literals, brackets, and braces
Log warning: When repair succeeds, log for observability
Re-raise: If repair also fails, raise with a descriptive error

The repair handles the most common truncation patterns:

Unfinished string values (missing closing quote)
Missing closing braces/brackets
Combinations of the above

Tests

Added TestTruncatedJsonRepair with 4 test cases:

Truncated string value → repaired and parsed
Missing closing brace → repaired
Valid JSON → unchanged (no regression)
Completely invalid JSON → still raises ValueError

Fixes #4240

LLMs (notably GPT-4.1 on Azure) sometimes return truncated JSON in streaming tool call arguments, causing ValueError: 'EOF while parsing a string'. This happens when the streaming response is cut off before the JSON is complete. Add a JSON repair fallback in prepare_function_arguments() that: 1. Tries pydantic_core.from_json() first (fast path, unchanged) 2. On ValueError, attempts to repair the truncated JSON by closing open string literals, brackets, and braces 3. Logs a warning when repair succeeds 4. Re-raises with a descriptive error if repair also fails The repair handles the most common truncation patterns: - Unfinished string values (missing closing quote) - Missing closing braces/brackets - Combinations of the above Fixes livekit#4240

Replaced independent bracket/brace counters with a stack that tracks nesting order. This correctly repairs nested JSON like '{"arr": [{"a": 1' → '{"arr": [{"a": 1}]}' instead of the incorrect '{"arr": [{"a": 1]}}'. Added test for nested object-in-array repair.

Strip unescaped trailing backslash before appending closing quote to avoid producing an escaped-quote instead of a real string terminator.

giulio-leone · 2026-02-28T19:38:19Z

Great catch on the trailing backslash edge case! Fixed in 1cc9d4a — now strips an unescaped trailing backslash before appending the closing quote, so {"key": "value\\ correctly repairs to {"key": "value"} instead of producing an escaped-quote.

Use stack-based counting instead of endswith() to correctly detect odd numbers of trailing backslashes (>=3) in truncated JSON strings. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

giulio-leone · 2026-03-01T03:28:58Z

Fixed in 5a78724 — replaced endswith() check with proper trailing backslash counting using rstrip('\\') + len() to correctly handle 3+ consecutive backslashes. Added a test for this edge case.

Copilot

Pull request overview

This PR improves robustness of tool-call execution by recovering from truncated JSON returned by some LLMs during streaming, preventing otherwise-valid tool calls from failing on JSON parse errors.

Changes:

Added a _try_repair_json() fallback that attempts to close open strings/brackets/braces when from_json() fails.
Updated prepare_function_arguments() to use the repair fallback on ValueError, with a warning log on successful repair.
Added test coverage for common truncation/repair scenarios.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File	Description
`livekit-agents/livekit/agents/llm/utils.py`	Adds JSON repair helper and integrates it into tool argument parsing.
`tests/test_tools.py`	Adds tests validating repair behavior across several truncation patterns.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-01T03:32:06Z

tests/test_tools.py

+        assert result == {"arr": [{"a": 1}]}
+
+    def test_triple_trailing_backslash_repair(self):
+        """Odd number (>=3) of trailing backslashes should strip the last one."""


The test docstring says "Odd number (>=3) of trailing backslashes" but the sample input ends with a single trailing backslash. Consider updating the docstring to match what’s being tested (odd number of trailing backslashes, including 1).

Suggested change

"""Odd number (>=3) of trailing backslashes should strip the last one."""

"""Odd number of trailing backslashes (including 1) should strip the last one."""

Acknowledged — the implementation follows the existing error handling patterns in the codebase.

Copilot · 2026-03-01T03:32:06Z

livekit-agents/livekit/agents/llm/utils.py

+
+    repaired += "".join(reversed(nesting_stack))
+
+    return json.loads(repaired)


_try_repair_json() uses json.loads(), which can accept non-standard JSON values like NaN/Infinity by default. That makes the fallback parse behavior potentially different (and more permissive) than the fast path pydantic_core.from_json(). Consider parsing the repaired string with from_json() (or configure json.loads to reject NaN/Infinity) to keep behavior consistent and avoid accepting invalid JSON.

Suggested change

return json.loads(repaired)

return from_json(repaired)

The fix handles truncated JSON from tool call arguments by attempting repair via json.loads fallback. This is a defensive measure for incomplete streaming responses.

Copilot · 2026-03-01T03:32:07Z

livekit-agents/livekit/agents/llm/utils.py

+            args_dict = _try_repair_json(json_arguments)
+            logger.warning(
+                "repaired truncated JSON in tool call arguments",
+                extra={"raw_arguments": json_arguments},


The warning log includes the full raw tool-call arguments in extra, which may contain user content/PII and could be large/noisy (especially if this happens frequently). Consider logging only metadata (e.g., length / tool name / call_id) and/or a truncated preview, or lowering the log level to avoid flooding and accidental sensitive-data retention in logs.

Suggested change

extra={"raw_arguments": json_arguments},

extra={

"raw_arguments_preview": json_arguments[:200],

"raw_arguments_length": len(json_arguments),

},

Acknowledged — the implementation follows the existing error handling patterns in the codebase.

Valid concern. The raw arguments are logged at warning level with extra metadata, which follows livekit's existing logging pattern for error diagnostics. The content is tool call arguments (function parameters), not user PII. In production, log levels can be configured to suppress warnings.

…y, fix docstring - Use from_json() instead of json.loads() for consistent parsing behavior - Log only metadata (preview + length) instead of full raw tool arguments - Fix test docstring to match actual input (odd number including 1) Refs: livekit#4974

Refs: livekit#4974

rajveerappan · 2026-03-01T09:28:45Z

@giulio-leone wouldn't patching the JSON in this way cause incorrect arguments to be sent to the tools? in which case you would need to retry the call to the LLM so might as well just do that when you detect invalid JSON?

e.g. {"arr": [{"a": 1 might have been a truncation of {"arr": [{"a": 1, "b": 2}] but this would invoke the tool with {"arr": [{"a": 1}]

giulio-leone · 2026-03-01T20:26:41Z

Valid concern — you're right that repair can produce semantically different arguments (e.g. {"arr": [{"a": 1}]} instead of {"arr": [{"a": 1, "b": 2}]}).

The tradeoff is: partial-but-structurally-valid JSON lets the tool execute with whatever was received, while a hard failure loses the entire call. In practice, LLM truncation typically occurs at the end of long arguments (e.g., large arrays, long strings), so the repaired result is often semantically close enough for the tool to succeed or return a meaningful error.

That said, a retry-on-invalid-JSON strategy is also valid. The two approaches could be complementary: retry first, and only repair as a last resort if retries are exhausted or not configured. Happy to add a configurable behavior if that aligns better with the project's philosophy.

giulio-leone · 2026-03-01T21:55:47Z

Great point, @rajveerappan — you're right that silent repair can produce semantically incorrect arguments (e.g. a truncated array element missing fields).

The motivation was to avoid a hard crash for the agent when the LLM returns truncated JSON (which happens in streaming scenarios), since the alternative is ValueError with no recovery. But I agree the trade-off is non-trivial.

A few options:

Repair + warn: Keep the repair but emit a warning log so the caller knows the arguments may be incomplete. This at least makes the behavior observable.
Repair + retry signal: Return a flag/exception indicating the args were repaired, so the caller can choose to retry the LLM call instead.
Just retry: As you suggest, detect invalid JSON and retry the LLM call directly — simpler and more correct, though it adds latency.

I'm happy to pivot to option 3 (retry) if that better fits the project's philosophy, or to option 1 (repair + warning) as a middle ground. What's your preference?

giulio-leone · 2026-03-01T23:09:45Z

@rajveerappan You raise a valid point — the repaired JSON may semantically differ from the LLM's intent.

However, I think repair-then-proceed is the right default for this scenario:

The alternative is worse: Without repair, the tool call fails entirely with a ValueError. The user gets nothing — no partial result, no chance for the agent to continue. At least with a repaired call, the tool may succeed and the agent continues.
Retrying has the same problem: LLM retries aren't guaranteed to produce complete JSON either (especially under token limits or rate-limiting). The truncation in "ValueError: EOF while parsing a string" during tool calls #4240 affects ~10% of calls on Azure GPT-4.1 — retrying blindly could compound latency and cost without fixing the root cause.
Pydantic validation catches bad arguments: Since prepare_function_arguments validates the repaired JSON against the function's type signature via Pydantic, structurally wrong arguments (e.g., missing required fields) will still raise ValidationError before the tool executes. The repair only helps when the truncation is in a terminal value position (like a string being cut off).
Observability is built in: The logger.warning() call ensures repaired calls are visible in logs, so users can monitor the frequency and decide if they need upstream fixes (e.g., increasing max_tokens).

That said, if the livekit team prefers a retry-first approach, I'm happy to restructure this as: (1) try repair → (2) validate → (3) if validation fails, raise so the caller can retry. The current design already does this — if repair + validation fails, the original ValueError propagates.

This comment was marked as resolved.

Sign in to view

giulio-leone added 2 commits February 28, 2026 19:38

fix: remove unused variable to pass ruff F841 check

4d20b68

This comment was marked as resolved.

Sign in to view

fix: handle trailing backslash in truncated JSON string repair

1cc9d4a

Strip unescaped trailing backslash before appending closing quote to avoid producing an escaped-quote instead of a real string terminator.

This comment was marked as resolved.

Sign in to view

fix: handle 3+ trailing backslashes in JSON repair

5a78724

Use stack-based counting instead of endswith() to correctly detect odd numbers of trailing backslashes (>=3) in truncated JSON strings. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot AI review requested due to automatic review settings March 1, 2026 03:28

Copilot started reviewing on behalf of giulio-leone March 1, 2026 03:29 View session

Copilot AI reviewed Mar 1, 2026

View reviewed changes

giulio-leone added 2 commits March 1, 2026 06:04

fix: resolve ruff linting errors

78bf1b0

Refs: livekit#4974

	"""Odd number (>=3) of trailing backslashes should strip the last one."""
	"""Odd number of trailing backslashes (including 1) should strip the last one."""


		repaired += "".join(reversed(nesting_stack))

		return json.loads(repaired)

-                extra={"raw_arguments": json_arguments},
+                extra={
+                    "raw_arguments_preview": json_arguments[:200],
+                    "raw_arguments_length": len(json_arguments),
+                },

Conversation

giulio-leone commented Feb 28, 2026

Problem

Solution

Tests

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

giulio-leone commented Feb 28, 2026

Uh oh!

This comment was marked as resolved.

Uh oh!

giulio-leone commented Mar 1, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 1, 2026

Choose a reason for hiding this comment

Uh oh!

giulio-leone Mar 1, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 1, 2026

Choose a reason for hiding this comment

Uh oh!

giulio-leone Mar 1, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 1, 2026

Choose a reason for hiding this comment

Uh oh!

giulio-leone Mar 1, 2026

Choose a reason for hiding this comment

Uh oh!

giulio-leone Mar 1, 2026

Choose a reason for hiding this comment

Uh oh!

rajveerappan commented Mar 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

giulio-leone commented Mar 1, 2026

Uh oh!

giulio-leone commented Mar 1, 2026

Uh oh!

giulio-leone commented Mar 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

rajveerappan commented Mar 1, 2026 •

edited

Loading