-
Notifications
You must be signed in to change notification settings - Fork 406
test: add e2e tests for wire mode #707
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Richard Chien <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
|
||
| TRACE_ENV = "KIMI_TEST_TRACE" | ||
| DEFAULT_TIMEOUT = 5.0 | ||
| _PATH_REPLACEMENTS: dict[str, str] = {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🟡 Global _PATH_REPLACEMENTS dict accumulates state across tests without cleanup
The module-level _PATH_REPLACEMENTS dictionary accumulates path replacement mappings across all tests without ever being cleared. This can cause test interference when running multiple tests in the same process.
Click to expand
Issue Details
At line 17, a global mutable dictionary is defined:
_PATH_REPLACEMENTS: dict[str, str] = {}Every call to make_home_dir() or make_work_dir() (lines 30-41) registers new path replacements via register_path_replacements() which calls _add_replacement() (lines 62-67):
def _add_replacement(path: Path | None, token: str) -> None:
if path is None:
return
_PATH_REPLACEMENTS[str(path)] = token # Never cleared!
resolved = path.resolve()
_PATH_REPLACEMENTS[str(resolved)] = tokenImpact
Since pytest runs tests in the same process, this dictionary grows unboundedly. More critically, in _replace_paths() (line 336), replacements are sorted by length and applied in order - if a path from a previous test happens to be a substring of a path in the current test, incorrect replacements could occur, leading to flaky test failures or incorrect snapshot comparisons.
Recommendation: Either clear _PATH_REPLACEMENTS at the start of each test (e.g., via a pytest fixture or by clearing it in make_home_dir/make_work_dir), or use a context-local approach instead of a global dictionary.
Was this helpful? React with 👍 or 👎 to provide feedback.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Adds a new tests_e2e suite to exercise kimi --wire (JSON-RPC + wire event stream) end-to-end, covering protocol handshake, prompts, approvals/tools, sessions, skills, and MCP behavior.
Changes:
- Introduces
tests_e2ehelpers and multiple wire-mode E2E test modules using_scripted_echoplus snapshots. - Wires the new E2E suite into
make test-kimi-cliand expands lint/type-check coverage to includetests_e2e. - Adds a
tests_e2e/AGENTS.mdguide documenting the wire E2E test matrix and scope.
Reviewed changes
Copilot reviewed 12 out of 13 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| tests_e2e/wire_helpers.py | Adds subprocess-based harness for driving kimi --wire, plus normalization/summarization utilities. |
| tests_e2e/test_wire_protocol.py | Adds tests for initialize handshake and external tool registration/calls. |
| tests_e2e/test_wire_prompt.py | Adds tests for prompt flows, content parts input, max steps, and concurrency error behavior. |
| tests_e2e/test_wire_errors.py | Adds tests for JSON-RPC and LLM-related error handling in wire mode. |
| tests_e2e/test_wire_sessions.py | Adds tests around session persistence, --continue, /clear, and /compact. |
| tests_e2e/test_wire_approvals_tools.py | Adds extensive tests for approval flows and display blocks. |
| tests_e2e/test_wire_skills_mcp.py | Adds tests for skills/flows and MCP tool invocation via a local stdio server. |
| tests_e2e/test_wire_real_llm.py | Adds placeholder tests (skipped) for behaviors requiring a real provider. |
| tests_e2e/AGENTS.md | Documents E2E goals, rules, and a wire test matrix. |
| tests_e2e/init.py | Declares the new test package. |
| pyproject.toml | Adds Ruff per-file ignores for tests/tests_e2e and includes tests_e2e in type-checking inputs. |
| Makefile | Runs pytest tests_e2e as part of make test-kimi-cli. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
tests_e2e/wire_helpers.py
Outdated
| ) | ||
| selector = selectors.DefaultSelector() | ||
| assert process.stdout is not None | ||
| selector.register(process.stdout, selectors.EVENT_READ) | ||
| return WireProcess(process=process, selector=selector) |
Copilot
AI
Jan 26, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
selectors.DefaultSelector() + selector.register(process.stdout, ...) is not portable on Windows (CI includes windows-2022). The default selector on Windows only supports sockets, so registering a pipe/file handle for process.stdout can raise an exception and break all e2e tests. Consider replacing this with a cross-platform approach (e.g., a background thread that reads stdout lines into a queue.Queue and get(timeout=...), or asyncio stream reader).
tests_e2e/test_wire_protocol.py
Outdated
| assert normalize_response(resp) == snapshot( | ||
| { | ||
| "result": { | ||
| "protocol_version": "1.1", | ||
| "server": {"name": "Kimi CLI", "version": "0.84"}, |
Copilot
AI
Jan 26, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The initialize snapshot hard-codes the server version as 0.84, but the repo's current package version is 0.87 (see pyproject.toml). This will cause the snapshot to fail whenever the package version changes. Prefer asserting server.name and that server.version is a non-empty string, or normalize/strip the version field before snapshotting.
| assert normalize_response(resp) == snapshot( | |
| { | |
| "result": { | |
| "protocol_version": "1.1", | |
| "server": {"name": "Kimi CLI", "version": "0.84"}, | |
| # Validate server info without depending on the exact version string. | |
| server = _as_dict(result.get("server")) | |
| assert server.get("name") == "Kimi CLI" | |
| assert isinstance(server.get("version"), str) and server["version"] | |
| # Normalize the response for snapshotting by stripping the concrete version. | |
| normalized = normalize_response(resp) | |
| normalized_result = _as_dict(normalized.get("result")) | |
| normalized_server = _as_dict(normalized_result.get("server")) | |
| if "version" in normalized_server: | |
| normalized_server["version"] = "<version>" | |
| assert normalized == snapshot( | |
| { | |
| "result": { | |
| "protocol_version": "1.1", | |
| "server": {"name": "Kimi CLI", "version": "<version>"}, |
| "result": { | ||
| "protocol_version": "1.1", | ||
| "server": {"name": "Kimi CLI", "version": "0.84"}, | ||
| "slash_commands": [ |
Copilot
AI
Jan 26, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same issue as above: this snapshot hard-codes server.version (0.84), which will drift from the actual package version (pyproject.toml currently has 0.87) and make the test brittle. Consider removing server.version from the snapshot or dynamically asserting it.
| def test_clear_context_rotates(tmp_path) -> None: | ||
| config_path = write_scripted_config(tmp_path, ["text: hello"]) | ||
| work_dir = make_work_dir(tmp_path) | ||
| home_dir = make_home_dir(tmp_path) | ||
|
|
||
| wire = start_wire( |
Copilot
AI
Jan 26, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test name suggests context file rotation, but the assertions expect the context file to be truncated and no rotated context.jsonl.* files (rotated == []). Rename the test to reflect the actual behavior (e.g., test_clear_context_truncates) or adjust the assertions to match the intended semantics.
| def close(self) -> None: | ||
| if self.process.stdin is not None: | ||
| with contextlib.suppress(Exception): | ||
| self.process.stdin.close() | ||
| if self.process.stdout is not None: | ||
| with contextlib.suppress(Exception): | ||
| self.process.stdout.close() | ||
| try: |
Copilot
AI
Jan 26, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
WireProcess.close() closes stdin/stdout and the subprocess, but never closes/unregisters the selector (self.selector). This can leak file descriptors across many e2e tests. Close the selector (and/or unregister stdout) as part of cleanup.
Signed-off-by: Richard Chien <[email protected]>
Signed-off-by: Richard Chien <[email protected]>
Signed-off-by: Richard Chien [email protected]
Related Issue
Resolve #(issue_number)
Description
Checklist
make gen-changelogto update the changelog.make gen-docsto update the user documentation.