feat(desktop): native desktop automation MCP tool (closes #189)#633
feat(desktop): native desktop automation MCP tool (closes #189)#633RajdeepKushwaha5 wants to merge 14 commits intoaccomplish-ai:mainfrom
Conversation
✅ Snyk checks have passed. No issues have been found so far.
💻 Catch issues earlier using the plugins for VS Code, JetBrains IDEs, Visual Studio, and Eclipse. |
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds a new local MCP tool "desktop-control" for cross-platform desktop automation, includes it in bundling and generated MCP servers, replaces shell exec calls with execFileSync for CLI/version resolution, adjusts Windows PTY spawn logic and tests, and extends the desktop permission API with a /desktop-permission endpoint. Changes
Sequence DiagramsequenceDiagram
participant Agent as Agent/Client
participant MCP as Desktop-Control MCP
participant PermAPI as Permission API
participant OS as OS Platform
Agent->>MCP: CallToolRequest (desktop action)
MCP->>PermAPI: Request desktop permission (operation, metadata)
alt permission granted
PermAPI-->>MCP: Granted
MCP->>OS: Execute platform-specific command (AppleScript / PowerShell / xdotool)
OS-->>MCP: Result (image/base64, status, window list)
MCP-->>Agent: Tool result
else denied or API error
PermAPI-->>MCP: Denied/Error
MCP-->>Agent: Error response
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Pull request overview
This PR introduces a new desktop-control MCP tool that provides native desktop automation capabilities across macOS, Windows, and Linux, while also fixing Issue #596 related to Windows path handling for paths containing spaces.
Changes:
- Adds
desktop-controlMCP tool with 7 desktop automation capabilities (screenshot, mouse control, keyboard input, window management) - Fixes Issue #596 by replacing
execSyncwithexecFileSyncin cli-resolver.ts and electron-options.ts - Updates OpenCodeAdapter to route Windows commands through
cmd.exe /s /cwith proper quoting - Adds comprehensive unit tests for Windows path handling with spaces
Reviewed changes
Copilot reviewed 12 out of 12 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| packages/agent-core/mcp-tools/desktop-control/src/index.ts | New MCP server implementing 7 desktop automation tools with security-first design |
| packages/agent-core/mcp-tools/desktop-control/package.json | Package configuration for desktop-control tool |
| packages/agent-core/mcp-tools/desktop-control/tsconfig.json | TypeScript configuration matching other MCP tools |
| packages/agent-core/mcp-tools/desktop-control/SKILL.md | Comprehensive documentation for AI agent usage |
| packages/agent-core/src/opencode/config-generator.ts | Registers desktop-control MCP server and adds capability to system prompt |
| packages/agent-core/src/opencode/cli-resolver.ts | Replaces execSync with execFileSync to fix Issue #596 |
| packages/agent-core/src/internal/classes/OpenCodeAdapter.ts | Routes Windows .exe through cmd.exe with proper quoting for Issue #596 |
| apps/desktop/src/main/opencode/electron-options.ts | Applies same execFileSync fix for consistency |
| apps/desktop/scripts/bundle-skills.cjs | Adds desktop-control to bundling configuration |
| packages/agent-core/tests/unit/opencode/config-generator.test.ts | Adds tests verifying desktop-control registration |
| packages/agent-core/tests/unit/opencode/cli-resolver.test.ts | Adds regression test for Issue #596 |
| packages/agent-core/tests/unit/opencode/adapter.test.ts | Comprehensive tests for Windows path escaping |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| const transport = new StdioServerTransport(); | ||
| await server.connect(transport); |
There was a problem hiding this comment.
The server startup code should follow the established pattern used by all other MCP tools in this codebase. Wrap the transport creation and server.connect() call in a main() async function with .catch() error handling. This ensures consistent error handling across all MCP tools and provides better error messages when the server fails to start. All other MCP tools (file-permission, ask-user-question, complete-task, start-task, report-checkpoint, report-thought, dev-browser-mcp) follow this pattern.
| const transport = new StdioServerTransport(); | |
| await server.connect(transport); | |
| async function main(): Promise<void> { | |
| const transport = new StdioServerTransport(); | |
| await server.connect(transport); | |
| } | |
| main().catch((error) => { | |
| console.error('Failed to start desktop-control MCP server:', error); | |
| process.exit(1); | |
| }); |
| [DllImport("user32.dll")] public static extern bool SetForegroundWindow(IntPtr h); | ||
| } | ||
| '@ | ||
| $proc = Get-Process | Where-Object {$_.MainWindowTitle -like "*$Title*"} | Select-Object -First 1 |
There was a problem hiding this comment.
The Windows focusWindow implementation uses PowerShell's -like operator with wildcards ($Title), which will treat special characters in the window name as wildcards (e.g., *, ?, [, ]). Consider using -eq for exact matching or escaping the wildcard characters in $Title to avoid unexpected matching behavior. For example, a window name containing brackets like "Test[1]" might not match correctly.
| $proc = Get-Process | Where-Object {$_.MainWindowTitle -like "*$Title*"} | Select-Object -First 1 | |
| $escapedTitle = [System.Management.Automation.WildcardPattern]::Escape($Title) | |
| $proc = Get-Process | Where-Object {$_.MainWindowTitle -like "*$escapedTitle*"} | Select-Object -First 1 |
| ## Tools | ||
|
|
||
| ### `desktop_screenshot` | ||
| Capture the entire primary display. Returns a base64-encoded PNG image. No permission required. |
There was a problem hiding this comment.
The desktop_screenshot tool captures the full contents of the primary display and is explicitly documented as requiring no permission, even though this can expose highly sensitive on‑screen information (passwords, 2FA codes, private documents, etc.). In a prompt‑injection or compromised orchestrator scenario, an attacker could invoke this tool repeatedly to exfiltrate whatever is visible on the user’s screen without any explicit, per‑action user approval. Consider aligning screenshots with the permission model (e.g., using the same permission API or a dedicated "allow screen capture" prompt) so users must explicitly approve each capture before the image is returned.
| Capture the entire primary display. Returns a base64-encoded PNG image. No permission required. | |
| Capture the entire primary display. Returns a base64-encoded PNG image. Permission required; each capture must be explicitly approved by the user (via the same permission model used for other desktop tools). |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (3)
packages/agent-core/mcp-tools/desktop-control/package.json (1)
12-16: Consider moving@modelcontextprotocol/sdktodependencies.The
@modelcontextprotocol/sdkpackage is likely used at runtime (imported insrc/index.tsto create the MCP server). Placing it indevDependenciesmeans it won't be installed in production builds or when this package is used as a dependency.Since this tool is bundled (per
bundle-skills.cjs), the bundler may resolve it regardless. However, for clarity and correctness, runtime dependencies should be independencies.Proposed fix
- "devDependencies": { - "@modelcontextprotocol/sdk": "^1.0.0", + "dependencies": { + "@modelcontextprotocol/sdk": "^1.0.0" + }, + "devDependencies": { "tsx": "^4.21.0", "typescript": "^5.0.0" }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/agent-core/mcp-tools/desktop-control/package.json` around lines 12 - 16, The package "@modelcontextprotocol/sdk" is declared in devDependencies but is required at runtime (imported in src/index.ts to create the MCP server) so move it into "dependencies" in package.json; update the package.json devDependencies/dev block to remove "@modelcontextprotocol/sdk" and add it under "dependencies" (keeping the same version "^1.0.0") so production installs and consumers get the runtime dependency, and ensure any bundler config (bundle-skills.cjs) still works with the dependency present.packages/agent-core/mcp-tools/desktop-control/src/index.ts (2)
355-360: Replace the nested ternary with explicit branching for key mapping.♻️ Suggested refactor
- const k = - mainKey in WIN_KEYS - ? WIN_KEYS[mainKey] - : mainKey.length === 1 - ? mainKey - : `{${mainKey.toUpperCase()}}`; + let k: string; + if (mainKey in WIN_KEYS) { + k = WIN_KEYS[mainKey]; + } else if (mainKey.length === 1) { + k = mainKey; + } else { + k = `{${mainKey.toUpperCase()}}`; + }As per coding guidelines: Avoid nested ternaries — Use mapper objects or if/else for readability.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/agent-core/mcp-tools/desktop-control/src/index.ts` around lines 355 - 360, The expression that computes k uses a nested ternary which reduces readability; replace it with explicit branching using either an if/else block or a small helper mapper so the logic is clear: first check WIN_KEYS for mainKey (WIN_KEYS[mainKey]), then if not present check whether mainKey.length === 1 and return mainKey, otherwise return the fallback `"{"+mainKey.toUpperCase()+"}"`; update the code that assigns k (the constant computed from mainKey and WIN_KEYS) and any caller that relies on that value to use the clearer branching implementation.
137-142: Replace the nested ternary with a mapping table for readability.♻️ Suggested refactor
- const cmd = - button === 'right' - ? 'right click' - : button === 'double' - ? 'double click' - : 'click'; + const CLICK_CMD: Record<'left' | 'right' | 'double', string> = { + left: 'click', + right: 'right click', + double: 'double click', + }; + const cmd = CLICK_CMD[button];As per coding guidelines: Avoid nested ternaries — Use mapper objects or if/else for readability.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/agent-core/mcp-tools/desktop-control/src/index.ts` around lines 137 - 142, The nested ternary that sets cmd based on button should be replaced with a simple mapping object or switch to improve readability: create a const mapping (e.g., const clickMap = { right: 'right click', double: 'double click', default: 'click' } or similar) and use button to look up the value for cmd, falling back to 'click' when not present; update the code that assigns cmd (the variable named cmd that currently uses the nested ternary) to use this mapper instead.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@packages/agent-core/mcp-tools/desktop-control/src/index.ts`:
- Around line 431-447: The Linux branch currently sets pid to the X11 window ID
(wids) instead of the process ID; update the mapping in the wids -> results
Promise.all block to call execFile('xdotool', ['getwindowpid', wid]) to retrieve
the real PID, parse it to a number, and set pid to undefined when
retrieval/parsing fails (keep the existing try/catch so windows without
_NET_WM_PID are handled). Locate the async map over wids (the block that
currently calls execFile('xdotool', ['getwindowname', wid]) and returns { name:
..., pid: Number(wid) }) and replace the pid logic to use getwindowpid and
conditional undefined on failure, preserving WindowInfo typing and filtering out
nulls as before.
- Around line 459-472: The PowerShell script written via tmpFile('focus', 'ps1')
and writeFile can return success even when no window matches because Write-Error
is non-terminating; update the here-doc body (the script string passed to
writeFile) so the else branch both writes the error and then calls exit 1 (e.g.,
after Write-Error "No window found matching: $Title" add exit 1) so the process
returns a non-zero exit code and execFile will surface the failure; locate the
ps1 script content where WinFocus and SetForegroundWindow are defined and modify
only the else branch to propagate the error.
---
Nitpick comments:
In `@packages/agent-core/mcp-tools/desktop-control/package.json`:
- Around line 12-16: The package "@modelcontextprotocol/sdk" is declared in
devDependencies but is required at runtime (imported in src/index.ts to create
the MCP server) so move it into "dependencies" in package.json; update the
package.json devDependencies/dev block to remove "@modelcontextprotocol/sdk" and
add it under "dependencies" (keeping the same version "^1.0.0") so production
installs and consumers get the runtime dependency, and ensure any bundler config
(bundle-skills.cjs) still works with the dependency present.
In `@packages/agent-core/mcp-tools/desktop-control/src/index.ts`:
- Around line 355-360: The expression that computes k uses a nested ternary
which reduces readability; replace it with explicit branching using either an
if/else block or a small helper mapper so the logic is clear: first check
WIN_KEYS for mainKey (WIN_KEYS[mainKey]), then if not present check whether
mainKey.length === 1 and return mainKey, otherwise return the fallback
`"{"+mainKey.toUpperCase()+"}"`; update the code that assigns k (the constant
computed from mainKey and WIN_KEYS) and any caller that relies on that value to
use the clearer branching implementation.
- Around line 137-142: The nested ternary that sets cmd based on button should
be replaced with a simple mapping object or switch to improve readability:
create a const mapping (e.g., const clickMap = { right: 'right click', double:
'double click', default: 'click' } or similar) and use button to look up the
value for cmd, falling back to 'click' when not present; update the code that
assigns cmd (the variable named cmd that currently uses the nested ternary) to
use this mapper instead.
|
@RajdeepKushwaha5 can you attach a short video demonstrating these new capabilities? |
…e paths with spaces On Windows, pty.spawn() with a raw .exe path works in ConPTY mode (Windows 10 1809+) but fails in WinPTY fallback mode when the installation path contains spaces. The intermediate cmd.exe session receives an unquoted path and splits it at every space, producing the error: '"C:\Users\Anish Maheshwari\...\opencode.exe"' is not recognized Fix buildPtySpawnArgs() on win32 to route through cmd.exe /s /c using the existing buildShellCommand() helper which individually quotes each token that contains spaces (Windows cmd.exe doubling convention). Also replace the two execSync calls in cli-resolver.ts and electron-options.ts with execFileSync(path, ['--version']) to eliminate cmd.exe shell involvement from version detection entirely. Tests: added 6 regression tests in adapter.test.ts (Issue accomplish-ai#596 describe block) and 1 test in cli-resolver.test.ts verifying graceful handling of paths with spaces. Fixes: accomplish-ai#596
…sh-ai#189) Add a new 'desktop-control' MCP tool that gives the agent native desktop automation capabilities beyond CLI and browser, operating on all three supported platforms without requiring any compiled native Node.js modules. ## Tools exposed - desktop_screenshot - capture primary display as PNG (no permission needed) - desktop_mouse_click - left/right/double click at screen coordinates - desktop_mouse_move - move cursor to screen coordinates - desktop_type_text - type arbitrary Unicode text at caret position - desktop_key_press - press key combinations (ctrl+c, Return, F5, ...) - desktop_list_windows - list visible application windows as JSON - desktop_focus_window - bring an application to the foreground ## Security - Permission fails CLOSED: catch {} returns false, never returns true - execFile() used for every subprocess - no shell interpolation, no injection - type_text uses clipboard-paste technique (pbcopy on macOS, PS clipboard on Windows, xdotool type on Linux) so user text is never embedded in a command string - AppleScript strings with user input are escaped with escapeAppleScriptString() - key_press validates input with regex before building platform commands ## Cross-platform implementation macOS: screencapture (screenshot), osascript/System Events (mouse+keyboard+ window management). mouse_move requires cliclick (brew install cliclick). Windows: parameterised PS1 scripts with P/Invoke for mouse/keyboard, System.Windows.Forms for screenshot, clipboard and SendKeys. Linux: scrot/gnome-screenshot (screenshot), xdotool for everything else. ## Integration - Registered in config-generator.ts with PERMISSION_API_PORT env var - Added to bundle-skills.cjs SKILLS_FULLY_BUNDLED list and bundles array - Desktop Automation capability added to agent system prompt - Unit tests updated: stub dist entry added to requiredMcpDistEntries; asserts server is registered with correct PERMISSION_API_PORT Closes accomplish-ai#189
10c8dc0 to
b40dea0
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (2)
packages/agent-core/mcp-tools/desktop-control/src/index.ts (2)
459-472: WindowsfocusWindowreports success when no window matches.
Write-Erroris a non-terminating PowerShell error — it does not set a non-zero exit code. PowerShell exits0regardless, soexecFileresolves successfully and the tool returns"Focused window: <name>"even when nothing was focused. Addexit 1to propagate the failure.🐛 Proposed fix
-else { Write-Error "No window found matching: $Title" } +else { Write-Error "No window found matching: $Title"; exit 1 }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/agent-core/mcp-tools/desktop-control/src/index.ts` around lines 459 - 472, The PowerShell script written via tmpFile('focus','ps1') and writeFile always exits 0 because Write-Error is non-terminating; change the script so that when no process matches the Title it writes the error and then calls exit 1 (or throws) so the spawned execFile call returns a non-zero exit code and the focusWindow flow does not report success erroneously. Locate the script construction in index.ts (the ps1 temp file created for focus) and append an explicit exit 1 in the else branch (or replace Write-Error with Write-Error; exit 1) so failures propagate to the caller.
437-447: Linuxpidfield is set to the X11 window ID, not the process ID.
xdotool searchreturns window IDs (arbitrary X11 identifiers). Storing them in thepidfield ofWindowInfois semantically wrong and will confuse callers who attempt to signal or inspect the process. Usexdotool getwindowpid <wid>to retrieve the real PID; leavepidundefined when retrieval fails (e.g., for windows without_NET_WM_PID).🐛 Proposed fix
- wids.map(async (wid) => { - try { - const { stdout: name } = await execFile('xdotool', ['getwindowname', wid]); - return { name: name.trim(), pid: Number(wid) }; - } catch { - return null; - } - }), + wids.map(async (wid) => { + try { + const { stdout: name } = await execFile('xdotool', ['getwindowname', wid]); + let pid: number | undefined; + try { + const { stdout: pidOut } = await execFile('xdotool', ['getwindowpid', wid]); + const parsed = Number(pidOut.trim()); + if (Number.isFinite(parsed)) { pid = parsed; } + } catch { + // window does not set _NET_WM_PID — leave pid undefined + } + return { name: name.trim(), pid }; + } catch { + return null; + } + }),🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/agent-core/mcp-tools/desktop-control/src/index.ts` around lines 437 - 447, The code is currently treating X11 window IDs as process IDs by returning pid: Number(wid) inside the wids mapping; instead call xdotool getwindowpid for each wid to retrieve the real process ID and parse it to a number, and if that command fails or returns non-numeric data leave pid undefined. Update the async mapping in the Promise.all block (the lambda that calls execFile('xdotool', ['getwindowname', wid])) to run a second execFile('xdotool', ['getwindowpid', wid]) after successfully getting the name, convert the stdout to Number when valid, and return { name: name.trim(), pid: pidNumberOrUndefined } while still returning null on failures so results.filter((r): r is WindowInfo => r !== null) remains correct.
🧹 Nitpick comments (1)
packages/agent-core/tests/unit/opencode/adapter.test.ts (1)
167-272:escapeShellArgWin32Issue596is a byte-for-byte duplicate ofescapeShellArgWin32from the Issue#354block.Both helper functions are identical, and
buildShellCommandIssue596is functionally the same asbuildShellCommand. If the production escaping strategy changes, both sets of helpers need to be updated independently — silent drift risk. Consider hoisting a single shared helper to the outerdescribe('Shell escaping utilities')scope and referencing it from both inner blocks.♻️ Suggested extraction
+ // Shared helper for both Issue `#354` and Issue `#596` tests + function escapeShellArgForCmd(arg: string): string { + if (arg.includes(' ') || arg.includes('"')) { + return `"${arg.replace(/"/g, '""')}"`; + } + return arg; + } describe('Windows cmd.exe /s /c quoting (Issue `#354`)', () => { - function escapeShellArgWin32(arg: string): string { - if (arg.includes(' ') || arg.includes('"')) { - return `"${arg.replace(/"/g, '""')}"`; - } - return arg; - } - function buildShellCommand(command: string, args: string[]): string { - const escapedCommand = escapeShellArgWin32(command); - const escapedArgs = args.map((arg) => escapeShellArgWin32(arg)); - return [escapedCommand, ...escapedArgs].join(' '); - } + function buildShellCommand(command: string, args: string[]): string { + return [command, ...args].map(escapeShellArgForCmd).join(' '); + } ... }); describe('Windows spawn via cmd.exe /s /c – Issue `#596` (paths with spaces)', () => { - function escapeShellArgWin32Issue596(arg: string): string { - if (arg.includes(' ') || arg.includes('"')) { - return `"${arg.replace(/"/g, '""')}"`; - } - return arg; - } - function buildShellCommandIssue596(command: string, args: string[]): string { - return [command, ...args].map(escapeShellArgWin32Issue596).join(' '); - } + function buildShellCommandIssue596(command: string, args: string[]): string { + return [command, ...args].map(escapeShellArgForCmd).join(' '); + } ... });🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/agent-core/tests/unit/opencode/adapter.test.ts` around lines 167 - 272, The test duplicates the escape logic (escapeShellArgWin32Issue596 / buildShellCommandIssue596) already defined as escapeShellArgWin32 / buildShellCommand; hoist a single shared helper into the outer describe (e.g. a new "Shell escaping utilities" scope) and have both inner test blocks call that shared escapeShellArgWin32 and buildShellCommand to eliminate byte-for-byte duplication and prevent drift — update references in the Windows Issue `#596` tests to use escapeShellArgWin32 and buildShellCommand (remove the duplicate escapeShellArgWin32Issue596 / buildShellCommandIssue596 functions) so both test suites rely on the same helpers.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@packages/agent-core/mcp-tools/desktop-control/src/index.ts`:
- Around line 74-125: captureScreenshot currently takes full-screen images
without any permission check; add a session-level consent gate and audit
logging: when captureScreenshot is invoked, call the existing checkPermission
(or a new checkSessionConsent) to verify a session-scoped flag (e.g.,
allowDesktopScreenshots) and, if not set, prompt once for session consent before
proceeding; if consent is denied, throw or return a denied CallToolResult and do
not take the screenshot. After a successful capture, emit a structured audit
entry (e.g., via processLogger or the module's telemetry API) recording
timestamp, requesting agent id/session id, and that a screenshot occurred.
Ensure the consent flag is stored for the session scope so subsequent
captureScreenshot calls in the same session skip prompting, and keep the
existing tmp file cleanup behavior (unlink) unchanged.
- Around line 36-38: The tmpFile function currently builds filenames using
Date.now(), which can collide under concurrent async calls; update tmpFile to
include cryptographic randomness instead of (or in addition to) Date.now()—for
example use crypto.randomBytes(...) or crypto.randomUUID() to produce a unique
suffix and incorporate it into the generated filename while still calling
join(tmpdir(), ...); locate the tmpFile function and replace the Date.now()
usage with a crypto-based identifier (ensure you import Node's crypto),
preserving the prefix/ext parameters and filename format.
---
Duplicate comments:
In `@packages/agent-core/mcp-tools/desktop-control/src/index.ts`:
- Around line 459-472: The PowerShell script written via tmpFile('focus','ps1')
and writeFile always exits 0 because Write-Error is non-terminating; change the
script so that when no process matches the Title it writes the error and then
calls exit 1 (or throws) so the spawned execFile call returns a non-zero exit
code and the focusWindow flow does not report success erroneously. Locate the
script construction in index.ts (the ps1 temp file created for focus) and append
an explicit exit 1 in the else branch (or replace Write-Error with Write-Error;
exit 1) so failures propagate to the caller.
- Around line 437-447: The code is currently treating X11 window IDs as process
IDs by returning pid: Number(wid) inside the wids mapping; instead call xdotool
getwindowpid for each wid to retrieve the real process ID and parse it to a
number, and if that command fails or returns non-numeric data leave pid
undefined. Update the async mapping in the Promise.all block (the lambda that
calls execFile('xdotool', ['getwindowname', wid])) to run a second
execFile('xdotool', ['getwindowpid', wid]) after successfully getting the name,
convert the stdout to Number when valid, and return { name: name.trim(), pid:
pidNumberOrUndefined } while still returning null on failures so
results.filter((r): r is WindowInfo => r !== null) remains correct.
---
Nitpick comments:
In `@packages/agent-core/tests/unit/opencode/adapter.test.ts`:
- Around line 167-272: The test duplicates the escape logic
(escapeShellArgWin32Issue596 / buildShellCommandIssue596) already defined as
escapeShellArgWin32 / buildShellCommand; hoist a single shared helper into the
outer describe (e.g. a new "Shell escaping utilities" scope) and have both inner
test blocks call that shared escapeShellArgWin32 and buildShellCommand to
eliminate byte-for-byte duplication and prevent drift — update references in the
Windows Issue `#596` tests to use escapeShellArgWin32 and buildShellCommand
(remove the duplicate escapeShellArgWin32Issue596 / buildShellCommandIssue596
functions) so both test suites rely on the same helpers.
ℹ️ Review info
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (12)
apps/desktop/scripts/bundle-skills.cjsapps/desktop/src/main/opencode/electron-options.tspackages/agent-core/mcp-tools/desktop-control/SKILL.mdpackages/agent-core/mcp-tools/desktop-control/package.jsonpackages/agent-core/mcp-tools/desktop-control/src/index.tspackages/agent-core/mcp-tools/desktop-control/tsconfig.jsonpackages/agent-core/src/internal/classes/OpenCodeAdapter.tspackages/agent-core/src/opencode/cli-resolver.tspackages/agent-core/src/opencode/config-generator.tspackages/agent-core/tests/unit/opencode/adapter.test.tspackages/agent-core/tests/unit/opencode/cli-resolver.test.tspackages/agent-core/tests/unit/opencode/config-generator.test.ts
🚧 Files skipped from review as they are similar to previous changes (4)
- packages/agent-core/mcp-tools/desktop-control/tsconfig.json
- packages/agent-core/src/internal/classes/OpenCodeAdapter.ts
- packages/agent-core/mcp-tools/desktop-control/package.json
- packages/agent-core/tests/unit/opencode/config-generator.test.ts
- Replace cmd.exe /s /c approach in buildPtySpawnArgs() with writing a temp .ps1 file and running powershell.exe -File, fixing crashes when task text contains double-quotes or other shell-special characters - Add /desktop-permission route to permission API server so desktop-control MCP tool can request permissions without triggering file-op validation - Update desktop-control PERMISSION_API_URL to use /desktop-permission - Update adapter tests to reflect new PowerShell spawn behaviour
There was a problem hiding this comment.
Actionable comments posted: 5
♻️ Duplicate comments (4)
packages/agent-core/mcp-tools/desktop-control/src/index.ts (4)
38-40:tmpFilecollision under concurrent calls — already flagged.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/agent-core/mcp-tools/desktop-control/src/index.ts` around lines 38 - 40, The tmpFile function can produce filename collisions under concurrent calls because it only uses Date.now(), so replace the uniqueness strategy in tmpFile with a robust unique suffix (e.g., crypto.randomUUID() or crypto.randomBytes hex, or use fs.mkdtemp/tmp library) to guarantee uniqueness; update the tmpFile implementation (function name tmpFile) to append that random/UUID token (or use a safe OS temporary-file API) to the generated filename and ensure the chosen approach is cross-platform and collision-resistant.
461-474: WindowsfocusWindowWrite-Errorleaves exit code 0 on miss — already flagged.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/agent-core/mcp-tools/desktop-control/src/index.ts` around lines 461 - 474, The current PowerShell script written to ps1 (via tmpFile and writeFile) uses Write-Error when no window matches, which still returns exit code 0; update the script so the non-match path causes a non-zero exit (e.g., replace or follow Write-Error with an explicit exit 1 or throw) so the calling focusWindow logic can detect failure; locate the ps1 generation where tmpFile('focus','ps1') and writeFile(ps1, `...`) are used and change the else branch to emit a real failure (exit 1 or throw) instead of only Write-Error.
76-127:captureScreenshotskips the permission gate — already flagged.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/agent-core/mcp-tools/desktop-control/src/index.ts` around lines 76 - 127, The captureScreenshot function currently performs screen capture unconditionally; add a permission check at the start of captureScreenshot (e.g., call the existing permission gate helper such as ensurePermission or checkAgentPermission, or verify the relevant permission flag) and abort/throw a clear permission error or return a CallToolResult error object if the permission is not granted before creating any temp files or invoking platform commands; ensure you still clean up any temp artifacts (png, ps1) in error paths and keep the function signature and returned CallToolResult shape unchanged.
433-449: LinuxlistWindowsassigns window IDs as PIDs — already flagged.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/agent-core/mcp-tools/desktop-control/src/index.ts` around lines 433 - 449, The code in listWindows is treating the xdotool window ID (wid) as the process id by doing Number(wid); instead call xdotool getwindowpid to fetch the real PID for each wid. In the wids.map async block (the execFile call that runs ['getwindowname', wid] and returns { name: ..., pid: Number(wid) }), add a second execFile call to run ['getwindowpid', wid], parse that stdout into a Number and use it as pid (and optionally include the original wid as id if WindowInfo supports it); update the returned shape to return the correct pid and handle errors as before.
🧹 Nitpick comments (2)
packages/agent-core/tests/unit/opencode/adapter.test.ts (2)
184-196: Local replicas of production logic — tests won't catch drift.
buildPs1ScriptandbuildPtySpawnArgsWin32duplicate the adapter's private logic. If the productionbuildPtySpawnArgschanges (e.g., switches quoting strategy, adds flags), these helpers won't fail. Consider extracting the PS1 script-building into a small, exported pure function (e.g.,buildPs1ScriptContent(command, args)) and importing it in the test. This way the tests exercise the real code.That said, this is consistent with the existing pattern in the file (lines 86–101 replicate
escapeShellArgandbuildShellCommand), and the file header acknowledges the limitation of not mocking the PTY layer.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/agent-core/tests/unit/opencode/adapter.test.ts` around lines 184 - 196, Tests duplicate the adapter's private PS1-building logic (buildPs1Script and buildPtySpawnArgsWin32), which can drift from production; extract the PS1 script construction into a small exported pure function (e.g., buildPs1ScriptContent(command, args)) in the adapter module and have the test import and call that exported function instead of the local buildPs1Script, then update buildPtySpawnArgsWin32 in the test to use the exported buildPs1ScriptContent for scriptContent so the tests exercise the real production logic.
82-101: ThegetShellArgsWin32helper and its tests now exercise a dead code path on Windows.With the adapter's
buildPtySpawnArgsnow routing all win32 invocations through the PS1 wrapper, thecmd.exe /s /cwrapping tested bygetShellArgsWin32(line 99) and the tests in lines 103-164 no longer correspond to any reachable Windows code path. TheescapeShellArgWin32branch (line 86-91) also tests the win32 branch ofescapeShellArg, which is now only reachable on non-Windows platforms (viabuildShellCommand→ used in the Unix shell path).Consider either removing or annotating these tests to clarify they cover legacy/unreachable logic, so future maintainers don't mistakenly rely on them as validation of the current Windows behavior.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/agent-core/tests/unit/opencode/adapter.test.ts` around lines 82 - 101, The tests exercise unreachable Windows code paths: update the test file to either remove or clearly mark the legacy helpers getShellArgsWin32 and escapeShellArgWin32 (and the buildShellCommand-based tests) as covering deprecated/unreachable logic now that buildPtySpawnArgs routes Win32 via the PS1 wrapper; specifically either delete the getShellArgsWin32/escapeShellArgWin32 helpers and associated specs or add a clear comment/skip/describe block noting they validate legacy behavior only so maintainers won't assume they reflect current Windows behavior driven by buildPtySpawnArgs.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@apps/desktop/src/main/permission-api.ts`:
- Around line 137-143: The synthesized uiData has a dead branch checking
rawData.details; because desktop-control spreads details into the body, details
is always undefined, so replace the ternary for isDesktopPermission to set
operation: String(rawData.operation) and filePath:
String(JSON.stringify(rawData)) (dropping the rawData.details check) and remove
the unnecessary "as unknown as FilePermissionRequestData" cast so the object
literal directly satisfies FilePermissionRequestData; update the uiData
assignment (and keep rawData: Record<string, unknown>) accordingly.
In `@packages/agent-core/mcp-tools/desktop-control/src/index.ts`:
- Around line 265-285: The tmp file `txt` can leak if `writeFile(ps1)` throws
because the `finally` cleanup only runs after the `try` that starts after both
writes; update the flow so both writes are performed inside the same try/finally
that unlinks `txt` and `ps1` (i.e., move the first writeFile(txt, ...) into the
try block that currently surrounds execFile or wrap both writeFile calls and
execFile in a single try), ensuring the `finally` block in index.ts unlinks both
`txt` and `ps1` even if either writeFile throws; reference the tmp variables
`txt` and `ps1`, the writeFile calls, the execFile('powershell', ...) call, and
the unlink calls to locate and fix the code.
- Around line 315-322: WIN_KEYS is missing a mapping for "forwarddelete",
causing buildSendKeysString to fall back to an invalid {FORWARDDELETE} token on
Windows; add an entry "forwarddelete": '{DELETE}' to the WIN_KEYS Record so the
forward-delete key (mapped in APPLE_KEY_CODES as keycode 117) is translated to
the correct SendKeys token and will work on Windows.
In `@packages/agent-core/src/internal/classes/OpenCodeAdapter.ts`:
- Line 840: The temp PowerShell script filename uses Date.now() which can
collide; change the construction of scriptPath in OpenCodeAdapter (the const
scriptPath = path.join(os.tmpdir(), `accomplish-run-${Date.now()}.ps1`)) to use
crypto.randomUUID() for uniqueness (e.g.
`accomplish-run-${crypto.randomUUID()}.ps1`) so concurrent/rapid spawns don't
overwrite each other's files before writeFileSync/execute; ensure
crypto.randomUUID is used where scriptPath is created and no other code paths
still rely on Date.now() here.
- Around line 838-841: The temp PowerShell script file created in
startTask/spawnSessionResumption (scriptPath) is never removed causing a leak;
capture scriptPath on the instance (e.g., this._tempPsScriptPath) when writing
the file and ensure it is unlinked in the process exit/cleanup path by deleting
it from within the onExit/handleProcessExit or dispose routines
(handleProcessExit/dispose) — ensure deletion is attempted regardless of exit
reason and guard with try/catch so removal failures don't throw.
---
Duplicate comments:
In `@packages/agent-core/mcp-tools/desktop-control/src/index.ts`:
- Around line 38-40: The tmpFile function can produce filename collisions under
concurrent calls because it only uses Date.now(), so replace the uniqueness
strategy in tmpFile with a robust unique suffix (e.g., crypto.randomUUID() or
crypto.randomBytes hex, or use fs.mkdtemp/tmp library) to guarantee uniqueness;
update the tmpFile implementation (function name tmpFile) to append that
random/UUID token (or use a safe OS temporary-file API) to the generated
filename and ensure the chosen approach is cross-platform and
collision-resistant.
- Around line 461-474: The current PowerShell script written to ps1 (via tmpFile
and writeFile) uses Write-Error when no window matches, which still returns exit
code 0; update the script so the non-match path causes a non-zero exit (e.g.,
replace or follow Write-Error with an explicit exit 1 or throw) so the calling
focusWindow logic can detect failure; locate the ps1 generation where
tmpFile('focus','ps1') and writeFile(ps1, `...`) are used and change the else
branch to emit a real failure (exit 1 or throw) instead of only Write-Error.
- Around line 76-127: The captureScreenshot function currently performs screen
capture unconditionally; add a permission check at the start of
captureScreenshot (e.g., call the existing permission gate helper such as
ensurePermission or checkAgentPermission, or verify the relevant permission
flag) and abort/throw a clear permission error or return a CallToolResult error
object if the permission is not granted before creating any temp files or
invoking platform commands; ensure you still clean up any temp artifacts (png,
ps1) in error paths and keep the function signature and returned CallToolResult
shape unchanged.
- Around line 433-449: The code in listWindows is treating the xdotool window ID
(wid) as the process id by doing Number(wid); instead call xdotool getwindowpid
to fetch the real PID for each wid. In the wids.map async block (the execFile
call that runs ['getwindowname', wid] and returns { name: ..., pid: Number(wid)
}), add a second execFile call to run ['getwindowpid', wid], parse that stdout
into a Number and use it as pid (and optionally include the original wid as id
if WindowInfo supports it); update the returned shape to return the correct pid
and handle errors as before.
---
Nitpick comments:
In `@packages/agent-core/tests/unit/opencode/adapter.test.ts`:
- Around line 184-196: Tests duplicate the adapter's private PS1-building logic
(buildPs1Script and buildPtySpawnArgsWin32), which can drift from production;
extract the PS1 script construction into a small exported pure function (e.g.,
buildPs1ScriptContent(command, args)) in the adapter module and have the test
import and call that exported function instead of the local buildPs1Script, then
update buildPtySpawnArgsWin32 in the test to use the exported
buildPs1ScriptContent for scriptContent so the tests exercise the real
production logic.
- Around line 82-101: The tests exercise unreachable Windows code paths: update
the test file to either remove or clearly mark the legacy helpers
getShellArgsWin32 and escapeShellArgWin32 (and the buildShellCommand-based
tests) as covering deprecated/unreachable logic now that buildPtySpawnArgs
routes Win32 via the PS1 wrapper; specifically either delete the
getShellArgsWin32/escapeShellArgWin32 helpers and associated specs or add a
clear comment/skip/describe block noting they validate legacy behavior only so
maintainers won't assume they reflect current Windows behavior driven by
buildPtySpawnArgs.
ℹ️ Review info
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
apps/desktop/src/main/permission-api.tspackages/agent-core/mcp-tools/desktop-control/src/index.tspackages/agent-core/src/internal/classes/OpenCodeAdapter.tspackages/agent-core/tests/unit/opencode/adapter.test.ts
| const rawData = data as Record<string, unknown>; | ||
| const uiData: FilePermissionRequestData = isDesktopPermission | ||
| ? ({ | ||
| operation: String(rawData.operation), | ||
| filePath: String(rawData.details ?? JSON.stringify(rawData)), | ||
| } as unknown as FilePermissionRequestData) | ||
| : (data as FilePermissionRequestData); |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# Explore the file structure and locate relevant files
find . -type f -name "permission-api.ts" | head -5
find . -type f -name "*desktop-control*" | head -10Repository: accomplish-ai/accomplish
Length of output: 109
🏁 Script executed:
# Read the full permission-api.ts file to understand context
wc -l apps/desktop/src/main/permission-api.tsRepository: accomplish-ai/accomplish
Length of output: 111
🏁 Script executed:
# Look for FilePermissionRequestData type definition
rg "FilePermissionRequestData" --type ts -B 3 -A 10Repository: accomplish-ai/accomplish
Length of output: 16250
🏁 Script executed:
# Search for how buildFilePermissionRequest is implemented and used
rg "buildFilePermissionRequest" --type ts -A 15 -B 2Repository: accomplish-ai/accomplish
Length of output: 6181
🏁 Script executed:
# Check permission-api.ts imports to see exact type being used
head -50 apps/desktop/src/main/permission-api.ts | grep -A 5 "import"Repository: accomplish-ai/accomplish
Length of output: 267
🏁 Script executed:
# Look for FileOperation type definition
rg "type FileOperation" --type ts -A 10Repository: accomplish-ai/accomplish
Length of output: 956
🏁 Script executed:
# Search for desktop-control to see how it sends data
find . -type f -name "*desktop-control*" | head -20Repository: accomplish-ai/accomplish
Length of output: 50
🏁 Script executed:
# Search for where details field is set in desktop requests
rg "details" --type ts | grep -i "desktop\|permission" | head -20Repository: accomplish-ai/accomplish
Length of output: 1172
🏁 Script executed:
# Check what agent-core index exports as PermissionFileRequestData
cat packages/agent-core/src/index.ts | grep -A 2 -B 2 "PermissionFileRequestData"Repository: accomplish-ai/accomplish
Length of output: 291
🏁 Script executed:
# Look at the FILE_OPERATIONS definition to understand FileOperation values
rg "FILE_OPERATIONS" --type ts -B 2 -A 15Repository: accomplish-ai/accomplish
Length of output: 12953
🏁 Script executed:
# Check the full context around lines 130-150 in permission-api.ts
sed -n '120,160p' apps/desktop/src/main/permission-api.tsRepository: accomplish-ai/accomplish
Length of output: 1673
🏁 Script executed:
# Verify the exact type definition exported from agent-core
grep -r "export.*FilePermissionRequestData" packages/agent-core/src/types/permission-handler.ts -A 10Repository: accomplish-ai/accomplish
Length of output: 333
🏁 Script executed:
# Check if there are any other places that send requests with a 'details' field
rg "details:" --type ts | grep -E "permission|file" | head -20Repository: accomplish-ai/accomplish
Length of output: 50
🏁 Script executed:
# Verify validation is skipped for desktop permissions
sed -n '110,125p' apps/desktop/src/main/permission-api.tsRepository: accomplish-ai/accomplish
Length of output: 662
🏁 Script executed:
# Check what architecture.md says about spawning node/npx
cat apps/desktop/src/main/architecture.md 2>/dev/null || echo "File not found, searching for it"
find . -name "architecture.md" -type fRepository: accomplish-ai/accomplish
Length of output: 123
🏁 Script executed:
# Verify that buildFilePermissionRequest doesn't validate the operation further
sed -n '1,50p' packages/agent-core/src/services/permission-handler.ts | grep -A 30 "buildFilePermissionRequest"Repository: accomplish-ai/accomplish
Length of output: 50
🏁 Script executed:
# Check if the PermissionRequest interface expects a FileOperation or string
rg "fileOperation" packages/agent-core/src/common/types/permission.ts -A 2 -B 2Repository: accomplish-ai/accomplish
Length of output: 202
🏁 Script executed:
# Confirm the exact content of buildFilePermissionRequest implementation
sed -n '100,130p' packages/agent-core/src/services/permission-handler.tsRepository: accomplish-ai/accomplish
Length of output: 1064
🏁 Script executed:
# Check if validateFilePermissionRequest is called for desktop permissions
sed -n '100,115p' apps/desktop/src/main/permission-api.tsRepository: accomplish-ai/accomplish
Length of output: 736
🏁 Script executed:
# Get the exact buildFilePermissionRequest implementation to see the full cast
sed -n '130,150p' packages/agent-core/src/services/permission-handler.tsRepository: accomplish-ai/accomplish
Length of output: 763
🏁 Script executed:
# Check if the renderer or downstream code validates fileOperation
rg "fileOperation" --type ts -B 2 -A 5 | grep -A 10 "render\|ui\|display" | head -30Repository: accomplish-ai/accomplish
Length of output: 2907
🏁 Script executed:
# Search for how fileOperation is used in the PermissionDialog or renderer
rg "fileOperation" --type tsx -A 5 -B 2Repository: accomplish-ai/accomplish
Length of output: 95
🏁 Script executed:
# Check if there's any validation or switching on fileOperation values in the renderer
rg "FILE_OPERATIONS\|fileOperation" apps/web/src --type ts --type tsx -A 5 | head -50Repository: accomplish-ai/accomplish
Length of output: 95
🏁 Script executed:
# Verify the exact line where buildFilePermissionRequest does the cast
sed -n '55,75p' packages/agent-core/src/services/permission-handler.tsRepository: accomplish-ai/accomplish
Length of output: 659
🏁 Script executed:
# Find the exact buildFilePermissionRequest implementation
rg "buildFilePermissionRequest" packages/agent-core/src/services/permission-handler.ts -A 20Repository: accomplish-ai/accomplish
Length of output: 637
🏁 Script executed:
# Look for PermissionDialog component to see how it uses fileOperation
find . -name "PermissionDialog*" -type fRepository: accomplish-ai/accomplish
Length of output: 131
🏁 Script executed:
# Check if any downstream code validates or restricts fileOperation values
rg "fileOperation" --type ts -C 5 | grep -E "switch|case|if.*fileOperation|FILE_OPERATIONS" -A 3 -B 3Repository: accomplish-ai/accomplish
Length of output: 50
🏁 Script executed:
# Check the PermissionDialog implementation
cat apps/web/src/client/components/execution/PermissionDialog.tsx | head -150Repository: accomplish-ai/accomplish
Length of output: 5830
🏁 Script executed:
# Search for where fileOperation is actually used/displayed in the UI
rg "fileOperation" apps/web/src/client/components/execution/ -B 3 -A 5Repository: accomplish-ai/accomplish
Length of output: 2239
Dead branch in filePath synthesis — remove unused rawData.details check.
desktop-control spreads details directly into the POST body ({ operation, ...details }), so rawData.details is always undefined. The conditional rawData.details ?? JSON.stringify(rawData) never takes the first branch; filePath always ends up as the full JSON.stringify(rawData), which redundantly includes the operation key. Simplify to:
const uiData: FilePermissionRequestData = isDesktopPermission
? {
operation: String(rawData.operation),
- filePath: String(rawData.details ?? JSON.stringify(rawData)),
- } as unknown as FilePermissionRequestData
+ filePath: JSON.stringify(
+ Object.fromEntries(
+ Object.entries(rawData).filter(([k]) => k !== 'operation'),
+ ),
+ ),
+ }
: (data as FilePermissionRequestData);The as unknown as FilePermissionRequestData cast is unnecessary after this change; the object literal already satisfies the type.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@apps/desktop/src/main/permission-api.ts` around lines 137 - 143, The
synthesized uiData has a dead branch checking rawData.details; because
desktop-control spreads details into the body, details is always undefined, so
replace the ternary for isDesktopPermission to set operation:
String(rawData.operation) and filePath: String(JSON.stringify(rawData))
(dropping the rawData.details check) and remove the unnecessary "as unknown as
FilePermissionRequestData" cast so the object literal directly satisfies
FilePermissionRequestData; update the uiData assignment (and keep rawData:
Record<string, unknown>) accordingly.
desktop-control/src/index.ts:
- Use crypto.randomUUID() in tmpFile() instead of Date.now() to
prevent filename collisions under concurrent async calls
- Add session-level consent gate to captureScreenshot(): prompt once
per process lifetime via checkPermission, deny subsequent calls if
the session was rejected, emit a structured audit log on each capture
- Move both writeFile(txt) and writeFile(ps1) calls inside the try
block in typeText() so txt is never leaked if writeFile(ps1) throws
- Add forwarddelete: '{DELETE}' to WIN_KEYS so the forward-delete key
doesn't silently misfire on Windows (mirroring APPLE_KEY_CODES 117)
permission-api.ts:
- Fix dead rawData.details branch in /desktop-permission synthesis;
since desktop-control spreads all detail fields into the POST body
rawData.details is always undefined — instead strip the 'operation'
key from rawData and JSON-stringify the remainder for filePath
OpenCodeAdapter.ts:
- Switch Windows spawn from powershell.exe .ps1 wrapper to direct
opencode.exe spawn: pass command+args directly to node-pty so the OS
hands each element as a raw argv entry with no shell quoting needed;
this also eliminates the temp-file leak and Date.now() collision
concerns that the previous PS1 approach introduced
- Update adapter unit tests to reflect the new direct-spawn contract
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (2)
packages/agent-core/mcp-tools/desktop-control/src/index.ts (2)
455-465: LinuxlistWindowssetspidto the window id.
xdotool searchyields window IDs, not process IDs. Usexdotool getwindowpidand leavepidundefined when unavailable.Proposed fix
const results = await Promise.all( wids.map(async (wid) => { try { const { stdout: name } = await execFile('xdotool', ['getwindowname', wid]); - return { name: name.trim(), pid: Number(wid) }; + let pid: number | undefined; + try { + const { stdout: pidOut } = await execFile('xdotool', ['getwindowpid', wid]); + pid = Number(pidOut.trim()); + } catch { + pid = undefined; + } + return { name: name.trim(), pid }; } catch { return null; } }), );xdotool getwindowpid documentation _NET_WM_PID behavior🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/agent-core/mcp-tools/desktop-control/src/index.ts` around lines 455 - 465, In the listWindows implementation, the code incorrectly assigns the window id to pid; change the per-window lookup (wids.map handler) to call "xdotool getwindowpid <wid>" via execFile to obtain the real process id, parse it to Number and set pid to that value, and if getwindowpid fails or returns non-numeric output leave pid as undefined; keep the existing getwindowname call for name, reference the mapped variable wid and the surrounding Promise.all logic to locate where to update the behavior.
493-495: PowerShell focus script can report success on failure.
Write-Erroris non-terminating; add a non-zero exit soexecFilecan surface the failure.Proposed fix
-if($proc) { [WinFocus]::SetForegroundWindow($proc.MainWindowHandle) } -else { Write-Error "No window found matching: $Title" } +if($proc) { [WinFocus]::SetForegroundWindow($proc.MainWindowHandle) } +else { Write-Error "No window found matching: $Title"; exit 1 }PowerShell Write-Error exit code non-terminating error behavior🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/agent-core/mcp-tools/desktop-control/src/index.ts` around lines 493 - 495, The PowerShell snippet that finds a process ($proc) and calls [WinFocus]::SetForegroundWindow can still return a zero exit code when no window is found because Write-Error is non-terminating; update the script emitted by the code that builds this PowerShell string so that the failure branch uses a non-zero exit (e.g., replace or follow Write-Error "No window found matching: $Title" with either "Write-Error ...; exit 1" or "throw 'No window found matching: $Title'") and optionally add an explicit "exit 0" after [WinFocus]::SetForegroundWindow to guarantee success; ensure this change is made where the PowerShell string containing $proc, [WinFocus]::SetForegroundWindow, and Write-Error is constructed so execFile can detect failures.
🧹 Nitpick comments (4)
packages/agent-core/tests/unit/opencode/adapter.test.ts (1)
179-180: Comment restates what the code already says — remove or replace with a why comment.
// mirrors buildPtySpawnArgs for win32: command → file, args → argsdescribes what the one-liner does, which the return statement itself makes obvious. Per coding guidelines, comments must explain why, not what.✂️ Proposed fix
- function buildPtySpawnArgsWin32(command: string, args: string[]) { - // mirrors buildPtySpawnArgs for win32: command → file, args → args - return { file: command, args }; - } + function buildPtySpawnArgsWin32(command: string, args: string[]) { + // stub that documents the expected contract: direct spawn with no shell wrapping + return { file: command, args }; + }As per coding guidelines: "No unnecessary comments — Don't add comments that restate what the code does. Comments should explain why, not what."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/agent-core/tests/unit/opencode/adapter.test.ts` around lines 179 - 180, Remove the redundant comment "// mirrors buildPtySpawnArgs for win32: command → file, args → args" that only restates the return statement and either delete it or replace it with a brief "why" comment explaining the platform mapping rationale (e.g., why win32 expects {file, args} vs other platforms) so readers understand the reason for mirroring buildPtySpawnArgs; locate the return { file: command, args } in the test around the buildPtySpawnArgs behavior and update the comment accordingly.apps/desktop/src/main/permission-api.ts (1)
73-110: Remove comments that just restate the control flow.
The comments at Line 73-75 and Line 98-109 simply narrate the conditionals; consider removing them to keep comments focused on “why”.Suggested cleanup
- // Handle both file permissions (/permission) and desktop permissions (/desktop-permission) const isFilePermission = req.method === 'POST' && req.url === '/permission'; const isDesktopPermission = req.method === 'POST' && req.url === '/desktop-permission'; if (!isFilePermission && !isDesktopPermission) { res.writeHead(404, { 'Content-Type': 'application/json' }); res.end(JSON.stringify({ error: 'Not found' })); return; } ... - // For file permissions, validate the payload strictly. - // For desktop permissions, only require an `operation` field (no filePath needed). if (isFilePermission) { const validation = permissionHandler.validateFilePermissionRequest(data); if (!validation.valid) { res.writeHead(400, { 'Content-Type': 'application/json' }); res.end(JSON.stringify({ error: validation.error })); return; } } else { - // desktop-permission: just require an operation string const d = data as Record<string, unknown>; if (!d.operation || typeof d.operation !== 'string') { res.writeHead(400, { 'Content-Type': 'application/json' }); res.end(JSON.stringify({ error: 'operation is required for desktop permissions' })); return; } }As per coding guidelines: No unnecessary comments — Don't add comments that restate what the code does. Comments should explain why, not what.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@apps/desktop/src/main/permission-api.ts` around lines 73 - 110, Remove the redundant “what the code does” comments around the request-type checks and validation branches: delete the comment that states the control flow near the isFilePermission / isDesktopPermission declarations and delete the comment that explains the difference between file and desktop permissions above the isFilePermission conditional; if you need a comment, replace them with a concise “why” note (e.g., why desktop-permission omits filePath) or a short reference to the validation behavior for permissionHandler.validateFilePermissionRequest, but do not keep comments that merely restate the conditions or the HTTP handling logic.packages/agent-core/mcp-tools/desktop-control/src/index.ts (2)
160-165: Replace the nested ternary with a small map.
Keeps the intent readable and aligns with the no-nested-ternary guideline.Refactor
- const cmd = - button === 'right' - ? 'right click' - : button === 'double' - ? 'double click' - : 'click'; + const cmdMap: Record<'left' | 'right' | 'double', string> = { + left: 'click', + right: 'right click', + double: 'double click', + }; + const cmd = cmdMap[button];As per coding guidelines: Avoid nested ternaries — Use mapper objects or if/else for readability.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/agent-core/mcp-tools/desktop-control/src/index.ts` around lines 160 - 165, The nested ternary that builds cmd from button (const cmd = button === 'right' ? 'right click' : button === 'double' ? 'double click' : 'click') should be replaced with a small mapping object; create a map (e.g., const CLICK_MAP = { right: 'right click', double: 'double click' }) and compute cmd = CLICK_MAP[button] || 'click' so the intent is clearer and avoids nested ternaries (refer to the cmd variable and the button value in index.ts).
375-383: Avoid nested ternary inbuildSendKeysString.
This is a straightforward if/else and reads clearer.Refactor
- const k = - mainKey in WIN_KEYS - ? WIN_KEYS[mainKey] - : mainKey.length === 1 - ? mainKey - : `{${mainKey.toUpperCase()}}`; + let k: string; + if (mainKey in WIN_KEYS) { + k = WIN_KEYS[mainKey]; + } else if (mainKey.length === 1) { + k = mainKey; + } else { + k = `{${mainKey.toUpperCase()}}`; + }As per coding guidelines: Avoid nested ternaries — Use mapper objects or if/else for readability.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/agent-core/mcp-tools/desktop-control/src/index.ts` around lines 375 - 383, The nested ternary inside buildSendKeysString (which computes k using WIN_KEYS and mainKey) should be replaced with a clear if/else to improve readability: check if mainKey is in WIN_KEYS and set k = WIN_KEYS[mainKey]; else if mainKey.length === 1 set k = mainKey; otherwise set k = `{${mainKey.toUpperCase()}}`. Update the function to use this explicit conditional flow and keep the existing prefix computation with WIN_MODIFIERS and parseKeyParts unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@packages/agent-core/mcp-tools/desktop-control/src/index.ts`:
- Around line 526-529: The tool metadata for the 'desktop_screenshot' tool still
says "No permission required" despite the implementation now requesting
session-level consent; update the description string in the desktop_screenshot
tool definition to state that session consent is required (e.g., "Requires user
session consent to capture a screenshot; returns a base64-encoded PNG of the
primary display") so the description matches the behavior in the consent flow.
In `@packages/agent-core/tests/unit/opencode/adapter.test.ts`:
- Around line 177-229: The test stub buildPtySpawnArgsWin32 lacks the production
.exe validation present in OpenCodeAdapter.ts (the .exe suffix check in
buildPtySpawnArgs), so add that validation to the test helper or use the real
function: either import/require the actual buildPtySpawnArgs from
OpenCodeAdapter (preferred) and call the win32 branch, or re-implement the same
`.endsWith('.exe')` check in buildPtySpawnArgsWin32 and add a negative test
asserting an error is thrown for non-.exe commands; also replace the redundant
comment with one that documents this block as a contract/regression test and
explicitly calls out which production behaviour (the .exe validation) is being
verified or cannot be verified if importing is impossible.
---
Duplicate comments:
In `@packages/agent-core/mcp-tools/desktop-control/src/index.ts`:
- Around line 455-465: In the listWindows implementation, the code incorrectly
assigns the window id to pid; change the per-window lookup (wids.map handler) to
call "xdotool getwindowpid <wid>" via execFile to obtain the real process id,
parse it to Number and set pid to that value, and if getwindowpid fails or
returns non-numeric output leave pid as undefined; keep the existing
getwindowname call for name, reference the mapped variable wid and the
surrounding Promise.all logic to locate where to update the behavior.
- Around line 493-495: The PowerShell snippet that finds a process ($proc) and
calls [WinFocus]::SetForegroundWindow can still return a zero exit code when no
window is found because Write-Error is non-terminating; update the script
emitted by the code that builds this PowerShell string so that the failure
branch uses a non-zero exit (e.g., replace or follow Write-Error "No window
found matching: $Title" with either "Write-Error ...; exit 1" or "throw 'No
window found matching: $Title'") and optionally add an explicit "exit 0" after
[WinFocus]::SetForegroundWindow to guarantee success; ensure this change is made
where the PowerShell string containing $proc, [WinFocus]::SetForegroundWindow,
and Write-Error is constructed so execFile can detect failures.
---
Nitpick comments:
In `@apps/desktop/src/main/permission-api.ts`:
- Around line 73-110: Remove the redundant “what the code does” comments around
the request-type checks and validation branches: delete the comment that states
the control flow near the isFilePermission / isDesktopPermission declarations
and delete the comment that explains the difference between file and desktop
permissions above the isFilePermission conditional; if you need a comment,
replace them with a concise “why” note (e.g., why desktop-permission omits
filePath) or a short reference to the validation behavior for
permissionHandler.validateFilePermissionRequest, but do not keep comments that
merely restate the conditions or the HTTP handling logic.
In `@packages/agent-core/mcp-tools/desktop-control/src/index.ts`:
- Around line 160-165: The nested ternary that builds cmd from button (const cmd
= button === 'right' ? 'right click' : button === 'double' ? 'double click' :
'click') should be replaced with a small mapping object; create a map (e.g.,
const CLICK_MAP = { right: 'right click', double: 'double click' }) and compute
cmd = CLICK_MAP[button] || 'click' so the intent is clearer and avoids nested
ternaries (refer to the cmd variable and the button value in index.ts).
- Around line 375-383: The nested ternary inside buildSendKeysString (which
computes k using WIN_KEYS and mainKey) should be replaced with a clear if/else
to improve readability: check if mainKey is in WIN_KEYS and set k =
WIN_KEYS[mainKey]; else if mainKey.length === 1 set k = mainKey; otherwise set k
= `{${mainKey.toUpperCase()}}`. Update the function to use this explicit
conditional flow and keep the existing prefix computation with WIN_MODIFIERS and
parseKeyParts unchanged.
In `@packages/agent-core/tests/unit/opencode/adapter.test.ts`:
- Around line 179-180: Remove the redundant comment "// mirrors
buildPtySpawnArgs for win32: command → file, args → args" that only restates the
return statement and either delete it or replace it with a brief "why" comment
explaining the platform mapping rationale (e.g., why win32 expects {file, args}
vs other platforms) so readers understand the reason for mirroring
buildPtySpawnArgs; locate the return { file: command, args } in the test around
the buildPtySpawnArgs behavior and update the comment accordingly.
ℹ️ Review info
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
apps/desktop/src/main/permission-api.tspackages/agent-core/mcp-tools/desktop-control/src/index.tspackages/agent-core/src/internal/classes/OpenCodeAdapter.tspackages/agent-core/tests/unit/opencode/adapter.test.ts
🚧 Files skipped from review as they are similar to previous changes (1)
- packages/agent-core/src/internal/classes/OpenCodeAdapter.ts
buildPtySpawnArgsWin32 was a bare pass-through that gave zero regression protection for the .exe suffix guard in the production buildPtySpawnArgs (which throws immediately on non-.exe commands). Changes: - Re-implement the .exe check inside the local stub used by the test block - Add a dedicated negative test: asserts Error is thrown for both a path without .exe and a Unix-style path — covering the exact production guard - Expand block comment to document why this is a contract/regression test and to call out that buildPtySpawnArgs is private (cannot be imported) - Remove the redundant 'mirrors...' inline comment (restated what, not why) Addresses CodeRabbitAI review comment on PR accomplish-ai#633.
…ailures
When the Gemini API has no context cache (fresh session after a rate-limit
reset or app restart), the model ignored the MCP-registered tool names and
tried to call 'start_task' / 'complete_task' literally from the system
prompt text. OpenCode registers these as 'start-task_start_task' and
'complete-task_complete_task', so the calls hit the 'invalid' tool handler
and the task failed immediately before any desktop automation ran.
Root cause confirmed via opencode log: MCP servers DID connect successfully
(toolCount=7 for desktop-control, toolCount=1 for start-task/complete-task),
but the first Gemini inference call — with cache:{read:0} — used the literal
names from the prompt rather than the registered MCP names.
Fix: replace every 'start_task' and 'complete_task' call-site in the system
prompt with 'start-task_start_task' and 'complete-task_complete_task'.
Update the config-generator tests to match.
35517a0 to
19f95e5
Compare
Sure final_fix_working.mp4 |
|
@RajdeepKushwaha5 I don't see the side effect i.e the notepad being written into |
…tests
- Wrap server startup in main() with .catch() per MCP tool convention
- Windows focusWindow: escape wildcard chars via WildcardPattern::Escape; add exit 1 to make Write-Error terminating
- Linux listWindows: use xdotool getwindowpid to return real process IDs instead of X11 window IDs
- Update screenshot tool description to reflect session-level consent gate
- typeText (Windows): fix temp-file leak — move both writeFile calls inside try so finally always cleans up txt and ps1
- tests: update 5 system-prompt assertions to use full MCP tool names (complete-task_complete_task, start-task_start_task) to match last commit
- tests: fix 2 pre-existing Windows path-separator failures by using path.join('dist', 'index.mjs') in assertions
- Add windowsHide:true to all Windows PowerShell execFile calls so the console window no longer steals keyboard focus from the target app - In typeText: capture foreground window handle before writing to clipboard, then restore it with SetForegroundWindow before SendKeys so paste lands in the correct window even if PS spawning shifts focus - In focusWindow: add 400ms sleep after SetForegroundWindow to let the OS fully activate the window before callers proceed to type - In captureScreenshot: scale bitmap to max 1280px wide (HighQuality- Bicubic) so base64 payload stays within model vision input limits
SetForegroundWindow silently fails from hidden/background processes due to Windows foreground locking rules. Replace it with WScript.Shell.AppActivate which activates windows by title from any process, then use WScript.Shell .SendKeys to paste -- guaranteeing the keystrokes reach the correct window.
…ext targeting focusWindow now uses WScript.Shell.AppActivate (works from background processes) and writes the exact window title to dc-focused-window.txt in TEMP. typeText reads that file to re-activate the correct window before pasting, eliminating the race where Electron regains focus between the two tool calls.
…ndows
- focusWindow now persists process PID (not title) to dc-focused-window.txt,
making it immune to title changes (e.g. Notepad prepends '*' on unsaved edits)
- typeText adopts a three-layer strategy on Windows:
1. UI Automation ValuePattern: programmatic append with no focus requirement,
works from any background process (Notepad, WordPad, Win32/WinUI 3 editors)
2. UIA SetFocus + clipboard paste: for editors that support UIA focus but not
ValuePattern
3. AppActivate(pid) + clipboard paste: immune fallback using stored PID
- Addresses code-review feedback: fixes the invisible side-effect issue where
typeText failed silently after Notepad window title changed post-focusWindow
Hi Sir @galElmalah ! Here's a video walkthrough of the recent updates — Windows spawn fixes, desktop automation features, and code review improvements. Let me know if you have any questions! accomplish_change.mp4 |
|
Jira ticket: https://accomplish-ai.atlassian.net/browse/ENG-688 |
Summary
Adds
desktop-control, a new MCP tool that gives the agent seven native desktopautomation capabilities beyond the browser, working across macOS, Windows, and
Linux with zero compiled native Node.js dependencies.
Tools
desktop_screenshotdesktop_mouse_clickdesktop_mouse_movedesktop_type_textdesktop_key_pressctrl+c,Return,F5, …)desktop_list_windowsdesktop_focus_windowSecurity design
catch {}in the permission check returnsfalse. The server never defaults to "allow" when the permission API isunreachable (contrast: competing PR feat: add native desktop automation MCP tool #606
catch { return true }).execFile()everywhere: noexec()or shell template strings with userinput embedded. All user-controlled values are passed as discrete typed
arguments or out-of-band via temp files.
desktop_type_textusespbcopystdin on macOS,PowerShell clipboard on Windows, and
xdotool type -- texton Linux — thetext being typed is never interpolated into any command string.
osascriptare sanitisedthrough
escapeAppleScriptString().Platform implementation
macOS (built-in tools only, except mouse_move):
screencapture -x→ reads file → base64 PNG (actual PNG bytes, not raw pixels)osascript+ System Events for click, keystroke, key code, window managementcliclick(brew install cliclick), documented in SKILL.mdWindows (built-in PowerShell only):
.ps1scripts — output path / arguments passed as typedPowerShell parameters, never string-interpolated
System.Windows.Formsfor screenshot + clipboard; P/Invoke for mouse simulationLinux (xdotool + scrot):
scrotwith fallback tognome-screenshotxdotoolfor mouse, keyboard, and window managementWhat's different from existing PRs
return trueon error)execFileonlyIntegration
config-generator.tsalongside the other MCP tools, withPERMISSION_API_PORTforwarded in environmentSKILLS_FULLY_BUNDLEDandbundlesinbundle-skills.cjsrequiredMcpDistEntriesupdated; new assertions verify the server isregistered with correct permission port in both default and custom scenarios
Testing
-pnpm typecheck ✅
-pnpm lint:eslint ✅ (exit 0)
-vitest run [config-generator.test.ts]- ✅ 57 tests, 55 pass, 2 pre-existing Windows path-separator failures unrelated to this PR
Summary by CodeRabbit
New Features
Documentation
Bug Fixes
Tests