Skip to content

feat(desktop): native desktop automation MCP tool (closes #189)#633

Open
RajdeepKushwaha5 wants to merge 14 commits intoaccomplish-ai:mainfrom
RajdeepKushwaha5:feat/issue-189-desktop-control
Open

feat(desktop): native desktop automation MCP tool (closes #189)#633
RajdeepKushwaha5 wants to merge 14 commits intoaccomplish-ai:mainfrom
RajdeepKushwaha5:feat/issue-189-desktop-control

Conversation

@RajdeepKushwaha5
Copy link
Contributor

@RajdeepKushwaha5 RajdeepKushwaha5 commented Feb 22, 2026

Summary

Adds desktop-control, a new MCP tool that gives the agent seven native desktop
automation capabilities beyond the browser, working across macOS, Windows, and
Linux with zero compiled native Node.js dependencies.

Tools

Tool Description Permission
desktop_screenshot Capture primary display as PNG None (read-only)
desktop_mouse_click Click (left/right/double) at screen coordinates
desktop_mouse_move Move cursor to screen coordinates
desktop_type_text Type Unicode text at caret
desktop_key_press Press key combos (ctrl+c, Return, F5, …)
desktop_list_windows List visible windows as JSON None (read-only)
desktop_focus_window Bring application to foreground

Security design

  • Permission fails closed: every catch {} in the permission check returns
    false. The server never defaults to "allow" when the permission API is
    unreachable (contrast: competing PR feat: add native desktop automation MCP tool #606 catch { return true }).
  • execFile() everywhere: no exec() or shell template strings with user
    input embedded. All user-controlled values are passed as discrete typed
    arguments or out-of-band via temp files.
  • Injection-proof typing: desktop_type_text uses pbcopy stdin on macOS,
    PowerShell clipboard on Windows, and xdotool type -- text on Linux — the
    text being typed is never interpolated into any command string.
  • AppleScript escaping: window names passed to osascript are sanitised
    through escapeAppleScriptString().

Platform implementation

macOS (built-in tools only, except mouse_move):

  • screencapture -x → reads file → base64 PNG (actual PNG bytes, not raw pixels)
  • osascript + System Events for click, keystroke, key code, window management
  • mouse_move: requires cliclick (brew install cliclick), documented in SKILL.md

Windows (built-in PowerShell only):

  • Parameterised .ps1 scripts — output path / arguments passed as typed
    PowerShell parameters, never string-interpolated
  • System.Windows.Forms for screenshot + clipboard; P/Invoke for mouse simulation

Linux (xdotool + scrot):

  • scrot with fallback to gnome-screenshot
  • xdotool for mouse, keyboard, and window management

What's different from existing PRs

Issue PR #431 PR #572 PR #606 This PR
Permission fails closed ❌ (return true on error)
No shell injection execFile only
Correct screenshot format ❌ raw pixels
macOS listWindows ❌ newline bug ✅ JXA JSON
Windows bounds/coords ❌ PID as x
express in right deps ❌ devDeps n/a (no express)

Integration

  • Registered in config-generator.ts alongside the other MCP tools, with
    PERMISSION_API_PORT forwarded in environment
  • Added to SKILLS_FULLY_BUNDLED and bundles in bundle-skills.cjs
  • "Desktop Automation" capability added to agent system prompt
  • Tests: requiredMcpDistEntries updated; new assertions verify the server is
    registered with correct permission port in both default and custom scenarios

Testing

-pnpm typecheck ✅
-pnpm lint:eslint ✅ (exit 0)
-vitest run [config-generator.test.ts]- ✅ 57 tests, 55 pass, 2 pre-existing Windows path-separator failures unrelated to this PR

Summary by CodeRabbit

  • New Features

    • Desktop automation tools (screenshots, mouse/keyboard, window list/focus), a local desktop-control MCP server now included in the desktop bundle, and integrated desktop permission endpoints with UI flow.
  • Documentation

    • Added user-facing desktop-control documentation covering platforms, permissions, command formats, and workflows.
  • Bug Fixes

    • Improved CLI invocation and executable path handling on Windows; suppresses console windows and avoids quoting issues.
  • Tests

    • Added tests for Windows CLI/path handling and for the new server/config behavior.

Copilot AI review requested due to automatic review settings February 22, 2026 18:51
@orcaman
Copy link
Contributor

orcaman commented Feb 22, 2026

Snyk checks have passed. No issues have been found so far.

Status Scanner Critical High Medium Low Total (0)
Open Source Security 0 0 0 0 0 issues
Licenses 0 0 0 0 0 issues

💻 Catch issues earlier using the plugins for VS Code, JetBrains IDEs, Visual Studio, and Eclipse.

@coderabbitai
Copy link

coderabbitai bot commented Feb 22, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds a new local MCP tool "desktop-control" for cross-platform desktop automation, includes it in bundling and generated MCP servers, replaces shell exec calls with execFileSync for CLI/version resolution, adjusts Windows PTY spawn logic and tests, and extends the desktop permission API with a /desktop-permission endpoint.

Changes

Cohort / File(s) Summary
Desktop-Control MCP
packages/agent-core/mcp-tools/desktop-control/SKILL.md, packages/agent-core/mcp-tools/desktop-control/package.json, packages/agent-core/mcp-tools/desktop-control/tsconfig.json, packages/agent-core/mcp-tools/desktop-control/src/index.ts
New MCP tool "desktop-control": cross-platform handlers (screenshot, mouse, keyboard, window list/focus), per-tool schemas, permission checks, and Stdio MCP server implementation.
Bundling & Config
apps/desktop/scripts/bundle-skills.cjs, packages/agent-core/src/opencode/config-generator.ts, packages/agent-core/tests/unit/opencode/config-generator.test.ts
Include desktop-control in fully bundled skills and generated MCP servers; add PERMISSION_API_PORT env and 60s timeout; update tests to expect new server and port behavior.
Permission API (Electron)
apps/desktop/src/main/permission-api.ts, apps/desktop/src/main/opencode/electron-options.ts
Add /desktop-permission POST endpoint, synthesize UI-compatible permission requests for desktop ops, validate desktop-specific payloads; change CLI invocation to use execFileSync with windowsHide.
CLI/version resolution
packages/agent-core/src/opencode/cli-resolver.ts, apps/desktop/src/main/opencode/electron-options.ts, packages/agent-core/tests/unit/opencode/cli-resolver.test.ts
Replace shell-based execSync with execFileSync and explicit ["--version"] arg, add windowsHide/timeout; add tests ensuring non-existent paths with spaces return null safely.
Windows spawn / OpenCode adapter & tests
packages/agent-core/src/internal/classes/OpenCodeAdapter.ts, packages/agent-core/tests/unit/opencode/adapter.test.ts
Clarify Windows branch to directly spawn .exe with {file, args}, add Windows-focused regression tests covering spaces, Unicode, and shell-sensitive characters (duplicate test blocks present).

Sequence Diagram

sequenceDiagram
    participant Agent as Agent/Client
    participant MCP as Desktop-Control MCP
    participant PermAPI as Permission API
    participant OS as OS Platform

    Agent->>MCP: CallToolRequest (desktop action)
    MCP->>PermAPI: Request desktop permission (operation, metadata)
    alt permission granted
        PermAPI-->>MCP: Granted
        MCP->>OS: Execute platform-specific command (AppleScript / PowerShell / xdotool)
        OS-->>MCP: Result (image/base64, status, window list)
        MCP-->>Agent: Tool result
    else denied or API error
        PermAPI-->>MCP: Denied/Error
        MCP-->>Agent: Error response
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested reviewers

  • mcmatan
  • galElmalah
  • DanielScharfsteinGit

Poem

🐇 I tiptoe across the glowing screen,
I ask for leave to click and glean,
I paste and move and capture light,
I wait for nods, then do it right,
A cheeky rabbit, automating bright.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 35.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: adding a native desktop automation MCP tool with cross-platform support, which aligns with the comprehensive changes introducing desktop-control capabilities.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a new desktop-control MCP tool that provides native desktop automation capabilities across macOS, Windows, and Linux, while also fixing Issue #596 related to Windows path handling for paths containing spaces.

Changes:

  • Adds desktop-control MCP tool with 7 desktop automation capabilities (screenshot, mouse control, keyboard input, window management)
  • Fixes Issue #596 by replacing execSync with execFileSync in cli-resolver.ts and electron-options.ts
  • Updates OpenCodeAdapter to route Windows commands through cmd.exe /s /c with proper quoting
  • Adds comprehensive unit tests for Windows path handling with spaces

Reviewed changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
packages/agent-core/mcp-tools/desktop-control/src/index.ts New MCP server implementing 7 desktop automation tools with security-first design
packages/agent-core/mcp-tools/desktop-control/package.json Package configuration for desktop-control tool
packages/agent-core/mcp-tools/desktop-control/tsconfig.json TypeScript configuration matching other MCP tools
packages/agent-core/mcp-tools/desktop-control/SKILL.md Comprehensive documentation for AI agent usage
packages/agent-core/src/opencode/config-generator.ts Registers desktop-control MCP server and adds capability to system prompt
packages/agent-core/src/opencode/cli-resolver.ts Replaces execSync with execFileSync to fix Issue #596
packages/agent-core/src/internal/classes/OpenCodeAdapter.ts Routes Windows .exe through cmd.exe with proper quoting for Issue #596
apps/desktop/src/main/opencode/electron-options.ts Applies same execFileSync fix for consistency
apps/desktop/scripts/bundle-skills.cjs Adds desktop-control to bundling configuration
packages/agent-core/tests/unit/opencode/config-generator.test.ts Adds tests verifying desktop-control registration
packages/agent-core/tests/unit/opencode/cli-resolver.test.ts Adds regression test for Issue #596
packages/agent-core/tests/unit/opencode/adapter.test.ts Comprehensive tests for Windows path escaping

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +726 to +727
const transport = new StdioServerTransport();
await server.connect(transport);
Copy link

Copilot AI Feb 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The server startup code should follow the established pattern used by all other MCP tools in this codebase. Wrap the transport creation and server.connect() call in a main() async function with .catch() error handling. This ensures consistent error handling across all MCP tools and provides better error messages when the server fails to start. All other MCP tools (file-permission, ask-user-question, complete-task, start-task, report-checkpoint, report-thought, dev-browser-mcp) follow this pattern.

Suggested change
const transport = new StdioServerTransport();
await server.connect(transport);
async function main(): Promise<void> {
const transport = new StdioServerTransport();
await server.connect(transport);
}
main().catch((error) => {
console.error('Failed to start desktop-control MCP server:', error);
process.exit(1);
});

Copilot uses AI. Check for mistakes.
[DllImport("user32.dll")] public static extern bool SetForegroundWindow(IntPtr h);
}
'@
$proc = Get-Process | Where-Object {$_.MainWindowTitle -like "*$Title*"} | Select-Object -First 1
Copy link

Copilot AI Feb 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Windows focusWindow implementation uses PowerShell's -like operator with wildcards ($Title), which will treat special characters in the window name as wildcards (e.g., *, ?, [, ]). Consider using -eq for exact matching or escaping the wildcard characters in $Title to avoid unexpected matching behavior. For example, a window name containing brackets like "Test[1]" might not match correctly.

Suggested change
$proc = Get-Process | Where-Object {$_.MainWindowTitle -like "*$Title*"} | Select-Object -First 1
$escapedTitle = [System.Management.Automation.WildcardPattern]::Escape($Title)
$proc = Get-Process | Where-Object {$_.MainWindowTitle -like "*$escapedTitle*"} | Select-Object -First 1

Copilot uses AI. Check for mistakes.
## Tools

### `desktop_screenshot`
Capture the entire primary display. Returns a base64-encoded PNG image. No permission required.
Copy link

Copilot AI Feb 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The desktop_screenshot tool captures the full contents of the primary display and is explicitly documented as requiring no permission, even though this can expose highly sensitive on‑screen information (passwords, 2FA codes, private documents, etc.). In a prompt‑injection or compromised orchestrator scenario, an attacker could invoke this tool repeatedly to exfiltrate whatever is visible on the user’s screen without any explicit, per‑action user approval. Consider aligning screenshots with the permission model (e.g., using the same permission API or a dedicated "allow screen capture" prompt) so users must explicitly approve each capture before the image is returned.

Suggested change
Capture the entire primary display. Returns a base64-encoded PNG image. No permission required.
Capture the entire primary display. Returns a base64-encoded PNG image. Permission required; each capture must be explicitly approved by the user (via the same permission model used for other desktop tools).

Copilot uses AI. Check for mistakes.
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (3)
packages/agent-core/mcp-tools/desktop-control/package.json (1)

12-16: Consider moving @modelcontextprotocol/sdk to dependencies.

The @modelcontextprotocol/sdk package is likely used at runtime (imported in src/index.ts to create the MCP server). Placing it in devDependencies means it won't be installed in production builds or when this package is used as a dependency.

Since this tool is bundled (per bundle-skills.cjs), the bundler may resolve it regardless. However, for clarity and correctness, runtime dependencies should be in dependencies.

Proposed fix
-  "devDependencies": {
-    "@modelcontextprotocol/sdk": "^1.0.0",
+  "dependencies": {
+    "@modelcontextprotocol/sdk": "^1.0.0"
+  },
+  "devDependencies": {
     "tsx": "^4.21.0",
     "typescript": "^5.0.0"
   }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/agent-core/mcp-tools/desktop-control/package.json` around lines 12 -
16, The package "@modelcontextprotocol/sdk" is declared in devDependencies but
is required at runtime (imported in src/index.ts to create the MCP server) so
move it into "dependencies" in package.json; update the package.json
devDependencies/dev block to remove "@modelcontextprotocol/sdk" and add it under
"dependencies" (keeping the same version "^1.0.0") so production installs and
consumers get the runtime dependency, and ensure any bundler config
(bundle-skills.cjs) still works with the dependency present.
packages/agent-core/mcp-tools/desktop-control/src/index.ts (2)

355-360: Replace the nested ternary with explicit branching for key mapping.

♻️ Suggested refactor
-  const k =
-    mainKey in WIN_KEYS
-      ? WIN_KEYS[mainKey]
-      : mainKey.length === 1
-        ? mainKey
-        : `{${mainKey.toUpperCase()}}`;
+  let k: string;
+  if (mainKey in WIN_KEYS) {
+    k = WIN_KEYS[mainKey];
+  } else if (mainKey.length === 1) {
+    k = mainKey;
+  } else {
+    k = `{${mainKey.toUpperCase()}}`;
+  }

As per coding guidelines: Avoid nested ternaries — Use mapper objects or if/else for readability.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/agent-core/mcp-tools/desktop-control/src/index.ts` around lines 355
- 360, The expression that computes k uses a nested ternary which reduces
readability; replace it with explicit branching using either an if/else block or
a small helper mapper so the logic is clear: first check WIN_KEYS for mainKey
(WIN_KEYS[mainKey]), then if not present check whether mainKey.length === 1 and
return mainKey, otherwise return the fallback `"{"+mainKey.toUpperCase()+"}"`;
update the code that assigns k (the constant computed from mainKey and WIN_KEYS)
and any caller that relies on that value to use the clearer branching
implementation.

137-142: Replace the nested ternary with a mapping table for readability.

♻️ Suggested refactor
-    const cmd =
-      button === 'right'
-        ? 'right click'
-        : button === 'double'
-          ? 'double click'
-          : 'click';
+    const CLICK_CMD: Record<'left' | 'right' | 'double', string> = {
+      left: 'click',
+      right: 'right click',
+      double: 'double click',
+    };
+    const cmd = CLICK_CMD[button];

As per coding guidelines: Avoid nested ternaries — Use mapper objects or if/else for readability.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/agent-core/mcp-tools/desktop-control/src/index.ts` around lines 137
- 142, The nested ternary that sets cmd based on button should be replaced with
a simple mapping object or switch to improve readability: create a const mapping
(e.g., const clickMap = { right: 'right click', double: 'double click', default:
'click' } or similar) and use button to look up the value for cmd, falling back
to 'click' when not present; update the code that assigns cmd (the variable
named cmd that currently uses the nested ternary) to use this mapper instead.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@packages/agent-core/mcp-tools/desktop-control/src/index.ts`:
- Around line 431-447: The Linux branch currently sets pid to the X11 window ID
(wids) instead of the process ID; update the mapping in the wids -> results
Promise.all block to call execFile('xdotool', ['getwindowpid', wid]) to retrieve
the real PID, parse it to a number, and set pid to undefined when
retrieval/parsing fails (keep the existing try/catch so windows without
_NET_WM_PID are handled). Locate the async map over wids (the block that
currently calls execFile('xdotool', ['getwindowname', wid]) and returns { name:
..., pid: Number(wid) }) and replace the pid logic to use getwindowpid and
conditional undefined on failure, preserving WindowInfo typing and filtering out
nulls as before.
- Around line 459-472: The PowerShell script written via tmpFile('focus', 'ps1')
and writeFile can return success even when no window matches because Write-Error
is non-terminating; update the here-doc body (the script string passed to
writeFile) so the else branch both writes the error and then calls exit 1 (e.g.,
after Write-Error "No window found matching: $Title" add exit 1) so the process
returns a non-zero exit code and execFile will surface the failure; locate the
ps1 script content where WinFocus and SetForegroundWindow are defined and modify
only the else branch to propagate the error.

---

Nitpick comments:
In `@packages/agent-core/mcp-tools/desktop-control/package.json`:
- Around line 12-16: The package "@modelcontextprotocol/sdk" is declared in
devDependencies but is required at runtime (imported in src/index.ts to create
the MCP server) so move it into "dependencies" in package.json; update the
package.json devDependencies/dev block to remove "@modelcontextprotocol/sdk" and
add it under "dependencies" (keeping the same version "^1.0.0") so production
installs and consumers get the runtime dependency, and ensure any bundler config
(bundle-skills.cjs) still works with the dependency present.

In `@packages/agent-core/mcp-tools/desktop-control/src/index.ts`:
- Around line 355-360: The expression that computes k uses a nested ternary
which reduces readability; replace it with explicit branching using either an
if/else block or a small helper mapper so the logic is clear: first check
WIN_KEYS for mainKey (WIN_KEYS[mainKey]), then if not present check whether
mainKey.length === 1 and return mainKey, otherwise return the fallback
`"{"+mainKey.toUpperCase()+"}"`; update the code that assigns k (the constant
computed from mainKey and WIN_KEYS) and any caller that relies on that value to
use the clearer branching implementation.
- Around line 137-142: The nested ternary that sets cmd based on button should
be replaced with a simple mapping object or switch to improve readability:
create a const mapping (e.g., const clickMap = { right: 'right click', double:
'double click', default: 'click' } or similar) and use button to look up the
value for cmd, falling back to 'click' when not present; update the code that
assigns cmd (the variable named cmd that currently uses the nested ternary) to
use this mapper instead.

@galElmalah
Copy link
Contributor

@RajdeepKushwaha5 can you attach a short video demonstrating these new capabilities?

…e paths with spaces

On Windows, pty.spawn() with a raw .exe path works in ConPTY mode (Windows 10
1809+) but fails in WinPTY fallback mode when the installation path contains
spaces. The intermediate cmd.exe session receives an unquoted path and splits
it at every space, producing the error:

  '"C:\Users\Anish Maheshwari\...\opencode.exe"' is not recognized

Fix buildPtySpawnArgs() on win32 to route through cmd.exe /s /c using the
existing buildShellCommand() helper which individually quotes each token that
contains spaces (Windows cmd.exe doubling convention).

Also replace the two execSync calls in cli-resolver.ts and electron-options.ts
with execFileSync(path, ['--version']) to eliminate cmd.exe shell involvement
from version detection entirely.

Tests: added 6 regression tests in adapter.test.ts (Issue accomplish-ai#596 describe block)
and 1 test in cli-resolver.test.ts verifying graceful handling of paths with
spaces.

Fixes: accomplish-ai#596
…sh-ai#189)

Add a new 'desktop-control' MCP tool that gives the agent native desktop
automation capabilities beyond CLI and browser, operating on all three
supported platforms without requiring any compiled native Node.js modules.

## Tools exposed

- desktop_screenshot  - capture primary display as PNG (no permission needed)
- desktop_mouse_click - left/right/double click at screen coordinates
- desktop_mouse_move  - move cursor to screen coordinates
- desktop_type_text   - type arbitrary Unicode text at caret position
- desktop_key_press   - press key combinations (ctrl+c, Return, F5, ...)
- desktop_list_windows - list visible application windows as JSON
- desktop_focus_window - bring an application to the foreground

## Security

- Permission fails CLOSED: catch {} returns false, never returns true
- execFile() used for every subprocess - no shell interpolation, no injection
- type_text uses clipboard-paste technique (pbcopy on macOS, PS clipboard
  on Windows, xdotool type on Linux) so user text is never embedded in
  a command string
- AppleScript strings with user input are escaped with escapeAppleScriptString()
- key_press validates input with regex before building platform commands

## Cross-platform implementation

macOS: screencapture (screenshot), osascript/System Events (mouse+keyboard+
  window management). mouse_move requires cliclick (brew install cliclick).
Windows: parameterised PS1 scripts with P/Invoke for mouse/keyboard,
  System.Windows.Forms for screenshot, clipboard and SendKeys.
Linux: scrot/gnome-screenshot (screenshot), xdotool for everything else.

## Integration

- Registered in config-generator.ts with PERMISSION_API_PORT env var
- Added to bundle-skills.cjs SKILLS_FULLY_BUNDLED list and bundles array
- Desktop Automation capability added to agent system prompt
- Unit tests updated: stub dist entry added to requiredMcpDistEntries;
  asserts server is registered with correct PERMISSION_API_PORT

Closes accomplish-ai#189
@RajdeepKushwaha5 RajdeepKushwaha5 force-pushed the feat/issue-189-desktop-control branch from 10c8dc0 to b40dea0 Compare February 24, 2026 12:32
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (2)
packages/agent-core/mcp-tools/desktop-control/src/index.ts (2)

459-472: Windows focusWindow reports success when no window matches.

Write-Error is a non-terminating PowerShell error — it does not set a non-zero exit code. PowerShell exits 0 regardless, so execFile resolves successfully and the tool returns "Focused window: <name>" even when nothing was focused. Add exit 1 to propagate the failure.

🐛 Proposed fix
-else { Write-Error "No window found matching: $Title" }
+else { Write-Error "No window found matching: $Title"; exit 1 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/agent-core/mcp-tools/desktop-control/src/index.ts` around lines 459
- 472, The PowerShell script written via tmpFile('focus','ps1') and writeFile
always exits 0 because Write-Error is non-terminating; change the script so that
when no process matches the Title it writes the error and then calls exit 1 (or
throws) so the spawned execFile call returns a non-zero exit code and the
focusWindow flow does not report success erroneously. Locate the script
construction in index.ts (the ps1 temp file created for focus) and append an
explicit exit 1 in the else branch (or replace Write-Error with Write-Error;
exit 1) so failures propagate to the caller.

437-447: Linux pid field is set to the X11 window ID, not the process ID.

xdotool search returns window IDs (arbitrary X11 identifiers). Storing them in the pid field of WindowInfo is semantically wrong and will confuse callers who attempt to signal or inspect the process. Use xdotool getwindowpid <wid> to retrieve the real PID; leave pid undefined when retrieval fails (e.g., for windows without _NET_WM_PID).

🐛 Proposed fix
-      wids.map(async (wid) => {
-        try {
-          const { stdout: name } = await execFile('xdotool', ['getwindowname', wid]);
-          return { name: name.trim(), pid: Number(wid) };
-        } catch {
-          return null;
-        }
-      }),
+      wids.map(async (wid) => {
+        try {
+          const { stdout: name } = await execFile('xdotool', ['getwindowname', wid]);
+          let pid: number | undefined;
+          try {
+            const { stdout: pidOut } = await execFile('xdotool', ['getwindowpid', wid]);
+            const parsed = Number(pidOut.trim());
+            if (Number.isFinite(parsed)) { pid = parsed; }
+          } catch {
+            // window does not set _NET_WM_PID — leave pid undefined
+          }
+          return { name: name.trim(), pid };
+        } catch {
+          return null;
+        }
+      }),
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/agent-core/mcp-tools/desktop-control/src/index.ts` around lines 437
- 447, The code is currently treating X11 window IDs as process IDs by returning
pid: Number(wid) inside the wids mapping; instead call xdotool getwindowpid for
each wid to retrieve the real process ID and parse it to a number, and if that
command fails or returns non-numeric data leave pid undefined. Update the async
mapping in the Promise.all block (the lambda that calls execFile('xdotool',
['getwindowname', wid])) to run a second execFile('xdotool', ['getwindowpid',
wid]) after successfully getting the name, convert the stdout to Number when
valid, and return { name: name.trim(), pid: pidNumberOrUndefined } while still
returning null on failures so results.filter((r): r is WindowInfo => r !== null)
remains correct.
🧹 Nitpick comments (1)
packages/agent-core/tests/unit/opencode/adapter.test.ts (1)

167-272: escapeShellArgWin32Issue596 is a byte-for-byte duplicate of escapeShellArgWin32 from the Issue #354 block.

Both helper functions are identical, and buildShellCommandIssue596 is functionally the same as buildShellCommand. If the production escaping strategy changes, both sets of helpers need to be updated independently — silent drift risk. Consider hoisting a single shared helper to the outer describe('Shell escaping utilities') scope and referencing it from both inner blocks.

♻️ Suggested extraction
+  // Shared helper for both Issue `#354` and Issue `#596` tests
+  function escapeShellArgForCmd(arg: string): string {
+    if (arg.includes(' ') || arg.includes('"')) {
+      return `"${arg.replace(/"/g, '""')}"`;
+    }
+    return arg;
+  }

   describe('Windows cmd.exe /s /c quoting (Issue `#354`)', () => {
-    function escapeShellArgWin32(arg: string): string {
-      if (arg.includes(' ') || arg.includes('"')) {
-        return `"${arg.replace(/"/g, '""')}"`;
-      }
-      return arg;
-    }
-    function buildShellCommand(command: string, args: string[]): string {
-      const escapedCommand = escapeShellArgWin32(command);
-      const escapedArgs = args.map((arg) => escapeShellArgWin32(arg));
-      return [escapedCommand, ...escapedArgs].join(' ');
-    }
+    function buildShellCommand(command: string, args: string[]): string {
+      return [command, ...args].map(escapeShellArgForCmd).join(' ');
+    }
     ...
   });

   describe('Windows spawn via cmd.exe /s /c – Issue `#596` (paths with spaces)', () => {
-    function escapeShellArgWin32Issue596(arg: string): string {
-      if (arg.includes(' ') || arg.includes('"')) {
-        return `"${arg.replace(/"/g, '""')}"`;
-      }
-      return arg;
-    }
-    function buildShellCommandIssue596(command: string, args: string[]): string {
-      return [command, ...args].map(escapeShellArgWin32Issue596).join(' ');
-    }
+    function buildShellCommandIssue596(command: string, args: string[]): string {
+      return [command, ...args].map(escapeShellArgForCmd).join(' ');
+    }
     ...
   });
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/agent-core/tests/unit/opencode/adapter.test.ts` around lines 167 -
272, The test duplicates the escape logic (escapeShellArgWin32Issue596 /
buildShellCommandIssue596) already defined as escapeShellArgWin32 /
buildShellCommand; hoist a single shared helper into the outer describe (e.g. a
new "Shell escaping utilities" scope) and have both inner test blocks call that
shared escapeShellArgWin32 and buildShellCommand to eliminate byte-for-byte
duplication and prevent drift — update references in the Windows Issue `#596`
tests to use escapeShellArgWin32 and buildShellCommand (remove the duplicate
escapeShellArgWin32Issue596 / buildShellCommandIssue596 functions) so both test
suites rely on the same helpers.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@packages/agent-core/mcp-tools/desktop-control/src/index.ts`:
- Around line 74-125: captureScreenshot currently takes full-screen images
without any permission check; add a session-level consent gate and audit
logging: when captureScreenshot is invoked, call the existing checkPermission
(or a new checkSessionConsent) to verify a session-scoped flag (e.g.,
allowDesktopScreenshots) and, if not set, prompt once for session consent before
proceeding; if consent is denied, throw or return a denied CallToolResult and do
not take the screenshot. After a successful capture, emit a structured audit
entry (e.g., via processLogger or the module's telemetry API) recording
timestamp, requesting agent id/session id, and that a screenshot occurred.
Ensure the consent flag is stored for the session scope so subsequent
captureScreenshot calls in the same session skip prompting, and keep the
existing tmp file cleanup behavior (unlink) unchanged.
- Around line 36-38: The tmpFile function currently builds filenames using
Date.now(), which can collide under concurrent async calls; update tmpFile to
include cryptographic randomness instead of (or in addition to) Date.now()—for
example use crypto.randomBytes(...) or crypto.randomUUID() to produce a unique
suffix and incorporate it into the generated filename while still calling
join(tmpdir(), ...); locate the tmpFile function and replace the Date.now()
usage with a crypto-based identifier (ensure you import Node's crypto),
preserving the prefix/ext parameters and filename format.

---

Duplicate comments:
In `@packages/agent-core/mcp-tools/desktop-control/src/index.ts`:
- Around line 459-472: The PowerShell script written via tmpFile('focus','ps1')
and writeFile always exits 0 because Write-Error is non-terminating; change the
script so that when no process matches the Title it writes the error and then
calls exit 1 (or throws) so the spawned execFile call returns a non-zero exit
code and the focusWindow flow does not report success erroneously. Locate the
script construction in index.ts (the ps1 temp file created for focus) and append
an explicit exit 1 in the else branch (or replace Write-Error with Write-Error;
exit 1) so failures propagate to the caller.
- Around line 437-447: The code is currently treating X11 window IDs as process
IDs by returning pid: Number(wid) inside the wids mapping; instead call xdotool
getwindowpid for each wid to retrieve the real process ID and parse it to a
number, and if that command fails or returns non-numeric data leave pid
undefined. Update the async mapping in the Promise.all block (the lambda that
calls execFile('xdotool', ['getwindowname', wid])) to run a second
execFile('xdotool', ['getwindowpid', wid]) after successfully getting the name,
convert the stdout to Number when valid, and return { name: name.trim(), pid:
pidNumberOrUndefined } while still returning null on failures so
results.filter((r): r is WindowInfo => r !== null) remains correct.

---

Nitpick comments:
In `@packages/agent-core/tests/unit/opencode/adapter.test.ts`:
- Around line 167-272: The test duplicates the escape logic
(escapeShellArgWin32Issue596 / buildShellCommandIssue596) already defined as
escapeShellArgWin32 / buildShellCommand; hoist a single shared helper into the
outer describe (e.g. a new "Shell escaping utilities" scope) and have both inner
test blocks call that shared escapeShellArgWin32 and buildShellCommand to
eliminate byte-for-byte duplication and prevent drift — update references in the
Windows Issue `#596` tests to use escapeShellArgWin32 and buildShellCommand
(remove the duplicate escapeShellArgWin32Issue596 / buildShellCommandIssue596
functions) so both test suites rely on the same helpers.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 10c8dc0 and b40dea0.

📒 Files selected for processing (12)
  • apps/desktop/scripts/bundle-skills.cjs
  • apps/desktop/src/main/opencode/electron-options.ts
  • packages/agent-core/mcp-tools/desktop-control/SKILL.md
  • packages/agent-core/mcp-tools/desktop-control/package.json
  • packages/agent-core/mcp-tools/desktop-control/src/index.ts
  • packages/agent-core/mcp-tools/desktop-control/tsconfig.json
  • packages/agent-core/src/internal/classes/OpenCodeAdapter.ts
  • packages/agent-core/src/opencode/cli-resolver.ts
  • packages/agent-core/src/opencode/config-generator.ts
  • packages/agent-core/tests/unit/opencode/adapter.test.ts
  • packages/agent-core/tests/unit/opencode/cli-resolver.test.ts
  • packages/agent-core/tests/unit/opencode/config-generator.test.ts
🚧 Files skipped from review as they are similar to previous changes (4)
  • packages/agent-core/mcp-tools/desktop-control/tsconfig.json
  • packages/agent-core/src/internal/classes/OpenCodeAdapter.ts
  • packages/agent-core/mcp-tools/desktop-control/package.json
  • packages/agent-core/tests/unit/opencode/config-generator.test.ts

- Replace cmd.exe /s /c approach in buildPtySpawnArgs() with writing a
  temp .ps1 file and running powershell.exe -File, fixing crashes when
  task text contains double-quotes or other shell-special characters
- Add /desktop-permission route to permission API server so desktop-control
  MCP tool can request permissions without triggering file-op validation
- Update desktop-control PERMISSION_API_URL to use /desktop-permission
- Update adapter tests to reflect new PowerShell spawn behaviour
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

♻️ Duplicate comments (4)
packages/agent-core/mcp-tools/desktop-control/src/index.ts (4)

38-40: tmpFile collision under concurrent calls — already flagged.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/agent-core/mcp-tools/desktop-control/src/index.ts` around lines 38 -
40, The tmpFile function can produce filename collisions under concurrent calls
because it only uses Date.now(), so replace the uniqueness strategy in tmpFile
with a robust unique suffix (e.g., crypto.randomUUID() or crypto.randomBytes
hex, or use fs.mkdtemp/tmp library) to guarantee uniqueness; update the tmpFile
implementation (function name tmpFile) to append that random/UUID token (or use
a safe OS temporary-file API) to the generated filename and ensure the chosen
approach is cross-platform and collision-resistant.

461-474: Windows focusWindow Write-Error leaves exit code 0 on miss — already flagged.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/agent-core/mcp-tools/desktop-control/src/index.ts` around lines 461
- 474, The current PowerShell script written to ps1 (via tmpFile and writeFile)
uses Write-Error when no window matches, which still returns exit code 0; update
the script so the non-match path causes a non-zero exit (e.g., replace or follow
Write-Error with an explicit exit 1 or throw) so the calling focusWindow logic
can detect failure; locate the ps1 generation where tmpFile('focus','ps1') and
writeFile(ps1, `...`) are used and change the else branch to emit a real failure
(exit 1 or throw) instead of only Write-Error.

76-127: captureScreenshot skips the permission gate — already flagged.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/agent-core/mcp-tools/desktop-control/src/index.ts` around lines 76 -
127, The captureScreenshot function currently performs screen capture
unconditionally; add a permission check at the start of captureScreenshot (e.g.,
call the existing permission gate helper such as ensurePermission or
checkAgentPermission, or verify the relevant permission flag) and abort/throw a
clear permission error or return a CallToolResult error object if the permission
is not granted before creating any temp files or invoking platform commands;
ensure you still clean up any temp artifacts (png, ps1) in error paths and keep
the function signature and returned CallToolResult shape unchanged.

433-449: Linux listWindows assigns window IDs as PIDs — already flagged.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/agent-core/mcp-tools/desktop-control/src/index.ts` around lines 433
- 449, The code in listWindows is treating the xdotool window ID (wid) as the
process id by doing Number(wid); instead call xdotool getwindowpid to fetch the
real PID for each wid. In the wids.map async block (the execFile call that runs
['getwindowname', wid] and returns { name: ..., pid: Number(wid) }), add a
second execFile call to run ['getwindowpid', wid], parse that stdout into a
Number and use it as pid (and optionally include the original wid as id if
WindowInfo supports it); update the returned shape to return the correct pid and
handle errors as before.
🧹 Nitpick comments (2)
packages/agent-core/tests/unit/opencode/adapter.test.ts (2)

184-196: Local replicas of production logic — tests won't catch drift.

buildPs1Script and buildPtySpawnArgsWin32 duplicate the adapter's private logic. If the production buildPtySpawnArgs changes (e.g., switches quoting strategy, adds flags), these helpers won't fail. Consider extracting the PS1 script-building into a small, exported pure function (e.g., buildPs1ScriptContent(command, args)) and importing it in the test. This way the tests exercise the real code.

That said, this is consistent with the existing pattern in the file (lines 86–101 replicate escapeShellArg and buildShellCommand), and the file header acknowledges the limitation of not mocking the PTY layer.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/agent-core/tests/unit/opencode/adapter.test.ts` around lines 184 -
196, Tests duplicate the adapter's private PS1-building logic (buildPs1Script
and buildPtySpawnArgsWin32), which can drift from production; extract the PS1
script construction into a small exported pure function (e.g.,
buildPs1ScriptContent(command, args)) in the adapter module and have the test
import and call that exported function instead of the local buildPs1Script, then
update buildPtySpawnArgsWin32 in the test to use the exported
buildPs1ScriptContent for scriptContent so the tests exercise the real
production logic.

82-101: The getShellArgsWin32 helper and its tests now exercise a dead code path on Windows.

With the adapter's buildPtySpawnArgs now routing all win32 invocations through the PS1 wrapper, the cmd.exe /s /c wrapping tested by getShellArgsWin32 (line 99) and the tests in lines 103-164 no longer correspond to any reachable Windows code path. The escapeShellArgWin32 branch (line 86-91) also tests the win32 branch of escapeShellArg, which is now only reachable on non-Windows platforms (via buildShellCommand → used in the Unix shell path).

Consider either removing or annotating these tests to clarify they cover legacy/unreachable logic, so future maintainers don't mistakenly rely on them as validation of the current Windows behavior.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/agent-core/tests/unit/opencode/adapter.test.ts` around lines 82 -
101, The tests exercise unreachable Windows code paths: update the test file to
either remove or clearly mark the legacy helpers getShellArgsWin32 and
escapeShellArgWin32 (and the buildShellCommand-based tests) as covering
deprecated/unreachable logic now that buildPtySpawnArgs routes Win32 via the PS1
wrapper; specifically either delete the getShellArgsWin32/escapeShellArgWin32
helpers and associated specs or add a clear comment/skip/describe block noting
they validate legacy behavior only so maintainers won't assume they reflect
current Windows behavior driven by buildPtySpawnArgs.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@apps/desktop/src/main/permission-api.ts`:
- Around line 137-143: The synthesized uiData has a dead branch checking
rawData.details; because desktop-control spreads details into the body, details
is always undefined, so replace the ternary for isDesktopPermission to set
operation: String(rawData.operation) and filePath:
String(JSON.stringify(rawData)) (dropping the rawData.details check) and remove
the unnecessary "as unknown as FilePermissionRequestData" cast so the object
literal directly satisfies FilePermissionRequestData; update the uiData
assignment (and keep rawData: Record<string, unknown>) accordingly.

In `@packages/agent-core/mcp-tools/desktop-control/src/index.ts`:
- Around line 265-285: The tmp file `txt` can leak if `writeFile(ps1)` throws
because the `finally` cleanup only runs after the `try` that starts after both
writes; update the flow so both writes are performed inside the same try/finally
that unlinks `txt` and `ps1` (i.e., move the first writeFile(txt, ...) into the
try block that currently surrounds execFile or wrap both writeFile calls and
execFile in a single try), ensuring the `finally` block in index.ts unlinks both
`txt` and `ps1` even if either writeFile throws; reference the tmp variables
`txt` and `ps1`, the writeFile calls, the execFile('powershell', ...) call, and
the unlink calls to locate and fix the code.
- Around line 315-322: WIN_KEYS is missing a mapping for "forwarddelete",
causing buildSendKeysString to fall back to an invalid {FORWARDDELETE} token on
Windows; add an entry "forwarddelete": '{DELETE}' to the WIN_KEYS Record so the
forward-delete key (mapped in APPLE_KEY_CODES as keycode 117) is translated to
the correct SendKeys token and will work on Windows.

In `@packages/agent-core/src/internal/classes/OpenCodeAdapter.ts`:
- Line 840: The temp PowerShell script filename uses Date.now() which can
collide; change the construction of scriptPath in OpenCodeAdapter (the const
scriptPath = path.join(os.tmpdir(), `accomplish-run-${Date.now()}.ps1`)) to use
crypto.randomUUID() for uniqueness (e.g.
`accomplish-run-${crypto.randomUUID()}.ps1`) so concurrent/rapid spawns don't
overwrite each other's files before writeFileSync/execute; ensure
crypto.randomUUID is used where scriptPath is created and no other code paths
still rely on Date.now() here.
- Around line 838-841: The temp PowerShell script file created in
startTask/spawnSessionResumption (scriptPath) is never removed causing a leak;
capture scriptPath on the instance (e.g., this._tempPsScriptPath) when writing
the file and ensure it is unlinked in the process exit/cleanup path by deleting
it from within the onExit/handleProcessExit or dispose routines
(handleProcessExit/dispose) — ensure deletion is attempted regardless of exit
reason and guard with try/catch so removal failures don't throw.

---

Duplicate comments:
In `@packages/agent-core/mcp-tools/desktop-control/src/index.ts`:
- Around line 38-40: The tmpFile function can produce filename collisions under
concurrent calls because it only uses Date.now(), so replace the uniqueness
strategy in tmpFile with a robust unique suffix (e.g., crypto.randomUUID() or
crypto.randomBytes hex, or use fs.mkdtemp/tmp library) to guarantee uniqueness;
update the tmpFile implementation (function name tmpFile) to append that
random/UUID token (or use a safe OS temporary-file API) to the generated
filename and ensure the chosen approach is cross-platform and
collision-resistant.
- Around line 461-474: The current PowerShell script written to ps1 (via tmpFile
and writeFile) uses Write-Error when no window matches, which still returns exit
code 0; update the script so the non-match path causes a non-zero exit (e.g.,
replace or follow Write-Error with an explicit exit 1 or throw) so the calling
focusWindow logic can detect failure; locate the ps1 generation where
tmpFile('focus','ps1') and writeFile(ps1, `...`) are used and change the else
branch to emit a real failure (exit 1 or throw) instead of only Write-Error.
- Around line 76-127: The captureScreenshot function currently performs screen
capture unconditionally; add a permission check at the start of
captureScreenshot (e.g., call the existing permission gate helper such as
ensurePermission or checkAgentPermission, or verify the relevant permission
flag) and abort/throw a clear permission error or return a CallToolResult error
object if the permission is not granted before creating any temp files or
invoking platform commands; ensure you still clean up any temp artifacts (png,
ps1) in error paths and keep the function signature and returned CallToolResult
shape unchanged.
- Around line 433-449: The code in listWindows is treating the xdotool window ID
(wid) as the process id by doing Number(wid); instead call xdotool getwindowpid
to fetch the real PID for each wid. In the wids.map async block (the execFile
call that runs ['getwindowname', wid] and returns { name: ..., pid: Number(wid)
}), add a second execFile call to run ['getwindowpid', wid], parse that stdout
into a Number and use it as pid (and optionally include the original wid as id
if WindowInfo supports it); update the returned shape to return the correct pid
and handle errors as before.

---

Nitpick comments:
In `@packages/agent-core/tests/unit/opencode/adapter.test.ts`:
- Around line 184-196: Tests duplicate the adapter's private PS1-building logic
(buildPs1Script and buildPtySpawnArgsWin32), which can drift from production;
extract the PS1 script construction into a small exported pure function (e.g.,
buildPs1ScriptContent(command, args)) in the adapter module and have the test
import and call that exported function instead of the local buildPs1Script, then
update buildPtySpawnArgsWin32 in the test to use the exported
buildPs1ScriptContent for scriptContent so the tests exercise the real
production logic.
- Around line 82-101: The tests exercise unreachable Windows code paths: update
the test file to either remove or clearly mark the legacy helpers
getShellArgsWin32 and escapeShellArgWin32 (and the buildShellCommand-based
tests) as covering deprecated/unreachable logic now that buildPtySpawnArgs
routes Win32 via the PS1 wrapper; specifically either delete the
getShellArgsWin32/escapeShellArgWin32 helpers and associated specs or add a
clear comment/skip/describe block noting they validate legacy behavior only so
maintainers won't assume they reflect current Windows behavior driven by
buildPtySpawnArgs.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b40dea0 and b5db4ed.

📒 Files selected for processing (4)
  • apps/desktop/src/main/permission-api.ts
  • packages/agent-core/mcp-tools/desktop-control/src/index.ts
  • packages/agent-core/src/internal/classes/OpenCodeAdapter.ts
  • packages/agent-core/tests/unit/opencode/adapter.test.ts

Comment on lines +137 to +143
const rawData = data as Record<string, unknown>;
const uiData: FilePermissionRequestData = isDesktopPermission
? ({
operation: String(rawData.operation),
filePath: String(rawData.details ?? JSON.stringify(rawData)),
} as unknown as FilePermissionRequestData)
: (data as FilePermissionRequestData);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

# Explore the file structure and locate relevant files
find . -type f -name "permission-api.ts" | head -5
find . -type f -name "*desktop-control*" | head -10

Repository: accomplish-ai/accomplish

Length of output: 109


🏁 Script executed:

# Read the full permission-api.ts file to understand context
wc -l apps/desktop/src/main/permission-api.ts

Repository: accomplish-ai/accomplish

Length of output: 111


🏁 Script executed:

# Look for FilePermissionRequestData type definition
rg "FilePermissionRequestData" --type ts -B 3 -A 10

Repository: accomplish-ai/accomplish

Length of output: 16250


🏁 Script executed:

# Search for how buildFilePermissionRequest is implemented and used
rg "buildFilePermissionRequest" --type ts -A 15 -B 2

Repository: accomplish-ai/accomplish

Length of output: 6181


🏁 Script executed:

# Check permission-api.ts imports to see exact type being used
head -50 apps/desktop/src/main/permission-api.ts | grep -A 5 "import"

Repository: accomplish-ai/accomplish

Length of output: 267


🏁 Script executed:

# Look for FileOperation type definition
rg "type FileOperation" --type ts -A 10

Repository: accomplish-ai/accomplish

Length of output: 956


🏁 Script executed:

# Search for desktop-control to see how it sends data
find . -type f -name "*desktop-control*" | head -20

Repository: accomplish-ai/accomplish

Length of output: 50


🏁 Script executed:

# Search for where details field is set in desktop requests
rg "details" --type ts | grep -i "desktop\|permission" | head -20

Repository: accomplish-ai/accomplish

Length of output: 1172


🏁 Script executed:

# Check what agent-core index exports as PermissionFileRequestData
cat packages/agent-core/src/index.ts | grep -A 2 -B 2 "PermissionFileRequestData"

Repository: accomplish-ai/accomplish

Length of output: 291


🏁 Script executed:

# Look at the FILE_OPERATIONS definition to understand FileOperation values
rg "FILE_OPERATIONS" --type ts -B 2 -A 15

Repository: accomplish-ai/accomplish

Length of output: 12953


🏁 Script executed:

# Check the full context around lines 130-150 in permission-api.ts
sed -n '120,160p' apps/desktop/src/main/permission-api.ts

Repository: accomplish-ai/accomplish

Length of output: 1673


🏁 Script executed:

# Verify the exact type definition exported from agent-core
grep -r "export.*FilePermissionRequestData" packages/agent-core/src/types/permission-handler.ts -A 10

Repository: accomplish-ai/accomplish

Length of output: 333


🏁 Script executed:

# Check if there are any other places that send requests with a 'details' field
rg "details:" --type ts | grep -E "permission|file" | head -20

Repository: accomplish-ai/accomplish

Length of output: 50


🏁 Script executed:

# Verify validation is skipped for desktop permissions
sed -n '110,125p' apps/desktop/src/main/permission-api.ts

Repository: accomplish-ai/accomplish

Length of output: 662


🏁 Script executed:

# Check what architecture.md says about spawning node/npx
cat apps/desktop/src/main/architecture.md 2>/dev/null || echo "File not found, searching for it"
find . -name "architecture.md" -type f

Repository: accomplish-ai/accomplish

Length of output: 123


🏁 Script executed:

# Verify that buildFilePermissionRequest doesn't validate the operation further
sed -n '1,50p' packages/agent-core/src/services/permission-handler.ts | grep -A 30 "buildFilePermissionRequest"

Repository: accomplish-ai/accomplish

Length of output: 50


🏁 Script executed:

# Check if the PermissionRequest interface expects a FileOperation or string
rg "fileOperation" packages/agent-core/src/common/types/permission.ts -A 2 -B 2

Repository: accomplish-ai/accomplish

Length of output: 202


🏁 Script executed:

# Confirm the exact content of buildFilePermissionRequest implementation
sed -n '100,130p' packages/agent-core/src/services/permission-handler.ts

Repository: accomplish-ai/accomplish

Length of output: 1064


🏁 Script executed:

# Check if validateFilePermissionRequest is called for desktop permissions
sed -n '100,115p' apps/desktop/src/main/permission-api.ts

Repository: accomplish-ai/accomplish

Length of output: 736


🏁 Script executed:

# Get the exact buildFilePermissionRequest implementation to see the full cast
sed -n '130,150p' packages/agent-core/src/services/permission-handler.ts

Repository: accomplish-ai/accomplish

Length of output: 763


🏁 Script executed:

# Check if the renderer or downstream code validates fileOperation
rg "fileOperation" --type ts -B 2 -A 5 | grep -A 10 "render\|ui\|display" | head -30

Repository: accomplish-ai/accomplish

Length of output: 2907


🏁 Script executed:

# Search for how fileOperation is used in the PermissionDialog or renderer
rg "fileOperation" --type tsx -A 5 -B 2

Repository: accomplish-ai/accomplish

Length of output: 95


🏁 Script executed:

# Check if there's any validation or switching on fileOperation values in the renderer
rg "FILE_OPERATIONS\|fileOperation" apps/web/src --type ts --type tsx -A 5 | head -50

Repository: accomplish-ai/accomplish

Length of output: 95


🏁 Script executed:

# Verify the exact line where buildFilePermissionRequest does the cast
sed -n '55,75p' packages/agent-core/src/services/permission-handler.ts

Repository: accomplish-ai/accomplish

Length of output: 659


🏁 Script executed:

# Find the exact buildFilePermissionRequest implementation
rg "buildFilePermissionRequest" packages/agent-core/src/services/permission-handler.ts -A 20

Repository: accomplish-ai/accomplish

Length of output: 637


🏁 Script executed:

# Look for PermissionDialog component to see how it uses fileOperation
find . -name "PermissionDialog*" -type f

Repository: accomplish-ai/accomplish

Length of output: 131


🏁 Script executed:

# Check if any downstream code validates or restricts fileOperation values
rg "fileOperation" --type ts -C 5 | grep -E "switch|case|if.*fileOperation|FILE_OPERATIONS" -A 3 -B 3

Repository: accomplish-ai/accomplish

Length of output: 50


🏁 Script executed:

# Check the PermissionDialog implementation
cat apps/web/src/client/components/execution/PermissionDialog.tsx | head -150

Repository: accomplish-ai/accomplish

Length of output: 5830


🏁 Script executed:

# Search for where fileOperation is actually used/displayed in the UI
rg "fileOperation" apps/web/src/client/components/execution/ -B 3 -A 5

Repository: accomplish-ai/accomplish

Length of output: 2239


Dead branch in filePath synthesis — remove unused rawData.details check.

desktop-control spreads details directly into the POST body ({ operation, ...details }), so rawData.details is always undefined. The conditional rawData.details ?? JSON.stringify(rawData) never takes the first branch; filePath always ends up as the full JSON.stringify(rawData), which redundantly includes the operation key. Simplify to:

const uiData: FilePermissionRequestData = isDesktopPermission
  ? {
      operation: String(rawData.operation),
-     filePath: String(rawData.details ?? JSON.stringify(rawData)),
-   } as unknown as FilePermissionRequestData
+     filePath: JSON.stringify(
+       Object.fromEntries(
+         Object.entries(rawData).filter(([k]) => k !== 'operation'),
+       ),
+     ),
+   }
  : (data as FilePermissionRequestData);

The as unknown as FilePermissionRequestData cast is unnecessary after this change; the object literal already satisfies the type.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/desktop/src/main/permission-api.ts` around lines 137 - 143, The
synthesized uiData has a dead branch checking rawData.details; because
desktop-control spreads details into the body, details is always undefined, so
replace the ternary for isDesktopPermission to set operation:
String(rawData.operation) and filePath: String(JSON.stringify(rawData))
(dropping the rawData.details check) and remove the unnecessary "as unknown as
FilePermissionRequestData" cast so the object literal directly satisfies
FilePermissionRequestData; update the uiData assignment (and keep rawData:
Record<string, unknown>) accordingly.

desktop-control/src/index.ts:
- Use crypto.randomUUID() in tmpFile() instead of Date.now() to
  prevent filename collisions under concurrent async calls
- Add session-level consent gate to captureScreenshot(): prompt once
  per process lifetime via checkPermission, deny subsequent calls if
  the session was rejected, emit a structured audit log on each capture
- Move both writeFile(txt) and writeFile(ps1) calls inside the try
  block in typeText() so txt is never leaked if writeFile(ps1) throws
- Add forwarddelete: '{DELETE}' to WIN_KEYS so the forward-delete key
  doesn't silently misfire on Windows (mirroring APPLE_KEY_CODES 117)

permission-api.ts:
- Fix dead rawData.details branch in /desktop-permission synthesis;
  since desktop-control spreads all detail fields into the POST body
  rawData.details is always undefined — instead strip the 'operation'
  key from rawData and JSON-stringify the remainder for filePath

OpenCodeAdapter.ts:
- Switch Windows spawn from powershell.exe .ps1 wrapper to direct
  opencode.exe spawn: pass command+args directly to node-pty so the OS
  hands each element as a raw argv entry with no shell quoting needed;
  this also eliminates the temp-file leak and Date.now() collision
  concerns that the previous PS1 approach introduced
- Update adapter unit tests to reflect the new direct-spawn contract
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (2)
packages/agent-core/mcp-tools/desktop-control/src/index.ts (2)

455-465: Linux listWindows sets pid to the window id.
xdotool search yields window IDs, not process IDs. Use xdotool getwindowpid and leave pid undefined when unavailable.

Proposed fix
     const results = await Promise.all(
       wids.map(async (wid) => {
         try {
           const { stdout: name } = await execFile('xdotool', ['getwindowname', wid]);
-          return { name: name.trim(), pid: Number(wid) };
+          let pid: number | undefined;
+          try {
+            const { stdout: pidOut } = await execFile('xdotool', ['getwindowpid', wid]);
+            pid = Number(pidOut.trim());
+          } catch {
+            pid = undefined;
+          }
+          return { name: name.trim(), pid };
         } catch {
           return null;
         }
       }),
     );
xdotool getwindowpid documentation _NET_WM_PID behavior
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/agent-core/mcp-tools/desktop-control/src/index.ts` around lines 455
- 465, In the listWindows implementation, the code incorrectly assigns the
window id to pid; change the per-window lookup (wids.map handler) to call
"xdotool getwindowpid <wid>" via execFile to obtain the real process id, parse
it to Number and set pid to that value, and if getwindowpid fails or returns
non-numeric output leave pid as undefined; keep the existing getwindowname call
for name, reference the mapped variable wid and the surrounding Promise.all
logic to locate where to update the behavior.

493-495: PowerShell focus script can report success on failure.
Write-Error is non-terminating; add a non-zero exit so execFile can surface the failure.

Proposed fix
-if($proc) { [WinFocus]::SetForegroundWindow($proc.MainWindowHandle) }
-else { Write-Error "No window found matching: $Title" }
+if($proc) { [WinFocus]::SetForegroundWindow($proc.MainWindowHandle) }
+else { Write-Error "No window found matching: $Title"; exit 1 }
PowerShell Write-Error exit code non-terminating error behavior
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/agent-core/mcp-tools/desktop-control/src/index.ts` around lines 493
- 495, The PowerShell snippet that finds a process ($proc) and calls
[WinFocus]::SetForegroundWindow can still return a zero exit code when no window
is found because Write-Error is non-terminating; update the script emitted by
the code that builds this PowerShell string so that the failure branch uses a
non-zero exit (e.g., replace or follow Write-Error "No window found matching:
$Title" with either "Write-Error ...; exit 1" or "throw 'No window found
matching: $Title'") and optionally add an explicit "exit 0" after
[WinFocus]::SetForegroundWindow to guarantee success; ensure this change is made
where the PowerShell string containing $proc, [WinFocus]::SetForegroundWindow,
and Write-Error is constructed so execFile can detect failures.
🧹 Nitpick comments (4)
packages/agent-core/tests/unit/opencode/adapter.test.ts (1)

179-180: Comment restates what the code already says — remove or replace with a why comment.

// mirrors buildPtySpawnArgs for win32: command → file, args → args describes what the one-liner does, which the return statement itself makes obvious. Per coding guidelines, comments must explain why, not what.

✂️ Proposed fix
-    function buildPtySpawnArgsWin32(command: string, args: string[]) {
-      // mirrors buildPtySpawnArgs for win32: command → file, args → args
-      return { file: command, args };
-    }
+    function buildPtySpawnArgsWin32(command: string, args: string[]) {
+      // stub that documents the expected contract: direct spawn with no shell wrapping
+      return { file: command, args };
+    }

As per coding guidelines: "No unnecessary comments — Don't add comments that restate what the code does. Comments should explain why, not what."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/agent-core/tests/unit/opencode/adapter.test.ts` around lines 179 -
180, Remove the redundant comment "// mirrors buildPtySpawnArgs for win32:
command → file, args → args" that only restates the return statement and either
delete it or replace it with a brief "why" comment explaining the platform
mapping rationale (e.g., why win32 expects {file, args} vs other platforms) so
readers understand the reason for mirroring buildPtySpawnArgs; locate the return
{ file: command, args } in the test around the buildPtySpawnArgs behavior and
update the comment accordingly.
apps/desktop/src/main/permission-api.ts (1)

73-110: Remove comments that just restate the control flow.
The comments at Line 73-75 and Line 98-109 simply narrate the conditionals; consider removing them to keep comments focused on “why”.

Suggested cleanup
-    // Handle both file permissions (/permission) and desktop permissions (/desktop-permission)
     const isFilePermission = req.method === 'POST' && req.url === '/permission';
     const isDesktopPermission = req.method === 'POST' && req.url === '/desktop-permission';
     if (!isFilePermission && !isDesktopPermission) {
       res.writeHead(404, { 'Content-Type': 'application/json' });
       res.end(JSON.stringify({ error: 'Not found' }));
       return;
     }
...
-    // For file permissions, validate the payload strictly.
-    // For desktop permissions, only require an `operation` field (no filePath needed).
     if (isFilePermission) {
       const validation = permissionHandler.validateFilePermissionRequest(data);
       if (!validation.valid) {
         res.writeHead(400, { 'Content-Type': 'application/json' });
         res.end(JSON.stringify({ error: validation.error }));
         return;
       }
     } else {
-      // desktop-permission: just require an operation string
       const d = data as Record<string, unknown>;
       if (!d.operation || typeof d.operation !== 'string') {
         res.writeHead(400, { 'Content-Type': 'application/json' });
         res.end(JSON.stringify({ error: 'operation is required for desktop permissions' }));
         return;
       }
     }

As per coding guidelines: No unnecessary comments — Don't add comments that restate what the code does. Comments should explain why, not what.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/desktop/src/main/permission-api.ts` around lines 73 - 110, Remove the
redundant “what the code does” comments around the request-type checks and
validation branches: delete the comment that states the control flow near the
isFilePermission / isDesktopPermission declarations and delete the comment that
explains the difference between file and desktop permissions above the
isFilePermission conditional; if you need a comment, replace them with a concise
“why” note (e.g., why desktop-permission omits filePath) or a short reference to
the validation behavior for permissionHandler.validateFilePermissionRequest, but
do not keep comments that merely restate the conditions or the HTTP handling
logic.
packages/agent-core/mcp-tools/desktop-control/src/index.ts (2)

160-165: Replace the nested ternary with a small map.
Keeps the intent readable and aligns with the no-nested-ternary guideline.

Refactor
-    const cmd =
-      button === 'right'
-        ? 'right click'
-        : button === 'double'
-          ? 'double click'
-          : 'click';
+    const cmdMap: Record<'left' | 'right' | 'double', string> = {
+      left: 'click',
+      right: 'right click',
+      double: 'double click',
+    };
+    const cmd = cmdMap[button];

As per coding guidelines: Avoid nested ternaries — Use mapper objects or if/else for readability.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/agent-core/mcp-tools/desktop-control/src/index.ts` around lines 160
- 165, The nested ternary that builds cmd from button (const cmd = button ===
'right' ? 'right click' : button === 'double' ? 'double click' : 'click') should
be replaced with a small mapping object; create a map (e.g., const CLICK_MAP = {
right: 'right click', double: 'double click' }) and compute cmd =
CLICK_MAP[button] || 'click' so the intent is clearer and avoids nested
ternaries (refer to the cmd variable and the button value in index.ts).

375-383: Avoid nested ternary in buildSendKeysString.
This is a straightforward if/else and reads clearer.

Refactor
-  const k =
-    mainKey in WIN_KEYS
-      ? WIN_KEYS[mainKey]
-      : mainKey.length === 1
-        ? mainKey
-        : `{${mainKey.toUpperCase()}}`;
+  let k: string;
+  if (mainKey in WIN_KEYS) {
+    k = WIN_KEYS[mainKey];
+  } else if (mainKey.length === 1) {
+    k = mainKey;
+  } else {
+    k = `{${mainKey.toUpperCase()}}`;
+  }

As per coding guidelines: Avoid nested ternaries — Use mapper objects or if/else for readability.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/agent-core/mcp-tools/desktop-control/src/index.ts` around lines 375
- 383, The nested ternary inside buildSendKeysString (which computes k using
WIN_KEYS and mainKey) should be replaced with a clear if/else to improve
readability: check if mainKey is in WIN_KEYS and set k = WIN_KEYS[mainKey]; else
if mainKey.length === 1 set k = mainKey; otherwise set k =
`{${mainKey.toUpperCase()}}`. Update the function to use this explicit
conditional flow and keep the existing prefix computation with WIN_MODIFIERS and
parseKeyParts unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@packages/agent-core/mcp-tools/desktop-control/src/index.ts`:
- Around line 526-529: The tool metadata for the 'desktop_screenshot' tool still
says "No permission required" despite the implementation now requesting
session-level consent; update the description string in the desktop_screenshot
tool definition to state that session consent is required (e.g., "Requires user
session consent to capture a screenshot; returns a base64-encoded PNG of the
primary display") so the description matches the behavior in the consent flow.

In `@packages/agent-core/tests/unit/opencode/adapter.test.ts`:
- Around line 177-229: The test stub buildPtySpawnArgsWin32 lacks the production
.exe validation present in OpenCodeAdapter.ts (the .exe suffix check in
buildPtySpawnArgs), so add that validation to the test helper or use the real
function: either import/require the actual buildPtySpawnArgs from
OpenCodeAdapter (preferred) and call the win32 branch, or re-implement the same
`.endsWith('.exe')` check in buildPtySpawnArgsWin32 and add a negative test
asserting an error is thrown for non-.exe commands; also replace the redundant
comment with one that documents this block as a contract/regression test and
explicitly calls out which production behaviour (the .exe validation) is being
verified or cannot be verified if importing is impossible.

---

Duplicate comments:
In `@packages/agent-core/mcp-tools/desktop-control/src/index.ts`:
- Around line 455-465: In the listWindows implementation, the code incorrectly
assigns the window id to pid; change the per-window lookup (wids.map handler) to
call "xdotool getwindowpid <wid>" via execFile to obtain the real process id,
parse it to Number and set pid to that value, and if getwindowpid fails or
returns non-numeric output leave pid as undefined; keep the existing
getwindowname call for name, reference the mapped variable wid and the
surrounding Promise.all logic to locate where to update the behavior.
- Around line 493-495: The PowerShell snippet that finds a process ($proc) and
calls [WinFocus]::SetForegroundWindow can still return a zero exit code when no
window is found because Write-Error is non-terminating; update the script
emitted by the code that builds this PowerShell string so that the failure
branch uses a non-zero exit (e.g., replace or follow Write-Error "No window
found matching: $Title" with either "Write-Error ...; exit 1" or "throw 'No
window found matching: $Title'") and optionally add an explicit "exit 0" after
[WinFocus]::SetForegroundWindow to guarantee success; ensure this change is made
where the PowerShell string containing $proc, [WinFocus]::SetForegroundWindow,
and Write-Error is constructed so execFile can detect failures.

---

Nitpick comments:
In `@apps/desktop/src/main/permission-api.ts`:
- Around line 73-110: Remove the redundant “what the code does” comments around
the request-type checks and validation branches: delete the comment that states
the control flow near the isFilePermission / isDesktopPermission declarations
and delete the comment that explains the difference between file and desktop
permissions above the isFilePermission conditional; if you need a comment,
replace them with a concise “why” note (e.g., why desktop-permission omits
filePath) or a short reference to the validation behavior for
permissionHandler.validateFilePermissionRequest, but do not keep comments that
merely restate the conditions or the HTTP handling logic.

In `@packages/agent-core/mcp-tools/desktop-control/src/index.ts`:
- Around line 160-165: The nested ternary that builds cmd from button (const cmd
= button === 'right' ? 'right click' : button === 'double' ? 'double click' :
'click') should be replaced with a small mapping object; create a map (e.g.,
const CLICK_MAP = { right: 'right click', double: 'double click' }) and compute
cmd = CLICK_MAP[button] || 'click' so the intent is clearer and avoids nested
ternaries (refer to the cmd variable and the button value in index.ts).
- Around line 375-383: The nested ternary inside buildSendKeysString (which
computes k using WIN_KEYS and mainKey) should be replaced with a clear if/else
to improve readability: check if mainKey is in WIN_KEYS and set k =
WIN_KEYS[mainKey]; else if mainKey.length === 1 set k = mainKey; otherwise set k
= `{${mainKey.toUpperCase()}}`. Update the function to use this explicit
conditional flow and keep the existing prefix computation with WIN_MODIFIERS and
parseKeyParts unchanged.

In `@packages/agent-core/tests/unit/opencode/adapter.test.ts`:
- Around line 179-180: Remove the redundant comment "// mirrors
buildPtySpawnArgs for win32: command → file, args → args" that only restates the
return statement and either delete it or replace it with a brief "why" comment
explaining the platform mapping rationale (e.g., why win32 expects {file, args}
vs other platforms) so readers understand the reason for mirroring
buildPtySpawnArgs; locate the return { file: command, args } in the test around
the buildPtySpawnArgs behavior and update the comment accordingly.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b5db4ed and 2e130eb.

📒 Files selected for processing (4)
  • apps/desktop/src/main/permission-api.ts
  • packages/agent-core/mcp-tools/desktop-control/src/index.ts
  • packages/agent-core/src/internal/classes/OpenCodeAdapter.ts
  • packages/agent-core/tests/unit/opencode/adapter.test.ts
🚧 Files skipped from review as they are similar to previous changes (1)
  • packages/agent-core/src/internal/classes/OpenCodeAdapter.ts

buildPtySpawnArgsWin32 was a bare pass-through that gave zero regression
protection for the .exe suffix guard in the production buildPtySpawnArgs
(which throws immediately on non-.exe commands).

Changes:
- Re-implement the .exe check inside the local stub used by the test block
- Add a dedicated negative test: asserts Error is thrown for both a path
  without .exe and a Unix-style path — covering the exact production guard
- Expand block comment to document why this is a contract/regression test
  and to call out that buildPtySpawnArgs is private (cannot be imported)
- Remove the redundant 'mirrors...' inline comment (restated what, not why)

Addresses CodeRabbitAI review comment on PR accomplish-ai#633.
…ailures

When the Gemini API has no context cache (fresh session after a rate-limit
reset or app restart), the model ignored the MCP-registered tool names and
tried to call 'start_task' / 'complete_task' literally from the system
prompt text.  OpenCode registers these as 'start-task_start_task' and
'complete-task_complete_task', so the calls hit the 'invalid' tool handler
and the task failed immediately before any desktop automation ran.

Root cause confirmed via opencode log: MCP servers DID connect successfully
(toolCount=7 for desktop-control, toolCount=1 for start-task/complete-task),
but the first Gemini inference call — with cache:{read:0} — used the literal
names from the prompt rather than the registered MCP names.

Fix: replace every 'start_task' and 'complete_task' call-site in the system
prompt with 'start-task_start_task' and 'complete-task_complete_task'.
Update the config-generator tests to match.
@RajdeepKushwaha5 RajdeepKushwaha5 force-pushed the feat/issue-189-desktop-control branch from 35517a0 to 19f95e5 Compare February 24, 2026 17:33
@RajdeepKushwaha5
Copy link
Contributor Author

@RajdeepKushwaha5 can you attach a short video demonstrating these new capabilities?

Sure

final_fix_working.mp4

@galElmalah
Copy link
Contributor

@RajdeepKushwaha5 I don't see the side effect i.e the notepad being written into

…tests

- Wrap server startup in main() with .catch() per MCP tool convention
- Windows focusWindow: escape wildcard chars via WildcardPattern::Escape; add exit 1 to make Write-Error terminating
- Linux listWindows: use xdotool getwindowpid to return real process IDs instead of X11 window IDs
- Update screenshot tool description to reflect session-level consent gate
- typeText (Windows): fix temp-file leak — move both writeFile calls inside try so finally always cleans up txt and ps1
- tests: update 5 system-prompt assertions to use full MCP tool names (complete-task_complete_task, start-task_start_task) to match last commit
- tests: fix 2 pre-existing Windows path-separator failures by using path.join('dist', 'index.mjs') in assertions
- Add windowsHide:true to all Windows PowerShell execFile calls so the
  console window no longer steals keyboard focus from the target app
- In typeText: capture foreground window handle before writing to
  clipboard, then restore it with SetForegroundWindow before SendKeys
  so paste lands in the correct window even if PS spawning shifts focus
- In focusWindow: add 400ms sleep after SetForegroundWindow to let the
  OS fully activate the window before callers proceed to type
- In captureScreenshot: scale bitmap to max 1280px wide (HighQuality-
  Bicubic) so base64 payload stays within model vision input limits
SetForegroundWindow silently fails from hidden/background processes due to
Windows foreground locking rules. Replace it with WScript.Shell.AppActivate
which activates windows by title from any process, then use WScript.Shell
.SendKeys to paste -- guaranteeing the keystrokes reach the correct window.
…ext targeting

focusWindow now uses WScript.Shell.AppActivate (works from background processes)
and writes the exact window title to dc-focused-window.txt in TEMP. typeText
reads that file to re-activate the correct window before pasting, eliminating
the race where Electron regains focus between the two tool calls.
…ndows

- focusWindow now persists process PID (not title) to dc-focused-window.txt,
  making it immune to title changes (e.g. Notepad prepends '*' on unsaved edits)
- typeText adopts a three-layer strategy on Windows:
  1. UI Automation ValuePattern: programmatic append with no focus requirement,
     works from any background process (Notepad, WordPad, Win32/WinUI 3 editors)
  2. UIA SetFocus + clipboard paste: for editors that support UIA focus but not
     ValuePattern
  3. AppActivate(pid) + clipboard paste: immune fallback using stored PID
- Addresses code-review feedback: fixes the invisible side-effect issue where
  typeText failed silently after Notepad window title changed post-focusWindow
@RajdeepKushwaha5
Copy link
Contributor Author

RajdeepKushwaha5 commented Mar 1, 2026

@RajdeepKushwaha5 I don't see the side effect i.e the notepad being written into

Hi Sir @galElmalah ! Here's a video walkthrough of the recent updates — Windows spawn fixes, desktop automation features, and code review improvements. Let me know if you have any questions!

accomplish_change.mp4

@mavishay
Copy link
Collaborator

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants