Feat/windows command by SiswoHandoko · Pull Request #108 · google-gemini/computer-use-preview

SiswoHandoko · 2026-02-10T06:23:49Z

No description provided.

gemini-code-assist · 2026-02-10T06:24:09Z

Summary of Changes

Hello @SiswoHandoko, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly expands the agent's capabilities by enabling it to interact with local desktop environments on both macOS and Windows. This allows the agent to perform tasks directly on the user's operating system, moving beyond browser-only interactions. The changes involve integrating new computer control classes, adapting the agent's core logic for desktop-specific instructions and tool usage, and updating the project's documentation and dependencies to support this new functionality.

Highlights

New Desktop Environments: Introduced two new experimental environments, desktop for macOS and desktop_win for Windows, allowing the agent to control local desktop GUIs using OS-level mouse/keyboard automation.
Agent Adaptation for Desktop Control: The agent now dynamically adjusts its behavior based on the chosen environment, excluding web-specific functions and providing tailored system instructions for macOS and Windows desktop interactions.
Documentation and CLI Updates: The README.md has been updated to reflect the new desktop environments, including usage examples and modifications to the command-line argument table. The main.py script now supports these new --env choices.
New Dependencies: Added pyautogui, pillow, and pyscreeze to requirements.txt to enable the desktop automation capabilities.

Changelog

README.md
- Added desktop and desktop_win to the list of available environments.
- Updated the command-line arguments table to include the new desktop environments for the --env flag.
- Included new sections with usage examples for 'Local Desktop (experimental)' and 'Local Desktop Windows (experimental)'.
agent.py
- Modified the BrowserAgent constructor to accept an environment parameter.
- Implemented logic to exclude web-related functions (open_web_browser, search, navigate) when operating in desktop environments.
- Added environment-specific system instructions for macOS and Windows desktop control to guide the agent's reasoning.
- Passed the generated system_instruction to the genai.GenerativeModel initialization.
computers/init.py
- Imported DesktopComputer and DesktopWindowsComputer from their respective modules.
- Added DesktopComputer and DesktopWindowsComputer to the __all__ list for module export.
computers/desktop/desktop.py
- Added a new DesktopComputer class that provides methods for controlling a macOS desktop using pyautogui.
- Implemented core desktop interaction functions such as click_at, hover_at, type_text_at, scroll_document, scroll_at, key_combination, and drag_and_drop.
- Included platform-specific key normalization for macOS commands.
computers/desktop/desktop_win.py
- Added a new DesktopComputer class (aliased as DesktopWindowsComputer) for controlling a Windows desktop using pyautogui.
- Implemented similar desktop interaction functions as the macOS version, with Windows-specific key normalizations.
- Included a runtime check to ensure this environment is only used on Windows (win32).
main.py
- Imported the newly added DesktopComputer and DesktopWindowsComputer classes.
- Updated the --env command-line argument choices to include desktop and desktop_win.
- Added conditional logic to instantiate the appropriate Computer subclass based on the --env argument.
- Passed the selected environment argument to the BrowserAgent constructor.
requirements.txt
- Added pyautogui for programmatic control of the mouse and keyboard.
- Added pillow for image processing, specifically for screenshots.
- Added pyscreeze for screenshot capabilities.
test_main.py
- Added test_main_desktop to verify the correct instantiation of DesktopComputer when --env desktop is used.
- Added test_main_desktop_win to verify the correct instantiation of DesktopWindowsComputer when --env desktop_win is used.

Activity

The pull request was authored by SiswoHandoko.
New desktop automation capabilities for macOS and Windows have been introduced.
The agent's core logic has been updated to support these new environments.
Documentation and dependencies have been adjusted to reflect the new features.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

google-cla · 2026-02-10T06:24:18Z

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

gemini-code-assist

Code Review

This pull request introduces support for desktop automation on both macOS and Windows by adding desktop and desktop_win environments, involving new DesktopComputer classes and agent logic updates. However, this introduces a critical security vulnerability: Remote Code Execution (RCE) via prompt injection, as the agent is given excessive permissions to control the host system's GUI based on untrusted user input without proper safeguards. Beyond this critical security concern, the changes also present maintainability issues due to significant code duplication between platform-specific computer classes, areas for code cleanup in the agent logic, and a minor documentation inconsistency in the README.

gemini-code-assist · 2026-02-10T06:26:32Z

agent.py

+        system_instruction = None
+        if self._environment == "desktop":
+            system_instruction = (
+                "You are controlling a local desktop GUI (macOS). "
+                "Do not type into the current active app unless you have explicitly focused the correct input field. "
+                "To open apps, first open Spotlight with Command+Space using key_combination, "
+                "then type the app name and press Enter without clicking in other windows. "
+                "Prefer keyboard shortcuts to switch apps instead of typing into arbitrary windows."
+            )
+            search_intent = any(
+                token in self._query.lower()
+                for token in ("find", "search", "look for", "locate")
+            )
+            if search_intent:
+                system_instruction += (
+                    " If the user asks to find or search for something, open Spotlight first."
+                )
+        elif self._environment == "desktop_win":
+            system_instruction = (
+                "You are controlling a local desktop GUI (Windows). "
+                "Do not type into the current active app unless you have explicitly focused the correct input field. "
+                "To open apps, first open Start/Search with Win or Win+S using key_combination, "
+                "then type the app name and press Enter without clicking in other windows. "
+                "Prefer keyboard shortcuts to switch apps instead of typing into arbitrary windows."
+            )
+            search_intent = any(
+                token in self._query.lower()
+                for token in ("find", "search", "look for", "locate")
+            )
+            if search_intent:
+                system_instruction += (
+                    " If the user asks to find or search for something, open Start/Search first."
+                )


The introduction of desktop automation capabilities (macOS and Windows) via pyautogui creates a critical security risk. The agent's actions, driven by the query parameter (untrusted user input), allow for Remote Code Execution (RCE) via prompt injection, potentially enabling an attacker to take control of the host system. This is exacerbated by duplicated logic for handling desktop and desktop_win environments, specifically the search_intent check and system_instruction appending, which could be refactored for better maintainability and to simplify the application of security mitigations.

To mitigate the RCE risk, consider:

Implementing mandatory human-in-the-loop (HITL) confirmation for sensitive actions.

Running the agent in a highly restricted, sandboxed environment.

Implementing a strict allow-list of permitted actions.

Sanitizing and validating all LLM-generated output before execution.

Additionally, refactoring the duplicated logic for desktop and desktop_win environments would improve maintainability and reduce redundancy.

gemini-code-assist · 2026-02-10T06:26:32Z

computers/desktop/desktop.py

+    def type_text_at(
+        self,
+        x: int,
+        y: int,
+        text: str,
+        press_enter: bool = False,
+        clear_before_typing: bool = True,
+    ) -> EnvState:
+        if self._spotlight_pending:
+            self._spotlight_pending = False
+        else:
+            pyautogui.click(x, y)
+        if clear_before_typing:
+            if sys.platform == "darwin":
+                pyautogui.hotkey("command", "a")
+            else:
+                pyautogui.hotkey("ctrl", "a")
+            pyautogui.press("backspace")
+        pyautogui.write(text)
+        if press_enter:
+            pyautogui.press("enter")
+        return self.current_state()
+
+    def scroll_document(
+        self, direction: Literal["up", "down", "left", "right"]
+    ) -> EnvState:
+        scroll_amount = self._screen_size[1] // 2
+        if direction == "up":
+            pyautogui.scroll(scroll_amount)
+        elif direction == "down":
+            pyautogui.scroll(-scroll_amount)
+        elif direction == "left":
+            pyautogui.hscroll(-scroll_amount)
+        elif direction == "right":
+            pyautogui.hscroll(scroll_amount)
+        else:
+            raise ValueError("Unsupported direction: ", direction)
+        return self.current_state()
+
+    def scroll_at(
+        self,
+        x: int,
+        y: int,
+        direction: Literal["up", "down", "left", "right"],
+        magnitude: int = 800,
+    ) -> EnvState:
+        pyautogui.moveTo(x, y)
+        if direction == "up":
+            pyautogui.scroll(magnitude)
+        elif direction == "down":
+            pyautogui.scroll(-magnitude)
+        elif direction == "left":
+            pyautogui.hscroll(-magnitude)
+        elif direction == "right":
+            pyautogui.hscroll(magnitude)
+        else:
+            raise ValueError("Unsupported direction: ", direction)
+        return self.current_state()
+
+    def wait_5_seconds(self) -> EnvState:
+        time.sleep(5)
+        return self.current_state()
+
+    def go_back(self) -> EnvState:
+        if sys.platform == "darwin":
+            pyautogui.hotkey("command", "[")
+        else:
+            pyautogui.hotkey("alt", "left")
+        return self.current_state()
+
+    def go_forward(self) -> EnvState:
+        if sys.platform == "darwin":
+            pyautogui.hotkey("command", "]")
+        else:
+            pyautogui.hotkey("alt", "right")
+        return self.current_state()
+
+    def search(self) -> EnvState:
+        return self.navigate(self._search_engine_url)
+
+    def navigate(self, url: str) -> EnvState:
+        normalized_url = url
+        if not normalized_url.startswith(("http://", "https://")):
+            normalized_url = "https://" + normalized_url
+        if sys.platform == "darwin":
+            pyautogui.hotkey("command", "l")
+        else:
+            pyautogui.hotkey("ctrl", "l")
+        pyautogui.write(normalized_url)
+        pyautogui.press("enter")
+        self._current_url = normalized_url
+        time.sleep(1)
+        return self.current_state()
+
+    def key_combination(self, keys: list[str]) -> EnvState:
+        normalized_keys = [self._normalize_key(key) for key in keys]
+        if len(normalized_keys) == 1:
+            pyautogui.press(normalized_keys[0])
+        else:
+            pyautogui.hotkey(*normalized_keys)
+        if sys.platform == "darwin" and normalized_keys == ["command", "space"]:
+            self._spotlight_pending = True
+            time.sleep(0.2)
+        return self.current_state()
+
+    def drag_and_drop(
+        self, x: int, y: int, destination_x: int, destination_y: int
+    ) -> EnvState:
+        pyautogui.moveTo(x, y)
+        pyautogui.dragTo(destination_x, destination_y, button="left")
+        return self.current_state()


The DesktopComputer class provides the LLM with excessive permissions by allowing it to perform OS-level input automation via pyautogui. This includes typing arbitrary text (type_text_at), navigating to arbitrary URLs (navigate), and pressing any key combination (key_combination). When combined with an LLM that processes untrusted user input, this tool can be abused to compromise the host system.

Consider restricting the tool's capabilities to the minimum necessary for the intended task and ensuring that all actions are performed in a secure, isolated environment.

gemini-code-assist · 2026-02-10T06:26:32Z

computers/desktop/desktop_win.py

+    def type_text_at(
+        self,
+        x: int,
+        y: int,
+        text: str,
+        press_enter: bool = False,
+        clear_before_typing: bool = True,
+    ) -> EnvState:
+        if self._launcher_pending:
+            self._launcher_pending = False
+        else:
+            pyautogui.click(x, y)
+        if clear_before_typing:
+            pyautogui.hotkey("ctrl", "a")
+            pyautogui.press("backspace")
+        pyautogui.write(text)
+        if press_enter:
+            pyautogui.press("enter")
+        return self.current_state()
+
+    def scroll_document(
+        self, direction: Literal["up", "down", "left", "right"]
+    ) -> EnvState:
+        scroll_amount = self._screen_size[1] // 2
+        if direction == "up":
+            pyautogui.scroll(scroll_amount)
+        elif direction == "down":
+            pyautogui.scroll(-scroll_amount)
+        elif direction == "left":
+            pyautogui.hscroll(-scroll_amount)
+        elif direction == "right":
+            pyautogui.hscroll(scroll_amount)
+        else:
+            raise ValueError("Unsupported direction: ", direction)
+        return self.current_state()
+
+    def scroll_at(
+        self,
+        x: int,
+        y: int,
+        direction: Literal["up", "down", "left", "right"],
+        magnitude: int = 800,
+    ) -> EnvState:
+        pyautogui.moveTo(x, y)
+        if direction == "up":
+            pyautogui.scroll(magnitude)
+        elif direction == "down":
+            pyautogui.scroll(-magnitude)
+        elif direction == "left":
+            pyautogui.hscroll(-magnitude)
+        elif direction == "right":
+            pyautogui.hscroll(magnitude)
+        else:
+            raise ValueError("Unsupported direction: ", direction)
+        return self.current_state()
+
+    def wait_5_seconds(self) -> EnvState:
+        time.sleep(5)
+        return self.current_state()
+
+    def go_back(self) -> EnvState:
+        pyautogui.hotkey("alt", "left")
+        return self.current_state()
+
+    def go_forward(self) -> EnvState:
+        pyautogui.hotkey("alt", "right")
+        return self.current_state()
+
+    def search(self) -> EnvState:
+        return self.navigate(self._search_engine_url)
+
+    def navigate(self, url: str) -> EnvState:
+        normalized_url = url
+        if not normalized_url.startswith(("http://", "https://")):
+            normalized_url = "https://" + normalized_url
+        pyautogui.hotkey("ctrl", "l")
+        pyautogui.write(normalized_url)
+        pyautogui.press("enter")
+        self._current_url = normalized_url
+        time.sleep(1)
+        return self.current_state()
+
+    def key_combination(self, keys: list[str]) -> EnvState:
+        normalized_keys = [self._normalize_key(key) for key in keys]
+        if len(normalized_keys) == 1:
+            pyautogui.press(normalized_keys[0])
+        else:
+            pyautogui.hotkey(*normalized_keys)
+        if normalized_keys in (["win"], ["win", "s"]):
+            self._launcher_pending = True
+            time.sleep(0.2)
+        return self.current_state()
+
+    def drag_and_drop(
+        self, x: int, y: int, destination_x: int, destination_y: int
+    ) -> EnvState:
+        pyautogui.moveTo(x, y)
+        pyautogui.dragTo(destination_x, destination_y, button="left")
+        return self.current_state()


The DesktopComputer class for Windows provides the LLM with excessive permissions via pyautogui, similar to the macOS version. This tool can be abused via prompt injection to perform malicious actions on the host Windows system.

Consider implementing strict safeguards and running the agent in a restricted environment.

gemini-code-assist · 2026-02-10T06:26:32Z

computers/desktop/desktop_win.py

+from ..computer import Computer, EnvState
+
+
+class DesktopComputer(Computer):


This file is almost an exact copy of computers/desktop/desktop.py. This significant code duplication will make future maintenance difficult, as changes will need to be applied in two places.

A better approach would be to create a common base class, say BaseDesktopComputer, that contains all the shared logic. Then, DesktopComputer (for macOS) and DesktopWindowsComputer can inherit from this base class and override only the methods that have platform-specific implementations (like _normalize_key, go_back, navigate, etc.). This will greatly improve code reuse and maintainability.

gemini-code-assist · 2026-02-10T06:26:32Z

README.md

+| Argument            | Description                                                                                                                            | Required | Default                                   | Supported Environment(s) |
+| ------------------- | -------------------------------------------------------------------------------------------------------------------------------------- | -------- | ----------------------------------------- | ------------------------ |
+| `--query`           | The natural language query for the browser agent to execute.                                                                           | Yes      | N/A                                       | All                      |
+| `--env`             | The computer use environment to use. Must be one of the following: `playwright`, `browserbase`, `desktop`, or `desktop_win`             | No       | N/A                                       | All                      |


The documentation for the --env argument is missing its default value. The table shows "N/A", but the code in main.py specifies default="playwright". To improve clarity for users, the default value should be documented here.

Suggested change

| `--env` | The computer use environment to use. Must be one of the following: `playwright`, `browserbase`, `desktop`, or `desktop_win` | No | N/A | All |

| `--env` | The computer use environment to use. Must be one of the following: `playwright`, `browserbase`, `desktop`, or `desktop_win` | No | `playwright` | All |

References

In documentation, command-line arguments that have a default value should be marked as not required and their default value should be specified.

gemini-code-assist · 2026-02-10T06:26:33Z

computers/desktop/desktop_win.py

+from ..computer import Computer, EnvState
+
+
+class DesktopComputer(Computer):


The class is named DesktopComputer, which is the same as the class in computers/desktop/desktop.py. This is confusing and can lead to errors. Since it's imported as DesktopWindowsComputer in computers/__init__.py, it would be clearer and more consistent to rename the class itself to DesktopWindowsComputer.

Suggested change

class DesktopComputer(Computer):

class DesktopWindowsComputer(Computer):

eryzerz and others added 2 commits February 10, 2026 11:05

add macos computer use

1355490

chore : add windows command

da3838b

gemini-code-assist bot reviewed Feb 10, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feat/windows command#108

Feat/windows command#108
SiswoHandoko wants to merge 2 commits intogoogle-gemini:mainfrom
eryzerz:feat/windows-command

SiswoHandoko commented Feb 10, 2026

Uh oh!

gemini-code-assist bot commented Feb 10, 2026

Uh oh!

google-cla bot commented Feb 10, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Feb 10, 2026

Uh oh!

gemini-code-assist bot Feb 10, 2026

Uh oh!

gemini-code-assist bot Feb 10, 2026

Uh oh!

gemini-code-assist bot Feb 10, 2026

Uh oh!

gemini-code-assist bot Feb 10, 2026

Uh oh!

gemini-code-assist bot Feb 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		from ..computer import Computer, EnvState


		class DesktopComputer(Computer):

	\| `--env` \| The computer use environment to use. Must be one of the following: `playwright`, `browserbase`, `desktop`, or `desktop_win` \| No \| N/A \| All \|
	\| `--env` \| The computer use environment to use. Must be one of the following: `playwright`, `browserbase`, `desktop`, or `desktop_win` \| No \| `playwright` \| All \|

	class DesktopComputer(Computer):
	class DesktopWindowsComputer(Computer):

Conversation

SiswoHandoko commented Feb 10, 2026

Uh oh!

gemini-code-assist bot commented Feb 10, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

google-cla bot commented Feb 10, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Feb 10, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 10, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 10, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 10, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 10, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 10, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants