This is the WindowsAgentArena (WAA) setup with Agent S2.5 (and beyond). Why do we need a setup guide? Despite the thorough README.md, we have to include our code into their repository and fix up a number of setup issues from the WAA environment. Sadly, this isn’t the most straightforward.
The initial WAA setup is straightforward. Follow the README.md on their repository. After you’ve finished this, try running run-local.sh. This will start up an experiment with their default Navi agent. At this point, the environment is sufficient to run evaluation, but it’s incomplete and thus the evaluation won’t be exactly correct due to environment issues.
Figure 1: Bash script chain of execution.
While we’re at it, look to understand the following things:
-
the entire README.md (especially the Bring Your Own Agent guide)
-
the long chain of bash scripts that start the run (Figure 1)
-
the
run.pyto see how the agent/environment are instantiated and used together -
the folder structure of the repository and the purpose of each folder
By now, your WAA environment should be set up to run locally. There are two major problems:
-
setup issues
-
the VM persists across examples (it won’t reset after every example is completed which may make evaluation unfair)
Let’s tackle the first one: setup issues.
The first issue I ran into was the office apps aren’t installed. Why is that? Turns out all apps installed in the VM during the initial setup stage install via the links from this file (tools_config.json). At the time of writing this, only the office links do not work. Try out all the links to make sure they work. If the links do not lead to a download (and some error occurs), then that app was not installed in the VM. What do we do? Two options:
-
redo the entire initial setup stage (time consuming; ~4 hours for me and even then, it would just not work a lot of the times; ideally, WAA is setup on Linux as I’ve had no issues so far with it)
-
Enter the VM and install the apps manually (easier and faster)
We’ll do the second approach.
You can access the VM via https://localhost:8006. You can turn the VM on by run-local.sh. There’s probably a better/faster way to do it, but this doesn’t take too much time anyways (~1-2 mins). After the VM has started, enter the VM (the agent may be trying to take actions, but you can either just override the action in run.py with import time; time.sleep(10000) here or fight the agent for control of the VM!).
Inside the VM, navigate to their download page and download the latest LibreOffice version. After it’s downloaded, complete the setup wizard and make sure to delete the downloaded *.msi file in the VM. Finally, test the download by opening up LibreOffice Writer and Calc.
In Google Chrome, there a couple unexpected pop-ups.
Figure 2: Pop-ups on Chrome.
Close all these pop-ups and make Google Chrome your default web browser.
This isn’t as important, but there are a couple initial pop-ups in VSCode that you can close.
Important if you’re using set_cell_values
Agent S2.5 uses a special grounding function called set_cell_values that takes advantage of the soffice CLI and unotools Python library. TL; DR, this function lets the agent set the cell values for a given spreadsheet and sheet.
For this function to work on WAA, the set up is a bit messy…
-
Connect into the VM
-
Open up a terminal and run
python --version, you should see you’re using the GIMP Python which is2.x. This won’t let you use thesofficeCLI orimport unoin Python code. -
In the
Desktopdirectory within a terminal, dopip freeze > requirements.txtto save all the PYPI libraries from the GIMP Python to arequirements.txt. -
Configuring Python path to LibreOffice’s Python
-
In the File Explorer, locate the
python.exefile from LibreOffice. You can do this withwhere python. Copy this path. -
In the Search bar in the bottom task bar inside the VM, search for “environment variables”.
-
Click on “Environment Variables” and click on “Path” under “System variables”. Paste the copied path from step (a) into there and ensure this path is above the GIMP Python path so it takes precedence.
-
Reopen a terminal and run
sofficeto ensure it is now working. Create a temporary python file and ensureimport unoworks.
-
-
LibreOffice’s Python should be
3.10or above. However, it does not come with pip. To install pip, download this file and executepython get-pip.pyto install it. Ensure thepythonhere is LibreOffice’s Python. Next, installpip install -r requirements.txtusing therequirements.txtfrom step 3. This is to ensure LibreOffice’s Python has all the dependencies needed for evaluation (pyautogui, etc). -
Clean up all installer files. Then, inside the WAA repository code, change this line
command_list = ["python", "-c", self.pkgs_prefix.format(command=command)]
to:
command_list = ["absolute/path/to/libreoffice/python", "-c", self.pkgs_prefix.format(command=command)]
This ensures that the subprocess running in the flask server inside the VM will use that specific Python version.
Double check all apps can be used and no unexpected pop-ups or issues are in the way. Any apps you open make sure to close them upon finishing your clean-up. Make sure any installation files you have in Downloads are deleted (and removed from Recycle Bin) to keep the environment clean. At the end, this is our golden image. You may want to save a copy of this VM somewhere safe so that you can always copy it back into the WAA repository to be reused (refer to this).
Take the time to understand the Agent-S repository.
-
Instead of following the README.md for Agent S2.5, you need to clone the repository then
pip install -r requirements.txt -
Move the S2.5 folder to the mm_agents folder in WAA. Follow the Bring Your Own Agent guide.
- You will need to move the
agent_s.pyfile out to theS2.5folder and update all the relevant import statements
- You will need to move the
-
Make the necessary changes in
run.pyandlib_run_single.pyto accommodate Agent S2.5 (replace the Navi Agent with Agent S2.5). -
Test it by running the experiments! Don’t forget when you do
run-local.sh, now you need to specify Agent S2.5 instead of the navi agentagent="agent_s". -
You may have some import errors and these libraries need to be installed inside the
winarenacontainer (I think). You can just add the pip install commands to the bash script where the error stems from (hacky).
-
Ensure you have:
-
a clean copy of the golden image
-
the correct Azure subscription (so you’re not using your own payment method)
-
-
Follow the Azure deployment in the README.md.
-
Test it! If this works, then we have a resettable golden image and WAA can be ran in parallel, making evaluation much much faster! Good luck!

