Skip to content

Conversation

@philipph-askui
Copy link
Contributor

@philipph-askui philipph-askui commented Jan 21, 2026

[Edited] PR Description

To get a quick idea of the new API, I added an example under examples/model_providers.py that you can run
Note: You will need an anthropic API Key for that

This PR:

  • Removes ModelRouter, ModelRegistry, and model_store
  • Introduces a provider-based configuration system (AgentSettings)
  • Renames VisionAgent → ComputerAgent and AndroidVisionAgent → AndroidAgent
  • Updates docs and examples

Summary

  • Replaced the ModelRouter/model_store abstraction with three typed provider slots (vlm_provider, image_qa_provider, detection_provider) configured via AgentSettings
  • Providers own their endpoint, credentials, and model ID — validated lazily on first API call
  • get() and locate() are now backed by GetTool/LocateTool, which are also available to the LLM during act() — no separate model injection path
  • Significantly reduced codebase complexity (~4600 lines removed)
  • Updated all docs and examples to reflect the new API

Key Changes

  • AgentSettings: single configuration object with provider slots; defaults to AskUI-hosted providers reading credentials from env vars
  • Built-in providers: AskUIVlmProvider, AskUIImageQAProvider, AskUIDetectionProvider, AnthropicVlmProvider, AnthropicImageQAProvider, GoogleImageQAProvider, OpenAICompatibleProvider
  • GetTool / LocateTool: wired into the act loop as ToolWithAgentOS — LLM can call them directly during act()
  • Deleted: entire src/askui/model_store/ directory
  • Renamed: VisionAgent → ComputerAgent, AndroidVisionAgent → AndroidAgent
  • Docs: 03_Using-Models-and-BYOM.md fully rewritten; VisionAgent replaced with ComputerAgent across all docs

Breaking Changes

  • VisionAgent is removed — use ComputerAgent
  • AndroidVisionAgent is removed — use AndroidAgent
  • act_model, get_model, locate_model constructor parameters are removed — use AgentSettings(vlm_provider=..., image_qa_provider=..., detection_provider=...)
  • model_store factory functions are removed
  • String-based model selection is removed

…del store

BREAKING CHANGE: Removed ModelRouter and ModelRegistry classes. Users must now use direct model injection.
@philipph-askui philipph-askui changed the title Chore/modelrouter Remove Modelrouter Jan 21, 2026
@philipph-askui philipph-askui changed the title Remove Modelrouter Remove modelRouter and add mode_store Jan 22, 2026
@philipph-askui philipph-askui marked this pull request as ready for review January 26, 2026 06:53
@philipph-askui philipph-askui changed the title Remove modelRouter and add mode_store Remove modelRouter and add model_store Jan 26, 2026
docs/01_Setup.md Outdated
**Problem**: Error connecting to Agent OS

**Solutions**:
1. Check if Agent OS is running (look for the system tray icon)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The agent OS doesn’t have a tray icon.

docs/01_Setup.md Outdated

**Solutions**:
1. Check if Agent OS is running (look for the system tray icon)
2. Restart Agent OS from your applications menu
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The agent OS is not listed in the application menu.

Comment on lines 240 to 247
custom_settings = ActSettings(
messages=MessageSettings(
max_tokens=8192,
temperature=0.5,
betas=["computer-use-2025-01-24"],
)
)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which system prompt is used in this example?
Could you please remove the betas?

docs/01_Setup.md Outdated

## Python Package Installation

AskUI Vision Agent requires Python 3.10 or higher.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

requires-python = ">=3.10,<3.14"

docs/01_Setup.md Outdated
Comment on lines 40 to 44
```bash
pip install askui[anthropic] # Anthropic Claude support
pip install askui[openrouter] # OpenRouter support
pip install askui[documents] # PDF, Excel, Word support
```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SDK dosent support these targets

super().__init__(self.message)


class AnthropicModelSettings(BaseSettings):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently unused.

self,
locator: str | Locator,
image: ImageSource,
locate_settings: LocateSettings, # noqa: ARG002
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are the Locate settings only needed for LLM-based locators?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

idea was to have a general settings object for all locate commands here

Comment on lines +46 to +61
max_tokens: int = 4096
temperature: float = Field(default=0.5, ge=0.0, le=1.0)
system_prompt: GetSystemPrompt | None = None
timeout: float | None = None


class LocateSettings(BaseModel):
"""Settings for LocateModel operations (UI element location)."""

model_config = ConfigDict(arbitrary_types_allowed=True)

query_type: str | None = None
confidence_threshold: float = Field(default=0.8, ge=0.0, le=1.0)
max_detections: int = 10
timeout: float | None = None
system_prompt: LocateSystemPrompt | None = None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, only the system prompt is being used.

confidence_threshold: float = Field(default=0.8, ge=0.0, le=1.0)
max_detections: int = 10
timeout: float | None = None
system_prompt: LocateSystemPrompt | None = None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Locate system prompt should not be configurable because the expected return is currently hard-coded. Changing the system prompt would cause the Locate code to fail.

timeout: float | None = None


class LocateSettings(BaseModel):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about removing the Locate and Get settings?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should generally discuss how "configurable" get and locate should be. This also includes if we want to support BYOM for these commands or if that should only be possible for act

@philipph-askui philipph-askui changed the title Remove modelRouter and add model_store Remove modelRouter and add model_providers concept Feb 12, 2026
…ttings

  Introduces VlmProvider, ImageQAProvider, and DetectionProvider slots on
  AgentSettings. GetTool/LocateTool are now ToolWithAgentOS and available
  in the act() loop. Renames VisionAgent→ComputerAgent and
  AndroidVisionAgent→AndroidAgent. Removes model_store entirely.
Copy link
Collaborator

@programminx-askui programminx-askui left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @philipph-askui,

I started to review the code. General remarks:

  1. Changes are to big to review. -> We need to test this heavly
  2. Creating new files, instead of rename/move files -> we don't know which code was already reviewed, which code is new.

Overall it is going in the right direction.

I've reviewed only view files, so you can start working on it. A deeper review is outstanding.

- Brittle selectors that break when UI changes
- Separate tools for different platforms (web, desktop, mobile)
- Manual scripting of every action step
- Constant maintenance as applications evolve
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Constant maintenance as applications evolve
- Constant maintenance as applications evolve
- Random application behavior
- External issues like network, rights or installation issues

Comment on lines 19 to 38
**1. Programmatic Control**
```python
from askui import VisionAgent

with VisionAgent() as agent:
agent.click("Submit button")
agent.type("[email protected]")
result = agent.get("What's the current page title?")
```

Direct, single-step commands for precise UI control. Like traditional automation, but powered by vision models that understand what elements look like, not just their DOM structure.

**2. Agentic Control (Goal-based)**
```python
with VisionAgent() as agent:
agent.act(
"Search for flights from New York to London, "
"filter by direct flights, and show me the cheapest option"
)
```
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would go full Agentic. And try to avoid the "Programmtic Code" and when we show the agentic first.

Suggested change
**1. Programmatic Control**
```python
from askui import VisionAgent
with VisionAgent() as agent:
agent.click("Submit button")
agent.type("[email protected]")
result = agent.get("What's the current page title?")
```
Direct, single-step commands for precise UI control. Like traditional automation, but powered by vision models that understand what elements look like, not just their DOM structure.
**2. Agentic Control (Goal-based)**
```python
with VisionAgent() as agent:
agent.act(
"Search for flights from New York to London, "
"filter by direct flights, and show me the cheapest option"
)
```
*1. Agentic Control (Goal-based)**
```python
with VisionAgent() as agent:
agent.act(
"Search for flights from New York to London, "
"filter by direct flights, and show me the cheapest option"
)


### Key Capabilities

- **Multi-Platform**: Windows, MacOS, Linux, Android
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- **Multi-Platform**: Windows, MacOS, Linux, Android
- **Multi-Platform**: Windows, MacOS, Linux, Android, Citric & KVM


Understand the model system, how to choose models for different tasks, and how to integrate custom models or third-party providers.

### 04 - Caching
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We we want to rename "caching"? In reality it's Token Efficient Rerun.

This documentation is organized to take you from setup to advanced usage:

### 01 - Setup
**Topics**: Installation, Agent OS setup, environment configuration, authentication
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
**Topics**: Installation, Agent OS setup, environment configuration, authentication
**Topics**: Installation, AgentOS setup, environment configuration, authentication

Raises:
ValueError: If the source data exceeds the size limit.
"""
import google.genai.types as genai_types
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use a try catch fo checking if the module is installed, if not raise a PackageNotInstalledException

@@ -0,0 +1,55 @@
import logging
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rename the file to ai_elment_locate_model

"""

# Provider-specific configuration
DEFAULT_RESOLUTION: tuple[int, int] = (1280, 800)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we have a DEFAULT_RESOLUTION, then the resolution should be changeable. otherwise it's the CLAUDE_IMAGE_RESOLUTION.

"""

# Provider-specific configuration
DEFAULT_RESOLUTION: tuple[int, int] = (1280, 800)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we not use a namedtuple?

Resolution = namedtuple('Resolution', ['width', 'height'])

Comment on lines +82 to +83
screen_width = self.DEFAULT_RESOLUTION[0]
screen_height = self.DEFAULT_RESOLUTION[1]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the namedtuple, then we can set it her more devloper friendly

Copy link
Collaborator

@programminx-askui programminx-askui left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @philipph-askui ,

I started to review the code. General remarks:

  1. Changes are to big to review. -> We need to test this heavly
  2. Creating new files, instead of rename/move files -> we don't know which code was already reviewed, which code is new.

Overall it is going in the right direction.

I've reviewed only view files, so you can start working on it. A deeper review is outstanding.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants