-
Notifications
You must be signed in to change notification settings - Fork 53
Remove modelRouter and add model_providers concept #224
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…del store BREAKING CHANGE: Removed ModelRouter and ModelRegistry classes. Users must now use direct model injection.
…raises an Error when executing from cache)
…nd AskUiInferenceLocateApi
docs/01_Setup.md
Outdated
| **Problem**: Error connecting to Agent OS | ||
|
|
||
| **Solutions**: | ||
| 1. Check if Agent OS is running (look for the system tray icon) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The agent OS doesn’t have a tray icon.
docs/01_Setup.md
Outdated
|
|
||
| **Solutions**: | ||
| 1. Check if Agent OS is running (look for the system tray icon) | ||
| 2. Restart Agent OS from your applications menu |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The agent OS is not listed in the application menu.
docs/03_Using-Models-and-BYOM.md
Outdated
| custom_settings = ActSettings( | ||
| messages=MessageSettings( | ||
| max_tokens=8192, | ||
| temperature=0.5, | ||
| betas=["computer-use-2025-01-24"], | ||
| ) | ||
| ) | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which system prompt is used in this example?
Could you please remove the betas?
docs/01_Setup.md
Outdated
|
|
||
| ## Python Package Installation | ||
|
|
||
| AskUI Vision Agent requires Python 3.10 or higher. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
requires-python = ">=3.10,<3.14"
docs/01_Setup.md
Outdated
| ```bash | ||
| pip install askui[anthropic] # Anthropic Claude support | ||
| pip install askui[openrouter] # OpenRouter support | ||
| pip install askui[documents] # PDF, Excel, Word support | ||
| ``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SDK dosent support these targets
| super().__init__(self.message) | ||
|
|
||
|
|
||
| class AnthropicModelSettings(BaseSettings): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently unused.
| self, | ||
| locator: str | Locator, | ||
| image: ImageSource, | ||
| locate_settings: LocateSettings, # noqa: ARG002 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are the Locate settings only needed for LLM-based locators?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
idea was to have a general settings object for all locate commands here
| max_tokens: int = 4096 | ||
| temperature: float = Field(default=0.5, ge=0.0, le=1.0) | ||
| system_prompt: GetSystemPrompt | None = None | ||
| timeout: float | None = None | ||
|
|
||
|
|
||
| class LocateSettings(BaseModel): | ||
| """Settings for LocateModel operations (UI element location).""" | ||
|
|
||
| model_config = ConfigDict(arbitrary_types_allowed=True) | ||
|
|
||
| query_type: str | None = None | ||
| confidence_threshold: float = Field(default=0.8, ge=0.0, le=1.0) | ||
| max_detections: int = 10 | ||
| timeout: float | None = None | ||
| system_prompt: LocateSystemPrompt | None = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently, only the system prompt is being used.
| confidence_threshold: float = Field(default=0.8, ge=0.0, le=1.0) | ||
| max_detections: int = 10 | ||
| timeout: float | None = None | ||
| system_prompt: LocateSystemPrompt | None = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Locate system prompt should not be configurable because the expected return is currently hard-coded. Changing the system prompt would cause the Locate code to fail.
| timeout: float | None = None | ||
|
|
||
|
|
||
| class LocateSettings(BaseModel): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think about removing the Locate and Get settings?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should generally discuss how "configurable" get and locate should be. This also includes if we want to support BYOM for these commands or if that should only be possible for act
…ttings Introduces VlmProvider, ImageQAProvider, and DetectionProvider slots on AgentSettings. GetTool/LocateTool are now ToolWithAgentOS and available in the act() loop. Renames VisionAgent→ComputerAgent and AndroidVisionAgent→AndroidAgent. Removes model_store entirely.
programminx-askui
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello @philipph-askui,
I started to review the code. General remarks:
- Changes are to big to review. -> We need to test this heavly
- Creating new files, instead of rename/move files -> we don't know which code was already reviewed, which code is new.
Overall it is going in the right direction.
I've reviewed only view files, so you can start working on it. A deeper review is outstanding.
| - Brittle selectors that break when UI changes | ||
| - Separate tools for different platforms (web, desktop, mobile) | ||
| - Manual scripting of every action step | ||
| - Constant maintenance as applications evolve |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| - Constant maintenance as applications evolve | |
| - Constant maintenance as applications evolve | |
| - Random application behavior | |
| - External issues like network, rights or installation issues |
| **1. Programmatic Control** | ||
| ```python | ||
| from askui import VisionAgent | ||
|
|
||
| with VisionAgent() as agent: | ||
| agent.click("Submit button") | ||
| agent.type("[email protected]") | ||
| result = agent.get("What's the current page title?") | ||
| ``` | ||
|
|
||
| Direct, single-step commands for precise UI control. Like traditional automation, but powered by vision models that understand what elements look like, not just their DOM structure. | ||
|
|
||
| **2. Agentic Control (Goal-based)** | ||
| ```python | ||
| with VisionAgent() as agent: | ||
| agent.act( | ||
| "Search for flights from New York to London, " | ||
| "filter by direct flights, and show me the cheapest option" | ||
| ) | ||
| ``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would go full Agentic. And try to avoid the "Programmtic Code" and when we show the agentic first.
| **1. Programmatic Control** | |
| ```python | |
| from askui import VisionAgent | |
| with VisionAgent() as agent: | |
| agent.click("Submit button") | |
| agent.type("[email protected]") | |
| result = agent.get("What's the current page title?") | |
| ``` | |
| Direct, single-step commands for precise UI control. Like traditional automation, but powered by vision models that understand what elements look like, not just their DOM structure. | |
| **2. Agentic Control (Goal-based)** | |
| ```python | |
| with VisionAgent() as agent: | |
| agent.act( | |
| "Search for flights from New York to London, " | |
| "filter by direct flights, and show me the cheapest option" | |
| ) | |
| ``` | |
| *1. Agentic Control (Goal-based)** | |
| ```python | |
| with VisionAgent() as agent: | |
| agent.act( | |
| "Search for flights from New York to London, " | |
| "filter by direct flights, and show me the cheapest option" | |
| ) |
|
|
||
| ### Key Capabilities | ||
|
|
||
| - **Multi-Platform**: Windows, MacOS, Linux, Android |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| - **Multi-Platform**: Windows, MacOS, Linux, Android | |
| - **Multi-Platform**: Windows, MacOS, Linux, Android, Citric & KVM |
|
|
||
| Understand the model system, how to choose models for different tasks, and how to integrate custom models or third-party providers. | ||
|
|
||
| ### 04 - Caching |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We we want to rename "caching"? In reality it's Token Efficient Rerun.
| This documentation is organized to take you from setup to advanced usage: | ||
|
|
||
| ### 01 - Setup | ||
| **Topics**: Installation, Agent OS setup, environment configuration, authentication |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| **Topics**: Installation, Agent OS setup, environment configuration, authentication | |
| **Topics**: Installation, AgentOS setup, environment configuration, authentication |
| Raises: | ||
| ValueError: If the source data exceeds the size limit. | ||
| """ | ||
| import google.genai.types as genai_types |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use a try catch fo checking if the module is installed, if not raise a PackageNotInstalledException
| @@ -0,0 +1,55 @@ | |||
| import logging | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rename the file to ai_elment_locate_model
| """ | ||
|
|
||
| # Provider-specific configuration | ||
| DEFAULT_RESOLUTION: tuple[int, int] = (1280, 800) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we have a DEFAULT_RESOLUTION, then the resolution should be changeable. otherwise it's the CLAUDE_IMAGE_RESOLUTION.
| """ | ||
|
|
||
| # Provider-specific configuration | ||
| DEFAULT_RESOLUTION: tuple[int, int] = (1280, 800) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we not use a namedtuple?
Resolution = namedtuple('Resolution', ['width', 'height'])
| screen_width = self.DEFAULT_RESOLUTION[0] | ||
| screen_height = self.DEFAULT_RESOLUTION[1] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the namedtuple, then we can set it her more devloper friendly
programminx-askui
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello @philipph-askui ,
I started to review the code. General remarks:
- Changes are to big to review. -> We need to test this heavly
- Creating new files, instead of rename/move files -> we don't know which code was already reviewed, which code is new.
Overall it is going in the right direction.
I've reviewed only view files, so you can start working on it. A deeper review is outstanding.
[Edited] PR Description
To get a quick idea of the new API, I added an example under examples/model_providers.py that you can run
Note: You will need an anthropic API Key for that
This PR:
Summary
Key Changes
Breaking Changes