This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
Coasty is a full-stack AI collaboration platform with computer automation capabilities. It features a Next.js frontend with a FastAPI Python backend that orchestrates multi-agent AI systems capable of browser automation, terminal operations, and desktop control through containerized virtual machines. A cross-platform Electron desktop app provides a lightweight overlay that executes AI agent commands directly on the user's local machine.
- Framework: Next.js 15 with App Router, TypeScript, Tailwind CSS
- State Management: Zustand stores for chat, models, user, and sessions
- Key Libraries:
- Vercel AI SDK (
ai) for streaming LLM responses - Radix UI for accessible components
- Supabase for authentication and database
- Stripe for billing/subscriptions
- Vercel AI SDK (
- Provider System: Multi-provider AI support (OpenAI, Anthropic, Azure, Google, Mistral, xAI, OpenRouter, Perplexity)
- Framework: FastAPI with async/await patterns
- Key Services:
multi_agent_executor.py: Orchestrates multi-agent task execution with browser, terminal, and desktop agentsvm_control.py: WebSocket-based VM control with persistent connections and auto-reconnectiondatabase.py: Supabase integration for user data, chats, and billingagent_billing.py: Tracks usage and credits for agent sessionssearch.py: Google Custom Search API integration
- API Routes:
/api/chat,/api/models,/api/search,/api/vm,/api/billing,/api/files
- Architecture: Docker containers running Ubuntu 22.04 with XFCE desktop
- Agent Types:
- Browser Agent: Web automation using Chrome with remote debugging (search-first strategy)
- Terminal Agent: Command execution and file operations
- Desktop Agent: UI automation with screenshot analysis
- Communication: WebSocket protocol on port 8080 (8081 for localhost)
- Tools: Each agent has specialized tools (browser navigation, terminal commands, desktop controls)
A cross-platform Electron app (v40.6.0) that runs as a floating overlay on the user's desktop, executing AI agent commands locally instead of in a remote VM.
- Build System: electron-vite + electron-builder, React 19 + Tailwind CSS renderer
- Version: 1.5.0 (
com.coasty.desktop) - Key Dependencies:
puppeteer-core(browser automation),ws(WebSocket client)
index.ts: App entry — creates frameless transparent window, system tray, registers IPC handlers, auto-updaterauth.ts: Google OAuth via Supabase implicit flow — spins up local HTTP server on random port for callback, extracts tokens from URL fragment via HTML redirect trick, auto-refreshes tokens 5min before expiryws-bridge.ts: Persistent WebSocket to backend/api/electron/ws— sends system info as URL params, auth credentials in first message body (not URL), auto-reconnect with exponential backoff (max 15s), 30s heartbeatwindow-manager.ts: Three modes —auth(400x500 centered),compact(360x56 top-center pill),expanded(400x520 chat panel). Smooth animation (320ms quintic ease-out), always-on-top management, opacity control (0.15–1.0), hides before screenshotslocal-executor.ts: Command handler registry (50+ commands) — maps backend command names to local handlers, normalizes params (filepath→path, find→old_text), auto-hides overlay during UI interactionsdesktop-automation.ts: Platform-specific mouse/keyboard/scroll/drag operationsbrowser-automation.ts: Puppeteer-core controlling installed Chrome/Edge/Brave with isolated temp user-data-dirterminal.ts: Session-based shell execution (PowerShell on Windows, bash on Unix), 30s timeoutfile-ops.ts: File system CRUD (read, write, edit, append, delete, directory listing)screenshot.ts: ElectrondesktopCapturerAPI, resized to max 1280px, JPEG 70% qualitypermissions.ts: macOS-only — checks Screen Recording and Accessibility permissionsauto-updater.ts: Generic update provider athttps://updates.coasty.ai, checks every 4 hours
Windows:
- Desktop Automation: PowerShell + user32.dll P/Invoke (
mouse_event,keybd_event,SendKeys)- Click/double-click via
System.Windows.Forms.Cursor+mouse_eventDLL calls - Typing via
SendKeys::SendWait()with special character escaping - Key combos via
keybd_eventwith virtual key codes (supports Win key, modifiers) - Scroll via
MOUSEEVENTF_WHEEL(120 units per notch) - Drag via cursor position + mousedown/mouseup sequence
- Click/double-click via
- Browser Discovery: Checks Program Files, Program Files (x86), LocalAppData for Chrome/Edge/Brave; falls back to
where.exePATH search - Terminal: Uses
powershell.exe -Commandfor all shell execution - Window Management: PowerShell +
user32.dll ShowWindowfor minimize/maximize/restore/close;Microsoft.VisualBasic.Interaction.AppActivatefor window switching - Window Z-Order Workaround: Transparent frameless windows lose always-on-top on Windows; fix is hide→apply bounds→show sequence on auth→overlay transition, with retries at 600ms/1200ms/2000ms/3000ms
- Build: NSIS installer, allows custom install directory, desktop + start menu shortcuts
macOS:
- Desktop Automation: Swift scripts via CoreGraphics + osascript
- Click/double-click via
CGEventwith propermouseEventClickStatefor double-clicks - Typing via
osascript 'tell application "System Events" to keystroke' - Key combos via osascript with modifier mapping (ctrl→
control down, cmd→command down) + CGKeyCode for special keys - Scroll via
CGEvent(scrollWheelEvent2Source:)with line units - Drag via CGEvent sequence (leftMouseDown → leftMouseDragged → leftMouseUp)
- Click/double-click via
- Browser Discovery: Checks
/Applications/for Google Chrome, Microsoft Edge, Brave Browser.appbundles; falls back towhich - Terminal: Uses
/bin/bash -cfor shell execution - Permissions: Checks Screen Recording (
getMediaAccessStatus+ actual capture fallback) and Accessibility (isTrustedAccessibilityClient); provides System Preferences deep links - Window Behavior:
setVisibleOnAllWorkspacesfor macOS Spaces, dock icon set separately viaapp.dock.setIcon - App Lifecycle: Does not quit on window close (
window-all-closedignored on darwin),activateevent re-creates window - Build: DMG + ZIP targets, hardened runtime, code signing, notarization, entitlements for accessibility/screen recording
Linux:
- Desktop Automation:
xdotoolfor mouse/keyboard,wmctrlfor window management - Browser Discovery: Checks
/usr/bin/for google-chrome, chromium-browser, microsoft-edge, brave-browser - Build: AppImage target
- Zustand Stores:
auth-store(user session),connection-store(WebSocket state),chat-store(messages, tool invocations, chat CRUD),window-store(mode sync) - SSE Parser (
lib/sse-parser.ts): Parses backend events — text (0), error (3), tool call (9), tool result (a), reasoning (g), finish (d) - Key Components:
AuthScreen(Google OAuth),Overlay(pill bar + expanded chat panel),PermissionsGuard(macOS permission prompts),MessageList,ChatHistory
Exposes window.coasty TypeScript API: auth methods, bridge control, chat CRUD, credits, window control (mode, opacity), update control, macOS permissions, event listeners
- Auth:
auth:sign-in,auth:sign-out,auth:get-session,auth:get-token - WebSocket Bridge:
bridge:connect,bridge:disconnect,bridge:get-state - Chat CRUD: Create, list, get messages, update, delete — all call FastAPI backend with Bearer token
- Window:
window:set-mode,window:set-opacity,window:get-opacity - Permissions:
permissions:check,permissions:request-accessibility,permissions:open-screen-recording - Updates:
update:get-status,update:get-version,update:install - Machine ID: Deterministic UUID v5 hash of
electron-{user_id}-{hostname}-{username}-{platform}
/api/electron/ws— WebSocket for receiving and executing commands/api/chats/create,/api/chats/list,/api/chats/{id}/messages— Chat persistence/api/chat/— SSE streaming chat responses/api/billing/credits/balance— Credit checks
- Task Planning: LLM decomposes user request into sequential subtasks
- Agent Assignment: Each subtask assigned to specialized agent (browser/terminal/desktop)
- Sequential Execution: Tasks execute in order (no dependencies system)
- Context Passing: Previous task summaries passed to next task for context
- Streaming: All execution streams via Server-Sent Events to frontend
- Located in
lib/providers/andbackend/app/providers/ - Each provider implements streaming chat with tool calling
- Frontend providers handle model selection and API routing
- Backend providers execute tools and manage agent workflows
- Chat Store (
lib/chat-store/): Manages conversations, messages, attachments - Model Store (
lib/model-store/): Available models and provider configurations - User Store (
lib/user-store/): User profile and authentication state - VM Store (
lib/vm-store/): Virtual machine sessions and connections
# Install dependencies
npm install
# Development server (with Turbopack)
npm run dev
# Production build
npm run build
# Start production server
npm start
# Type checking
npm run type-check
# Linting
npm run lint# Navigate to backend directory
cd backend
# Create virtual environment (first time)
python -m venv venv
# Activate virtual environment
# Windows:
venv\Scripts\activate
# Linux/Mac:
source venv/bin/activate
# Install dependencies
pip install -r requirements.txt
# Run development server (from backend directory)
python main.py
# Or use the helper script
# Windows:
.\run_backend.bat
# Linux/Mac:
./run_backend.sh# Navigate to electron directory
cd electron
# Install dependencies
npm install
# Development mode (with hot reload)
npm run dev
# Build (compile TypeScript)
npm run build
# Package for current platform
npm run package
# Package for specific platform
npm run package:win # Windows NSIS installer
npm run package:mac # macOS DMG + ZIP
npm run package:linux # Linux AppImage# Build and start all services
docker-compose up --build
# Start services in detached mode
docker-compose up -d
# Stop services
docker-compose down
# View logs
docker-compose logs -f
# AI Desktop container (separate compose file)
docker-compose -f docker-compose.ai-desktop.yml up --build# Backend tests
cd backend
pytest
# Run specific test file
pytest tests/test_specific.py
# Run with coverage
pytest --cov=app tests/NEXT_PUBLIC_SUPABASE_URL: Supabase project URLNEXT_PUBLIC_SUPABASE_ANON_KEY: Supabase anonymous keySUPABASE_SERVICE_ROLE: Supabase service role key (server-side)CSRF_SECRET: CSRF protection secret (required)ENCRYPTION_KEY: For encrypting user API keys (required for BYOK)PYTHON_BACKEND_URL: Backend API URL (default: http://0.0.0.0:8001)NEXT_PUBLIC_BACKEND_URL: Public backend URL (default: http://localhost:8001)- Azure credentials for VM provisioning (AZURE_*)
- Stripe keys for billing (STRIPE_*)
- Google Search API keys (GOOGLE_SEARCH_*)
DEBUG: Enable debug mode (true/false)CORS_ORIGINS: Allowed CORS origins (comma-separated)SUPABASE_URL,SUPABASE_ANON_KEY,SUPABASE_SERVICE_ROLE: Supabase configCSRF_SECRET,ENCRYPTION_KEY: Security keys (must match frontend)GOOGLE_SEARCH_KEY,GOOGLE_SEARCH_CX: Google Custom Search API
COASTY_BACKEND_URL: Backend API endpoint (default:http://localhost:8001)NEXT_PUBLIC_SUPABASE_URL: Supabase auth/DB URL (shared with frontend)NEXT_PUBLIC_SUPABASE_ANON_KEY: Supabase anon key (shared with frontend)- These are injected at build time via electron-vite
defineconfig
See .env.example and backend/.env.example for complete configuration templates.
app/: Next.js app directory with routes and layoutsc/[chatId]/: Individual chat pagesapi/: API route handlers (Next.js API routes)auth/,billing/,account/: Feature-specific pages
components/: Reusable React componentsui/: shadcn/ui components (Radix UI based)common/: Shared components (chat interface, message display)prompt-kit/: Prompt-related components
lib/: Business logic and utilitiesproviders/: AI provider implementationschat-store/,model-store/,user-store/: Zustand state storessupabase/: Database client and queriesservices/: Service layer (API calls, utilities)
backend/app/api/routes/: FastAPI route handlersservices/: Core business logicmulti_agent_executor.py: Multi-agent orchestrationvm_control.py: VM WebSocket managementdatabase.py: Supabase operationsagent_billing.py: Usage tracking
core/: Configuration, middleware, loggingmodels/: Pydantic data modelsproviders/: AI provider integrationsutils/: Utility functions
electron/src/main/: Main process — app lifecycle, IPC, WebSocket bridge, automation modulesindex.ts: App entry, window creation, tray, IPC registrationauth.ts: Supabase Google OAuth with local HTTP callback serverws-bridge.ts: WebSocket client to backend with auto-reconnectwindow-manager.ts: Window modes, animation, opacity, screenshot hidinglocal-executor.ts: Command dispatch registry (50+ commands)desktop-automation.ts: Platform-specific mouse/keyboard (Win32/macOS/Linux)browser-automation.ts: Puppeteer-core browser controlterminal.ts: Shell execution (PowerShell/bash)file-ops.ts: File system operationsscreenshot.ts: Desktop capture via Electron APIpermissions.ts: macOS permission checksauto-updater.ts: Auto-update lifecycle
src/preload/: Context bridge exposingwindow.coastyAPIsrc/renderer/: React 19 UIstores/: Zustand stores (auth, connection, chat, window)components/: AuthScreen, Overlay, MessageList, PermissionsGuardhooks/:useChatSubmitfor chat message flowlib/: API client, SSE parser, utilities
build/: Icons (ico/icns/png), macOS entitlements plistelectron-builder.yml: Build config (NSIS, DMG, AppImage)electron.vite.config.ts: Vite config with env injection
docker/ai-desktop/: Ubuntu desktop container with AI agents- Includes Chrome, Node.js, Python, automation tools
- WebSocket server for agent communication
- VNC server for remote desktop access
-
Frontend: Create provider in
lib/providers/your-provider.ts- Implement
streamChat()method with tool calling support - Add to
lib/providers/index.ts
- Implement
-
Backend: Add provider support in
backend/app/providers/- Configure API keys in environment
- Update model lists in
models.py
- Add agent type to
AgentTypeenum inmulti_agent_executor.py - Create agent prompt in
_get_*_agent_prompt()method - Define agent tools in
_get_*_tools()method - Update task planner to recognize new agent type
- Create tool function in
backend/app/api/routes/chat_vm_tools.py - Define tool schema (name, description, parameters)
- Add tool to appropriate agent's tool list in
multi_agent_executor.py - Implement tool execution in VM agent server (if needed)
- Create handler function in the appropriate module (
desktop-automation.ts,browser-automation.ts,terminal.ts,file-ops.ts, or a new module) - Register the command name → handler mapping in
local-executor.tsregisterHandlers() - If the command involves UI interaction (clicks, typing), wrap with
this.withOverlayHidden() - Add parameter normalization in
normalizeParams()if backend sends different param names - Backend sends commands via the WebSocket bridge as
{ type: 'command', data: { command, parameters } }
- Add the function in
desktop-automation.tswithprocess.platformbranching - Windows: Use
runPowershell()with user32.dll P/Invoke viaAdd-Type @"..."@ - macOS: Use
runSwift()for CoreGraphics orrunBash()withosascriptfor System Events - Linux: Use
runBash()withxdotoolorwmctrl - Register in
local-executor.tsand wrap withwithOverlayHidden()if it interacts with the desktop
- VM connections are persistent with auto-reconnection
- Heartbeat mechanism prevents stale connections
- Connection reuse minimizes latency
- Password authentication for VNC access
- Tool responses are truncated to prevent context overflow (5000 chars)
frontendScreenshotfield is preserved and not sent to model- Screenshots are compressed (JPEG, 1280x720 max) before transmission
- All AI responses stream via Server-Sent Events (SSE)
- Tool calls and results stream separately from text
- Frontend accumulates chunks and updates UI reactively
finishevent signals completion with full content
- Tasks execute sequentially (no parallel execution)
- Each task receives context from all previous completed tasks
- Tasks can request user input via
[NEED_USER_INPUT]markers - Execution stops if agent encounters critical blocker
- Search-First: Always use Google Search before opening browser
- Minimal Browsing: Only open browser when action is required (forms, clicks, purchases)
- State Validation: Use
browser_state()to verify actions - Tab Management: Reuse tabs instead of excessive navigation
- CSRF protection on all state-changing operations
- API keys encrypted with
ENCRYPTION_KEY(BYOK feature) - Rate limiting on backend endpoints
- Supabase Row Level Security (RLS) for data access
- No credentials stored in VM environments
- Electron: Context isolation enabled, node integration disabled, sandbox=false (required for native modules)
- Electron: Auth tokens sent in WebSocket message body, not URL params (avoids proxy/CDN logging)
- Electron: Window title passed via env var (
_COASTY_WIN_TITLE) to avoid shell injection in window switching
- Design API endpoints in
backend/app/api/routes/ - Implement business logic in
backend/app/services/ - Create frontend components in
components/ - Add state management in appropriate store (
lib/*-store/) - Wire up API calls in
lib/services/or route handlers
- Check WebSocket bridge connection in console —
[WS Bridge]log prefix shows connect/disconnect/auth events - Verify backend
/api/electron/wsendpoint is running and accepting connections - On Windows: if overlay loses always-on-top, check
window-manager.tsz-order workaround logic - On macOS: if automation fails, check Screen Recording + Accessibility permissions via
permissions.ts - Browser automation: ensure Chrome/Edge/Brave is installed; Puppeteer uses isolated temp profile to avoid locks
- If clicks/typing don't work: verify overlay is hiding before desktop actions (
withOverlayHiddenwrapper) - Auth issues: check local HTTP callback server port binding, Supabase OAuth redirect URL config
- Check WebSocket connection status in
vm_control.pylogs - Verify agent tools are registered in
multi_agent_executor.py - Test tool execution with reduced context
- Check container logs:
docker logs <container-id> - Verify VNC connection:
ws://localhost:8081(localhost) orws://<ip>:8080
- Frontend: Use React.memo for expensive components, lazy load routes
- Backend: Enable caching in
cache.py, optimize database queries - Streaming: Batch small chunks, compress screenshots
- VM: Reuse connections, minimize tool calls, truncate responses
- Supabase migrations handled via Supabase Dashboard or CLI
- Schema changes require updating Supabase types in
types/supabase.ts - Run
supabase gen types typescriptto regenerate types
- Frontend: 3000 (Next.js dev server)
- Backend: 8001 (FastAPI server)
- VM Agent WebSocket: 8080 (remote), 8081 (localhost)
- VNC: 5900 (desktop access)
- Supabase: Hosted service (URLs in .env)
- Frontend uses React Server Components where applicable for better performance
- Backend runs on uvicorn with auto-reload in development
- VM containers are ephemeral and should be treated as stateless
- Billing system tracks agent usage by session duration
- Multi-model support allows users to switch providers mid-conversation
- Screenshot compression is critical for performance (JPEG, 70% quality)
- Electron app launches at login (packaged builds), runs as always-on-top overlay on all virtual desktops
- Electron auto-updates via generic provider at
https://updates.coasty.ai(checks every 4 hours) - Electron browser automation uses
puppeteer-corewith temp user-data-dir to avoid Chrome profile locks - Electron overlay auto-hides during desktop automation to prevent interfering with clicks/screenshots