CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Project Overview

Coasty is a full-stack AI collaboration platform with computer automation capabilities. It features a Next.js frontend with a FastAPI Python backend that orchestrates multi-agent AI systems capable of browser automation, terminal operations, and desktop control through containerized virtual machines. A cross-platform Electron desktop app provides a lightweight overlay that executes AI agent commands directly on the user's local machine.

Architecture

Frontend (Next.js 15 + React 19)

Framework: Next.js 15 with App Router, TypeScript, Tailwind CSS
State Management: Zustand stores for chat, models, user, and sessions
Key Libraries:
- Vercel AI SDK (ai) for streaming LLM responses
- Radix UI for accessible components
- Supabase for authentication and database
- Stripe for billing/subscriptions
Provider System: Multi-provider AI support (OpenAI, Anthropic, Azure, Google, Mistral, xAI, OpenRouter, Perplexity)

Backend (Python FastAPI)

Framework: FastAPI with async/await patterns
Key Services:
- multi_agent_executor.py: Orchestrates multi-agent task execution with browser, terminal, and desktop agents
- vm_control.py: WebSocket-based VM control with persistent connections and auto-reconnection
- database.py: Supabase integration for user data, chats, and billing
- agent_billing.py: Tracks usage and credits for agent sessions
- search.py: Google Custom Search API integration
API Routes: /api/chat, /api/models, /api/search, /api/vm, /api/billing, /api/files

VM Agent System

Architecture: Docker containers running Ubuntu 22.04 with XFCE desktop
Agent Types:
- Browser Agent: Web automation using Chrome with remote debugging (search-first strategy)
- Terminal Agent: Command execution and file operations
- Desktop Agent: UI automation with screenshot analysis
Communication: WebSocket protocol on port 8080 (8081 for localhost)
Tools: Each agent has specialized tools (browser navigation, terminal commands, desktop controls)

Electron Desktop App (`electron/`)

A cross-platform Electron app (v40.6.0) that runs as a floating overlay on the user's desktop, executing AI agent commands locally instead of in a remote VM.

Build System: electron-vite + electron-builder, React 19 + Tailwind CSS renderer
Version: 1.5.0 (com.coasty.desktop)
Key Dependencies: puppeteer-core (browser automation), ws (WebSocket client)

Main Process (`electron/src/main/`)

index.ts: App entry — creates frameless transparent window, system tray, registers IPC handlers, auto-updater
auth.ts: Google OAuth via Supabase implicit flow — spins up local HTTP server on random port for callback, extracts tokens from URL fragment via HTML redirect trick, auto-refreshes tokens 5min before expiry
ws-bridge.ts: Persistent WebSocket to backend /api/electron/ws — sends system info as URL params, auth credentials in first message body (not URL), auto-reconnect with exponential backoff (max 15s), 30s heartbeat
window-manager.ts: Three modes — auth (400x500 centered), compact (360x56 top-center pill), expanded (400x520 chat panel). Smooth animation (320ms quintic ease-out), always-on-top management, opacity control (0.15–1.0), hides before screenshots
local-executor.ts: Command handler registry (50+ commands) — maps backend command names to local handlers, normalizes params (filepath→path, find→old_text), auto-hides overlay during UI interactions
desktop-automation.ts: Platform-specific mouse/keyboard/scroll/drag operations
browser-automation.ts: Puppeteer-core controlling installed Chrome/Edge/Brave with isolated temp user-data-dir
terminal.ts: Session-based shell execution (PowerShell on Windows, bash on Unix), 30s timeout
file-ops.ts: File system CRUD (read, write, edit, append, delete, directory listing)
screenshot.ts: Electron desktopCapturer API, resized to max 1280px, JPEG 70% quality
permissions.ts: macOS-only — checks Screen Recording and Accessibility permissions
auto-updater.ts: Generic update provider at https://updates.coasty.ai, checks every 4 hours

Platform-Specific Implementations

Windows:

Desktop Automation: PowerShell + user32.dll P/Invoke (mouse_event, keybd_event, SendKeys)
- Click/double-click via System.Windows.Forms.Cursor + mouse_event DLL calls
- Typing via SendKeys::SendWait() with special character escaping
- Key combos via keybd_event with virtual key codes (supports Win key, modifiers)
- Scroll via MOUSEEVENTF_WHEEL (120 units per notch)
- Drag via cursor position + mousedown/mouseup sequence
Browser Discovery: Checks Program Files, Program Files (x86), LocalAppData for Chrome/Edge/Brave; falls back to where.exe PATH search
Terminal: Uses powershell.exe -Command for all shell execution
Window Management: PowerShell + user32.dll ShowWindow for minimize/maximize/restore/close; Microsoft.VisualBasic.Interaction.AppActivate for window switching
Window Z-Order Workaround: Transparent frameless windows lose always-on-top on Windows; fix is hide→apply bounds→show sequence on auth→overlay transition, with retries at 600ms/1200ms/2000ms/3000ms
Build: NSIS installer, allows custom install directory, desktop + start menu shortcuts

macOS:

Desktop Automation: Swift scripts via CoreGraphics + osascript
- Click/double-click via CGEvent with proper mouseEventClickState for double-clicks
- Typing via osascript 'tell application "System Events" to keystroke'
- Key combos via osascript with modifier mapping (ctrl→control down, cmd→command down) + CGKeyCode for special keys
- Scroll via CGEvent(scrollWheelEvent2Source:) with line units
- Drag via CGEvent sequence (leftMouseDown → leftMouseDragged → leftMouseUp)
Browser Discovery: Checks /Applications/ for Google Chrome, Microsoft Edge, Brave Browser .app bundles; falls back to which
Terminal: Uses /bin/bash -c for shell execution
Permissions: Checks Screen Recording (getMediaAccessStatus + actual capture fallback) and Accessibility (isTrustedAccessibilityClient); provides System Preferences deep links
Window Behavior: setVisibleOnAllWorkspaces for macOS Spaces, dock icon set separately via app.dock.setIcon
App Lifecycle: Does not quit on window close (window-all-closed ignored on darwin), activate event re-creates window
Build: DMG + ZIP targets, hardened runtime, code signing, notarization, entitlements for accessibility/screen recording

Linux:

Desktop Automation: xdotool for mouse/keyboard, wmctrl for window management
Browser Discovery: Checks /usr/bin/ for google-chrome, chromium-browser, microsoft-edge, brave-browser
Build: AppImage target

Renderer Process (`electron/src/renderer/`)

Zustand Stores: auth-store (user session), connection-store (WebSocket state), chat-store (messages, tool invocations, chat CRUD), window-store (mode sync)
SSE Parser (lib/sse-parser.ts): Parses backend events — text (0), error (3), tool call (9), tool result (a), reasoning (g), finish (d)
Key Components: AuthScreen (Google OAuth), Overlay (pill bar + expanded chat panel), PermissionsGuard (macOS permission prompts), MessageList, ChatHistory

Preload Bridge (`electron/src/preload/`)

Exposes window.coasty TypeScript API: auth methods, bridge control, chat CRUD, credits, window control (mode, opacity), update control, macOS permissions, event listeners

IPC Communication

Auth: auth:sign-in, auth:sign-out, auth:get-session, auth:get-token
WebSocket Bridge: bridge:connect, bridge:disconnect, bridge:get-state
Chat CRUD: Create, list, get messages, update, delete — all call FastAPI backend with Bearer token
Window: window:set-mode, window:set-opacity, window:get-opacity
Permissions: permissions:check, permissions:request-accessibility, permissions:open-screen-recording
Updates: update:get-status, update:get-version, update:install
Machine ID: Deterministic UUID v5 hash of electron-{user_id}-{hostname}-{username}-{platform}

Backend Integration

/api/electron/ws — WebSocket for receiving and executing commands
/api/chats/create, /api/chats/list, /api/chats/{id}/messages — Chat persistence
/api/chat/ — SSE streaming chat responses
/api/billing/credits/balance — Credit checks

Key Design Patterns

Multi-Agent Execution Flow

Task Planning: LLM decomposes user request into sequential subtasks
Agent Assignment: Each subtask assigned to specialized agent (browser/terminal/desktop)
Sequential Execution: Tasks execute in order (no dependencies system)
Context Passing: Previous task summaries passed to next task for context
Streaming: All execution streams via Server-Sent Events to frontend

Provider Architecture

Located in lib/providers/ and backend/app/providers/
Each provider implements streaming chat with tool calling
Frontend providers handle model selection and API routing
Backend providers execute tools and manage agent workflows

State Management

Chat Store (lib/chat-store/): Manages conversations, messages, attachments
Model Store (lib/model-store/): Available models and provider configurations
User Store (lib/user-store/): User profile and authentication state
VM Store (lib/vm-store/): Virtual machine sessions and connections

Development Commands

Frontend Development

# Install dependencies
npm install

# Development server (with Turbopack)
npm run dev

# Production build
npm run build

# Start production server
npm start

# Type checking
npm run type-check

# Linting
npm run lint

Backend Development

# Navigate to backend directory
cd backend

# Create virtual environment (first time)
python -m venv venv

# Activate virtual environment
# Windows:
venv\Scripts\activate
# Linux/Mac:
source venv/bin/activate

# Install dependencies
pip install -r requirements.txt

# Run development server (from backend directory)
python main.py

# Or use the helper script
# Windows:
.\run_backend.bat
# Linux/Mac:
./run_backend.sh

Electron Desktop App Development

# Navigate to electron directory
cd electron

# Install dependencies
npm install

# Development mode (with hot reload)
npm run dev

# Build (compile TypeScript)
npm run build

# Package for current platform
npm run package

# Package for specific platform
npm run package:win    # Windows NSIS installer
npm run package:mac    # macOS DMG + ZIP
npm run package:linux  # Linux AppImage

Docker Deployment

# Build and start all services
docker-compose up --build

# Start services in detached mode
docker-compose up -d

# Stop services
docker-compose down

# View logs
docker-compose logs -f

# AI Desktop container (separate compose file)
docker-compose -f docker-compose.ai-desktop.yml up --build

Testing

# Backend tests
cd backend
pytest

# Run specific test file
pytest tests/test_specific.py

# Run with coverage
pytest --cov=app tests/

Environment Configuration

Frontend Environment Variables (.env)

NEXT_PUBLIC_SUPABASE_URL: Supabase project URL
NEXT_PUBLIC_SUPABASE_ANON_KEY: Supabase anonymous key
SUPABASE_SERVICE_ROLE: Supabase service role key (server-side)
CSRF_SECRET: CSRF protection secret (required)
ENCRYPTION_KEY: For encrypting user API keys (required for BYOK)
PYTHON_BACKEND_URL: Backend API URL (default: http://0.0.0.0:8001)
NEXT_PUBLIC_BACKEND_URL: Public backend URL (default: http://localhost:8001)
Azure credentials for VM provisioning (AZURE_*)
Stripe keys for billing (STRIPE_*)
Google Search API keys (GOOGLE_SEARCH_*)

Backend Environment Variables (backend/.env)

DEBUG: Enable debug mode (true/false)
CORS_ORIGINS: Allowed CORS origins (comma-separated)
SUPABASE_URL, SUPABASE_ANON_KEY, SUPABASE_SERVICE_ROLE: Supabase config
CSRF_SECRET, ENCRYPTION_KEY: Security keys (must match frontend)
GOOGLE_SEARCH_KEY, GOOGLE_SEARCH_CX: Google Custom Search API

Electron Environment Variables (electron/.env)

COASTY_BACKEND_URL: Backend API endpoint (default: http://localhost:8001)
NEXT_PUBLIC_SUPABASE_URL: Supabase auth/DB URL (shared with frontend)
NEXT_PUBLIC_SUPABASE_ANON_KEY: Supabase anon key (shared with frontend)
These are injected at build time via electron-vite define config

See .env.example and backend/.env.example for complete configuration templates.

Code Organization

Frontend Structure

app/: Next.js app directory with routes and layouts
- c/[chatId]/: Individual chat pages
- api/: API route handlers (Next.js API routes)
- auth/, billing/, account/: Feature-specific pages
components/: Reusable React components
- ui/: shadcn/ui components (Radix UI based)
- common/: Shared components (chat interface, message display)
- prompt-kit/: Prompt-related components
lib/: Business logic and utilities
- providers/: AI provider implementations
- chat-store/, model-store/, user-store/: Zustand state stores
- supabase/: Database client and queries
- services/: Service layer (API calls, utilities)

Backend Structure

backend/app/
- api/routes/: FastAPI route handlers
- services/: Core business logic
  - multi_agent_executor.py: Multi-agent orchestration
  - vm_control.py: VM WebSocket management
  - database.py: Supabase operations
  - agent_billing.py: Usage tracking
- core/: Configuration, middleware, logging
- models/: Pydantic data models
- providers/: AI provider integrations
- utils/: Utility functions

Electron Structure

electron/
- src/main/: Main process — app lifecycle, IPC, WebSocket bridge, automation modules
  - index.ts: App entry, window creation, tray, IPC registration
  - auth.ts: Supabase Google OAuth with local HTTP callback server
  - ws-bridge.ts: WebSocket client to backend with auto-reconnect
  - window-manager.ts: Window modes, animation, opacity, screenshot hiding
  - local-executor.ts: Command dispatch registry (50+ commands)
  - desktop-automation.ts: Platform-specific mouse/keyboard (Win32/macOS/Linux)
  - browser-automation.ts: Puppeteer-core browser control
  - terminal.ts: Shell execution (PowerShell/bash)
  - file-ops.ts: File system operations
  - screenshot.ts: Desktop capture via Electron API
  - permissions.ts: macOS permission checks
  - auto-updater.ts: Auto-update lifecycle
- src/preload/: Context bridge exposing window.coasty API
- src/renderer/: React 19 UI
  - stores/: Zustand stores (auth, connection, chat, window)
  - components/: AuthScreen, Overlay, MessageList, PermissionsGuard
  - hooks/: useChatSubmit for chat message flow
  - lib/: API client, SSE parser, utilities
- build/: Icons (ico/icns/png), macOS entitlements plist
- electron-builder.yml: Build config (NSIS, DMG, AppImage)
- electron.vite.config.ts: Vite config with env injection

Docker Structure

docker/ai-desktop/: Ubuntu desktop container with AI agents
- Includes Chrome, Node.js, Python, automation tools
- WebSocket server for agent communication
- VNC server for remote desktop access

Key Workflows

Adding a New AI Provider

Frontend: Create provider in lib/providers/your-provider.ts
- Implement streamChat() method with tool calling support
- Add to lib/providers/index.ts
Backend: Add provider support in backend/app/providers/
- Configure API keys in environment
- Update model lists in models.py

Creating a New Agent Type

Add agent type to AgentType enum in multi_agent_executor.py
Create agent prompt in _get_*_agent_prompt() method
Define agent tools in _get_*_tools() method
Update task planner to recognize new agent type

Adding New VM Tools

Create tool function in backend/app/api/routes/chat_vm_tools.py
Define tool schema (name, description, parameters)
Add tool to appropriate agent's tool list in multi_agent_executor.py
Implement tool execution in VM agent server (if needed)

Adding a New Electron Local Command

Create handler function in the appropriate module (desktop-automation.ts, browser-automation.ts, terminal.ts, file-ops.ts, or a new module)
Register the command name → handler mapping in local-executor.ts registerHandlers()
If the command involves UI interaction (clicks, typing), wrap with this.withOverlayHidden()
Add parameter normalization in normalizeParams() if backend sends different param names
Backend sends commands via the WebSocket bridge as { type: 'command', data: { command, parameters } }

Adding Platform-Specific Desktop Automation

Add the function in desktop-automation.ts with process.platform branching
Windows: Use runPowershell() with user32.dll P/Invoke via Add-Type @"..."@
macOS: Use runSwift() for CoreGraphics or runBash() with osascript for System Events
Linux: Use runBash() with xdotool or wmctrl
Register in local-executor.ts and wrap with withOverlayHidden() if it interacts with the desktop

Important Technical Details

WebSocket Connection Management

VM connections are persistent with auto-reconnection
Heartbeat mechanism prevents stale connections
Connection reuse minimizes latency
Password authentication for VNC access

Tool Response Handling

Tool responses are truncated to prevent context overflow (5000 chars)
frontendScreenshot field is preserved and not sent to model
Screenshots are compressed (JPEG, 1280x720 max) before transmission

Streaming Architecture

All AI responses stream via Server-Sent Events (SSE)
Tool calls and results stream separately from text
Frontend accumulates chunks and updates UI reactively
finish event signals completion with full content

Task Execution Rules

Tasks execute sequentially (no parallel execution)
Each task receives context from all previous completed tasks
Tasks can request user input via [NEED_USER_INPUT] markers
Execution stops if agent encounters critical blocker

Browser Agent Strategy

Search-First: Always use Google Search before opening browser
Minimal Browsing: Only open browser when action is required (forms, clicks, purchases)
State Validation: Use browser_state() to verify actions
Tab Management: Reuse tabs instead of excessive navigation

Security Considerations

CSRF protection on all state-changing operations
API keys encrypted with ENCRYPTION_KEY (BYOK feature)
Rate limiting on backend endpoints
Supabase Row Level Security (RLS) for data access
No credentials stored in VM environments
Electron: Context isolation enabled, node integration disabled, sandbox=false (required for native modules)
Electron: Auth tokens sent in WebSocket message body, not URL params (avoids proxy/CDN logging)
Electron: Window title passed via env var (_COASTY_WIN_TITLE) to avoid shell injection in window switching

Common Development Tasks

Adding a New Feature

Design API endpoints in backend/app/api/routes/
Implement business logic in backend/app/services/
Create frontend components in components/
Add state management in appropriate store (lib/*-store/)
Wire up API calls in lib/services/ or route handlers

Debugging Electron Desktop App Issues

Check WebSocket bridge connection in console — [WS Bridge] log prefix shows connect/disconnect/auth events
Verify backend /api/electron/ws endpoint is running and accepting connections
On Windows: if overlay loses always-on-top, check window-manager.ts z-order workaround logic
On macOS: if automation fails, check Screen Recording + Accessibility permissions via permissions.ts
Browser automation: ensure Chrome/Edge/Brave is installed; Puppeteer uses isolated temp profile to avoid locks
If clicks/typing don't work: verify overlay is hiding before desktop actions (withOverlayHidden wrapper)
Auth issues: check local HTTP callback server port binding, Supabase OAuth redirect URL config

Debugging VM Agent Issues

Check WebSocket connection status in vm_control.py logs
Verify agent tools are registered in multi_agent_executor.py
Test tool execution with reduced context
Check container logs: docker logs <container-id>
Verify VNC connection: ws://localhost:8081 (localhost) or ws://<ip>:8080

Optimizing Performance

Frontend: Use React.memo for expensive components, lazy load routes
Backend: Enable caching in cache.py, optimize database queries
Streaming: Batch small chunks, compress screenshots
VM: Reuse connections, minimize tool calls, truncate responses

Database Migrations

Supabase migrations handled via Supabase Dashboard or CLI
Schema changes require updating Supabase types in types/supabase.ts
Run supabase gen types typescript to regenerate types

Ports and Services

Frontend: 3000 (Next.js dev server)
Backend: 8001 (FastAPI server)
VM Agent WebSocket: 8080 (remote), 8081 (localhost)
VNC: 5900 (desktop access)
Supabase: Hosted service (URLs in .env)

Additional Notes

Frontend uses React Server Components where applicable for better performance
Backend runs on uvicorn with auto-reload in development
VM containers are ephemeral and should be treated as stateless
Billing system tracks agent usage by session duration
Multi-model support allows users to switch providers mid-conversation
Screenshot compression is critical for performance (JPEG, 70% quality)
Electron app launches at login (packaged builds), runs as always-on-top overlay on all virtual desktops
Electron auto-updates via generic provider at https://updates.coasty.ai (checks every 4 hours)
Electron browser automation uses puppeteer-core with temp user-data-dir to avoid Chrome profile locks
Electron overlay auto-hides during desktop automation to prevent interfering with clicks/screenshots

Uh oh!

FilesExpand file tree

CLAUDE.md

Latest commit

History

CLAUDE.md

File metadata and controls

CLAUDE.md

Project Overview

Architecture

Frontend (Next.js 15 + React 19)

Backend (Python FastAPI)

VM Agent System

Electron Desktop App (electron/)

Main Process (electron/src/main/)

Platform-Specific Implementations

Renderer Process (electron/src/renderer/)

Preload Bridge (electron/src/preload/)

IPC Communication

Backend Integration

Key Design Patterns

Multi-Agent Execution Flow

Provider Architecture

State Management

Development Commands

Frontend Development

Backend Development

Electron Desktop App Development

Docker Deployment

Testing

Environment Configuration

Frontend Environment Variables (.env)

Backend Environment Variables (backend/.env)

Electron Environment Variables (electron/.env)

Code Organization

Frontend Structure

Backend Structure

Electron Structure

Docker Structure

Key Workflows

Adding a New AI Provider

Creating a New Agent Type

Adding New VM Tools

Adding a New Electron Local Command

Adding Platform-Specific Desktop Automation

Important Technical Details

WebSocket Connection Management

Tool Response Handling

Streaming Architecture

Task Execution Rules

Browser Agent Strategy

Security Considerations

Common Development Tasks

Adding a New Feature

Debugging Electron Desktop App Issues

Debugging VM Agent Issues

Optimizing Performance

Database Migrations

Ports and Services

Additional Notes

Electron Desktop App (`electron/`)

Main Process (`electron/src/main/`)

Renderer Process (`electron/src/renderer/`)

Preload Bridge (`electron/src/preload/`)