Run AI models locally on your desktop — private, fast, and fully under your control.
Backend.AI GO is a cross-platform desktop application that lets you run Large Language Models (LLMs) directly on your machine. Download models from Hugging Face, chat with AI privately, connect to cloud providers, or scale to other Backend.AI GO instances and enterprise clusters when you need more power.
Run popular models like Gemma 3, Qwen3, Llama, and Mistral entirely on your hardware. Your conversations stay on your machine — no data leaves your computer.
- Apple Silicon (MLX): Native acceleration for M1/M2/M3/M4/M5 chips
- NVIDIA GPU (CUDA): Full GPU acceleration on Windows and Linux
- AMD GPU (HIP/ROCm): Support for AMD graphics cards
- Intel GPU (SYCL): Acceleration for Intel Arc and Iris graphics
- CPU: Optimized inference for systems without dedicated GPUs
Seamlessly combine local models with cloud APIs. Use local models for sensitive data, switch to GPT, Claude, or Gemini for complex tasks — all from the same interface.
Supported providers:
- OpenAI (GPT-5.2, GPT Image 1.5, etc.)
- Anthropic (Claude 4.5 Sonnet, Claude 4.5 Opus)
- Google (Gemini 3 Pro, Gemini 3 Flash)
- Any OpenAI-compatible endpoint (Ollama, LocalAI, vLLM, etc.)
Transform your AI from a simple chatbot into an autonomous assistant:
- Web Search: Get real-time information from the internet
- File Operations: Read, analyze, and manage local files
- Code Execution: Run Python scripts and shell commands
- Calculator: Perform precise mathematical calculations
Connect to any MCP-compatible server to extend your AI's capabilities. Access databases, APIs, and custom tools through a standardized protocol.
Use Backend.AI GO as a local API backend for your favorite AI tools. Any application that supports the OpenAI API can connect to your locally running models.
Scale beyond your local hardware by connecting to other Backend.AI GO instances or Backend.AI clusters. Visualize your network topology in real-time with the interactive Mesh view.
| Component | Minimum | Recommended |
|---|---|---|
| RAM | 8 GB | 16 GB or more |
| Storage | 10 GB free | 50 GB+ (for multiple models) |
| GPU (Optional) | 4 GB VRAM | 8 GB+ VRAM |
| Platform | Architecture | Notes |
|---|---|---|
| macOS | Apple Silicon (arm64) | M1/M2/M3/M4/M5 chips. Intel not supported. |
| Windows | x64 | Windows 10/11. NVIDIA GPU recommended. |
| Linux | x64 | Ubuntu/Debian (.deb) or Flatpak |
Download the latest version for your platform from the Releases page.
| Platform | Package |
|---|---|
| macOS (Apple Silicon) | backend.ai-go-x.x.x-macos-arm64.dmg |
| Windows | backend.ai-go-x.x.x-windows-x64-setup.exe |
| Linux (Debian/Ubuntu) | backend-ai-go-x.x.x-linux-x64.deb |
| Linux (Other distros) | backend-ai-go-x.x.x-linux-x64.flatpak |
brew tap lablup/tap
brew install --cask backend-ai-go- Download the
.dmgfile - Open and drag Backend.AI GO to your Applications folder
- On first launch, you may need to allow the app in System Settings > Privacy & Security
- Download and run the
.exeinstaller - Follow the installation wizard
- For best performance, ensure NVIDIA drivers are up to date
# Debian/Ubuntu
sudo dpkg -i backend-ai-go-x.x.x-linux-x64.deb
sudo apt-get install -f
# Flatpak
flatpak install backend-ai-go-x.x.x-linux-x64.flatpak- Click the Search icon in the sidebar
- Search for a model (e.g.,
Gemma3-4B,Qwen3-4B) - Look for GGUF format (cross-platform) or MLX (macOS only)
- Click Download on your chosen variant (Q4_K_M recommended for balance of speed and quality)
- Go to the Models tab
- Find your downloaded model
- Click Load and wait for the status to show "Ready"
- Click the Chat icon in the sidebar
- Type your message and press Enter
- Your AI responds entirely locally — no internet required
Backend.AI GO integrates multiple inference engines to provide optimal performance across different hardware and use cases.
| Engine | Format | Platform | Best For |
|---|---|---|---|
| llama.cpp | GGUF | All platforms | Cross-platform LLM inference with CPU/GPU support |
| mlx-lm | MLX | macOS only | Maximum LLM performance on Apple Silicon |
| stable-diffusion.cpp | GGUF | All platforms | Local image generation |
| mlxcel | MLX | macOS only | Experimental MLX-based inference engine by Lablup (not yet public) |
The application automatically checks for updates on startup. You can also manually check via Settings > Check for Updates.
For detailed guides and advanced features, visit the Backend.AI GO Documentation.
Backend.AI GO is developed and maintained by Lablup Inc. as part of the Backend.AI project.
We are preparing a discussion channel for bug reports and feature requests. Stay tuned for updates.