Benchmarking intelligence efficiency for LLM inference systems.
Intelligence Per Watt measures accuracy alongside energy for any LLM inference system. It profiles single-turn and multi-turn agentic workloads, captures per-query energy telemetry, and computes two efficiency metrics: Intelligence Per Joule (IPJ) and Intelligence Per Watt (IPW).
- Python >= 3.13 -- managed with uv
- Rust compiler -- for the energy monitor (install)
- protoc -- Protocol Buffer compiler (install)
- An inference runtime -- Ollama, vLLM, or an OpenAI-compatible API
See Prerequisites for platform-specific setup (NVIDIA NVML, AMD ROCm, Apple Silicon, Linux RAPL).
pip install intelligence-per-wattOr from source:
git clone https://github.com/HazyResearch/intelligence-per-watt.git
cd intelligence-per-watt
uv venv && source .venv/bin/activate
uv run scripts/build_energy_monitor.py # Build Rust energy monitor
uv pip install -e intelligence-per-wattThere is also an automated setup script that handles virtual environment creation, package installation, and energy monitor build:
bash intelligence-per-watt/scripts/setup.shOptional extras: ollama, vllm, react, openhands, terminus, agents, tavily, flops, all.
# Run the test suite
pytest intelligence-per-watt
# Check the CLI
ipw --help
# Test energy monitoring on your hardware
uv run scripts/test_energy_monitor.pyProfile an inference server:
ipw profile --client ollama --model llama3.2:1b --client-base-url http://localhost:11434Run an agentic benchmark:
ipw run --agent react --model gpt-4o --dataset gaia --max-queries 10Analyze and plot results:
ipw analyze ./runs/profile_*
ipw plot ./runs/profile_*Each query captures: energy (Joules), power (Watts), GPU/CPU memory, temperature, TTFT, throughput, token counts, API cost, and FLOPs.
Inference clients -- Ollama, vLLM (offline), OpenAI-compatible servers
Agent harnesses -- ReAct (Agno), OpenHands, Terminus
Benchmarks -- MMLU-Pro, SuperGPQA, GAIA, FRAMES, HLE, SimpleQA, SWE-bench, SWEfficiency, TerminalBench, and a built-in 1K mixed set
Energy telemetry -- Rust gRPC service (50ms sampling) with NVIDIA NVML, AMD ROCm, Apple Silicon powermetrics, and Linux RAPL collectors
Evaluation -- LLM-as-judge, MCQ exact match, and task-specific scorers
ipw/
├── cli/ CLI commands (profile, run, analyze, plot, list)
├── clients/ Inference adapters (Ollama, vLLM, OpenAI)
├── agents/ Agent harnesses with per-turn telemetry
├── datasets/ Dataset providers (10+ benchmarks)
├── evaluation/ Scoring handlers
├── analysis/ IPJ/IPW computation, regression fitting
├── execution/ ProfilerRunner, AgenticRunner, TelemetrySession
└── telemetry/ Energy monitor launcher + gRPC collector
energy-monitor/ Rust gRPC service with platform-specific collectors
All components self-register via the registry pattern (@ClientRegistry.register("id"), etc.) and are resolved by string key through the CLI.
Intelligence Per Watt is a research initiative studying the efficiency of on-device AI systems. The project is developed at Hazy Research and the Scaling Intelligence Lab at Stanford SAIL.
Laude Institute • Stanford Marlowe • Google Cloud Platform • Lambda Labs
If you use Intelligence Per Watt in your research, please cite:
@misc{saadfalcon2025intelligencewattmeasuringintelligence,
title={Intelligence per Watt: Measuring Intelligence Efficiency of Local AI},
author={Jon Saad-Falcon and Avanika Narayan and Hakki Orhun Akengin and J. Wes Griffin and Herumb Shandilya and Adrian Gamarra Lafuente and Medhya Goel and Rebecca Joseph and Shlok Natarajan and Etash Kumar Guha and Shang Zhu and Ben Athiwaratkun and John Hennessy and Azalia Mirhoseini and Christopher Ré},
year={2025},
eprint={2511.07885},
archivePrefix={arXiv},
primaryClass={cs.DC},
url={https://arxiv.org/abs/2511.07885},
}