Ollama Benchmark

A lightweight benchmarking tool for measuring LLM inference performance through Ollama. Get detailed tokens-per-second metrics, load times, and generation speed for any model running on your local hardware.

Features

Prompt processing speed — measures tokens/sec for input evaluation
Generation speed — measures tokens/sec for output generation
Model load time — tracks cold-start overhead
Multi-model comparison — benchmark several models in a single run
Table output — side-by-side comparison across models with -t
Custom prompts — supply your own prompts or use the built-in test suite
Zero dependencies on cloud — everything runs locally through Ollama

Example Output

Dual 3090 Ti GPU, Epyc 7763 CPU — Ubuntu 22.04:

----------------------------------------------------
        Model: deepseek-r1:70b
        Performance Metrics:
            Prompt Processing:  336.73 tokens/sec
            Generation Speed:   17.65 tokens/sec
            Combined Speed:     18.01 tokens/sec

        Workload Stats:
            Input Tokens:       165
            Generated Tokens:   7673
            Model Load Time:    6.11s
            Processing Time:    0.49s
            Generation Time:    434.70s
            Total Time:         441.31s
----------------------------------------------------

Single 3090 GPU, 13900KS CPU — WSL2 (Ubuntu 22.04) on Windows 11:

----------------------------------------------------
        Model: deepseek-r1:32b
        Performance Metrics:
            Prompt Processing:  399.05 tokens/sec
            Generation Speed:   27.18 tokens/sec
            Combined Speed:     27.58 tokens/sec

        Workload Stats:
            Input Tokens:       168
            Generated Tokens:   10601
            Model Load Time:    15.44s
            Processing Time:    0.42s
            Generation Time:    390.00s
            Total Time:         405.87s
----------------------------------------------------

Getting Started

Prerequisites

Python 3.11 or higher
Ollama installed and running

Installation

Using pip (recommended):

pip install git+https://github.com/LarHope/ollama-benchmark.git

From source:

git clone https://github.com/LarHope/ollama-benchmark.git
cd ollama-benchmark
python3 -m venv venv && source venv/bin/activate  # or .\venv\Scripts\activate on Windows
pip install -e .

Usage

Make sure the Ollama server is running:

ollama serve

Run benchmarks:

# Benchmark all available models with default prompts
ollama-benchmark

# Benchmark specific models
ollama-benchmark --models deepseek-r1:70b llama3:8b

# Custom prompts
ollama-benchmark --models mistral --prompts "Write a hello world in Rust" "Explain quantum computing"

# Table comparison output
ollama-benchmark --table_output --models deepseek-r1:70b deepseek-r1:32b llama3:8b

# Verbose mode (shows streaming responses)
ollama-benchmark --verbose --models deepseek-r1:70b

CLI Reference

Flag	Description
`-v, --verbose`	Show streaming responses and per-prompt stats
`-m, --models`	Space-separated list of models to benchmark (default: all available)
`-p, --prompts`	Space-separated list of custom prompts
`-t, --table_output`	Display results as a comparison table

Default Benchmark Suite

When no custom prompts are provided, the tool runs a suite covering:

Analytical reasoning
Creative writing
Complex analysis
Technical knowledge
Structured output generation

How It Works

Connects to your local Ollama instance
Sends each prompt to each model
Captures timing metrics from the Ollama API response (total duration, prompt eval, generation)
Calculates tokens/sec for prompt processing, generation, and combined throughput
Outputs per-model averages or a comparison table

Contributing

Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.

License

This project is licensed under the MIT License — see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
.github		.github
.editorconfig		.editorconfig
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
benchmark.py		benchmark.py
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Ollama Benchmark

Features

Example Output

Getting Started

Prerequisites

Installation

Usage

CLI Reference

Default Benchmark Suite

How It Works

Contributing

License

About

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Ollama Benchmark

Features

Example Output

Getting Started

Prerequisites

Installation

Usage

CLI Reference

Default Benchmark Suite

How It Works

Contributing

License

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Uh oh!

Contributors

Uh oh!

Languages