Ollama Benchmark CLI¶

What it is¶

Ollama Benchmark CLI is a tool for benchmarking local models running on Ollama. It measures tokens-per-second and response latency for different models on your specific hardware.

What problem it solves¶

Provides a quick way to measure and compare the inference performance of different models running locally on Ollama, helping identify the best model-hardware configuration for your setup.

Where it fits in the stack¶

Benchmarking. Used to measure local LLM inference performance on Ollama-hosted models.

Typical use cases¶

Measuring tokens-per-second for different models on local hardware
Comparing inference latency across model sizes and quantization levels
Establishing performance baselines after hardware changes

Strengths¶

Directly measures performance on your actual hardware
Simple to use with existing Ollama installations
Provides practical metrics (tokens/second, latency) relevant to daily use

Limitations¶

Specific to Ollama; cannot benchmark other inference backends directly
Results are hardware-dependent and not comparable across different machines
Limited to inference performance; does not measure model quality

When to use it¶

When selecting which model to run locally based on performance constraints
When evaluating the impact of hardware upgrades on inference speed

When not to use it¶

When benchmarking cloud API providers (use LLMPerf instead)
When evaluating model accuracy or quality

Sources / references¶

GitHub Repository (Example implementation)

Contribution Metadata¶

Last reviewed: 2026-02-26
Confidence: medium