Skip to content

ClickHouse

What it is

ClickHouse is an open-source, high-performance, column-oriented SQL database management system (DBMS) for online analytical processing (OLAP). It is designed to handle massive datasets and return query results in real-time.

What problem it solves

It addresses the performance limitations of traditional row-oriented databases when performing complex analytical queries over billions of rows. ClickHouse allows for extremely fast aggregations and filters, making it ideal for real-time observability, logs, and telemetry analysis.

Where it fits in the stack

Category: Process & Understanding / Analytical Database (OLAP)

Typical use cases

  • Log Management: Storing and querying massive volumes of application and system logs (e.g., streaming OpenRouter logs).
  • Real-Time Analytics: Powering dashboards that require sub-second response times on large datasets.
  • Observability: Storing and analyzing traces, metrics, and events at scale.
  • Clickstream Analysis: Analyzing user behavior on websites and mobile apps in real-time.

Strengths

  • Superior Query Performance: Leverages columnar storage and parallel execution for lightning-fast SQL queries.
  • High Data Compression: Significantly reduces storage costs by using efficient compression algorithms tailored for columnar data.
  • Scalability: Can be deployed on a single server or scaled out to clusters handling petabytes of data.
  • Rich SQL Support: Supports ANSI SQL, including JOINs, window functions, and complex aggregations.

Limitations

  • Not for OLTP: Not designed for high-frequency individual row updates or deletes (mutations).
  • Learning Curve: Optimizing table schemas and partitioning keys for maximum performance requires specific expertise.
  • Hardware Intensive: Performance scales well with CPU cores and fast storage (NVMe), which can increase infrastructure costs.

Getting started

Installation (via Docker)

docker run -d --name clickhouse-server -p 8123:8123 -p 9000:9000 clickhouse/clickhouse-server

Initial Schema Setup

CREATE TABLE traces (
    event_time DateTime,
    model String,
    prompt_tokens UInt32,
    completion_tokens UInt32,
    latency Float32
) ENGINE = MergeTree()
ORDER BY event_time;

CLI examples

Start ClickHouse Client

clickhouse-client

Run a Remote Query

clickhouse-client --host <HOST> --query "SELECT count() FROM system.parts"

Import Data from CSV

cat logs.csv | clickhouse-client --query "INSERT INTO logs FORMAT CSV"

API examples

Python (clickhouse-connect)

import clickhouse_connect

client = clickhouse_connect.get_client(host='localhost', port=8123)

# Execute a query and fetch results
result = client.query('SELECT model, AVG(latency) FROM traces GROUP BY model')
for row in result.result_rows:
    print(row)

# Insert multiple rows
data = [
    ['2026-05-24 10:00:00', 'gpt-4o', 100, 50, 0.8],
    ['2026-05-24 10:01:00', 'claude-3-5-sonnet', 200, 150, 1.2]
]
client.insert('traces', data, column_names=['event_time', 'model', 'prompt_tokens', 'completion_tokens', 'latency'])

Sources / references

Contribution Metadata

  • Last reviewed: 2026-05-24
  • Confidence: high