ClickHouse¶

What it is¶

ClickHouse is an open-source, high-performance, column-oriented SQL database management system (DBMS) for online analytical processing (OLAP). It is designed to handle massive datasets and return query results in real-time.

What problem it solves¶

It addresses the performance limitations of traditional row-oriented databases when performing complex analytical queries over billions of rows. ClickHouse allows for extremely fast aggregations and filters, making it ideal for real-time observability, logs, and telemetry analysis.

Where it fits in the stack¶

Category: Process & Understanding / Analytical Database (OLAP)

Typical use cases¶

Log Management: Storing and querying massive volumes of application and system logs (e.g., streaming OpenRouter logs).
Real-Time Analytics: Powering dashboards that require sub-second response times on large datasets.
Observability: Storing and analyzing traces, metrics, and events at scale.
Clickstream Analysis: Analyzing user behavior on websites and mobile apps in real-time.

Strengths¶

Superior Query Performance: Leverages columnar storage and parallel execution for lightning-fast SQL queries.
High Data Compression: Significantly reduces storage costs by using efficient compression algorithms tailored for columnar data.
Scalability: Can be deployed on a single server or scaled out to clusters handling petabytes of data.
Rich SQL Support: Supports ANSI SQL, including JOINs, window functions, and complex aggregations.

Limitations¶

Not for OLTP: Not designed for high-frequency individual row updates or deletes (mutations).
Learning Curve: Optimizing table schemas and partitioning keys for maximum performance requires specific expertise.
Hardware Intensive: Performance scales well with CPU cores and fast storage (NVMe), which can increase infrastructure costs.

Getting started¶

Installation (via Docker)¶

docker run -d --name clickhouse-server -p 8123:8123 -p 9000:9000 clickhouse/clickhouse-server

Initial Schema Setup¶

CREATE TABLE traces (
    event_time DateTime,
    model String,
    prompt_tokens UInt32,
    completion_tokens UInt32,
    latency Float32
) ENGINE = MergeTree()
ORDER BY event_time;

CLI examples¶

Start ClickHouse Client¶

clickhouse-client

Run a Remote Query¶

clickhouse-client --host <HOST> --query "SELECT count() FROM system.parts"

Import Data from CSV¶

cat logs.csv | clickhouse-client --query "INSERT INTO logs FORMAT CSV"

API examples¶

Python (clickhouse-connect)¶

import clickhouse_connect

client = clickhouse_connect.get_client(host='localhost', port=8123)

# Execute a query and fetch results
result = client.query('SELECT model, AVG(latency) FROM traces GROUP BY model')
for row in result.result_rows:
    print(row)

# Insert multiple rows
data = [
    ['2026-05-24 10:00:00', 'gpt-4o', 100, 50, 0.8],
    ['2026-05-24 10:01:00', 'claude-3-5-sonnet', 200, 150, 1.2]
]
client.insert('traces', data, column_names=['event_time', 'model', 'prompt_tokens', 'completion_tokens', 'latency'])

Sources / references¶

Contribution Metadata¶

Last reviewed: 2026-05-24
Confidence: high