ClickHouse¶
What it is¶
ClickHouse is an open-source, high-performance, column-oriented SQL database management system (DBMS) for online analytical processing (OLAP). It is designed to handle massive datasets and return query results in real-time.
What problem it solves¶
It addresses the performance limitations of traditional row-oriented databases when performing complex analytical queries over billions of rows. ClickHouse allows for extremely fast aggregations and filters, making it ideal for real-time observability, logs, and telemetry analysis.
Where it fits in the stack¶
Category: Process & Understanding / Analytical Database (OLAP)
Typical use cases¶
- Log Management: Storing and querying massive volumes of application and system logs (e.g., streaming OpenRouter logs).
- Real-Time Analytics: Powering dashboards that require sub-second response times on large datasets.
- Observability: Storing and analyzing traces, metrics, and events at scale.
- Clickstream Analysis: Analyzing user behavior on websites and mobile apps in real-time.
Strengths¶
- Superior Query Performance: Leverages columnar storage and parallel execution for lightning-fast SQL queries.
- High Data Compression: Significantly reduces storage costs by using efficient compression algorithms tailored for columnar data.
- Scalability: Can be deployed on a single server or scaled out to clusters handling petabytes of data.
- Rich SQL Support: Supports ANSI SQL, including JOINs, window functions, and complex aggregations.
Limitations¶
- Not for OLTP: Not designed for high-frequency individual row updates or deletes (mutations).
- Learning Curve: Optimizing table schemas and partitioning keys for maximum performance requires specific expertise.
- Hardware Intensive: Performance scales well with CPU cores and fast storage (NVMe), which can increase infrastructure costs.
Getting started¶
Installation (via Docker)¶
docker run -d --name clickhouse-server -p 8123:8123 -p 9000:9000 clickhouse/clickhouse-server
Initial Schema Setup¶
CREATE TABLE traces (
event_time DateTime,
model String,
prompt_tokens UInt32,
completion_tokens UInt32,
latency Float32
) ENGINE = MergeTree()
ORDER BY event_time;
CLI examples¶
Start ClickHouse Client¶
clickhouse-client
Run a Remote Query¶
clickhouse-client --host <HOST> --query "SELECT count() FROM system.parts"
Import Data from CSV¶
cat logs.csv | clickhouse-client --query "INSERT INTO logs FORMAT CSV"
API examples¶
Python (clickhouse-connect)¶
import clickhouse_connect
client = clickhouse_connect.get_client(host='localhost', port=8123)
# Execute a query and fetch results
result = client.query('SELECT model, AVG(latency) FROM traces GROUP BY model')
for row in result.result_rows:
print(row)
# Insert multiple rows
data = [
['2026-05-24 10:00:00', 'gpt-4o', 100, 50, 0.8],
['2026-05-24 10:01:00', 'claude-3-5-sonnet', 200, 150, 1.2]
]
client.insert('traces', data, column_names=['event_time', 'model', 'prompt_tokens', 'completion_tokens', 'latency'])
Related tools / concepts¶
- OpenRouter (Log streaming destination)
- Snowflake
- Datadog
- S3 / S3-Compatible Storage
- PostHog
- OpenTelemetry Collector
Sources / references¶
Contribution Metadata¶
- Last reviewed: 2026-05-24
- Confidence: high