ClickHouse¶
What it is¶
ClickHouse is an open-source, high-performance, column-oriented SQL database management system (DBMS) designed for Online Analytical Processing (OLAP). It is engineered to handle massive datasets and return query results in real-time, making it a premier choice for high-volume telemetry and log analysis in AI ecosystems.
What problem it solves¶
Traditional row-oriented databases (like PostgreSQL) struggle with performance and storage efficiency when performing complex analytical queries over billions of rows. ClickHouse solves this by: - Query Speed: Leveraging columnar storage and parallel execution for lightning-fast aggregations and filters. - Storage Efficiency: Providing high data compression ratios (often 10x or more) to significantly reduce infrastructure costs. - Real-Time Analytics: Enabling sub-second response times on massive datasets, which is critical for monitoring live AI agent sessions. - Handling High-Throughput Ingestion: Efficiently processing millions of events per second from streaming sources like OpenRouter or OpenTelemetry.
Where it fits in the stack¶
ClickHouse sits in the Data Storage and Analytics layer of the AI Observability stack. It serves as the high-performance back-end for storing and querying logs, traces, and metrics generated by LLM applications and agentic workflows.
Typical use cases¶
- LLM Log Management: Storing and querying massive volumes of request/response traces (e.g., streaming OpenRouter logs).
- AI Observability Back-end: Powering platforms like Langfuse or Helicone (self-hosted) that require fast retrieval of nested traces.
- Cost & Token Analytics: Running complex aggregations across millions of LLM calls to track spend by model, user, or application.
- RAG Performance Monitoring: Analyzing retrieval latency and embedding efficiency across large-scale vector search operations.
- Clickstream and Event Analysis: Tracking user interactions with AI agents to identify patterns and failure modes.
Strengths¶
- Superior Analytical Performance: Optimized for "scanning" large volumes of data to perform sums, counts, and averages.
- High Availability: Supports multi-master replication and can be scaled to multi-petabyte clusters.
- Vibrant Ecosystem: Native integrations with visualization tools like Grafana and data collectors like OpenTelemetry Collector.
- Flexible Data Types: Specialized support for arrays, nested structures, and JSON, which are common in LLM traces.
Limitations¶
- Not for Transactional Use (OLTP): ClickHouse is not designed for frequent individual row updates or deletes (mutations).
- Operational Complexity: Managing a large ClickHouse cluster (partitioning, replication, sharding) requires specialized knowledge.
- Schema Sensitivity: Performance is heavily dependent on choosing the correct Primary Key and Sorting Key for your specific query patterns.
When to use it¶
- When your LLM application generates millions of traces per day and you need to perform real-time analysis on that data.
- When you are building a custom AI observability dashboard and require sub-second query performance.
- When you need to store logs for long periods and want to minimize storage costs via aggressive compression.
- When you are using OpenRouter and want to archive every request/response for audit or fine-tuning purposes.
When not to use it¶
- For small-scale applications where a standard relational database (like PostgreSQL) can handle the analytical load.
- If your primary requirement is high-frequency individual row updates (use an OLTP database instead).
- If you do not have the operational capacity to manage a specialized analytical database and a SaaS solution like Langfuse or Datadog suffices.
Getting started¶
Installation (via Docker)¶
For a quick local setup:
docker run -d \
--name clickhouse-server \
-p 8123:8123 \
-p 9000:9000 \
-v clickhouse_data:/var/lib/clickhouse \
clickhouse/clickhouse-server
OpenRouter Trace Schema¶
A standard schema for receiving broadcasted logs from OpenRouter:
CREATE TABLE IF NOT EXISTS OPENROUTER_TRACES (
timestamp DateTime64(3, 'UTC'),
id String,
model String,
app_id Nullable(String),
user_id Nullable(String),
prompt_tokens UInt32,
completion_tokens UInt32,
total_tokens UInt32,
total_cost Float64,
latency_ms Float64,
status String,
request String,
response String,
-- Sorting key optimized for time-based filtering
INDEX idx_model model TYPE minmax GRANULARITY 3
) ENGINE = MergeTree()
ORDER BY (timestamp, model);
Python Query Example¶
Using the clickhouse-connect library to analyze model costs:
import clickhouse_connect
# Connect to the local server
client = clickhouse_connect.get_client(host='localhost', port=8123)
# Find models with the highest average latency
query = """
SELECT
model,
avg(latency_ms) as avg_latency,
count() as request_count
FROM OPENROUTER_TRACES
GROUP BY model
HAVING request_count > 100
ORDER BY avg_latency DESC
LIMIT 5
"""
result = client.query(query)
for row in result.result_rows:
print(f"Model: {row[0]} | Avg Latency: {row[1]:.2f}ms | Requests: {row[2]}")
Related tools / concepts¶
- OpenRouter - Primary source for streaming LLM logs.
- Langfuse - Observability platform that can use ClickHouse for analytics.
- Snowflake - Enterprise cloud data warehouse alternative.
- Datadog - SaaS observability platform.
- OpenTelemetry Collector - Recommended tool for ingesting OTel data into ClickHouse.
- PostHog - Product analytics suite built on top of ClickHouse.
- S3 / S3-Compatible Storage - Often used for ClickHouse data tiering and backups.
- Helicone - AI Gateway that uses ClickHouse for its analytics dashboard.
Sources / references¶
Contribution Metadata¶
- Last reviewed: 2026-05-11
- Confidence: high