ClickHouse is an OLAP database designed for high-performance analytics over petabyte-scale data sets with high ingestion rates.
1 Key Design
ClikcHouse is designed to address 5 key challenges of modern analytical data management:
Huge data sets with high ingestion rates
Many simultaneous queries with an expectation of low latencies: ad-hoc and recurring queries, pruning techniques allow optimizing frequent queries. Managing shared system resources.
Diverse landscapes of data stores, storage locations and formats
A convenient query language with support for performance introspection, aka SQL.
Industry-grade robustness (data repllication) and versatile deployment (native binaries).
2 Architecture
ClickHouse database engine can be operatoed in on-premise, cloud, standalone, or in-process modes (chDB).
2.1 Query Processing
ClickHouse parallelizes queries at the level of:
- data element: multiple data elements can be processed within operators at once using SIMD
- vectorized execution model similar to MonetDB/X100
- data chunks: each node executes operators simultaneously in multiple threads, operators work on data chunks.
- need-chunk → ready: a chunk is placed in the operator’s input port
- ready → done: processes the input chunk and generate an output chunk
- done → need-chunk: output chunk is removed from the operator’s output port
- table shards: multiple nodes can scan the shards simultaneously
2.1.1 Holistic Performance Optimization
Query optimization: constant folding, extracting scalars from some aggregation functions, common subexpression elimination, transforming disjunctions of equality filters to IN-lists;
- logical plan: filter pushdown, reordering functions evaluation and sorting steps
- physical plan: exploit the particularities of the involved table engines
Query compilation: fuse adjacent plan operators
Primary key index evaluation:
Data skipping:
Hash tables:
Joins:
2.2.2 Workload Isolation
Concurrency control prevents thread oversubscription, number of worker threads per query are adjusted dynamically based on a specified ratio to the number of available CPU cores.
Memory usage limits tracks bytes sizes of memory allocations at the server, user and query level, and thereby allows to set flexible memory usage limits
I/O scheduling restrict local and remote disk accesses.
2.2 Storage
MergeTree* table engine: continuously merged by a background process
Special-purpose table engines
- dictionaries: reduces access latencies
- pure in-memory engine used for temporary tables
- distributed table engine for transparent data sharding
Virtual table engines: bidirectional data exchange with external systems
Sharding: partitions a table into a set of table shards, allowing process data sets which exceed the capacity of individual nodes. Also, distributed the read-write load for a table over multiple nodes. Also, replicated for tolerance against node failures.
2.2.1 On-Disk Format
2.2.2 Data Pruning
2.2.3 Merge-time Data Transformation
2.2.4 Updates and Deletes
2.2.5 Idempotent Inserts
2.2.6 Data Replication
2.2.7 ACID Compliance
2.3 Integration
In the pull-based model, the database itself connects to remote data sources and pulls data for querying into local tables or exports data to remote systems. Interesting capabilities such as joins between local and remote data.
External Connectivity
- Temporary access with Integration Table Functions
- Persisted access: integration table engines, integration database engines, dictionaries
Data Formats
Compatibility interfaces: binary wire protocol, HTTP, MySQL or PostgresSQL wire-protocol-compatible interface
3 Performance as a Feature
3.1 Built-in Performance Analysis Tools
User interacts with all tools through a uniform interface based on system tables.
Server and query metrics
Sampling profiler
OpenTelemetry integration
Explain query
3.2 Benchmarks
Denormalized Tables
Normalized Tables
4 Related Work
Druid and Pinot
Snowflake
Photon and Velox
DuckDB