ClickHouse - Lightning Fast Analytics for Everyone, 2024 PVLDB

ClickHouse is an OLAP database designed for high-performance analytics over petabyte-scale data sets with high ingestion rates.

1 Key Design

ClikcHouse is designed to address 5 key challenges of modern analytical data management:

Huge data sets with high ingestion rates
Many simultaneous queries with an expectation of low latencies: ad-hoc and recurring queries, pruning techniques allow optimizing frequent queries. Managing shared system resources.
Diverse landscapes of data stores, storage locations and formats
A convenient query language with support for performance introspection, aka SQL.
Industry-grade robustness (data repllication) and versatile deployment (native binaries).

2 Architecture

ClickHouse database engine can be operatoed in on-premise, cloud, standalone, or in-process modes (chDB).

2.1 Query Processing

ClickHouse parallelizes queries at the level of:

data element: multiple data elements can be processed within operators at once using SIMD
- vectorized execution model similar to MonetDB/X100
data chunks: each node executes operators simultaneously in multiple threads, operators work on data chunks.
- need-chunk → ready: a chunk is placed in the operator’s input port
- ready → done: processes the input chunk and generate an output chunk
- done → need-chunk: output chunk is removed from the operator’s output port
table shards: multiple nodes can scan the shards simultaneously

2.1.1 Holistic Performance Optimization

Query optimization: constant folding, extracting scalars from some aggregation functions, common subexpression elimination, transforming disjunctions of equality filters to IN-lists;

logical plan: filter pushdown, reordering functions evaluation and sorting steps
physical plan: exploit the particularities of the involved table engines

Query compilation: fuse adjacent plan operators

Primary key index evaluation:

Data skipping:

Hash tables:

Joins:

2.2.2 Workload Isolation

Concurrency control prevents thread oversubscription, number of worker threads per query are adjusted dynamically based on a specified ratio to the number of available CPU cores.

Memory usage limits tracks bytes sizes of memory allocations at the server, user and query level, and thereby allows to set flexible memory usage limits

I/O scheduling restrict local and remote disk accesses.

2.2 Storage

MergeTree* table engine: continuously merged by a background process

Special-purpose table engines

dictionaries: reduces access latencies
pure in-memory engine used for temporary tables
distributed table engine for transparent data sharding

Virtual table engines: bidirectional data exchange with external systems

Sharding: partitions a table into a set of table shards, allowing process data sets which exceed the capacity of individual nodes. Also, distributed the read-write load for a table over multiple nodes. Also, replicated for tolerance against node failures.

2.2.1 On-Disk Format

2.2.2 Data Pruning

2.2.3 Merge-time Data Transformation

2.2.4 Updates and Deletes

2.2.5 Idempotent Inserts

2.2.6 Data Replication

2.2.7 ACID Compliance

2.3 Integration

In the pull-based model, the database itself connects to remote data sources and pulls data for querying into local tables or exports data to remote systems. Interesting capabilities such as joins between local and remote data.

External Connectivity

Temporary access with Integration Table Functions
Persisted access: integration table engines, integration database engines, dictionaries

Data Formats

Compatibility interfaces: binary wire protocol, HTTP, MySQL or PostgresSQL wire-protocol-compatible interface

3 Performance as a Feature

3.1 Built-in Performance Analysis Tools

User interacts with all tools through a uniform interface based on system tables.

Server and query metrics

Sampling profiler

OpenTelemetry integration

Explain query

3.2 Benchmarks

Denormalized Tables

Normalized Tables

Druid and Pinot

Snowflake

Photon and Velox

DuckDB

1 Key Design#

2 Architecture#

2.1 Query Processing#

2.1.1 Holistic Performance Optimization#

2.2.2 Workload Isolation#

2.2 Storage#

2.2.1 On-Disk Format#

2.2.2 Data Pruning#

2.2.3 Merge-time Data Transformation#

2.2.4 Updates and Deletes#

2.2.5 Idempotent Inserts#

2.2.6 Data Replication#

2.2.7 ACID Compliance#

2.3 Integration#

3 Performance as a Feature#

3.1 Built-in Performance Analysis Tools#

3.2 Benchmarks#

4 Related Work#