Paper Notes: Amazon Redshift and the Case for Simpler Data Warehouses

Amazon Redshift and the Case for Simpler Data Warehouses, 2015 SIGMOD 1 Key Design Redshift is a fast, fully managed, petabyte-scale data warehouse solution that makes it simple and cost-effective to efficiently analyze large volumes of data. It uses familiar data warehousing techniques, including columnar layout, per-column compression, co-locating compute and data, co-locating joins, compilation to machine code and scale-out MPP processing. It also had a number of additional design goals: ...

April 27, 2025 · 4 min · Jin Cong Ho

Paper Notes: ClickHouse - Lightning Fast Analytics for Everyone

ClickHouse - Lightning Fast Analytics for Everyone, 2024 PVLDB ClickHouse is an OLAP database designed for high-performance analytics over petabyte-scale data sets with high ingestion rates. 1 Key Design ClikcHouse is designed to address 5 key challenges of modern analytical data management: Huge data sets with high ingestion rates Many simultaneous queries with an expectation of low latencies: ad-hoc and recurring queries, pruning techniques allow optimizing frequent queries. Managing shared system resources. ...

April 27, 2025 · 3 min · Jin Cong Ho

Paper Notes: The Snowflake Elastic Data Warehouse

The Snowflake Elastic Data Warehouse, 2016 ACM 1 Key Design Snowflake is an enterprise-ready data warehousing solution for the cloud. Cloud promises increased economies of scale, extreme scalability and availability and a pay-as you go cost model — but it can only be captured if the software itself is able to scale elastically over the pool of commodity resources in the cloud. Meanwhile, Saas brings enterprise-class systems to users who previously could not afford them. Snowflake key features includes: relational model, semi-structured data, elastic compute and storage, highly available, durable, cost-efficient and secure. ...

April 24, 2025 · 5 min · Jin Cong Ho