Separation of Storage and Compute (Snowflake's Bet)

In short

For thirty years, every serious analytical database — Teradata in 1984, Hadoop in 2006, Vertica in 2005, the original Redshift in 2012 — assumed that storage and compute belonged on the same machine. Each worker node owned a shard of disk, and to scale you bought more nodes, which meant buying more of both whether you needed it or not. This produced two production headaches that everyone in data engineering knew but nobody could fix: workloads contended (the BI team's dashboard query stalled the ETL pipeline reading the same disks) and capacity was wasted (a cluster sized for the nightly 4-hour batch ran 24/7 because there was nowhere to put the data when the compute was off). In 2014 Snowflake bet that S3 — durable, cheap, infinitely scalable, network-accessible — had eliminated the technical reason for the coupling. They put the data in S3, kept the metadata in a custom transactional service (built on FoundationDB), and made compute a fleet of ephemeral virtual warehouses — clusters you spin up for a query and pay for by the second. Independent scaling, workload isolation, pause-when-idle, time-travel via cheap object-store snapshots, and zero-copy clones all fell out of the architecture for free. The pattern is now industry-standard: BigQuery had it from day one (Colossus + Dremel), Databricks built the Lakehouse on the same shape (Delta Lake on S3 + Photon), AWS retrofitted Redshift with RA3 nodes and Redshift Serverless, Athena is pure compute against your S3 bucket, and DuckDB-WASM in chapter 127 is the extreme case where your laptop is the warehouse. This chapter derives the architecture, walks the new problems it created (object-store latency, transactional metadata, scan-cost surprises), and sizes the cost saving for a real Indian e-commerce BI workload.

You have spent the last two chapters in Build 16 designing the warehouse — star schemas, dimensions, slowly changing dimensions, the grain decision. The implicit assumption was that the warehouse runs on a cluster, and the cluster has disks, and the disks have your data on them. That assumption was true for every serious analytical database from 1984 to roughly 2015, and it shaped everything: the size of your cluster, the cost of your bills, the latency of your queries, and — most painfully — the political negotiations between the BI team and the data engineering team about whose query got to run first.

Then a startup that nobody had heard of, founded by three Oracle veterans in 2012 and named Snowflake after the schema variant that nobody used, made a bet. The bet was that S3 had quietly eliminated the technical reason for tying compute to storage, and that whoever built the first warehouse around that observation would reset the economics of analytics for the next decade. They published the architecture in a SIGMOD 2016 paper called The Snowflake Elastic Data Warehouse, IPO'd at a $70 billion valuation in 2020, and forced every other vendor to copy the shape. This chapter is about the bet, the shape, and the new problems it created.

The classical model: storage and compute glued together

Walk into a data centre running Vertica or Greenplum or pre-2018 Redshift in 2014. You see a rack of machines, maybe 16 of them, each one a 2U server with 24 cores, 256 GB RAM, and 16 spinning disks of 4 TB each. Total cluster capacity: 384 cores, 4 TB RAM, 1 PB of disk. The data is sharded across the disks — fact_sales partitioned by date_id, with 2026 data on nodes 1-4, 2025 on nodes 5-8, and so on. When a query runs, the planner sends a fragment to every node; each node reads from its local disks (fast — sub-millisecond seeks on the SSDs that arrived later, single-digit milliseconds on spinners), runs its share of the work, and ships partial results to a coordinator.

This shape — shared-nothing, in Stonebraker's 1986 taxonomy — was the right answer for the hardware available. Network bandwidth between machines (1 Gbit/s, later 10 Gbit/s) was an order of magnitude slower than local-disk bandwidth, so you wanted compute next to its data. Hadoop took the same idea and made it the headline feature: move the computation to the data. MapReduce schedulers tried hard to assign each map task to the node that already held the input block.

Coupled architecture (Hadoop, Vertica, classical Redshift). Each worker is a CPU + disk unit; scaling means adding both together.

The coupling produced three operational pathologies that everyone running a warehouse in 2014 knew by heart:

Forced bundled scaling. You bought the cluster sized for one of compute or storage, and overprovisioned the other. A telco running fraud detection had 50 TB of call records but needed every CPU it could get to run the ML jobs — so it bought a 32-node cluster with 1.5 PB of disk it would never use. A media company had 800 TB of historical viewer events but ran maybe two queries an hour against them — so it bought a 60-node cluster with 1500 cores it could not keep busy.

Workload contention. Every cluster had exactly one fleet of CPUs and exactly one set of disks, so every team's queries fought for the same resources. The classic 9 AM scenario: BI analyst opens a Tableau dashboard and kicks off a 10-minute scan of fact_sales; data engineer's hourly ETL job runs at 9:05 and queues behind it; the dashboard renders, the ETL drains the queue, the next dashboard query collides with the ETL — and so on, all day. Workload management features (Vertica's resource pools, Redshift's WLM queues) tried to mitigate this with priorities and CPU shares, but the contention was structural: there was one set of disks and they were shared.

No pause. When the cluster was idle (nights, weekends, between batch windows), the disks still held the data, so the cluster still had to be running to make the data accessible. You paid for 168 hours a week of compute even if you used 40 of them. Cloud-hosted variants — early Redshift, EMR — let you tear the cluster down, but then you lost the data and had to reload from S3 on restart, which made teardown unattractive.

Why coupling made sense in 1986 and not in 2014: shared-nothing architecture was invented when network bandwidth was a tenth of disk bandwidth and S3 did not exist. By 2014, 10 Gbit/s networks were standard in datacentres (40 Gbit/s in cloud datacentres), S3 offered 11 nines of durability and effectively unbounded throughput, and the bandwidth gap had inverted in some cases — pulling Parquet from S3 to a worker over 10 Gbit/s was faster than reading the same data from a single spinning disk. The technical justification for storage-on-the-worker had quietly evaporated; only inertia kept the architecture in place.

The bet: put the data in S3, make compute disposable

Snowflake's founders — Benoit Dageville, Thierry Cruanes, and Marcin Zukowski (who had built Vectorwise earlier) — looked at this picture in 2012 and made a different design choice. The data, they decided, would live in S3, full stop. Not as a backup, not as a cold tier, but as the primary, sole, authoritative copy. Compute would be a fleet of ephemeral clusters — what Snowflake calls virtual warehouses — that you spin up to run a query and tear down when you are done. Each virtual warehouse pulls the data it needs from S3, caches what it can locally for the duration of its life, and bills by the second.

The architecture is a three-layer cake. The Snowflake architecture documentation names them like this:

Storage layer. S3 (or GCS, or Azure Blob, depending on which cloud you run in). Tables are stored as immutable, columnar micro-partitions — 50-500 MB compressed Parquet-like files, automatically partitioned by Snowflake based on insertion order. Once written, a micro-partition is never modified; updates and deletes produce new micro-partitions and tombstone the old ones. Storage is metered separately at object-storage rates (cents per GB-month).
Cloud services layer. A multi-tenant fleet of stateless services that does query parsing, planning, optimisation, transaction management, security, and metadata bookkeeping. The metadata itself — which micro-partitions belong to which table, which columns they have, which versions are current, which time-travel snapshots reference them — lives in a transactional store. Snowflake famously built this on top of FoundationDB once Apple open-sourced it in 2018.
Compute layer. Virtual warehouses. Each is a cluster of EC2 instances (a "T-shirt size" — XS, S, M, L, XL, 2XL, 3XL, 4XL — doubling cores at each step). You can have many warehouses running concurrently, each owned by a different team, each sized to its workload. A warehouse spins up in seconds, runs queries, caches micro-partitions on its local NVMe disks for the life of the cluster, and shuts down when idle (configurable auto-suspend, default 10 minutes).

Separated architecture (Snowflake, BigQuery, Databricks Lakehouse). The object store is the single source of truth; compute clusters are ephemeral and independent.

What falls out of this layout is striking, and explains the marketing pitch that won Snowflake $70 billion of market cap.

Independent scaling. You add storage by writing more bytes to S3 — there is no node to provision. You add compute by spinning up a bigger warehouse — there is no data to re-shard. The two axes that were welded together for thirty years come apart. A team with 500 TB of historical data and one analyst running one query a day pays for 500 TB of S3 (12 per TB-month, so ~6000/month) and a few warehouse-hours, not for a 50-node cluster sized to hold 500 TB.

Workload isolation. This is the feature that ends the political war between BI and data engineering. The BI team owns Warehouse A (Medium, auto-suspend 5 min). The data science team owns Warehouse B (Large, manual control). The ETL pipeline owns Warehouse C (XLarge, runs 2-6 AM only). All three teams query the same fact_sales table because it lives in S3, and none of them contend with each other for CPU, RAM, or local-disk IO. A heavy ad-hoc scan in Warehouse B does not slow down a dashboard refresh in Warehouse A. They are physically different fleets of EC2 instances reading the same S3 bucket.

Why isolation works without coordination: the data in S3 is immutable. Once a micro-partition is written, no warehouse can modify it; updates produce new micro-partitions and update the metadata layer atomically. So all the warehouses can read the same files concurrently without locks, without consensus, without coordination — exactly the property S3 was designed to give. The transactional metadata layer (FoundationDB) handles the rare case where two warehouses commit conflicting writes; it serialises them through a single source of truth, but the read path never touches it.

Pause when idle. When a warehouse auto-suspends, you pay $0 for compute. The data is still in S3, accessible immediately when the next query comes in. A cluster that is busy 8 hours a day costs you 8 warehouse-hours, not 24. For typical BI workloads — bursty during work hours, near-zero overnight and weekends — this is a 3-5x reduction in compute spend.

Time travel and zero-copy clones. Because data lives in immutable S3 objects and the metadata layer tracks which objects were live at which point in time, Snowflake can offer two features that would have been impossible on a coupled cluster. Time travel lets you query a table as of any timestamp in the last N days (SELECT * FROM orders AT (TIMESTAMP => '2026-04-20 09:00')) — the metadata layer just hands the warehouse the micro-partition list as of that timestamp. Zero-copy clones let you create a full logical copy of a table or database (CREATE TABLE orders_dev CLONE orders) by copying only the metadata pointers, not the underlying micro-partitions. A clone of a 100 TB table takes a few seconds and adds zero bytes of storage; only when you modify the clone do new micro-partitions get written.

Many warehouses, one source of truth

The third diagram is the one that closes the deal for most enterprises. Once compute is decoupled from storage and a warehouse can spin up in seconds, you stop having one warehouse and start having one warehouse per workload. Each team picks a size that matches its query latency target; each team picks an auto-suspend window that matches its bursty pattern; each team's bill is its own, billed to its own cost centre, visible in its own dashboard.

Multi-cluster architecture. Five teams, five independently sized warehouses, one shared S3 dataset. Each team's spend, latency, and reliability are independent of every other team's.

Why this changes organisation design, not just architecture: the political question "whose query gets to run during business hours" was the dominant friction in 2014-era data teams, and it had no good technical answer because there was only one cluster. Decoupled compute reduces it to a billing question, which finance teams already know how to solve with cost centres and chargebacks. The platform team stops being a queue manager and starts being a librarian — guarantee the data is correct, fresh, and discoverable; let each team buy whatever compute it needs.

The new problems the architecture created

Decoupling solved the three coupled-architecture pathologies, but it created three new ones that the Cloud Data Warehouse Maturity Model and a decade of practitioner experience now name explicitly.

Object-store latency. S3 GET latency is 10-100 ms per request, versus sub-millisecond for a local SSD. A query that scans a thousand micro-partitions cold from S3 spends a measurable fraction of its time waiting on TCP. Snowflake fights this with two layers of caching: each warehouse node keeps a local NVMe SSD cache of recently-read micro-partitions (so a hot table is essentially memory-speed after the first query), and the cloud-services layer keeps a result cache (so identical repeated queries return in milliseconds). The first run of a new query against a cold dataset is slow; the second is fast; this is the bargain.

Transactional metadata. Hadoop's HDFS had real POSIX-ish semantics — rename was atomic, directory listing was consistent. S3 does not: it is eventually consistent for some operations, and concurrent writes need explicit coordination if you want serialisability. So the warehouse needs a separate transactional metadata service to record "what is the current version of table T", "which micro-partitions belong to it", "what was its state at timestamp X". Snowflake built one on FoundationDB; Databricks open-sourced Delta Lake (a transaction log on S3); Apache Iceberg and Apache Hudi are alternative open-source approaches. None of them are free; all of them are simpler than rebuilding HDFS.

Cost-model surprises. When compute is metered by the second and storage by the GB-month, your bill becomes a function of your queries, not your cluster. A new analyst who writes SELECT * FROM events ORDER BY timestamp on a 50 TB table can rack up a four-figure bill in an afternoon. BigQuery's per-byte-scanned pricing is even more direct: a poorly-written query that scans a partition it didn't need to scan costs real money. Modern Snowflake and BigQuery deployments need cost guardrails — query timeouts, credit quotas per warehouse, per-user spending alerts, and SQL linters that warn on SELECT * over large tables. The cost is real; it is just visible at a different granularity than before.

The pattern is now industry-standard

Snowflake pioneered the shape commercially, but they were not alone in the bet. By 2026 the architecture is the default for any new analytical system.

Google BigQuery had separated storage and compute from its 2010 launch, predating Snowflake. Storage is Colossus (Google's successor to GFS), compute is Dremel (a serverless query engine that allocates "slots" on demand). The user does not provision clusters at all; they buy slots (units of CPU+RAM) on a flat-rate or per-query-byte basis, and Dremel dispatches their query across thousands of cores transparently. The two systems converged on the same architecture from opposite directions: Snowflake started with the cluster model and made it elastic; BigQuery started with the serverless model and added clusters.

Databricks built the Lakehouse on the same shape: data in S3 as Parquet files governed by a Delta Lake transaction log, compute as Spark clusters (or the newer C++ Photon engine), and a metadata catalog (Unity Catalog) tying it together. The pitch is that the same data layer can serve both ML/data-science workloads (which want raw files) and BI workloads (which want SQL warehouse semantics).

AWS retrofitted Redshift twice. The 2019 RA3 node type introduced "managed storage" — data lives in S3 and is cached on local NVMe in the cluster, decoupling storage scaling from compute. The 2021 Redshift Serverless launch went further: no nodes at all, just RPUs (Redshift Processing Units) you pay for by the second, with auto-scaling. Both were defensive responses to Snowflake taking large enterprise accounts.

Athena is the extreme case in the AWS lineup: pure compute (a managed Presto/Trino fleet) that runs SQL directly against your S3 bucket, with no warehouse to manage and no data to load. You write a CREATE EXTERNAL TABLE over your existing files and pay $5 per TB scanned. There is no "cluster" at all from the user's point of view; storage and compute are not just separated, the compute is invisible.

And DuckDB-WASM (which you met in chapter 127) is the architectural endpoint: the compute is your laptop, in the browser tab, and the storage is whatever Parquet file is on the open internet. The same separation pattern, miniaturised, with the caching layer being your browser cache and the metadata layer being a HTTP range-read into the file's footer.

The bet that storage and compute belong on different machines won completely. The interesting questions in 2026 are about the consequences — open table formats (Iceberg vs Delta vs Hudi), governance across warehouses, multi-cloud portability, and the rise of the data mesh organisational pattern that separation makes possible.

Worked example: sizing an Indian e-commerce BI workload

Costing the BI workload at a mid-sized Indian e-commerce company

A direct-to-consumer fashion brand based in Bangalore runs its analytical workload on Snowflake. The data team profiles the workload and finds two distinct shapes:

Day shift (9 AM - 5 PM IST, weekdays). The BI team — three analysts and a head of analytics — runs Tableau dashboards against fact_orders, dim_product, dim_customer. The dashboards aggregate over the last 90 days of data, roughly 100 GB of compressed Parquet. Each query scans a few hundred MB after pruning, returns in 1-3 seconds, and a Medium warehouse (4 nodes, 64 cores total) handles the concurrency comfortably. The warehouse is set to auto-suspend after 5 minutes of idle.

Night batch (2 AM - 6 AM IST, daily). The data engineering team's dbt project rebuilds aggregate tables from the previous day's raw events. Total scan: roughly 10 TB across events, sessions, clickstream, joined to dimensions. An XLarge warehouse (16 nodes, 256 cores) finishes the run in 4 hours. Outside this window, the warehouse is fully suspended.

Snowflake costs (illustrative US-East rates, 2026):

Medium warehouse: 4 credits/hour × 2/credit =8/hour
XLarge warehouse: 16 credits/hour × 2/credit =32/hour
Day-shift utilisation: ~6 productive hours/day after auto-suspend savings → 6 × 8 =48/day
Night-batch: 4 hours × 32 =128/day
Total compute: 176/day, ~5,300/month
Storage: 50 TB compressed historical data at 23/TB-month =1,150/month
Grand total: ~$6,450/month

The coupled-cluster alternative. To run the same workloads on a shared-nothing Vertica or pre-RA3 Redshift cluster, the cluster has to be sized for the peak — the XLarge-equivalent that handles the 4-hour batch — and it has to run 24/7 because the data lives on its disks. Equivalent on-prem hardware (16 nodes, 256 cores, 50 TB usable) at AWS reserved-instance pricing: roughly 32/hour × 24 hours × 30 days = **~23,000/month**, plus the BI workload still contends with the ETL during the day.

Net saving: roughly 3.5x on compute alone, plus workload isolation (BI never stalls on ETL), plus instant elasticity (the team can spin up a Large warehouse for an ad-hoc data science project without disturbing anyone), plus pay-zero-on-weekends. This is the economics that took Snowflake from a 2014 startup to a $70 billion IPO in six years; the math is genuinely better, not just marketed better.

The catch: a junior analyst who writes SELECT * FROM fact_orders ORDER BY order_date (no WHERE clause, no LIMIT) on the Medium warehouse can spend an extra $50 in twenty minutes if nobody is watching. The cost-model surprises are real, and every Snowflake deployment of any size needs query budgets, statement timeouts, and a SQL review process for newly-onboarded users. The architecture moves the failure mode from the cluster crashes to the AWS bill arrives; both need engineering attention, just of different kinds.

What you should take away

The decoupling of storage and compute is the single most important architectural shift in analytical data infrastructure since the invention of MapReduce. Every chapter you read going forward in this Build — data lakes (chapter 132), open table formats like Iceberg (chapter 133), and the lakehouse synthesis — assumes the separation as the starting point. The chapter that follows (132) will look at what happens when you push the architecture even further: drop the warehouse layer entirely and let your raw, unprocessed data sit in S3 as the primary analytical surface, with multiple compute engines reading the same files. That is the data lake, and it is the natural endpoint of the bet that Snowflake placed in 2014.

References

Dageville, B., Cruanes, T., Zukowski, M., et al. (2016). The Snowflake Elastic Data Warehouse. SIGMOD 2016. [link]
Snowflake. Key Concepts and Architecture. docs.snowflake.com
Melnik, S., et al. (2010). Dremel: Interactive Analysis of Web-Scale Datasets. VLDB 2010. [link]
Snowflake. The Cloud Data Warehouse Maturity Model. [link]
Armbrust, M., et al. (2021). Lakehouse: A New Generation of Open Platforms that Unify Data Warehousing and Advanced Analytics. CIDR 2021. [link]
AWS. Amazon Redshift RA3 Nodes with Managed Storage. [link]