Compute-storage separation for cost control

In April, the platform team at a Bengaluru fintech opened a Snowflake invoice for ₹47 lakh — up from ₹31 lakh in March. The growth team had spun up a dedicated WH_GROWTH_M warehouse to "make dashboards snappier" and set its auto-suspend to 24 hours, which the Snowflake UI happily allowed. For 22 of those 24 hours nobody ran a query against it. The warehouse sat warm, billing every credit-second, while the 12 TB of dim_user data it was meant to serve never moved a byte. Same data, same queries, ₹16 lakh extra — purely because the team conflated "I want my data ready" with "I want a compute cluster running". The lesson the team learned the expensive way is the lesson this chapter is about: the unit you pay for compute on must be different from the unit you pay for storage on, and that separation is the single biggest cost lever in modern data platforms.

Compute-storage separation is the architectural decision to keep table bytes in cheap object storage (S3/GCS/ADLS) and spin compute clusters up only when a query needs them. Done right, you pay ₹X for storage that exists 24/7 and ₹Y for compute that exists for the 90 minutes a day someone is querying it — instead of paying for a cluster that holds both. The hard parts are not the architecture, they are the metadata layer, the cold-start tax, and the cache-warming pattern that decides whether the separation actually saves money.

What the warehouse used to look like, and why it was expensive

Before 2014 every analytical database packaged compute and storage in the same node. A Teradata appliance, a Netezza machine, a Vertica cluster, even early Redshift — all of them stored your data on disks that were physically attached to the servers running your queries. That coupling was the source of the bill: when you wanted more storage, you bought a bigger node; when you wanted more compute, you also bought a bigger node, because storage was bolted to it. A team with 100 TB of data and one query a day still paid for 100 TB worth of nodes humming continuously, because the bytes lived on those nodes' disks and turning the node off meant the bytes were unreachable.

The shape of the bill that came out of this architecture had three sharp consequences. The first was that idle was unaffordable — you could not turn the cluster off, even at 3 a.m. on a Sunday, because doing so would take your data offline. The second was that scale-up was lumpy — going from 8 nodes to 16 nodes for a one-hour Monday-morning report meant you were paying for 16 nodes the rest of the week. The third was that redundancy was expensive — the only way to make data fault-tolerant was to replicate it across the same expensive coupled nodes, often three times.

The same 24 TB of data on the same physical disks (because S3 is, ultimately, also disks). What changes is who can turn what off. In the coupled world, turning anything off loses data. In the separated world, the storage is always on but cheap; the compute is on only when a query needs it.

The unlock came from a quiet observation: object storage (S3 launched 2006, GCS 2010, ADLS 2014) was finally cheap and durable enough that you could keep your warehouse data on it rather than on the warehouse's local disks. Why object storage changed the math: S3 stores 11 nines of durability for ~₹1.8/GB/month and is reachable from any compute layer over a network. The price-per-byte is roughly 1/8th the price of a local SSD on a database node, and you don't pay for the CPU and RAM that come bolted to that SSD — you pay only for the bytes themselves. The moment data fits on S3, the question stops being "how big a node should I buy" and becomes "how often do I need a node at all". Snowflake (2014) was the first warehouse to commercialise this — separate billing for storage (S3 underneath, billed per TB) and compute (virtual warehouses, billed per credit-second) — and BigQuery (2010, on Colossus) had been doing it from the start. By 2024 the architecture is the default for analytical workloads; the Teradata-shaped coupled appliance is a legacy footprint, not a new install.

How the separation actually works under the hood

Three layers stack up in a separated architecture, and each layer has its own price meter and its own scaling rule. Understanding the layers is what lets you reason about cost.

The bottom layer is object storage. S3, GCS, ADLS, MinIO. It stores bytes — Parquet files, ORC files, whatever — at ₹1.4–1.9 per GB per month for hot tier, cheaper for infrequent-access tiers. It charges separately for requests (per 1000 GET/PUT) and for egress (per GB read out of the cloud region). It doesn't know what tables are; it just knows files. This layer is always on. It does not have a "suspend" mode because there's nothing to suspend — the bytes are at rest. Why this layer can be always-on cheaply: S3 is internally a distributed system that the cloud vendor amortises across millions of customers. Your 12 TB sits among petabytes of other customers' data on shared spinning disks (S3 Standard) or shared SSDs (S3 Express). You pay for the bytes you store, not for the slot on the disk; the vendor packs the slots tight and bills the marginal cost.

The middle layer is the metadata / table format. Iceberg, Delta Lake, Hudi, or — inside a closed warehouse — Snowflake's FoundationDB-backed metadata service or BigQuery's Spanner-backed catalog. This layer turns "files in a bucket" into "tables with schemas, partitions, and snapshots". It is the layer that lets a query planner know "this table has 47 partitions, here is the manifest, here are the files for dt = '2026-04-24'" without scanning the bucket. The metadata layer is the part most engineers under-appreciate, because it's invisible when it works — but it is the entire reason the separation pays off. Without it, every query would have to do a LIST on the bucket (slow, expensive) and read every file's footer to figure out the schema (very slow, very expensive).

The top layer is compute — virtual warehouses, BigQuery slots, Spark clusters, Trino coordinators. These are the things that read bytes from the object store, run SQL operators on them, and return results. They are ephemeral: in Snowflake they spin up in 1–2 seconds from a hot pool and suspend after 60 seconds of idle by default. In BigQuery on-demand they are invisible (Google manages a global pool of slots and you just submit queries). In Spark on EMR they take 3–5 minutes to provision and are usually kept running for the duration of a job. The bill for this layer is what you turn on and off; it is the lever that makes the architecture actually save money.

The interaction between these three layers is what determines whether the architecture saves you money or not. A Snowflake warehouse running a query goes: receive SQL → consult metadata to find which files to read → S3 GET those files (paying egress and request fees) → run operators in compute → write results. If the warehouse runs the same query an hour later, modern warehouses cache the most recent result (Snowflake's result cache lives 24 hours) and don't even hit compute. If the result cache is stale but the data files haven't changed, the warehouse can still hit a local SSD cache of the recently-read files (Snowflake's "warehouse cache" — the only reason a warm warehouse is faster than a cold one for the same query). Cold queries pay full S3 egress and request cost; warm queries pay almost nothing because they hit the local cache.

Building a tiny separated warehouse

To make the architecture concrete, here is a 60-line Python prototype of the separation. Storage is a directory on disk standing in for S3; metadata is a JSON file standing in for an Iceberg catalog; compute is a function that you invoke (or don't) on demand.

# tiny_separated_warehouse.py — see how compute and storage become billable separately.
import os, json, time, glob, datetime, csv

STORAGE_DIR = "./tiny_s3"   # stand-in for S3 — always on
CATALOG     = "./catalog.json"  # stand-in for Iceberg manifest — always on
COMPUTE_LOG = "./compute_log.csv"  # how we bill compute — when it ran, for whom

# ---------- STORAGE LAYER (always on, charged by byte) ----------
def storage_cost_per_month(rate_per_gb=1.85):
    total_bytes = sum(os.path.getsize(p) for p in glob.glob(f"{STORAGE_DIR}/*.csv"))
    gb = total_bytes / 1024**3
    return gb, gb * rate_per_gb

# ---------- METADATA LAYER (always on, near-free) ----------
def register_file(path, table, partition_dt, row_count):
    cat = json.load(open(CATALOG)) if os.path.exists(CATALOG) else {}
    cat.setdefault(table, []).append(
        {"path": path, "dt": partition_dt, "rows": row_count})
    json.dump(cat, open(CATALOG, "w"), indent=2)

def files_for(table, dt):
    cat = json.load(open(CATALOG))
    return [e["path"] for e in cat.get(table, []) if e["dt"] == dt]

# ---------- COMPUTE LAYER (billed only when run) ----------
class Warehouse:
    def __init__(self, name, credit_rate=260.0):
        self.name = name
        self.credit_rate = credit_rate  # ₹/credit-second
        self.started_at = None
    def __enter__(self):
        self.started_at = time.time()
        print(f"[{self.name}] resume (cold start, ~1.4s)")
        time.sleep(1.4)  # simulated cold-start tax
        return self
    def __exit__(self, *a):
        runtime = time.time() - self.started_at
        cost = runtime * self.credit_rate
        with open(COMPUTE_LOG, "a", newline="") as f:
            csv.writer(f).writerow([self.name, runtime, cost,
                                    datetime.datetime.utcnow().isoformat()])
        print(f"[{self.name}] suspend after {runtime:.1f}s, billed ₹{cost:.2f}")
    def query(self, table, dt):
        files = files_for(table, dt)
        rows = 0
        for fp in files:
            with open(fp) as f: rows += sum(1 for _ in f) - 1
        return rows

# ---------- DEMO ----------
os.makedirs(STORAGE_DIR, exist_ok=True)
sample = f"{STORAGE_DIR}/orders_2026-04-24.csv"
with open(sample, "w") as f:
    f.write("order_id,amount\n")
    for i in range(50_000): f.write(f"{i},{i*7}\n")
register_file(sample, "orders", "2026-04-24", 50_000)

gb, monthly = storage_cost_per_month()
print(f"Storage: {gb*1024:.1f} MB, ₹{monthly:.2f}/month (always on)")

with Warehouse("WH_GROWTH_M") as wh:    # billed for ~1.5s of runtime
    rows = wh.query("orders", "2026-04-24")
    print(f"Query returned {rows:,} rows")
# storage layer keeps charging; compute is off; metadata is at rest.

# Sample run:
Storage: 0.6 MB, ₹0.00/month (always on)
[WH_GROWTH_M] resume (cold start, ~1.4s)
Query returned 50,000 rows
[WH_GROWTH_M] suspend after 1.5s, billed ₹389.93

Walk through the four lines that carry the architecture. storage_cost_per_month charges by total bytes in the bucket regardless of whether anything is querying — this is the layer that survives even when the warehouse is suspended. register_file / files_for is the metadata layer that lets compute find files by partition without listing the whole bucket; it's a JSON file here, but in production it's the Iceberg manifest tree or Snowflake's micro-partition catalog. Why metadata is the unsung hero: a real warehouse has 100,000+ files; doing an S3 LIST on every query costs ₹0.005 per 1,000 keys plus latency, and reading every file's Parquet footer to discover the schema costs another GET each. Metadata collapses this to one read, which is what makes the architecture survive at scale. Warehouse.__enter__ / __exit__ is the resume-and-suspend cycle that turns the bill from "always-on per-second" to "per-second-while-running". The 1.4-second cold-start tax in __enter__ is the price you pay for the architecture; if your queries are 200 ms each, that tax dominates and the separation hurts. Why cold start matters for the cost calculus: if a query takes 0.2s of compute and 1.4s of resume, the warehouse runs for 1.6s but only does 0.2s of work — you are paying 8× the cost per useful second. The separation only saves money when query bursts are long enough or rare enough that the saved idle time exceeds the resume tax. The with block is what makes the bill a function of usage; everything outside the with is free, everything inside it costs ₹260/credit-second. This is the entire architectural shift in 5 lines of Python.

The toy is single-user and single-warehouse; production warehouses add multi-statement transactions, query result caching, warehouse-local SSD caches, multi-cluster scale-out (Snowflake), reservation pools (BigQuery), and per-statement billing for serverless variants — but the bone structure is what the script shows: storage that never turns off, compute that does, metadata that brokers between them.

The three layers each have a different on/off rule and a different price meter. The architecture's savings come from being able to suspend layer 3 without losing access to layers 1 and 2 — and the architecture's failure mode is keeping layer 3 on when it doesn't need to be.

Where the separation pays for itself, and where it doesn't

The architecture sounds like a free lunch: pay only for compute when you use it, pay only for storage by the byte. In practice the savings depend sharply on your workload's duty cycle — the fraction of time compute is doing useful work — and three patterns dominate.

Bursty analytical workloads pay off enormously. A team that runs 90 minutes of dashboard queries per day and then nothing else gets compute billed for 1.5 hours × 30 days = 45 hours/month, instead of 720 hours/month. Storage is unchanged. The bill collapses by ~95% on the compute side, which usually means a 70–80% bill reduction overall (because compute dominates). This is the canonical Snowflake / BigQuery on-demand sweet spot, and the reason cloud warehouses ate the analytical workload market between 2016 and 2022.

Steady-state OLTP workloads do not pay off and are a category mistake. A Postgres database serving 500 QPS at p99 5ms cannot use the separated architecture, because every query needs sub-millisecond access to data and the S3 round-trip alone is 20–60 ms. Cold-start tax is unaffordable, file-format reads are too slow for OLTP. This is why operational databases (Postgres, MySQL, DynamoDB) still couple compute and storage on local disk; the separation is for analytics, not transactions. Why OLTP needs coupled storage: a transactional query reading a single row by primary key needs the row's page to be in RAM, ideally already there from a recent access. If the page lives on S3, you pay 30 ms minimum per cache miss; at 500 QPS that's 15 seconds of latency per second of wallclock — the system simply cannot keep up. Coupled storage exists in OLTP so the page cache and the disk are the same fabric.

Steady-state analytical workloads are the trap. A team that runs continuous queries — ML feature pipelines that read every 30 seconds, dashboards on a 1-minute refresh, real-time analytics on hot data — keeps the warehouse warm 24×7. The compute bill is the same as a coupled architecture (you're paying for compute all the time anyway), and you pay an extra tax for S3 egress and metadata calls that a coupled architecture wouldn't pay. This is the Bengaluru fintech failure mode: a warehouse with 24-hour auto-suspend is in steady state by design, so the architecture's main savings vector — being off — is disabled. The fix isn't a different warehouse; it's recognising that "I want my data ready" doesn't mean "I want a warehouse on" — it means "I want my dashboards cached" or "I want a real-time analytics engine like ClickHouse / Pinot / Druid". Those are different architectures, optimised for steady-state hot reads, and the separation is not the right tool for them.

The decision rule that survives in practice is: if your peak-hour duty cycle is under 30%, separation saves you 5×–10× on compute. If your duty cycle is over 80%, separation costs you 10–20% extra over a coupled cluster of the same compute size. The grey band in between (30–80%) is where the savings are real but smaller and where the auto-suspend tuning, multi-team sharing, and reservation strategy decide the actual outcome.

How real systems implement the separation

Snowflake runs Virtual Warehouses on EC2 with local SSD caches in front of an S3-backed (or GCS-backed, on GCP deployments) compressed columnar store. The metadata layer is a sharded FoundationDB cluster called Cloud Services. Cold queries hit S3; warm queries hit the local SSD cache; very-warm queries hit the result cache and don't even spin up compute. The credits charged for compute are decoupled from credits charged for Cloud Services (the metadata layer), and the storage bill is a separate flat per-TB-per-month line item. The auto-suspend default is 600s (10 minutes) for new warehouses, which is generous; tightening to 60s is the single highest-impact cost lever for under-utilised warehouses.

BigQuery went separated from day one. Compute is "slots" (a global pool managed by Google); storage is Colossus, Google's internal blob store; metadata is in Spanner. On-demand pricing charges you per byte scanned per query (₹417/TB scanned, on-demand, in 2026); reservation pricing charges you a flat monthly rate for a guaranteed slot count plus storage. The on-demand model is the purest possible expression of "compute only when used" — there is no warehouse object you turn on; queries just run and you pay for the bytes they touched. The trade-off is the inability to predict the bill until queries actually run.

Lakehouse architectures (Iceberg / Delta / Hudi on S3 + Trino/Spark/DuckDB compute) push the separation further: the storage and metadata layer is fully open (parquet files + iceberg manifests in your bucket) and you bring your own compute engine. This is the architecture Razorpay's data lake runs on: Iceberg tables in S3, Trino for interactive queries, Spark for heavy ETL, dbt for transformations — three different compute engines hitting the same byte layer, scaled and billed independently. Storage is managed by AWS (₹1.85/GB/month), compute by Razorpay's EKS clusters (₹X/hour for Trino, ₹Y/hour for Spark). The architectural prize is that no single compute vendor owns the lock-in — the data lives in your bucket, in an open format, and you can swap engines.

Databricks SQL warehouses look like Snowflake from the outside (resume/suspend, t-shirt sizes) but run Photon on top of Delta Lake on top of S3. The same separation, packaged differently. The cost lever set is the same: tighten auto-suspend, right-size the cluster, attribute by tag.

The takeaway across vendors is that the separation is now table stakes — the question for a team in 2026 is not "should I separate compute and storage" (the answer is yes, unless you're running OLTP) but "how do I tune auto-suspend, caching, and per-team warehouses so the savings actually land".

Common confusions

"Compute-storage separation is the same as cloud-native." Cloud-native is a marketing term for "deployed on managed infra"; many cloud-native warehouses are still coupled (early Redshift was). The separation is a specific architectural choice — bytes on S3, compute over the network — that you can have on-prem too (MinIO + Trino is a separated stack on bare metal).
"Separated means cheaper, always." Only when duty cycle is low. A team running a warehouse 24/7 in steady state pays more on a separated architecture than on a coupled one, because of S3 egress + metadata fees layered on top of equivalent compute spend.
"I can just turn off the warehouse to save money." Turning off a warehouse without tightening auto-suspend, attributing untagged queries, and routing one-off queries to a serverless compute layer just shifts the spike — users will resume the warehouse all day and never let it suspend. The separation needs operational discipline to deliver savings.
"The lakehouse and the warehouse are different architectures." The bone structure is identical: bytes on object storage, compute on demand, metadata in between. The differences are who owns the metadata (Snowflake vs Iceberg) and how open the format is (proprietary micro-partitions vs Parquet+manifest). The economics are the same.
"Reading from S3 is too slow for analytics." Single-file random access to S3 is slow (~30 ms latency). Sequential scans of Parquet files at 100 MB/s+ per file, in parallel across hundreds of files, with predicate pushdown reading only the bytes the query needs — that is fast enough to compete with local SSD for analytical workloads. The benchmark Snowflake and BigQuery publish (millions of rows/sec/core) is on top of this exact pattern.
"Storage is free, only compute matters." Storage is 5–15% of a typical analytical bill, but it grows linearly forever. A team that does not implement retention will see the storage line cross the compute line in 3–5 years. Iceberg snapshot expiry and S3 lifecycle policies are not optional — they are part of the architecture.

Going deeper

Cold-start tax: the part the marketing skips

A separated warehouse pays a "resume" cost every time it goes from suspended to running — Snowflake's docs say 1–2 seconds, in practice often 3–5 seconds for very large warehouses. For batch ETL that runs a 30-minute job, this is noise. For interactive queries on a sub-second-latency dashboard, this is fatal — the user clicks a filter, waits 3 seconds for the warehouse to resume, then 100 ms for the actual query, and walks away thinking your dashboard is broken. The fixes are: keep at least one warehouse warm for interactive workloads (with a tight 60-second auto-suspend so the cost is bounded), use a multi-cluster warehouse so the second cluster is already warm when the first is at capacity, or move interactive workloads to a real-time analytics engine where compute is permanently warm by design. This is the under-appreciated reason teams running interactive analytics on Snowflake end up keeping a warehouse warm 24/7 and paying coupled-architecture prices for a separated architecture.

The S3 egress trap and the "data plane on the same cloud" rule

S3 charges nothing to transfer data within the same region but charges ₹7–8/GB to transfer data out of the region. A team that puts their Iceberg tables in ap-south-1 and runs Spark in us-east-1 because the engineers like the AWS US console will pay for every byte the Spark job reads — and on a 50 TB scan, that's ₹4 lakh in egress alone, before any compute spend. The architectural rule is that compute and storage must be in the same region, and the team's monitoring must include cross-region transfer dollars on the dashboard. This is a regression that hits every other migration.

Snowflake's micro-partition vs Iceberg's manifest tree

Both systems separate storage and compute, but the metadata layer is different. Snowflake's micro-partitions are 16 MB compressed columnar blobs with embedded statistics, indexed by a closed FoundationDB-backed catalog — the metadata is Snowflake's, you can't see it. Iceberg's manifest tree is a hierarchy of JSON/Avro files in your bucket: a snapshot points to manifest lists, which point to manifests, which point to data files. The Iceberg metadata is open — anyone can read it with the right library — and that openness is what makes "swap your compute engine" possible. The trade-off is that Iceberg's manifest tree can grow unbounded if you don't run periodic expire_snapshots and rewrite_manifests; Snowflake handles both internally. The "amount of plumbing per team" is the cost of openness.

Real-world benchmark: Razorpay's lakehouse migration

In a 2024 talk, Razorpay's data platform team described migrating off Redshift (coupled, ₹X/month) to an Iceberg-on-S3 + Trino lakehouse. The published headline was that storage costs dropped ~70% (Redshift's ds2.xlarge nodes priced storage at roughly 4× S3) and compute became elastic enough to spin Trino clusters up only during business hours. The unpublished detail in the Q&A was that the migration took 8 months and the first three months saw higher total spend because the team had not yet tuned auto-suspend, had Trino clusters on for ETL during off-hours, and was paying egress on cross-region tooling. The architectural win required matching operational tuning — the separation does not deliver savings on its own.

Where this leads next

/wiki/query-cost-attribution — the previous chapter. Once compute and storage are separated, attribution becomes the lever for "who decides to keep this warm".
/wiki/multi-tenant-warehouses-isolation-and-noisy-neighbours — the team-sharing pattern that decides whether per-team warehouses or shared warehouses are cheaper.
/wiki/copy-on-write-vs-merge-on-read-iceberg-vs-hudi — the table-format choice that lives on top of the separated storage layer.
/wiki/eventual-consistency-on-s3-and-what-it-breaks — what S3's actual consistency guarantees mean for the metadata layer.

References

Dageville et al., "The Snowflake Elastic Data Warehouse" (SIGMOD 2016) — the original paper on the architecture.
Melnik et al., "Dremel: Interactive Analysis of Web-Scale Datasets" (VLDB 2010) — BigQuery's foundation; the separated architecture's earliest production deployment at scale.
Apache Iceberg, "Table Format Specification" — the open metadata layer that powers most lakehouses.
AWS, "S3 pricing" — the storage layer's actual cost numbers, including egress and request fees.
Snowflake, "Auto-suspension and auto-resumption" — the operational lever that decides whether the architecture saves money.
Razorpay Engineering, "Building a lakehouse on Iceberg" — public talks and posts on the migration journey.
/wiki/query-cost-attribution — internal previous chapter; the per-team billing problem this architecture enables.
/wiki/copy-on-write-vs-merge-on-read-iceberg-vs-hudi — internal cross-link; the writer-side trade-off on the separated storage layer.