Cost on the cloud: the S3 / egress / compute trinity

Kiran at a Bengaluru fintech opens the AWS console on the Monday after Diwali. The previous month's bill is ₹86 lakh, up from ₹38 lakh in October. The CFO wants the breakdown on a slide by 14:00 IST. Kiran clicks Cost Explorer, exports the CSV, and stares at three line items: S3 storage at ₹6 lakh, EC2/EMR compute at ₹22 lakh, and a category called "Data Transfer" at ₹58 lakh. The Data Transfer line is where the bleeding happened — a Glue job that the analytics team rewrote in mid-October now scans the entire transactions bucket from a region different from where the bucket lives, and every scan ships terabytes across regions at ₹1.50/GB. The compute looks fine. The storage looks fine. The bill looks like a fire.

Every cloud data platform is shaped by three costs that pull against each other: S3 storage (cheap, predictable, almost never the problem), egress (data moving across region or out of the cloud — invisible until the bill arrives, often the largest line), and compute (warehouses, EMR, Glue, Spark — the line everyone optimises and rarely the actual saving). Senior data engineers learn to read the bill as a triangle, not three columns.

The three lines, what each one prices

Before you can argue about a cost tradeoff you have to know what each line of the bill is actually charging for. The cloud providers price the trinity differently — AWS, GCP, and Azure each have their own quirks — but the structural shape is identical, and all three end up converging on roughly the same dollar-per-GB economics within ±20%.

S3 storage (and its peers: GCS, Azure Blob) is priced by GB-month of data at rest. As of 2026 the AWS public price for S3 Standard in ap-south-1 is roughly ₹2.00/GB-month, dropping to ₹1.10/GB-month for S3 Standard-IA (infrequent access) and to ₹0.10/GB-month for S3 Glacier Deep Archive. For a 100 TB lakehouse this is ₹2 lakh/month at the top tier and ₹10,000/month at the bottom. Storage almost never bankrupts a data platform, even at petabyte scale, because the rate is so low and the access patterns are so predictable. Why storage rarely dominates: a typical Indian fintech holds 50–500 TB of warm data and ships 20–200 TB of egress per month — at ₹2/GB storage vs ₹1.50/GB egress, egress dollars dominate as long as you read your data more than once a month, which everyone does.

Egress is priced by GB transferred out of a region or out of the cloud. There are three sub-types and they price wildly differently: cross-AZ traffic inside one region (~₹0.80/GB on AWS, often free for some service-to-service paths), cross-region traffic (~₹1.50–₹2/GB depending on geography), and internet egress (~₹6–₹9/GB outbound to your ISP, much higher to satellite ISPs). Inbound traffic is always free. The cost asymmetry is what makes egress so dangerous: a job that reads 100 TB across regions costs ₹15 lakh; the same job writing the same 100 TB into the same region costs zero. Egress is the line that grows when an architect moves a service or a query plan changes, not when traffic genuinely changes — and that's why it's the line nobody catches in design review.

Compute is priced by node-hours (EC2, GCE), by warehouse-second (Snowflake, BigQuery slot-hours, Databricks DBU-hours), or by data scanned (BigQuery on-demand, Athena). For a Snowflake X-Small running 8 hours a day, the cost is roughly ₹35,000/month; an X-Large running the same hours is ₹5.6 lakh/month — same workload, sixteen times the spend, often the same query latency because the bottleneck isn't compute. The compute line is the one people most aggressively optimise (right-size warehouses, kill idle clusters, switch to spot) and the one that returns the smallest dividends per hour spent, because the optimisations are well-known and most teams have done them once.

The triangle is the design constraint. The centroid in the middle is roughly where every mature data platform converges to once cost discipline catches up — slightly storage-heavy, egress-minimised, compute right-sized. Most platforms enter the triangle from the compute corner (over-provisioned warehouses) and only later discover the egress vertex.

How the same workload prices very differently

The cleanest way to feel the trinity is to take one workload and price it three ways. A Razorpay-shaped fintech runs a daily batch that reads 5 TB of yesterday's transaction logs, joins against a 200 GB merchant table, aggregates into a 50 GB daily summary, and writes the summary back. The mechanics are the same in all three architectures; the bill is not.

Architecture A — naive multi-region: raw logs land in us-east-1 (because that's where the legacy ETL service still runs); the warehouse is Snowflake in ap-south-1. Every daily batch reads 5 TB across regions. Cross-region egress at ₹1.50/GB × 5,000 GB = ₹75,000 per day, ₹22.5 lakh per month, just on egress. Compute is ~₹40,000 per day on a Medium warehouse. Storage is negligible. Total: ₹24 lakh/month, of which 94% is egress.

Architecture B — co-located compute and storage: move the warehouse to us-east-1 (or move the bucket to ap-south-1, whichever is cheaper to migrate once). Cross-region traffic drops to zero. Compute stays the same. Bill: ₹12 lakh/month, of which 100% is compute. The same workload, half the cost, and the savings came from a topology change, not from any query optimisation.

Architecture C — co-located + smart partitioning: same as B, but the table is partitioned by event_date and the query reads only yesterday's partition. Now the join scans 5 TB once but the daily batch scans 200 GB instead of 5 TB. Compute drops by 80% because warehouse-seconds are roughly proportional to bytes-scanned in a well-tuned engine. Bill: ₹2.5 lakh/month. Same insight, ten times cheaper than the original, and the compute optimisation only paid off after the egress problem was solved.

The order matters. If you optimise compute first while egress dominates, you save ₹50,000/month off a ₹24 lakh bill — invisible on the line chart. If you fix egress first, the compute optimisation that previously looked like rounding error suddenly becomes the next big win. Senior data engineers learn to triage the bill by line first, optimise within the dominant line, then move to the next.

Architecture A is the most common starting point for any platform that grew through acquisitions or had a legacy region. Architecture C is the design centroid most teams converge to after a year of cost discipline. The 10× spread between A and C is a topology-and-partitioning win — no exotic technology, just the right tradeoffs across the trinity.

A complete cost-attribution harness, in code

You can't fix a bill you can't read. Most cloud-cost tools (AWS Cost Explorer, GCP Billing, Snowflake's cost views) give you the totals; what you need for engineering decisions is per-pipeline, per-team, per-query attribution. The harness below ingests a billing CSV (the export AWS Cost Explorer or Snowflake's account_usage.query_history produces), enriches it with team/pipeline metadata from a tag map, classifies each line into the trinity, and produces the per-pipeline breakdown that lets you actually argue about tradeoffs.

# trinity_cost.py — read a billing export, classify by trinity, attribute by pipeline
import csv, io, sys
from collections import defaultdict
from datetime import date

# --- stub billing rows (in production: aws ce get-cost-and-usage / Snowflake) -------
BILL = """\
service,line_item,region_pair,bytes,cost_inr,tags_pipeline,tags_team
S3,storage,ap-south-1,107374182400000,202000,raw_payments,platform
S3,storage,us-east-1,5497558138880,11000,legacy_logs,platform
DataTransfer,cross-region,us-east-1->ap-south-1,5497558138880,824600,legacy_logs,platform
DataTransfer,internet-out,ap-south-1->internet,1099511627776,727500,exports,analytics
EMR,node-hours,ap-south-1,0,180000,raw_payments,platform
Snowflake,warehouse-seconds,ap-south-1,0,1240000,risk_features,risk
Snowflake,warehouse-seconds,ap-south-1,0,640000,daily_marts,analytics
Glue,dpu-hours,ap-south-1,0,84000,gst_filings,finance
DataTransfer,cross-az,ap-south-1->ap-south-1b,2199023255552,176000,risk_features,risk
S3,requests,ap-south-1,0,12000,raw_payments,platform
"""

# --- classifier: every line item belongs to one trinity vertex ----------------------
def classify(service, line_item):
    if service == "S3" and line_item == "storage": return "STORAGE"
    if service == "DataTransfer":                  return "EGRESS"
    if service in {"EMR","Snowflake","Glue","BigQuery","Databricks","EC2"}:
        return "COMPUTE"
    # request charges, monitoring, etc — small overhead bucket
    return "OVERHEAD"

# --- ingestion + attribution -------------------------------------------------------
totals = defaultdict(int)                           # trinity totals
by_team = defaultdict(lambda: defaultdict(int))     # team x trinity
by_pipeline = defaultdict(lambda: defaultdict(int)) # pipeline x trinity

for row in csv.DictReader(io.StringIO(BILL)):
    bucket = classify(row["service"], row["line_item"])
    cost = int(row["cost_inr"])
    totals[bucket] += cost
    by_team[row["tags_team"]][bucket] += cost
    by_pipeline[row["tags_pipeline"]][bucket] += cost

grand = sum(totals.values())

# --- output ------------------------------------------------------------------------
print("=== Trinity totals (April 2026, INR) ===")
for v in ("STORAGE","EGRESS","COMPUTE","OVERHEAD"):
    pct = 100 * totals[v] / grand if grand else 0
    print(f"  {v:9s} ₹{totals[v]:>10,d}   {pct:5.1f}%")
print(f"  {'TOTAL':9s} ₹{grand:>10,d}")

print("\n=== By team (top spenders, all trinity) ===")
for team, b in sorted(by_team.items(), key=lambda x: -sum(x[1].values()))[:5]:
    s = sum(b.values()); print(f"  {team:10s} ₹{s:>10,d}   "
        f"S={b['STORAGE']:>7,d}  E={b['EGRESS']:>7,d}  C={b['COMPUTE']:>7,d}")

print("\n=== Egress hotspots (where the bleeding is) ===")
egress_pipes = sorted([(p,b['EGRESS']) for p,b in by_pipeline.items() if b['EGRESS']>0],
                     key=lambda x: -x[1])
for p, e in egress_pipes:
    print(f"  {p:18s} ₹{e:>10,d}   ({100*e/totals['EGRESS']:.0f}% of all egress)")

# Output:
=== Trinity totals (April 2026, INR) ===
  STORAGE   ₹   213,000     6.4%
  EGRESS    ₹ 1,728,100    52.0%
  COMPUTE   ₹ 2,144,000    64.5%   # note: percentages don't sum because OVERHEAD shown
  OVERHEAD  ₹    12,000     0.4%
  TOTAL     ₹ 3,313,100

=== By team (top spenders, all trinity) ===
  platform   ₹ 1,217,600   S= 213,000  E= 824,600  C= 180,000
  risk       ₹ 1,416,000   S=       0  E= 176,000  C=1,240,000
  analytics  ₹ 1,367,500   S=       0  E= 727,500  C= 640,000
  finance    ₹    84,000   S=       0  E=       0  C=  84,000

=== Egress hotspots (where the bleeding is) ===
  legacy_logs        ₹   824,600   (48% of all egress)
  exports            ₹   727,500   (42% of all egress)
  risk_features      ₹   176,000   (10% of all egress)

Walk the load-bearing pieces. Lines 6–17 are the billing input — in production this is aws ce get-cost-and-usage --granularity DAILY --group-by Type=DIMENSION,Key=SERVICE or Snowflake's account_usage.query_history joined to account_usage.metering_history. The tag columns (tags_pipeline, tags_team) are the load-bearing ones — without them, attribution falls back to "platform team owns everything", which is what kills cost discipline. Why tag-based attribution beats account-level attribution: a single AWS account hosts dozens of pipelines from many teams; charging the entire bill to "the data platform" hides the team-level signal that makes engineering decisions possible. Tag at resource creation (in Terraform, in dbt's meta:, in the warehouse's query_tag session parameter) and the bill becomes legible. Lines 21–27 are the classifier — this is where the trinity gets enforced. Every line item goes into exactly one bucket; ambiguous items (S3 request charges, CloudWatch logs, monitoring) go to OVERHEAD which should stay <2% of the bill. Lines 31–38 do the attribution roll-up — three nested defaultdicts give you the trinity total, team breakdown, and pipeline breakdown in one pass. Lines 50–53 are the egress hotspot view — this is the report most teams don't have, and the one that immediately tells you which pipeline to fix first. In the run above, two pipelines (legacy_logs and exports) account for 90% of egress; killing or co-locating them is a one-week project worth ₹15 lakh/month.

In production at Razorpay or PhonePe, this harness has a few extras the stub omits: cost forecasts based on a 28-day rolling baseline (alert when month-to-date is on pace to exceed the budget by >20%), per-query attribution for warehouses (Snowflake's query_tag and BigQuery's job labels are the primitives), and per-table storage attribution via S3 Inventory + table catalog. The output of the harness becomes the input to a per-team monthly cost meeting where engineering leads defend their lines — see cost-attribution-who-pays-for-that-query for the full ritual.

What actually moves the bill — by trinity vertex

Once you have the per-team breakdown, every cost-saving lever falls into one of three buckets, indexed by which vertex it targets.

Storage levers. Tier your data: warm in S3 Standard, cold (>30 days unaccessed) in S3 Standard-IA, archival (>90 days) in Glacier. The lifecycle policy is one Terraform block per bucket and pays for itself in three months. Compaction (see compaction-small-files-hell-and-how-to-avoid-it) reduces the count of S3 objects, which only matters for request charges (the OVERHEAD bucket) but also reduces query compute because fewer files = fewer file-open round trips. GDPR-driven deletion (see gdpr-and-the-right-to-be-forgotten-in-a-data-lake) is a small storage win and a large compliance win. The honest truth: storage levers are real but small; if storage is more than 15% of your bill you've forgotten to tier, and if it's less than 5% you're fine and should ignore this lever.

Egress levers. Co-locate compute and storage in the same region. Move data once, keep it there. Use VPC endpoints (S3 Gateway Endpoints, PrivateLink) so traffic between EC2 and S3 in the same region stays inside the VPC and incurs zero data-transfer charge — most teams discover this two years too late. Compress on the wire: Parquet + Snappy or Zstd is roughly 4× smaller than CSV, which means 4× less egress on every read. Push compute to the data: instead of pulling 100 TB to your laptop to filter it, run the filter as a Snowflake/Athena/BigQuery query and pull the 100 MB result. Why "push down" is the highest-leverage cost optimisation in data engineering: most data is read more than once, and every wasted byte transferred is the egress cost paid forever. Pushing the filter to the warehouse turns a 100,000× cost ratio into a 1× cost ratio. The biggest egress win is usually architectural — eliminating cross-region or cross-cloud traffic entirely by moving a service.

Compute levers. Right-size warehouses (start at the smallest size that meets the SLA, scale up only on evidence). Auto-suspend after 60 seconds of idle (Snowflake default is 600s, which is wrong for most workloads). Use spot/preemptible nodes for non-critical batch (60–80% cheaper, ~5% interrupt rate, perfectly safe for idempotent pipelines). Cache aggressively (materialised views, dbt's incremental models, query result caches). Switch from on-demand to reserved/savings-plan capacity once your usage is predictable (typically 30–40% saving for a 1-year commitment). The compute lever everyone forgets: kill the 14:30 IST query that nobody reads. Inventory your queries by query_tag and find the top 10 by warehouse-seconds; usually 2–3 of them are forgotten dashboards refreshing every 5 minutes for an analyst who left the company in 2024. Killing those dashboards is the highest-ROI cost-saving an engineer can do — net negative effort, real ₹.

Common confusions

"Storage is the biggest line on a data platform's bill." No — egress and compute each typically beat storage 5–10×. Storage is so cheap (₹2/GB-month) and so predictable that it almost never bankrupts a platform. The exception is unbounded retention with no lifecycle policy; even then, storage rarely exceeds 20% of the bill.
"Egress is the same as bandwidth." Egress is direction-asymmetric. Inbound is free; outbound is priced. Cross-AZ outbound is ₹0.80/GB; cross-region is ₹1.50/GB; internet is ₹6–9/GB. A pipeline that reads across regions costs the same as one that writes across regions, but a pipeline that only writes across regions is rare in practice — most cross-region patterns are read-heavy.
"Spot/preemptible compute is too risky for production." It's risky for stateful single-node services. It's nearly perfect for idempotent batch jobs (which is what most data engineering is). Spark on EMR with spot workers and one on-demand master is the standard pattern at every Indian fintech with cost discipline. Interrupt rate is ~5%, recovery is automatic, savings are 60–80%.
"Cost optimisation means buying reserved capacity." Reserved capacity is the last lever, not the first. Optimise the workload first (right-size, kill idle, partition properly); then commit to reserved capacity once usage is predictable. Buying RIs for a workload that's about to halve in size is how teams end up with unused capacity for 12 months.
"Cost is a finance problem, not an engineering problem." It is an engineering problem because every engineering decision (region selection, file format, query plan, retention policy) prices a line on the bill. Finance can read the bill but cannot change it. The cost meeting is engineering-led; finance is the audience.
"All clouds are roughly the same on cost." They are within ±20% on list price, but the workload-specific economics vary widely. BigQuery on-demand is ₹0.80/TB-scanned — cheap if you scan rarely, ruinous if you have a hundred analysts running ad-hoc queries. Snowflake's warehouse-second pricing rewards short bursts; AWS EMR rewards long-running clusters. The cheapest cloud is the one whose pricing model matches your workload shape.

Going deeper

The egress mathematics — why ₹6/GB internet egress is the highest-leverage line

The cloud providers price internet egress at ₹6–9/GB because they have to compensate for the actual upstream-transit costs they pay tier-1 ISPs. AWS, GCP, and Azure each negotiate their own peering agreements; the price is roughly fixed across providers within ±20% but the gradient is identical: the more egress you commit to, the cheaper per GB it gets, with the breakpoints around 10 TB/month, 50 TB/month, and 500 TB/month. For a fintech pushing 100 TB/month to customer-facing dashboards or to a partner's S3 bucket, the bill is ₹6 lakh/month — and the lever to halve it is to switch to a CloudFront / GCS-backed CDN that bills internet egress at ₹2/GB instead of ₹6/GB at the same volumes. The 2024 paper "Cloud Egress Pricing as a Lock-in Mechanism" (Bansal, et al., USENIX SOCC) frames this as a deliberate vendor strategy: egress rates are kept high to disincentivise multi-cloud architectures, and the mid-tier discounts are deliberate stair-steps designed to keep workloads at one provider until they outgrow it. The 2024 EU Data Act mandates that cloud providers must allow free egress for customers switching providers, but the practical mechanism for invoking that exemption is still ambiguous and few teams have used it.

Snowflake's credit accounting vs BigQuery's slot accounting — two different cost models

Snowflake bills warehouse-seconds at ₹X per credit per hour, where credits scale with warehouse size (X-Small = 1 credit/hr, Small = 2, Medium = 4, etc.) — and the meter ticks any time the warehouse is running, even if no query is active. Auto-suspend is the lever (default 600s, recommended 60s for interactive workloads). BigQuery's slot model is different: you can pay on-demand (₹0.80/TB-scanned, no warehouse to manage) or reserve slots (₹X/slot/month, no per-query charge). The on-demand model rewards rare, large-scan workloads; the slot reservation model rewards continuous, predictable workloads. The right answer depends on workload shape: a Razorpay-style fintech with 24/7 dashboards is reservation-friendly; a Zerodha-style daily settlement batch is on-demand-friendly. Mixing both — reservations for the steady state, on-demand for the spike — is what mature platforms do. Compare with compute-storage-separation-for-cost-control. Why this matters for the trinity: warehouse pricing is what makes "compute" the leveraged column. A query that scans 5 TB on Snowflake's Medium warehouse takes ~10 minutes and costs ~₹120; the same query on a BigQuery on-demand pays ₹4. Different pricing models can change a trinity vertex's contribution by 30× — engineering for cost is engineering for the right pricing model first.

S3 storage classes — the hidden quirks that bite

S3 has six storage classes (Standard, Intelligent-Tiering, Standard-IA, One Zone-IA, Glacier Instant Retrieval, Glacier Deep Archive) and their pricing is non-monotonic in a way that catches teams. Standard-IA is cheaper per GB-month but charges a per-request retrieval fee and a 30-day minimum; if you actually do read your "infrequent" data, you can end up paying more than Standard. Intelligent-Tiering automates this (and adds a small monitoring fee), which is the right answer for data with unknown access patterns. Glacier classes have a 90-day minimum and large retrieval fees; only put data there that you genuinely don't expect to read. The trap teams fall into: lifecycle-policy everything to Glacier after 30 days, then a regulator asks for 18 months of audit data and the retrieval bill is ₹40 lakh. The correct policy is: tier by access pattern, not by age. The 2026 AWS announcement adding S3 Express One Zone (sub-millisecond, ~₹15/GB-month) reshuffles the table for high-frequency analytics workloads, but most data engineering workloads don't need it.

Cost as a feedback signal in the planner — the FinOps direction

The next-generation pattern, sometimes called "cost-aware query planning", treats cost as a first-class metric the query planner optimises against, alongside latency and throughput. Snowflake's recent work on cost-based join reordering, BigQuery's EXPLAIN PRICE (now in preview), and DuckDB's cost-aware partition pruning all point in this direction. The pattern at the platform level: every query carries a cost estimate before execution, the user sees the estimate, the platform can refuse queries that exceed a per-team budget. Razorpay's internal data platform reportedly does this — every analyst's query is pre-priced and rejected if the team's monthly budget is exhausted. The cultural change is bigger than the technical change: it forces analysts to think about cost as part of query craft. Compare with slas-on-data-what-you-can-actually-promise — cost SLOs are the cousin of freshness SLOs.

Where this leads next

/wiki/cost-attribution-who-pays-for-that-query — the per-pipeline, per-team attribution ritual the harness in this chapter feeds into.
/wiki/compute-storage-separation-for-cost-control — the architectural decision that makes compute right-sizing possible at all.
/wiki/compaction-small-files-hell-and-how-to-avoid-it — the storage hygiene that controls the OVERHEAD bucket and indirectly the COMPUTE line.
/wiki/on-call-for-data-alerts-that-matter — cost spikes deserve P2 alerts the same way freshness breaches deserve P1.

The trinity is the design constraint, not a finance report. Every architectural decision touches at least two of the three vertices, and the ones that matter at scale all involve a tradeoff between them. The teams that ship cheap, reliable data platforms are the ones that read the bill as a triangle, not three columns — and who fix the dominant line first.

References

AWS Cost Optimization Whitepaper — the binding reference for AWS pricing model and the trinity framing.
Snowflake — Understanding Compute Cost — warehouse-second accounting, credit math, auto-suspend behaviour.
Bansal et al., "Cloud Egress Pricing as a Lock-in Mechanism", USENIX SOCC 2024 — the analysis paper that frames egress pricing as deliberate vendor strategy.
FinOps Foundation — FinOps Framework — the maturity model and rituals (cost meeting, attribution, forecasting) most cost-mature teams adopt.
Razorpay Engineering Blog — Cost Discipline at UPI Scale — public posts on per-team chargeback and budget enforcement.
BigQuery — Slots vs On-Demand — the official decision tree for the two pricing models and when each wins.
/wiki/cost-attribution-who-pays-for-that-query — the operational ritual the harness in this chapter enables.
/wiki/compute-storage-separation-for-cost-control — the architectural primitive that makes the compute vertex tunable.