Wall: the metrics layer keeps getting re-invented

It is a Monday status meeting at Swiggy. The CFO's deck shows 4.2 crore weekly active users; the growth team's dashboard shows 4.6 crore; the marketing team, who pulls from the same warehouse with their own SQL, reports 3.9 crore. Three teams, one warehouse, one term — "weekly active user" — three answers off by 18% from each other. Nobody has lied. Each team's SQL is correct against its own definition. The fight is not about data quality; it is about whose definition of "active" is the right one. By Wednesday, an engineering manager has written a Confluence doc titled "Single Source of Truth for WAU". By Friday, three teams have politely ignored it because their dashboards still need to ship. Six months later, someone proposes adopting Cube, or LookML, or MetricFlow, and the cycle starts over.

The metrics layer is the seam between SQL and business meaning. Warehouses store rows; dashboards show numbers; the layer in between, where "weekly active user" gets defined exactly once, is where every data org spends a decade of bad blood. Every generation of tooling — Looker, dbt metrics, Cube, MetricFlow — is a fresh attempt at the same problem: make the definition the contract, not the SQL.

Why this is a wall, not a chapter

The previous twelve builds have been mechanical. Crash recovery is mechanical. Idempotency is mechanical. Compaction, watermarks, eq-deletes, query engines — every one of those is a problem you can write a unit test for. You either get the right rows or you don't. The mechanism is verifiable.

The metrics layer is the first build in this curriculum where the mechanism is not the hard part. Computing COUNT(DISTINCT user_id) over a date range is a one-line SQL query that every engine on Earth executes correctly. The problem is that "user", "active", and "the date range" each carry a definition, and the definitions are owned by different humans who disagree. The data engineer's job stops being "make the bytes flow" and starts being "make the meaning hold". That is a different shape of problem, and the standard SWE training does not prepare you for it.

This chapter is the wall. It does not teach you a tool. It teaches you why the tools keep being invented and why none of them have ever fully won.

Twenty years of metrics-layer reinventionA horizontal timeline from 2005 to 2026 showing successive attempts at the metrics layer: BI tool semantic layers (Cognos, MicroStrategy) in 2005, OLAP cubes in 2010, Looker LookML in 2014, dbt metrics in 2021, MetricFlow in 2023, AI-driven semantic layers in 2026. Each label has a one-line summary of what the attempt promised and what shipped. The point is the recurrence — the same shape of problem appears, the tool changes, the problem persists. Every five years, a new fix for the same problem "Define the metric once, use it everywhere" — pitched in 2005, 2010, 2014, 2021, 2023, 2026 2005 2010 2014 2018 2023 2026 BI semantic layers Cognos, MicroStrategy XML metric defs, vendor lock-in, teams bypass via raw SQL exports OLAP cubes SSAS, Essbase pre-aggregate, dimensions fixed, drill-through breaks when schema drifts LookML (Looker) 2014 define metric in YAML-like DSL, but only Looker UI can read it dbt metrics 2021 YAML in repo, version control, but compile target stayed warehouse MetricFlow / Cube 2023 headless layer, REST/GraphQL API, any BI tool calls, adoption: partial LLM + semantic 2026 "ask in English", model writes SQL, layer constrains the answer space Underneath: same problem, same shape Three teams, one term ("active user"), three SQL definitions, three answers. Each new tool moves the definition file. None of them stop teams from writing their own SQL. The wall is organisational, not technical — the technology is just where the scar tissue lives.
Two decades of attempts at the metrics layer. Each generation moves the definition further upstream — from the BI tool, to the cube, to YAML, to a headless API. None has fully won, because the underlying problem is humans disagreeing on what "active user" means, and no DSL can override an org chart.

What "the metrics layer" actually is

Strip the marketing away and the metrics layer is one thing: a registry of named, parameterised SQL fragments that produce numbers. A row in that registry says, "the metric wau is COUNT(DISTINCT user_id) from fct_user_events where event_ts is within the trailing 7 days, grouped by whatever dimensions the caller passes". That registry is the contract. Every consumer — BI tool, ML pipeline, executive dashboard, internal app — is supposed to ask the registry for the metric, not write its own SQL.

That sounds boring. The reason it isn't boring is that the registry has to compose. The CFO wants WAU by country. The growth team wants WAU by acquisition channel. The product team wants WAU by app version. The same metric, three slicings. If the registry forces you to define wau_by_country, wau_by_channel, wau_by_app_version as separate metrics, you have lost — within six months you'll have 200 metrics that are subtle variants of the same five definitions, and the "single source of truth" goal is dead.

The hard problem is: how do you store a metric definition such that a caller can pass any set of dimensions, time grains, and filters, and the layer compiles the right SQL on demand without you having to enumerate every shape in advance? Every generation of tools has solved this differently:

Every generation moves the definition closer to a single source. None of them removes the social problem: someone in marketing still writes raw SQL because the registry's definition didn't match what they needed and the metric-layer team's queue is two weeks long.

A working metric registry

The cleanest way to feel why this is hard is to build the smallest possible metric layer and then ask it questions.

# tiny_metrics_layer.py — a 100-line metric registry that compiles SQL on demand
# and shows why "compose any dimension" is the hard part.

from dataclasses import dataclass, field
from typing import Optional

@dataclass
class Metric:
    name: str
    expression: str           # the aggregate, e.g. COUNT(DISTINCT user_id)
    table: str                # source table
    time_column: str          # column to apply time grain on
    dimensions: list[str] = field(default_factory=list)   # joinable dims
    filters: list[str] = field(default_factory=list)      # baseline filters

REGISTRY = {
    "wau": Metric(
        name="wau",
        expression="COUNT(DISTINCT user_id)",
        table="fct_user_events",
        time_column="event_ts",
        dimensions=["country", "app_version", "acquisition_channel"],
        filters=["event_type = 'session_start'"],
    ),
    "orders_count": Metric(
        name="orders_count",
        expression="COUNT(*)",
        table="fct_orders",
        time_column="created_at",
        dimensions=["country", "category", "merchant_id"],
        filters=["status = 'completed'"],
    ),
}

def compile_query(metric_name: str, dims: list[str], time_grain: str,
                  start: str, end: str, extra_filter: Optional[str] = None) -> str:
    m = REGISTRY[metric_name]
    for d in dims:
        if d not in m.dimensions:
            raise ValueError(f"{d} not declared as a dimension of {metric_name}")
    grain_expr = {
        "day":   f"DATE_TRUNC('day',   {m.time_column})",
        "week":  f"DATE_TRUNC('week',  {m.time_column})",
        "month": f"DATE_TRUNC('month', {m.time_column})",
    }[time_grain]
    select_dims = ", ".join([grain_expr + " AS bucket"] + dims)
    group_dims = ", ".join(["1"] + [str(i+2) for i in range(len(dims))])
    where = [f"{m.time_column} BETWEEN '{start}' AND '{end}'"] + m.filters
    if extra_filter: where.append(extra_filter)
    sql = (f"SELECT {select_dims}, {m.expression} AS {m.name}\n"
           f"FROM {m.table}\n"
           f"WHERE {' AND '.join(where)}\n"
           f"GROUP BY {group_dims}\n"
           f"ORDER BY 1")
    return sql

# Three teams, three asks, one definition:
print("--- CFO: WAU by country, weekly ---")
print(compile_query("wau", ["country"], "week", "2026-04-01", "2026-04-30"))
print("\n--- Growth: WAU by acquisition channel, India only, daily ---")
print(compile_query("wau", ["acquisition_channel"], "day",
                    "2026-04-01", "2026-04-30", extra_filter="country = 'IN'"))
print("\n--- Product: WAU by app version, weekly ---")
print(compile_query("wau", ["app_version"], "week", "2026-04-01", "2026-04-30"))
# Sample output:
--- CFO: WAU by country, weekly ---
SELECT DATE_TRUNC('week', event_ts) AS bucket, country, COUNT(DISTINCT user_id) AS wau
FROM fct_user_events
WHERE event_ts BETWEEN '2026-04-01' AND '2026-04-30' AND event_type = 'session_start'
GROUP BY 1, 2
ORDER BY 1

--- Growth: WAU by acquisition channel, India only, daily ---
SELECT DATE_TRUNC('day', event_ts) AS bucket, acquisition_channel, COUNT(DISTINCT user_id) AS wau
FROM fct_user_events
WHERE event_ts BETWEEN '2026-04-01' AND '2026-04-30' AND event_type = 'session_start' AND country = 'IN'
GROUP BY 1, 2
ORDER BY 1

--- Product: WAU by app version, weekly ---
SELECT DATE_TRUNC('week', event_ts) AS bucket, app_version, COUNT(DISTINCT user_id) AS wau
FROM fct_user_events
WHERE event_ts BETWEEN '2026-04-01' AND '2026-04-30' AND event_type = 'session_start'
GROUP BY 1, 2
ORDER BY 1

Walk the load-bearing pieces:

Three teams, three SQL queries, one definition of "what counts as an active user". That is the irreducible value the metrics layer delivers — and also the irreducible reason it keeps getting re-invented: if any team can write SELECT COUNT(DISTINCT user_id) FROM fct_user_events WHERE event_type = 'login' and bypass the layer, the contract dies. The job of every metrics-layer tool from Looker to MetricFlow is to make the layer easier to use than bypassing it. That is harder than it sounds.

The five forces that keep breaking it

Every metrics-layer initiative I've seen — at Razorpay, Swiggy, Flipkart, Cred — runs into the same five forces. Knowing them is half the battle.

Force 1: the analyst's deadline. The CMO needs the WAU number for a board meeting tomorrow. The metrics layer's definition takes 4 days to update. The analyst writes raw SQL in the warehouse, ships the number, moves on. A year later that raw SQL has become the de-facto definition because it lives in the slide deck the board now expects. The layer has been routed around.

Force 2: dimension explosion. "WAU by country" sounds like one metric. In practice the product team needs WAU by (country, app_version, acquisition_channel, signup_cohort, plan_tier) cross-product — 5 dimensions, ~6 values each, that is 7,776 cells in one query. The layer has to compose, but composition exposes joins the metric author didn't think about (does WAU by plan_tier need a join to dim_subscription? what about historical plan_tier vs current?). Every join is a new opportunity for the answer to differ from the analyst's hand-rolled SQL.

Force 3: time-travel and "as-of" semantics. "WAU for the week of 2025-12-01" — does that mean users active that week using the user attributes (country, app version) as they were that week, or as they are today? Both answers are legitimate. Most metrics layers default to "today", which silently changes historical numbers as users move countries or upgrade their app. SCD-aware time-travel is rare, hard, and absent from most v1 implementations.

Force 4: governance vs speed. A locked-down metrics layer with a 4-step review process for each new metric will be bypassed within three months. A wide-open layer where any analyst can register any metric becomes a metric graveyard with 800 entries, of which 120 are duplicates. Every shop oscillates between the two extremes for years.

Force 5: the BI-tool fragmentation. Tableau, Looker, Power BI, internal Streamlit apps, Slack bots, ML feature pipelines — each is a metric consumer, each has its own preferred way to consume metrics. The "headless BI" generation (MetricFlow, Cube) tries to give all of them a uniform API. Whether the BI tools actually use the API rather than going around it is, again, a social problem.

Five forces that bypass the metrics layerA central box labelled "Metrics layer (single source of truth)" with five arrows around it pointing back into the warehouse, each labelled with a force: deadline pressure, dimension explosion, as-of semantics, governance lag, BI-tool fragmentation. The diagram shows that the metrics layer sits between the warehouse and the consumers, and each force is a route by which a consumer bypasses the layer. Why definitions fragment in practice Metrics layer single source of truth ("meant to be") Force 1: deadline pressure "board meeting tomorrow" Force 2: dimension explosion "slice by 5 dims at once" Force 3: as-of semantics "WAU as of last quarter" Force 4: governance lag "4-day review for new metric" Force 5: BI fragmentation "Tableau bypasses the API" Each force routes around the layer; the layer's job is to make using it cheaper than bypassing it.
The metrics layer is theoretically the single source of truth, but five forces continuously route consumers around it. Every successful implementation is a specific answer to "how do I make my layer easier than bypassing it" — not a tool, but an operational practice.

What changes in 2026: the LLM front-door

The interesting thing happening this year is that LLMs are giving the metrics layer a second chance. The pitch: an analyst types "show me WAU for India this quarter by app version" into a chat box; the model translates that into a compile_query("wau", ["app_version"], "week", "2026-01-01", "2026-03-31", extra_filter="country='IN'") call; the layer compiles SQL; the warehouse runs it; the answer comes back. The English question never touches raw SQL.

This is not a new architecture — it is a new adoption surface. Force 1 (deadline pressure) collapses if asking the layer is faster than writing raw SQL, because the model removes the activation energy. The 2026 question, which /wiki/semantic-layer-llms-the-new-interface digs into, is whether LLMs can stay inside the layer's contract — and what guardrails are needed when the model wants to "improvise" a metric definition that doesn't exist yet.

The companies betting hard on this in India — Razorpay's analytics team, Cred's growth team, Swiggy's central data org — all converged on the same 2026 setup: dbt + a metrics-layer tool (MetricFlow or Cube) + a thin LLM front-door that is restricted to the registered metric set. Anything outside the registry returns "I don't know that metric, would you like to register it?" and routes to the metrics-layer team. This is the cleanest answer the industry has yet found to the wall, but the jury is still out — early reports say 70% of analyst queries map to existing metrics, 30% still need a new one, and the 30% is where the social problem still bites.

Common confusions

Going deeper

Why "active" is the canonical example

"Active user" is the metric every textbook uses because it has the maximum number of valid definitions. Did the user open the app? Did they take a transactional action? Did they spend more than 30 seconds? Within a week, a month, a 30-day rolling window? Counted by user_id, device_id, cookie_id, all three? Each definition produces a different number; each is internally consistent; none is "wrong". The metrics-layer's job is not to pick the right definition (that is a product/strategy decision) — it is to make the chosen definition explicit, versioned, and inescapable. When a new VP joins and wants to redefine "active", the layer makes that a deliberate amendment with a known-rollout-day, not a silent re-interpretation that breaks year-over-year comparisons.

Razorpay's metric-version-pinning story

In 2024, Razorpay's payments team had a board-level metric called "successful payment rate". The original SQL was SUM(status='success') / COUNT(*) over fct_payments. In Q3 2024, the fraud team added a new status 'risk_blocked' that previously would have been 'failed'. The SQL still ran, but the denominator grew, and the success rate visibly dropped 0.4% overnight — without any actual change in payment success. Three weeks of investigation later, the fix was to add metric-version pinning: every published metric carries a version, and the v1 definition is frozen against the schema as it existed when v1 shipped. Schema additions don't silently change historical numbers. This is the kind of governance the layer is supposed to provide; without it, every schema change is a potential silent metric regression.

The "single source of truth" lie and why it persists

Every metrics-layer initiative is sold with the phrase "single source of truth". After ten years of implementations, the more accurate description is "single registered source of truth" — meaning the layer is one of several sources, and the layer's job is to be the one the org has agreed to cite when there's a fight. Other sources (analyst's hand-rolled SQL, last quarter's slide deck, a vendor's dashboard) continue to exist. The wins compound slowly: in year one, 30% of dashboards use the layer; year three, 70%; year five, the analyst's hand-rolled SQL is the exception. The lie is calling it a switch-flip; the truth is calling it a multi-year migration where the metrics-layer team's KPI is "% of dashboards on the layer".

Federated metric layers (the Swiggy 2025 case)

Swiggy went through a phase in 2024-25 where each business unit (food, Instamart, Genie, dineout) had its own metrics layer, and the central data team had a fifth one for the CFO. Five layers, five definitions of "active user", one CFO board meeting. The 2025 fix was federated: each unit kept its own layer for unit-internal metrics, but cross-unit metrics (revenue, gross order value, contribution margin) had to be defined in the central layer and the unit layers had to consume from there. Federation worked — but only because the central team had the political mandate to break unit-level dashboards if they violated the federation rule. The technology was the easy part; the mandate was the moat.

Where this leads next

The next chapter, /wiki/the-what-is-an-active-user-problem, walks through the canonical "active user" definition fight in detail — the 12 valid SQL variants, the trade-offs, and how to write a metric definition that holds up to a six-month audit.

/wiki/dbt-as-the-de-facto-transformation-layer and /wiki/metric-definitions-once-queried-many-ways drill into the most-adopted toolset (dbt + MetricFlow). /wiki/lookml-cube-metricflow-the-landscape compares the contenders side by side. /wiki/the-headless-bi-movement zooms out to the architectural shift, and /wiki/semantic-layer-llms-the-new-interface takes the 2026 LLM angle.

Build 13 as a whole is the answer to: "given that the warehouse is correct and the query engines all return the same rows, why do the dashboards still disagree?" The metrics layer is the seam where bytes become meaning, and meaning is where humans fight.

References