Note: Company names, engineers, incidents, numbers, and scaling scenarios in this article are hypothetical — even when they resemble real ones. See the full disclaimer.

Wall: the metrics layer keeps getting re-invented

It is a Monday status meeting at BhojanBox. The CFO's deck shows 4.2 crore weekly active users; the growth team's dashboard shows 4.6 crore; the marketing team, who pulls from the same warehouse with their own SQL, reports 3.9 crore. Three teams, one warehouse, one term — "weekly active user" — three answers off by 18% from each other. Nobody has lied. Each team's SQL is correct against its own definition. The fight is not about data quality; it is about whose definition of "active" is the right one. By Wednesday, an engineering manager has written a Confluence doc titled "Single Source of Truth for WAU". By Friday, three teams have politely ignored it because their dashboards still need to ship. Six months later, someone proposes adopting Cube, or LookML, or MetricFlow, and the cycle starts over.

The metrics layer is the seam between SQL and business meaning. Warehouses store rows; dashboards show numbers; the layer in between, where "weekly active user" gets defined exactly once, is where every data org spends a decade of bad blood. Every generation of tooling — Looker, dbt metrics, Cube, MetricFlow — is a fresh attempt at the same problem: make the definition the contract, not the SQL.

Why this is a wall, not a chapter

The previous twelve builds have been mechanical. Crash recovery is mechanical. Idempotency is mechanical. Compaction, watermarks, eq-deletes, query engines — every one of those is a problem you can write a unit test for. You either get the right rows or you don't. The mechanism is verifiable.

The metrics layer is the first build in this curriculum where the mechanism is not the hard part. Computing COUNT(DISTINCT user_id) over a date range is a one-line SQL query that every engine on Earth executes correctly. The problem is that "user", "active", and "the date range" each carry a definition, and the definitions are owned by different humans who disagree. The data engineer's job stops being "make the bytes flow" and starts being "make the meaning hold". That is a different shape of problem, and the standard SWE training does not prepare you for it.

This chapter is the wall. It does not teach you a tool. It teaches you why the tools keep being invented and why none of them have ever fully won.

Two decades of attempts at the metrics layer. Each generation moves the definition further upstream — from the BI tool, to the cube, to YAML, to a headless API. None has fully won, because the underlying problem is humans disagreeing on what "active user" means, and no DSL can override an org chart.

What "the metrics layer" actually is

Strip the marketing away and the metrics layer is one thing: a registry of named, parameterised SQL fragments that produce numbers. A row in that registry says, "the metric wau is COUNT(DISTINCT user_id) from fct_user_events where event_ts is within the trailing 7 days, grouped by whatever dimensions the caller passes". That registry is the contract. Every consumer — BI tool, ML pipeline, executive dashboard, internal app — is supposed to ask the registry for the metric, not write its own SQL.

That sounds boring. The reason it isn't boring is that the registry has to compose. The CFO wants WAU by country. The growth team wants WAU by acquisition channel. The product team wants WAU by app version. The same metric, three slicings. If the registry forces you to define wau_by_country, wau_by_channel, wau_by_app_version as separate metrics, you have lost — within six months you'll have 200 metrics that are subtle variants of the same five definitions, and the "single source of truth" goal is dead.

The hard problem is: how do you store a metric definition such that a caller can pass any set of dimensions, time grains, and filters, and the layer compiles the right SQL on demand without you having to enumerate every shape in advance? Every generation of tools has solved this differently:

OLAP cubes pre-compute every dimension combination — fast at query time, brittle when dimensions change.
LookML parameterises at the BI tool's compile time — flexible, but only Looker can read LookML.
dbt metrics (v1) stored YAML and compiled to a SQL view per metric — version-controlled, but every dimension combination still meant a new view.
MetricFlow / Cube treat the metric as a function: wau(dims=['country', 'app_version'], filter='country = "IN"') returns a SQL query the warehouse runs, and the layer guarantees the join paths and time-grain semantics are uniform.
LLM-fronted semantic layers (2026) add a natural-language interface on top, where the model translates "show me WAU for India this quarter by app version" into a function call against the same registry.

Every generation moves the definition closer to a single source. None of them removes the social problem: someone in marketing still writes raw SQL because the registry's definition didn't match what they needed and the metric-layer team's queue is two weeks long.

A working metric registry

The cleanest way to feel why this is hard is to build the smallest possible metric layer and then ask it questions.

# tiny_metrics_layer.py — a 100-line metric registry that compiles SQL on demand
# and shows why "compose any dimension" is the hard part.

from dataclasses import dataclass, field
from typing import Optional

@dataclass
class Metric:
    name: str
    expression: str           # the aggregate, e.g. COUNT(DISTINCT user_id)
    table: str                # source table
    time_column: str          # column to apply time grain on
    dimensions: list[str] = field(default_factory=list)   # joinable dims
    filters: list[str] = field(default_factory=list)      # baseline filters

REGISTRY = {
    "wau": Metric(
        name="wau",
        expression="COUNT(DISTINCT user_id)",
        table="fct_user_events",
        time_column="event_ts",
        dimensions=["country", "app_version", "acquisition_channel"],
        filters=["event_type = 'session_start'"],
    ),
    "orders_count": Metric(
        name="orders_count",
        expression="COUNT(*)",
        table="fct_orders",
        time_column="created_at",
        dimensions=["country", "category", "merchant_id"],
        filters=["status = 'completed'"],
    ),
}

def compile_query(metric_name: str, dims: list[str], time_grain: str,
                  start: str, end: str, extra_filter: Optional[str] = None) -> str:
    m = REGISTRY[metric_name]
    for d in dims:
        if d not in m.dimensions:
            raise ValueError(f"{d} not declared as a dimension of {metric_name}")
    grain_expr = {
        "day":   f"DATE_TRUNC('day',   {m.time_column})",
        "week":  f"DATE_TRUNC('week',  {m.time_column})",
        "month": f"DATE_TRUNC('month', {m.time_column})",
    }[time_grain]
    select_dims = ", ".join([grain_expr + " AS bucket"] + dims)
    group_dims = ", ".join(["1"] + [str(i+2) for i in range(len(dims))])
    where = [f"{m.time_column} BETWEEN '{start}' AND '{end}'"] + m.filters
    if extra_filter: where.append(extra_filter)
    sql = (f"SELECT {select_dims}, {m.expression} AS {m.name}\n"
           f"FROM {m.table}\n"
           f"WHERE {' AND '.join(where)}\n"
           f"GROUP BY {group_dims}\n"
           f"ORDER BY 1")
    return sql

# Three teams, three asks, one definition:
print("--- CFO: WAU by country, weekly ---")
print(compile_query("wau", ["country"], "week", "2026-04-01", "2026-04-30"))
print("\n--- Growth: WAU by acquisition channel, India only, daily ---")
print(compile_query("wau", ["acquisition_channel"], "day",
                    "2026-04-01", "2026-04-30", extra_filter="country = 'IN'"))
print("\n--- Product: WAU by app version, weekly ---")
print(compile_query("wau", ["app_version"], "week", "2026-04-01", "2026-04-30"))

# Sample output:
--- CFO: WAU by country, weekly ---
SELECT DATE_TRUNC('week', event_ts) AS bucket, country, COUNT(DISTINCT user_id) AS wau
FROM fct_user_events
WHERE event_ts BETWEEN '2026-04-01' AND '2026-04-30' AND event_type = 'session_start'
GROUP BY 1, 2
ORDER BY 1

--- Growth: WAU by acquisition channel, India only, daily ---
SELECT DATE_TRUNC('day', event_ts) AS bucket, acquisition_channel, COUNT(DISTINCT user_id) AS wau
FROM fct_user_events
WHERE event_ts BETWEEN '2026-04-01' AND '2026-04-30' AND event_type = 'session_start' AND country = 'IN'
GROUP BY 1, 2
ORDER BY 1

--- Product: WAU by app version, weekly ---
SELECT DATE_TRUNC('week', event_ts) AS bucket, app_version, COUNT(DISTINCT user_id) AS wau
FROM fct_user_events
WHERE event_ts BETWEEN '2026-04-01' AND '2026-04-30' AND event_type = 'session_start'
GROUP BY 1, 2
ORDER BY 1

Walk the load-bearing pieces:

Metric.dimensions = ["country", "app_version", "acquisition_channel"] — this is the contract. The metric author declares which dimensions the metric is allowed to be sliced by. Why declare them: if the layer accepts arbitrary column names, a caller can pass dim="user_email" and the SQL compiles, runs, and returns garbage because email isn't a meaningful dimension for "active users". The declaration turns silent garbage into an explicit error at compile time.
m.expression="COUNT(DISTINCT user_id)" — the aggregate is a string fragment, not a function. The caller cannot override it. A team that wants "WAU but counting only paying users" cannot just pass a different expression; they have to register a new metric (wau_paying) or extend the existing one with a parameter. Why this matters: if the expression were caller-supplied, every dashboard would invent its own variant and the whole point of the layer would collapse. The expression must be a registry-owned value.
time_grain mapped to DATE_TRUNC — the layer enforces consistent time-bucket semantics. A naive SQL user would write WHERE event_ts >= NOW() - INTERVAL 7 DAYS, which gives a different answer if the query runs at 11:30 AM versus 11:30 PM. The layer pins it to DATE_TRUNC and explicit start/end dates so the answer is reproducible.
extra_filter="country = 'IN'" — the caller can add filters on declared dimensions, but cannot add filters on undeclared columns. In a real layer this is enforced via a SQL parser; in our toy it is by convention.
raise ValueError for undeclared dim — the only safety net. Without it, the layer reduces to "string concatenation of caller input into SQL" and you have lost the contract entirely.

Three teams, three SQL queries, one definition of "what counts as an active user". That is the irreducible value the metrics layer delivers — and also the irreducible reason it keeps getting re-invented: if any team can write SELECT COUNT(DISTINCT user_id) FROM fct_user_events WHERE event_type = 'login' and bypass the layer, the contract dies. The job of every metrics-layer tool from Looker to MetricFlow is to make the layer easier to use than bypassing it. That is harder than it sounds.

The five forces that keep breaking it

Every metrics-layer initiative I've seen — at PaisaBridge, BhojanBox, BharatBazaar, KreditClub — runs into the same five forces. Knowing them is half the battle.

Force 1: the analyst's deadline. The CMO needs the WAU number for a board meeting tomorrow. The metrics layer's definition takes 4 days to update. The analyst writes raw SQL in the warehouse, ships the number, moves on. A year later that raw SQL has become the de-facto definition because it lives in the slide deck the board now expects. The layer has been routed around.

Force 2: dimension explosion. "WAU by country" sounds like one metric. In practice the product team needs WAU by (country, app_version, acquisition_channel, signup_cohort, plan_tier) cross-product — 5 dimensions, ~6 values each, that is 7,776 cells in one query. The layer has to compose, but composition exposes joins the metric author didn't think about (does WAU by plan_tier need a join to dim_subscription? what about historical plan_tier vs current?). Every join is a new opportunity for the answer to differ from the analyst's hand-rolled SQL.

Force 3: time-travel and "as-of" semantics. "WAU for the week of 2025-12-01" — does that mean users active that week using the user attributes (country, app version) as they were that week, or as they are today? Both answers are legitimate. Most metrics layers default to "today", which silently changes historical numbers as users move countries or upgrade their app. SCD-aware time-travel is rare, hard, and absent from most v1 implementations.

Force 4: governance vs speed. A locked-down metrics layer with a 4-step review process for each new metric will be bypassed within three months. A wide-open layer where any analyst can register any metric becomes a metric graveyard with 800 entries, of which 120 are duplicates. Every shop oscillates between the two extremes for years.

Force 5: the BI-tool fragmentation. DashView, Looker, Power BI, internal Streamlit apps, Slack bots, ML feature pipelines — each is a metric consumer, each has its own preferred way to consume metrics. The "headless BI" generation (MetricFlow, Cube) tries to give all of them a uniform API. Whether the BI tools actually use the API rather than going around it is, again, a social problem.

The metrics layer is theoretically the single source of truth, but five forces continuously route consumers around it. Every successful implementation is a specific answer to "how do I make my layer easier than bypassing it" — not a tool, but an operational practice.

What changes in 2026: the LLM front-door

The interesting thing happening this year is that LLMs are giving the metrics layer a second chance. The pitch: an analyst types "show me WAU for India this quarter by app version" into a chat box; the model translates that into a compile_query("wau", ["app_version"], "week", "2026-01-01", "2026-03-31", extra_filter="country='IN'") call; the layer compiles SQL; the warehouse runs it; the answer comes back. The English question never touches raw SQL.

This is not a new architecture — it is a new adoption surface. Force 1 (deadline pressure) collapses if asking the layer is faster than writing raw SQL, because the model removes the activation energy. The 2026 question, which /wiki/semantic-layer-llms-the-new-interface digs into, is whether LLMs can stay inside the layer's contract — and what guardrails are needed when the model wants to "improvise" a metric definition that doesn't exist yet.

The companies betting hard on this in India — PaisaBridge's analytics team, KreditClub's growth team, BhojanBox's central data org — all converged on the same 2026 setup: dbt + a metrics-layer tool (MetricFlow or Cube) + a thin LLM front-door that is restricted to the registered metric set. Anything outside the registry returns "I don't know that metric, would you like to register it?" and routes to the metrics-layer team. This is the cleanest answer the industry has yet found to the wall, but the jury is still out — early reports say 70% of analyst queries map to existing metrics, 30% still need a new one, and the 30% is where the social problem still bites.

Common confusions

"The metrics layer is just dbt." dbt is a transformation framework that can host a metrics layer (via dbt Semantic Layer / MetricFlow), but the two are distinct. dbt's core job is "compile and run SQL transformations". The metrics layer's job is "answer parameterised metric questions on demand". Many teams use dbt for transformations and a separate tool (Cube, Looker) for metrics, and both are right.
"OLAP cubes are obsolete." SSAS and Essbase are still running in many enterprises in 2026 — banks, insurance, telecom — because the workloads they were built for (fixed dimensional analysis with sub-second response) haven't gone away. What changed is that for cloud-native, fast-moving teams the rigidity of pre-computed cubes outweighs the speed advantage. "Obsolete" is too strong; "displaced for one segment" is more accurate.
"A semantic layer eliminates the need for a metrics layer." Semantic layer is the broader concept (entities, relationships, dimensions, metrics); metrics layer is one piece of it (just the metrics). Tools like LookML and Cube cover both; tools like dbt metrics historically covered only the metrics half and depended on dbt models for the entity layer. The terms are used loosely; check what the tool actually does, not what it calls itself.
"You can solve this in code review." "Just have a code review for every new dashboard SQL" is the proposal every data lead has tried. It works for six weeks. Then dashboards multiply, the reviewer becomes the bottleneck, the reviewer leaves, and you're back to fragmentation. The layer has to encode the contract, not depend on a human enforcing it.
"LLMs make the metrics layer unnecessary because they can write SQL directly." The opposite is true. LLMs writing free-form SQL against a warehouse is a fragmentation accelerator — the model "improvises" a slightly different definition each time, and now you have 50 SQL variants of WAU instead of 5. The metrics layer is more important with LLMs, not less, because it constrains the model's output space to the registered metric set.

Going deeper

Why "active" is the canonical example

"Active user" is the metric every textbook uses because it has the maximum number of valid definitions. Did the user open the app? Did they take a transactional action? Did they spend more than 30 seconds? Within a week, a month, a 30-day rolling window? Counted by user_id, device_id, cookie_id, all three? Each definition produces a different number; each is internally consistent; none is "wrong". The metrics-layer's job is not to pick the right definition (that is a product/strategy decision) — it is to make the chosen definition explicit, versioned, and inescapable. When a new VP joins and wants to redefine "active", the layer makes that a deliberate amendment with a known-rollout-day, not a silent re-interpretation that breaks year-over-year comparisons.

PaisaBridge's metric-version-pinning story

In 2024, PaisaBridge's payments team had a board-level metric called "successful payment rate". The original SQL was SUM(status='success') / COUNT(*) over fct_payments. In Q3 2024, the fraud team added a new status 'risk_blocked' that previously would have been 'failed'. The SQL still ran, but the denominator grew, and the success rate visibly dropped 0.4% overnight — without any actual change in payment success. Three weeks of investigation later, the fix was to add metric-version pinning: every published metric carries a version, and the v1 definition is frozen against the schema as it existed when v1 shipped. Schema additions don't silently change historical numbers. This is the kind of governance the layer is supposed to provide; without it, every schema change is a potential silent metric regression.

The "single source of truth" lie and why it persists

Every metrics-layer initiative is sold with the phrase "single source of truth". After ten years of implementations, the more accurate description is "single registered source of truth" — meaning the layer is one of several sources, and the layer's job is to be the one the org has agreed to cite when there's a fight. Other sources (analyst's hand-rolled SQL, last quarter's slide deck, a vendor's dashboard) continue to exist. The wins compound slowly: in year one, 30% of dashboards use the layer; year three, 70%; year five, the analyst's hand-rolled SQL is the exception. The lie is calling it a switch-flip; the truth is calling it a multi-year migration where the metrics-layer team's KPI is "% of dashboards on the layer".

Federated metric layers (the BhojanBox 2025 case)

BhojanBox went through a phase in 2024-25 where each business unit (food, Instamart, Genie, dineout) had its own metrics layer, and the central data team had a fifth one for the CFO. Five layers, five definitions of "active user", one CFO board meeting. The 2025 fix was federated: each unit kept its own layer for unit-internal metrics, but cross-unit metrics (revenue, gross order value, contribution margin) had to be defined in the central layer and the unit layers had to consume from there. Federation worked — but only because the central team had the political mandate to break unit-level dashboards if they violated the federation rule. The technology was the easy part; the mandate was the moat.

Where this leads next

The next chapter, /wiki/the-what-is-an-active-user-problem, walks through the canonical "active user" definition fight in detail — the 12 valid SQL variants, the trade-offs, and how to write a metric definition that holds up to a six-month audit.

/wiki/dbt-as-the-de-facto-transformation-layer and /wiki/metric-definitions-once-queried-many-ways drill into the most-adopted toolset (dbt + MetricFlow). /wiki/lookml-cube-metricflow-the-landscape compares the contenders side by side. /wiki/the-headless-bi-movement zooms out to the architectural shift, and /wiki/semantic-layer-llms-the-new-interface takes the 2026 LLM angle.

Build 13 as a whole is the answer to: "given that the warehouse is correct and the query engines all return the same rows, why do the dashboards still disagree?" The metrics layer is the seam where bytes become meaning, and meaning is where humans fight.

References

Drew Banin: The Modern Metrics Stack — the dbt founder's framing of why dbt invested in metrics and the lineage from LookML.
Looker LookML reference — the canonical example of a BI-tool-embedded semantic layer.
MetricFlow Documentation — the dbt-acquired headless metrics engine, with the most up-to-date theory of metric composition.
Cube.dev — The Headless BI Architecture — Cube's framing of why the metrics layer should be a separate API tier, not embedded in the BI tool.
Benn Stancil: The Metrics Layer Will Eat the Modern Data Stack — the 2021 essay that crystallised "metrics layer" as a category.
Vistron SSAS Tabular vs Multidimensional — the 1990s-era OLAP cube documentation, useful as a historical reference for what cubes did and didn't solve.
/wiki/query-engines-on-top-trino-spark-dremio-duckdb — the previous chapter; the warehouse layer the metrics layer sits on top of.
/wiki/the-what-is-an-active-user-problem — the next chapter; the canonical fight that motivates every metrics-layer initiative.