Metric definitions: once, queried many ways

In March 2026, a finance analyst at Cred opened the rewards dashboard and saw GMV for last Tuesday at ₹14.2 crore. The CEO was looking at a deck the same morning that said ₹15.6 crore. The growth team's notebook said ₹13.9 crore. Three numbers, same Tuesday, same company, same source warehouse. The investigation took two days. The root cause was that "GMV" had been hand-written as SQL in three places: the dashboard included reversed transactions until they were settled, the deck excluded them on the day of reversal, and the notebook excluded them on the day of the original transaction. None of the three was wrong. None of them agreed. This is the problem a semantic layer exists to make impossible.

A metric definition is a versioned recipe — a measure expression, the table it lives on, and the dimensions you are allowed to slice it by. The semantic layer compiles this recipe plus a query intent ("GMV by gst_state for last 7 days") into the SQL needed for that exact slice, on demand. The point is not less SQL; it is one source of truth for what the number means.

What "defined once" actually means

A hand-written SQL query encodes three things into one string: the metric (sum of amount_paise), the grain (rows from fct_payments, filtered to status='captured'), and the slice (group by gst_state, filter to created_at >= today - 7). Change any one of those and you write a new query. Copy-paste the query to a new dashboard, and now changing the metric definition means editing the same logic in N places — and you will miss one. That missed dashboard is the ₹14.2 crore vs ₹15.6 crore divergence.

A metric definition splits the three apart. You write the metric once:

metrics:
  - name: gmv
    description: "Gross merchandise value, in INR. Captured payments only; reversals excluded on the date of reversal."
    type: simple
    type_params:
      measure: amount_inr
    label: "GMV"

And the grain once, on the underlying semantic model:

semantic_models:
  - name: payments
    model: ref('fct_payments')
    entities:
      - name: payment_id
        type: primary
      - name: merchant_id
        type: foreign
    measures:
      - name: amount_inr
        agg: sum
        expr: "CASE WHEN status = 'captured' THEN amount_paise / 100.0 ELSE 0 END"
    dimensions:
      - name: created_at
        type: time
        type_params: {time_granularity: day}
      - name: gst_state
        type: categorical

The slice — "by gst_state, last 7 days" — is supplied at query time by the dashboard, the notebook, the LLM agent, the Slack bot, anybody. The semantic layer compiles the metric + slice into SQL on the fly. Every consumer asks the same definition; only the slice differs. This is the architectural shift.

From SQL-everywhere to definition-onceTwo side-by-side architectures. Left: three BI tools each holding their own copy of GMV SQL, with arrows directly to the warehouse. Three different numbers come out. Right: three BI tools sending intents (GMV by state, last 7 days) to a semantic layer, which holds one metric definition and compiles SQL to the warehouse. One consistent number comes out. Before: SQL copy-pasted everywhere After: one definition, many queries Dashboard Deck (Sheets) Notebook SQL v1 ₹14.2 cr SQL v2 ₹15.6 cr SQL v3 ₹13.9 cr Warehouse (raw fct_payments) Three queries hit it; three answers leave it. Three numbers, same Tuesday. Dashboard Deck Notebook { metric: gmv, by: state, last 7d } Semantic layer one gmv definition · compiles SQL per intent enforces grain, joins, dimensions Warehouse (raw fct_payments) N queries, all derived from one definition. One number per Tuesday.
The shift is not "less SQL" — the warehouse still gets SQL. The shift is whether the SQL is authored by N humans or generated from one definition. Authored SQL drifts; generated SQL inherits whatever the central recipe says today.

Why the split is load-bearing: the definition (what GMV means) changes rarely. The slice (which dashboard, which date range, which dimension) changes constantly. If you author SQL, every slice change forces you to re-encode the definition; if you split, the definition is referenced, not retyped.

The seven fields a metric definition has to carry

Every serious semantic layer — MetricFlow, Cube, LookML, Malloy — encodes the same seven fields. The names and YAML shape differ; the fields don't. Knowing the fields is how you read any of them.

The seven fields of a metric definitionA grid of seven labelled cards explaining each field of a metric definition: name, semantic model (grain), measure expression, allowed dimensions, filters, time granularity, and version. Each card has a one-line description. A metric definition encodes seven fields 1. Name gmv, wau, p99_capture_latency stable identifier; never reused 2. Semantic model fct_payments, fct_sessions the grain — one row = one of what? 3. Measure expression SUM(amount_paise / 100) how to aggregate the rows 4. Allowed dimensions gst_state, merchant_tier, day columns the consumer can slice by 5. Filters status='captured' always-applied predicate 6. Time granularity day, week, month — pick one finest legal slice; coarser is free 7. Version v1 → v2 when meaning changes old consumers see the old number
Field 7 — the version — is what most teams forget. Without versioning, "we updated the GMV definition" silently rewrites every dashboard's history. With versioning, gmv_v1 keeps producing the old number and consumers migrate to gmv_v2 deliberately.

The version field deserves its own paragraph. When the finance team at Cred decided that reversed transactions should now be excluded on the date of the original transaction (not the date of reversal), they did not edit gmv — they shipped gmv_v2 and left gmv_v1 running. The board deck used v1 for the year-over-year chart so the prior years didn't suddenly shift; the daily ops dashboard switched to v2 the next morning. Why versioning beats updating: a metric definition change is a cliff for every chart that depends on it. If you don't version, you silently rewrite history. If you do, you let consumers cross the cliff one at a time.

Worked example: compiling intent into SQL

Below is a tiny semantic-layer compiler — 70 lines of Python that takes a metric definition plus a query intent and produces the SQL. It is not MetricFlow. It is the smallest thing that demonstrates the pattern, so you can see what the abstraction is actually doing.

# tiny_metric_compiler.py — compile (metric definition + slice) → SQL.
# This is what MetricFlow / Cube / LookML do, modulo joins and caching.

from dataclasses import dataclass, field
from typing import Optional

@dataclass
class SemanticModel:
    name: str
    table: str
    time_dim: str
    dims: list[str]
    measures: dict[str, str]   # measure_name -> SQL expression
    always_filter: Optional[str] = None

@dataclass
class Metric:
    name: str
    version: str
    semantic_model: str
    measure: str               # which measure of the semantic model
    extra_filter: Optional[str] = None
    description: str = ""

@dataclass
class QueryIntent:
    metric: str
    by: list[str] = field(default_factory=list)   # dimensions to group by
    where: Optional[str] = None                   # ad-hoc filter
    time_grain: str = "day"
    last_n_days: Optional[int] = None

REGISTRY_MODELS = {
  "payments": SemanticModel(
    name="payments", table="analytics.fct_payments",
    time_dim="created_at",
    dims=["gst_state", "merchant_tier", "payment_method"],
    measures={"amount_inr": "SUM(amount_paise / 100.0)",
              "txn_count": "COUNT(*)"},
    always_filter="status = 'captured'",
  ),
}
REGISTRY_METRICS = {
  "gmv_v1": Metric("gmv", "v1", "payments", "amount_inr",
                   description="Captured GMV, INR. Reversals excluded on date of reversal."),
  "gmv_v2": Metric("gmv", "v2", "payments", "amount_inr",
                   extra_filter="reversed_at IS NULL",
                   description="Captured GMV, INR. Reversals excluded on date of original txn."),
}

GRAIN_TRUNC = {"day": "DATE", "week": "DATE_TRUNC('week',", "month": "DATE_TRUNC('month',"}

def compile_intent(intent: QueryIntent) -> str:
    m = REGISTRY_METRICS[intent.metric]
    sm = REGISTRY_MODELS[m.semantic_model]
    illegal = [d for d in intent.by if d not in sm.dims]
    if illegal: raise ValueError(f"dimensions not allowed: {illegal}")
    select_dims = []
    if intent.time_grain == "day":
        select_dims.append(f"DATE({sm.time_dim}) AS day")
    else:
        trunc = GRAIN_TRUNC[intent.time_grain]
        select_dims.append(f"{trunc} {sm.time_dim}) AS {intent.time_grain}")
    select_dims.extend(intent.by)
    select_clause = ",\n  ".join(select_dims + [f"{sm.measures[m.measure]} AS {m.name}"])
    where = []
    if sm.always_filter: where.append(sm.always_filter)
    if m.extra_filter:   where.append(m.extra_filter)
    if intent.last_n_days is not None:
        where.append(f"{sm.time_dim} >= CURRENT_DATE - INTERVAL '{intent.last_n_days} days'")
    if intent.where: where.append(intent.where)
    where_clause = " AND ".join(where) if where else "TRUE"
    group_cols = list(range(1, 1 + len(select_dims)))
    return (f"SELECT\n  {select_clause}\nFROM {sm.table}\n"
            f"WHERE {where_clause}\nGROUP BY {', '.join(map(str, group_cols))}\n"
            f"ORDER BY {', '.join(map(str, group_cols))}")

if __name__ == "__main__":
    intent = QueryIntent(metric="gmv_v2", by=["gst_state"], time_grain="day", last_n_days=7)
    print(compile_intent(intent))
    print("\n---\n")
    intent2 = QueryIntent(metric="gmv_v1", by=["merchant_tier"], time_grain="month", last_n_days=90)
    print(compile_intent(intent2))
# Sample run:
SELECT
  DATE(created_at) AS day,
  gst_state,
  SUM(amount_paise / 100.0) AS gmv
FROM analytics.fct_payments
WHERE status = 'captured' AND reversed_at IS NULL AND created_at >= CURRENT_DATE - INTERVAL '7 days'
GROUP BY 1, 2
ORDER BY 1, 2

---

SELECT
  DATE_TRUNC('month', created_at) AS month,
  merchant_tier,
  SUM(amount_paise / 100.0) AS gmv
FROM analytics.fct_payments
WHERE status = 'captured' AND created_at >= CURRENT_DATE - INTERVAL '90 days'
GROUP BY 1, 2
ORDER BY 1, 2

Walk through the load-bearing parts:

This is 70 lines. MetricFlow is a few hundred thousand. The extra code handles join planning across multiple semantic models, ratio metrics, cumulative metrics, time-spine joins for "weeks with no transactions still get a 0 row", and warehouse-specific dialect emission. The kernel — definition + intent → SQL — is what those 70 lines show.

How the seven fields appear in three real systems

Field dbt MetricFlow Cube LookML
Name metrics: - name: gmv cube('Payments').measures.gmv measure: gmv
Semantic model semantic_models: block cube('Payments') view: payments
Measure type_params.measure → measure on semantic model sql: ${amount_paise} / 100, type: sum sql: ${TABLE}.amount_paise/100, type: sum
Dimensions dimensions: on semantic model dimensions: on cube dimension: on view
Filters filter: on metric, where: at query sql_filter on cube always_filter on Explore
Time granularity time_granularity on the time dim granularity on time dim timeframes: on dimension_group
Version by convention (gmv_v2 as a separate metric) by convention (PaymentsV2 cube) LookML refinement (+view)

The shapes differ, the fields are the same. When you read a Cube file or a LookML view for the first time, mentally locate each of the seven fields and the file becomes legible immediately.

Common confusions

Going deeper

The grain mismatch that breaks every junior's first metric

A common first-week bug: define a "revenue per active user" metric naively as SUM(amount_inr) / COUNT(DISTINCT user_id) over fct_payments. The numerator is correct. The denominator counts only users who paid — not users who were active. The actual active_user lives on fct_sessions, not fct_payments. The two tables have different grains. A semantic layer with two semantic models (payments and sessions) and a ratio metric defined as gmv / wau forces the compiler to do a cross-grain join (or symmetric aggregate) properly. Without the layer, the analyst writes the wrong join and ships the wrong number for two months. Razorpay's analytics team saw this pattern repeatedly in 2023 reviews, which is part of why they moved to MetricFlow in 2024.

Ratio metrics, cumulative metrics, and why the SQL gets weird

A ratio metric (gmv_per_active_user) is not just "metric A over metric B" — the two metrics may have different filters, different time grains, different semantic models. The compiler has to emit two CTEs, one per side, then join them on the dimensions the user asked for. A cumulative metric (mtd_gmv, month-to-date GMV) requires a time-spine join: the warehouse must produce a 0 row for any day with no payments so the running sum doesn't have gaps. MetricFlow ships a time-spine model precisely to make this possible. These are the cases where hand-written SQL gets long and where the semantic layer's SQL emission is less obvious — but also where the value is highest, because these are the metrics juniors get wrong.

What versioning actually looks like in production

A real versioning workflow at a fintech: PR adds gmv_v2, leaves gmv_v1 untouched. CI compares the compiled SQL of v1 and v2 on the last 30 days of data and posts the diff to the PR. Stakeholders eyeball the diff (Finance, Growth, Ops). Once approved, a deprecation timeline is set: dashboards switch to v2 over six weeks, with an email a week before each switch. After the cutover, v1 stays in the registry — frozen, queryable, never removed — so historical decks continue to render the old number. The registry grows over time; storage is free, and the audit trail of "what did GMV mean in Q3 2024" is invaluable when a finance audit or a regulator asks. Cred and Razorpay both run roughly this workflow as of 2026.

Pre-aggregation: where the semantic layer earns its keep on cost

Compiling intent → SQL on every dashboard load is fine for a 100-row mart. For Flipkart's Big Billion Days dashboard hitting fct_orders (2.1 billion rows for 2025), every dashboard view recomputing the same SUM(amount_inr) would melt the warehouse. The semantic layer's pre-aggregation feature (Cube calls it "rollups", MetricFlow uses dbt incremental materializations, LookML uses aggregate_table) lets the compiler answer most queries from a small daily-rollup table and only fall back to the raw fact table for unusual slices. The pre-agg is invisible to the consumer — they ask for gmv by state last 7 days and get an answer in 80ms instead of 8 seconds. This is the half of the semantic-layer story that pays for itself in warehouse credits within a quarter.

Where this leads next

The substrate this metric layer sits on is dbt — see /wiki/dbt-as-the-de-facto-transformation-layer. The "what is an active user" definition problem that motivates this layer is unpacked in /wiki/the-what-is-an-active-user-problem. The cross-tool comparison — MetricFlow vs Cube vs LookML on the seven fields — is the next chapter, /wiki/lookml-cube-metricflow-the-landscape.

After that, Build 13 turns to consumption: how the metrics layer powers headless BI, LLM agents that ask metric questions, and the "metric API" pattern where Slack bots and notebooks pull the same number through the same compiler. The thread of this build is that the metric is a contract, the consumer surface is interchangeable.

References