Metric definitions: once, queried many ways
In March 2026, a finance analyst at Cred opened the rewards dashboard and saw GMV for last Tuesday at ₹14.2 crore. The CEO was looking at a deck the same morning that said ₹15.6 crore. The growth team's notebook said ₹13.9 crore. Three numbers, same Tuesday, same company, same source warehouse. The investigation took two days. The root cause was that "GMV" had been hand-written as SQL in three places: the dashboard included reversed transactions until they were settled, the deck excluded them on the day of reversal, and the notebook excluded them on the day of the original transaction. None of the three was wrong. None of them agreed. This is the problem a semantic layer exists to make impossible.
A metric definition is a versioned recipe — a measure expression, the table it lives on, and the dimensions you are allowed to slice it by. The semantic layer compiles this recipe plus a query intent ("GMV by gst_state for last 7 days") into the SQL needed for that exact slice, on demand. The point is not less SQL; it is one source of truth for what the number means.
What "defined once" actually means
A hand-written SQL query encodes three things into one string: the metric (sum of amount_paise), the grain (rows from fct_payments, filtered to status='captured'), and the slice (group by gst_state, filter to created_at >= today - 7). Change any one of those and you write a new query. Copy-paste the query to a new dashboard, and now changing the metric definition means editing the same logic in N places — and you will miss one. That missed dashboard is the ₹14.2 crore vs ₹15.6 crore divergence.
A metric definition splits the three apart. You write the metric once:
metrics:
- name: gmv
description: "Gross merchandise value, in INR. Captured payments only; reversals excluded on the date of reversal."
type: simple
type_params:
measure: amount_inr
label: "GMV"
And the grain once, on the underlying semantic model:
semantic_models:
- name: payments
model: ref('fct_payments')
entities:
- name: payment_id
type: primary
- name: merchant_id
type: foreign
measures:
- name: amount_inr
agg: sum
expr: "CASE WHEN status = 'captured' THEN amount_paise / 100.0 ELSE 0 END"
dimensions:
- name: created_at
type: time
type_params: {time_granularity: day}
- name: gst_state
type: categorical
The slice — "by gst_state, last 7 days" — is supplied at query time by the dashboard, the notebook, the LLM agent, the Slack bot, anybody. The semantic layer compiles the metric + slice into SQL on the fly. Every consumer asks the same definition; only the slice differs. This is the architectural shift.
Why the split is load-bearing: the definition (what GMV means) changes rarely. The slice (which dashboard, which date range, which dimension) changes constantly. If you author SQL, every slice change forces you to re-encode the definition; if you split, the definition is referenced, not retyped.
The seven fields a metric definition has to carry
Every serious semantic layer — MetricFlow, Cube, LookML, Malloy — encodes the same seven fields. The names and YAML shape differ; the fields don't. Knowing the fields is how you read any of them.
The version field deserves its own paragraph. When the finance team at Cred decided that reversed transactions should now be excluded on the date of the original transaction (not the date of reversal), they did not edit gmv — they shipped gmv_v2 and left gmv_v1 running. The board deck used v1 for the year-over-year chart so the prior years didn't suddenly shift; the daily ops dashboard switched to v2 the next morning. Why versioning beats updating: a metric definition change is a cliff for every chart that depends on it. If you don't version, you silently rewrite history. If you do, you let consumers cross the cliff one at a time.
Worked example: compiling intent into SQL
Below is a tiny semantic-layer compiler — 70 lines of Python that takes a metric definition plus a query intent and produces the SQL. It is not MetricFlow. It is the smallest thing that demonstrates the pattern, so you can see what the abstraction is actually doing.
# tiny_metric_compiler.py — compile (metric definition + slice) → SQL.
# This is what MetricFlow / Cube / LookML do, modulo joins and caching.
from dataclasses import dataclass, field
from typing import Optional
@dataclass
class SemanticModel:
name: str
table: str
time_dim: str
dims: list[str]
measures: dict[str, str] # measure_name -> SQL expression
always_filter: Optional[str] = None
@dataclass
class Metric:
name: str
version: str
semantic_model: str
measure: str # which measure of the semantic model
extra_filter: Optional[str] = None
description: str = ""
@dataclass
class QueryIntent:
metric: str
by: list[str] = field(default_factory=list) # dimensions to group by
where: Optional[str] = None # ad-hoc filter
time_grain: str = "day"
last_n_days: Optional[int] = None
REGISTRY_MODELS = {
"payments": SemanticModel(
name="payments", table="analytics.fct_payments",
time_dim="created_at",
dims=["gst_state", "merchant_tier", "payment_method"],
measures={"amount_inr": "SUM(amount_paise / 100.0)",
"txn_count": "COUNT(*)"},
always_filter="status = 'captured'",
),
}
REGISTRY_METRICS = {
"gmv_v1": Metric("gmv", "v1", "payments", "amount_inr",
description="Captured GMV, INR. Reversals excluded on date of reversal."),
"gmv_v2": Metric("gmv", "v2", "payments", "amount_inr",
extra_filter="reversed_at IS NULL",
description="Captured GMV, INR. Reversals excluded on date of original txn."),
}
GRAIN_TRUNC = {"day": "DATE", "week": "DATE_TRUNC('week',", "month": "DATE_TRUNC('month',"}
def compile_intent(intent: QueryIntent) -> str:
m = REGISTRY_METRICS[intent.metric]
sm = REGISTRY_MODELS[m.semantic_model]
illegal = [d for d in intent.by if d not in sm.dims]
if illegal: raise ValueError(f"dimensions not allowed: {illegal}")
select_dims = []
if intent.time_grain == "day":
select_dims.append(f"DATE({sm.time_dim}) AS day")
else:
trunc = GRAIN_TRUNC[intent.time_grain]
select_dims.append(f"{trunc} {sm.time_dim}) AS {intent.time_grain}")
select_dims.extend(intent.by)
select_clause = ",\n ".join(select_dims + [f"{sm.measures[m.measure]} AS {m.name}"])
where = []
if sm.always_filter: where.append(sm.always_filter)
if m.extra_filter: where.append(m.extra_filter)
if intent.last_n_days is not None:
where.append(f"{sm.time_dim} >= CURRENT_DATE - INTERVAL '{intent.last_n_days} days'")
if intent.where: where.append(intent.where)
where_clause = " AND ".join(where) if where else "TRUE"
group_cols = list(range(1, 1 + len(select_dims)))
return (f"SELECT\n {select_clause}\nFROM {sm.table}\n"
f"WHERE {where_clause}\nGROUP BY {', '.join(map(str, group_cols))}\n"
f"ORDER BY {', '.join(map(str, group_cols))}")
if __name__ == "__main__":
intent = QueryIntent(metric="gmv_v2", by=["gst_state"], time_grain="day", last_n_days=7)
print(compile_intent(intent))
print("\n---\n")
intent2 = QueryIntent(metric="gmv_v1", by=["merchant_tier"], time_grain="month", last_n_days=90)
print(compile_intent(intent2))
# Sample run:
SELECT
DATE(created_at) AS day,
gst_state,
SUM(amount_paise / 100.0) AS gmv
FROM analytics.fct_payments
WHERE status = 'captured' AND reversed_at IS NULL AND created_at >= CURRENT_DATE - INTERVAL '7 days'
GROUP BY 1, 2
ORDER BY 1, 2
---
SELECT
DATE_TRUNC('month', created_at) AS month,
merchant_tier,
SUM(amount_paise / 100.0) AS gmv
FROM analytics.fct_payments
WHERE status = 'captured' AND created_at >= CURRENT_DATE - INTERVAL '90 days'
GROUP BY 1, 2
ORDER BY 1, 2
Walk through the load-bearing parts:
REGISTRY_MODELS["payments"].always_filter = "status = 'captured'"— this filter lives on the semantic model, not the metric. It is applied to every metric that uses this model. The reader cannot forget to add it; the compiler always emits it. This is how the semantic layer stops the "I forgot the WHERE clause" class of bug.illegal = [d for d in intent.by if d not in sm.dims]— if a consumer asks forby=["random_column"], the compiler refuses. Compare to hand-written SQL where you can group by anything in the table, including PII columns nobody is meant to see. The dimension allow-list is the governance hook. Why this is more than a feature: it makes the dimension list a contract, not a convention. PII columns can be omitted fromdims; consumers cannot accidentally surface them.gmv_v1vsgmv_v2differ only inextra_filter— same name field (gmv), same model, same measure expression, differentextra_filter. The compiler treats them as separate metrics. v1 keeps producing the original number; v2 produces the new one. Consumers migrate by swapping the metric reference, not by editing SQL. Why the version is a separate metric, not a column on Metric: a version bump can change anything — the measure, the model, the filter. Treating each version as its own metric makes "what changed" trivially diffable.time_grainbecomesDATE()orDATE_TRUNC(...)— the same metric definition serves day/week/month dashboards. The grain is supplied per query, but the underlying metric does not need to be redefined for each grain. Real semantic layers go further (they pre-compute_daily,_weeklyrollups for performance), but the API stays the same.
This is 70 lines. MetricFlow is a few hundred thousand. The extra code handles join planning across multiple semantic models, ratio metrics, cumulative metrics, time-spine joins for "weeks with no transactions still get a 0 row", and warehouse-specific dialect emission. The kernel — definition + intent → SQL — is what those 70 lines show.
How the seven fields appear in three real systems
| Field | dbt MetricFlow | Cube | LookML |
|---|---|---|---|
| Name | metrics: - name: gmv |
cube('Payments').measures.gmv |
measure: gmv |
| Semantic model | semantic_models: block |
cube('Payments') |
view: payments |
| Measure | type_params.measure → measure on semantic model |
sql: ${amount_paise} / 100, type: sum |
sql: ${TABLE}.amount_paise/100, type: sum |
| Dimensions | dimensions: on semantic model |
dimensions: on cube |
dimension: on view |
| Filters | filter: on metric, where: at query |
sql_filter on cube |
always_filter on Explore |
| Time granularity | time_granularity on the time dim |
granularity on time dim |
timeframes: on dimension_group |
| Version | by convention (gmv_v2 as a separate metric) |
by convention (PaymentsV2 cube) |
LookML refinement (+view) |
The shapes differ, the fields are the same. When you read a Cube file or a LookML view for the first time, mentally locate each of the seven fields and the file becomes legible immediately.
Common confusions
- "A metric is the same as a column." A column is a measure expression at best —
amount_paise / 100. A metric is the column plus the grain plus the dimension list plus the filter plus the version. Treating the column as the metric is what makes "GMV" mean four different things in four dashboards. The semantic layer exists exactly to bundle the four together. - "The semantic layer is the same as a BI tool." Looker is a BI tool that has LookML inside it; Cube is a semantic layer that BI tools sit on top of; MetricFlow is a semantic layer with no BI of its own. The semantic layer is the metric registry plus a query compiler. The BI tool is the chart-building UI on top. They overlap historically (Looker built both) but they are different jobs.
- "Metrics live in the warehouse." Metrics live in version-controlled YAML alongside dbt models. The warehouse holds rows; the semantic layer holds the recipe for how to aggregate those rows. If your "metric layer" is a Snowflake VIEW, you are halfway there — but views can't enforce dimension allow-lists, can't version cleanly, and can't compile different SQL per grain. They are a rough draft of a semantic layer, not a substitute.
- "Versioning a metric means versioning the SQL." Versioning means treating each definition change as a new metric (
gmv_v1,gmv_v2). The SQL the compiler emits will differ — but you don't ship "edit gmv to mean something new", you ship "add gmv_v2, deprecate gmv_v1 over six weeks". Editing in place is what causes the silent-history-rewrite bug. - "Ad-hoc analysts will rebel against this." The semantic layer does not block ad-hoc SQL. It produces the canonical answer for the metrics that matter — GMV, WAU, p99 latency, churn. Analysts still write raw SQL for one-off investigations; they just don't redefine GMV every time. The teams that succeed at adopting MetricFlow keep the analyst free to explore and only require the semantic-layer path for metrics that show up on multiple dashboards.
Going deeper
The grain mismatch that breaks every junior's first metric
A common first-week bug: define a "revenue per active user" metric naively as SUM(amount_inr) / COUNT(DISTINCT user_id) over fct_payments. The numerator is correct. The denominator counts only users who paid — not users who were active. The actual active_user lives on fct_sessions, not fct_payments. The two tables have different grains. A semantic layer with two semantic models (payments and sessions) and a ratio metric defined as gmv / wau forces the compiler to do a cross-grain join (or symmetric aggregate) properly. Without the layer, the analyst writes the wrong join and ships the wrong number for two months. Razorpay's analytics team saw this pattern repeatedly in 2023 reviews, which is part of why they moved to MetricFlow in 2024.
Ratio metrics, cumulative metrics, and why the SQL gets weird
A ratio metric (gmv_per_active_user) is not just "metric A over metric B" — the two metrics may have different filters, different time grains, different semantic models. The compiler has to emit two CTEs, one per side, then join them on the dimensions the user asked for. A cumulative metric (mtd_gmv, month-to-date GMV) requires a time-spine join: the warehouse must produce a 0 row for any day with no payments so the running sum doesn't have gaps. MetricFlow ships a time-spine model precisely to make this possible. These are the cases where hand-written SQL gets long and where the semantic layer's SQL emission is less obvious — but also where the value is highest, because these are the metrics juniors get wrong.
What versioning actually looks like in production
A real versioning workflow at a fintech: PR adds gmv_v2, leaves gmv_v1 untouched. CI compares the compiled SQL of v1 and v2 on the last 30 days of data and posts the diff to the PR. Stakeholders eyeball the diff (Finance, Growth, Ops). Once approved, a deprecation timeline is set: dashboards switch to v2 over six weeks, with an email a week before each switch. After the cutover, v1 stays in the registry — frozen, queryable, never removed — so historical decks continue to render the old number. The registry grows over time; storage is free, and the audit trail of "what did GMV mean in Q3 2024" is invaluable when a finance audit or a regulator asks. Cred and Razorpay both run roughly this workflow as of 2026.
Pre-aggregation: where the semantic layer earns its keep on cost
Compiling intent → SQL on every dashboard load is fine for a 100-row mart. For Flipkart's Big Billion Days dashboard hitting fct_orders (2.1 billion rows for 2025), every dashboard view recomputing the same SUM(amount_inr) would melt the warehouse. The semantic layer's pre-aggregation feature (Cube calls it "rollups", MetricFlow uses dbt incremental materializations, LookML uses aggregate_table) lets the compiler answer most queries from a small daily-rollup table and only fall back to the raw fact table for unusual slices. The pre-agg is invisible to the consumer — they ask for gmv by state last 7 days and get an answer in 80ms instead of 8 seconds. This is the half of the semantic-layer story that pays for itself in warehouse credits within a quarter.
Where this leads next
The substrate this metric layer sits on is dbt — see /wiki/dbt-as-the-de-facto-transformation-layer. The "what is an active user" definition problem that motivates this layer is unpacked in /wiki/the-what-is-an-active-user-problem. The cross-tool comparison — MetricFlow vs Cube vs LookML on the seven fields — is the next chapter, /wiki/lookml-cube-metricflow-the-landscape.
After that, Build 13 turns to consumption: how the metrics layer powers headless BI, LLM agents that ask metric questions, and the "metric API" pattern where Slack bots and notebooks pull the same number through the same compiler. The thread of this build is that the metric is a contract, the consumer surface is interchangeable.
References
- dbt MetricFlow — Build a metric — canonical docs for the dbt-native semantic layer; the seven-fields framing maps almost 1:1 to the YAML.
- Cube.dev — Data modeling — the cube/measure/dimension shape, with
pre_aggregationsfor the rollup story. - Looker — LookML overview — the original semantic layer; reading LookML is still the fastest way to grasp the abstraction.
- Benn Stancil — The metrics layer — the essay that named the category in 2021 and seeded the round of investment that produced MetricFlow, Cube Cloud, Lightdash.
- MetricFlow — Theory of constraints in metrics — Nick Handel's writeup on why the compiler needs to know the join graph, not just the measures.
- Malloy — Lloyd Tabb's experimental language; if LookML is a configuration of joins, Malloy is the language for them. Useful as a calibration of "what does a clean semantic model look like".
- /wiki/dbt-as-the-de-facto-transformation-layer — the previous chapter; dbt is the substrate the metrics layer sits on.
- /wiki/the-what-is-an-active-user-problem — the definitional problem this layer makes tractable.