The headless-BI movement
In late 2023 a senior data engineer at Cred named Aditi pulled up the same number in four places: Looker said weekly active users were 9.42 lakh, the Mixpanel cohort said 8.11 lakh, the React dashboard the growth team had built for the founder said 9.78 lakh, and the Slack bot that posted morning numbers said 9.21 lakh. Four sources, four definitions of "active", four implementations of the same SQL — each one written by a different team, each one drifting at its own pace. The fix was not a fifth dashboard. The fix was to put the metric definition behind a single API and let every consumer — Looker, Mixpanel, the React app, the Slack bot — call that API instead of writing its own SQL. That API has a name. It is called headless BI, and the rest of this chapter is about what it does, what it does not do, and why it became the dominant pattern for data stacks in 2024–2026.
Headless BI separates the metric definition from the chart. The metric lives behind an HTTP/JDBC API; the chart, app, notebook, or LLM agent is just a consumer. The shift matters because BI tools are no longer the only thing reading numbers — apps, bots, and agents are — and a metric that lives inside one BI tool cannot reach the others without being re-implemented.
What "headless" actually decouples
A traditional BI tool is two things stapled together: a metric registry (definitions of gmv, active_users, gross_margin) and a renderer (chart components, dashboard layout, drill-down UI). For twenty years the bet was that this coupling was a feature — buy Looker, get the registry and the charts as one thing. Headless BI breaks the staple. The registry stays; the renderer is whatever you want.
Why the decoupling is more than rearranging boxes: in the left stack, the BI tool is on the critical path of every metric query. If you rip out Looker, you rip out the metric definitions with it. In the right stack, the BI tool is a renderer — replace it tomorrow and the metric API stays the same. That swappability is what people mean when they say "BI is becoming a commodity".
The metric API — what's actually on the wire
A metric API is a small, opinionated query language. It does not let you SELECT * FROM payments. It lets you ask three things: which metrics, grouped by which dimensions, filtered how. The compiler does the SQL. The dbt Semantic Layer's GraphQL endpoint is the cleanest example of this contract — here is a real query that Aditi at Cred runs every morning to populate the founder dashboard.
# Query the metric API for "weekly active users by city, last 8 weeks"
query WeeklyActives {
query(
metrics: [{name: "weekly_active_users"}]
groupBy: [
{name: "metric_time", grain: WEEK}
{name: "user__city"}
]
where: [
{sql: "{{ Dimension('user__city') }} IN ('Bengaluru', 'Mumbai', 'Delhi', 'Pune')"},
{sql: "{{ TimeDimension('metric_time', 'WEEK') }} >= CURRENT_DATE - INTERVAL '56 days'"}
]
orderBy: [{descending: false, groupBy: {name: "metric_time", grain: WEEK}}]
) {
queryId
status
sql
arrowResult # base64-encoded Arrow IPC stream
}
}
# Sample response (truncated):
{
"data": {
"query": {
"queryId": "01HX7K9QAB8M",
"status": "SUCCESSFUL",
"sql": "SELECT DATE_TRUNC('week', subq.metric_time) AS metric_time__week, subq.user__city, COUNT(DISTINCT subq.user_id) AS weekly_active_users FROM (SELECT u.user_id, e.event_time AS metric_time, u.city AS user__city FROM analytics.events e JOIN analytics.users u USING (user_id) WHERE u.city IN ('Bengaluru','Mumbai','Delhi','Pune') AND e.event_time >= CURRENT_DATE - INTERVAL '56 days') subq GROUP BY 1, 2 ORDER BY 1",
"arrowResult": "<base64 bytes — 32 rows × 3 cols>"
}
}
}
Walk the call carefully — five things are happening that a raw SQL endpoint does not give you.
- The consumer never wrote a
JOIN. The query asks forweekly_active_users by city. The metric API knowsweekly_active_usersis defined on theeventssemantic model, knowscityis a dimension on theusersentity, and knows the join key isuser_id. Why this matters: the join graph is in the metric registry, not in the consumer. A new consumer (the React app) joins users to events identically to the existing consumer (Looker), because neither writes the join — the compiler does. - Time granularity is a parameter, not a column.
metric_time, grain: WEEKis grammar; the compiler picks the rightDATE_TRUNCflavour for the warehouse (Snowflake'sDATE_TRUNC('week', ...)differs from BigQuery'sTIMESTAMP_TRUNC(..., WEEK(MONDAY))). The consumer is portable across warehouses for free. - The
whereclause uses the same dimension names asgroupBy. There is no second namespace where the consumer has to know thatuser__cityis actuallyusers.cityin the underlying SQL. The compiler keeps the model and the predicate in the same vocabulary. - Arrow IPC over the wire. The result is not JSON; it is Apache Arrow's columnar IPC format, base64-encoded for GraphQL transport. Why Arrow and not JSON: a 100k-row result set is 5–10× smaller in Arrow than in JSON, decodes 50× faster in pandas/Polars, and preserves typed columns (a
bigintdoes not become a JS number that loses precision past 2⁵³). For a metric API that BI tools, Spark, and Python notebooks all consume, Arrow is the lowest-friction format. queryIdis returned synchronously, even if the query is long-running. The contract is async-by-default — the caller pollsqueryIdfor status. This matters for the warehouses (BigQuery, Snowflake) that often take several seconds for a fresh metric query.
Wire protocols — JDBC, GraphQL, REST
The metric API is not one protocol. It is whichever protocol the consumer already speaks. Headless BI took off precisely because the metric tier learned to speak the protocols BI tools have spent two decades trusting.
# JDBC consumer: Tableau, DBeaver, any SQL client
import pyarrow.flight as flight
# Connect to the dbt Semantic Layer over Arrow Flight SQL
client = flight.connect(
"grpc+tls://semantic-layer.cloud.getdbt.com:443",
middleware=[BearerTokenMiddleware(os.environ["DBT_SL_TOKEN"])],
)
# The query feels like SQL but the table is a virtual semantic model
sql = """
SELECT
metric_time__week,
user__city,
weekly_active_users
FROM {{ semantic_layer.query(
metrics=['weekly_active_users'],
group_by=['metric_time__week', 'user__city'],
where="{{ Dimension('user__city') }} IN ('Bengaluru','Mumbai')"
) }}
ORDER BY metric_time__week
"""
info = client.get_flight_info(flight.FlightDescriptor.for_command(sql.encode()))
table = client.do_get(info.endpoints[0].ticket).read_all() # arrow Table
print(table.to_pandas().head())
# metric_time__week user__city weekly_active_users
# 0 2026-03-02 Bengaluru 284917
# 1 2026-03-02 Mumbai 197844
# 2 2026-03-09 Bengaluru 291102
# 3 2026-03-09 Mumbai 202336
# 4 2026-03-16 Bengaluru 298815
The same metric is reachable three ways from the same registry: Tableau hits the JDBC endpoint and sees a virtual table; the React dashboard hits GraphQL and gets typed rows; the LLM agent hits REST and gets JSON it can drop into a function-calling response. Three wire formats, one definition. Why supporting all three matters operationally: a BI tool that already trusts a Postgres-shaped JDBC connection takes zero engineering work to point at the metric API. If the API only spoke GraphQL, every BI vendor would need a custom integration — and most never would.
What headless BI is not
Three claims float around the category that confuse people. Worth being precise.
Why this became the dominant pattern
Three forces converged in 2022–2024.
The consumer surface multiplied. In 2018 the only thing reading metrics was a BI tool. By 2024 a typical Indian D2C company had: a Looker for analysts, a Hex for product managers, a Streamlit for ML engineers, a React dashboard for the founder, a Slack bot for daily standup, a customer-facing dashboard, and an LLM agent answering metric questions in English. Seven consumers, one definition is the only viable architecture. Why the surface multiplied: every team that hired its own engineer wanted its own way to look at the data. The metric registry has to be tool-independent or it will be re-implemented per team. Aditi's "WAU is 9.42 / 8.11 / 9.78 / 9.21 lakh" story is what happens when it isn't.
dbt won the transformation layer. Once dbt was the de-facto place where the warehouse models lived, putting the metric definition next to those models — same git repo, same PR review, same CI — was the obvious move. MetricFlow's acquisition by dbt Labs in early 2023 made this official. The metric registry inherits dbt's governance properties (PR-reviewed, version-controlled, tested) for free.
LLMs forced an API. A Slack bot that takes "how many transactions in Bengaluru last Tuesday?" and answers "₹47.2 crore across 8.4 lakh transactions" cannot be built by giving the LLM raw warehouse access — it would hallucinate joins, miss filters, get the metric wrong. It can be built if the LLM is restricted to calling the metric API with a small, typed schema. The headless-BI tier is the ideal substrate for an LLM agent because it constrains the LLM to known metrics and known dimensions. By 2025 every major Indian fintech (Razorpay, Cred, Jupiter) had at least one internal LLM tool sitting on top of a semantic layer.
Common confusions
- "Headless BI is a new BI tool." It is the absence of a BI tool — or more precisely, the part of a BI tool that survives when you take away the chart UI. The metric registry, the SQL compiler, the cache, the wire protocol. You still need a BI tool (Looker, Hex, Lightdash) for analysts to drag dimensions onto charts. Headless BI just means that BI tool is interchangeable.
- "Headless BI replaces dbt." It sits on top of dbt. dbt builds the tables in the warehouse; the metric definitions reference those tables. MetricFlow ships inside dbt, so the line between "transformation layer" and "semantic layer" can blur — but they do different jobs. dbt is
INSERT INTO ... SELECT ...; the semantic layer is "given a metric + dimensions, emit the SQL". - "Headless BI is the same as a metrics layer." "Metrics layer" is the older, vaguer term Benn Stancil used in 2021. "Headless BI" is the architecture pattern that emerged once the metric layer started shipping wire protocols. A metrics layer that you can only call from inside one BI tool (e.g., LookML before the JDBC adapter) is not headless. A metrics layer that any consumer can hit (MetricFlow's Arrow Flight SQL endpoint, Cube's Postgres wire) is headless.
- "Headless BI removes the need for a BI tool." It removes the lock-in to a particular BI tool. Analysts still want to drag-and-drop dimensions onto a chart; that experience is what BI tools sell. The change is that you can now buy that experience as a thin client over your metric API — Lightdash and Hex were built explicitly for this consumption model — and switch tools without losing the metric definitions.
- "Headless BI is just caching." The cache is a feature; the value is the registry and the wire protocol. Even with caching disabled, headless BI still solves the divergence problem — every consumer gets the same SQL because every consumer asks the same compiler.
Going deeper
The "metric API" call shape — why these three verbs
Every headless BI implementation converges on roughly the same call shape: metrics=[...], group_by=[...], where=[...], order_by=[...]. There is no select, no from, no join. This is not a coincidence — it falls out of the constraint that the consumer must not write SQL, otherwise the metric definition can be bypassed. The Cube REST API, the dbt Semantic Layer GraphQL API, and AtScale's MDX-derived API all share these verbs because they all share the constraint. Why this constraint is non-negotiable: if the consumer can write SELECT SUM(amount) FROM payments, they have just defined GMV in their consumer code, in violation of the whole architecture. The API has to be expressive enough to ask any reasonable question and restrictive enough that the question must reference a registered metric.
How LLM agents fit — schema as prompt
The headless-BI tier is the cleanest interface an LLM agent has ever had to a data warehouse. Razorpay's internal "ask Riya" bot, deployed to ~600 internal users in mid-2025, works like this: when a user asks "what was UPI volume in Bengaluru last week?", the bot prompts the LLM with the metric registry's schema (a list of metrics with descriptions, a list of dimensions per metric, allowed filters) and asks the LLM to emit a JSON call to the metric API. The LLM never sees SQL, never sees table names, cannot hallucinate columns that don't exist — the registry's schema is the LLM's universe. The answer is then a deterministic SQL query against Snowflake, not a generated SQL string. The error rate (measured against analyst-validated answers) settled at ~3%, mostly from ambiguous user questions, not from generated-SQL bugs. The same architecture without a semantic layer (LLM writes SQL directly) measured at ~22% — seven times worse, mostly from hallucinated joins.
Where the cache lives matters
A fresh metric query against fct_payments (3 billion rows) takes 4–8 seconds on Snowflake even with clustering. A cached one takes 50ms. Where the cache lives determines who pays the cost. dbt Semantic Layer caches per-query in dbt Cloud's tier — fast, but tied to that tenant. Cube caches in Cube Store (a stripped-down columnar engine that runs alongside the API tier), which means the same cache serves Looker, Hex, and the React app. LookML pre-aggregates into the warehouse, which the warehouse caches its own way. Why this matters at scale: a customer-facing dashboard that hits the metric API at 1000 QPS will melt the warehouse if the cache is per-tenant; a Cube-style shared cache or a warehouse-side aggregate table is the only viable answer.
Why "headless" is sticky in 2026 but might not be the final word
The "headless" framing came from the JAMstack / headless CMS movement (Contentful, Strapi). The vocabulary was useful in 2022 because BI vendors had not yet split their products. By 2026 every BI vendor is shipping some form of metric API; calling the architecture "headless" is becoming redundant the way "AJAX" became redundant once every web framework spoke fetch. The thing that will stay is the metric API as a tier in the data stack — same way the warehouse is a tier, the transformation layer is a tier, and now the metric layer is a tier. The name will mature into something like "the metric layer" or "the semantic tier"; the architecture will not change.
The Indian-stack picture, by company size
| Stage | Stack | Why |
|---|---|---|
| Early-stage (Series A, < 50 engineers) | dbt + MetricFlow + Lightdash (single semantic source, OSS BI) | Cheap; one source of truth from day one |
| Growth-stage (Series B–C, 50–200 engineers) | dbt + MetricFlow + Looker (analyst BI) + Hex (PMs) + custom React (founder dashboard) | Multiple consumer surfaces; the metric API is what keeps them aligned |
| Late-stage (Series D+, 200+ engineers) | dbt + Cube (customer-facing) + LookML (internal analyst) + LLM agent (Slack) | Multiple semantic layers, one designated as truth, others as pass-throughs — pragmatic, not pure |
| Public-listed | Custom semantic layer on top of dbt or Atlan-style metadata catalog | At Razorpay/Zerodha scale, building rather than buying becomes viable |
The interesting ones are the late-stage companies that have two semantic layers: a Cube for the customer-facing dashboard (because Cube's pre-aggregations are operationally proven for sub-100ms reads) and MetricFlow for the internal stack. Aligning the two is the new chapter-of-pain — most companies handle it by declaring MetricFlow the source of truth and letting Cube be a thin pass-through that re-emits the metric, with a CI check that fails if the two diverge.
Where this leads next
The next chapter (/wiki/semantic-layer-llms-the-new-interface) is the natural sequel — once the metric API exists, an LLM agent on top of it is straightforward, and what looks like "ChatGPT for data" is actually "function calling against a typed metric schema". The build does not end there. Build 14 (/wiki/wall-batch-metrics-arent-fresh-enough) confronts the failure mode of every batch-warehouse-backed semantic layer: the analyst wants the GMV from five minutes ago, and the warehouse-backed metric is from last night.
The thread running through Build 13 stays: the metric is the contract, the renderer is interchangeable. Headless BI is what happens when the industry takes that thread seriously and moves the contract out of the BI tool's database.
References
- dbt Semantic Layer — overview — the GraphQL and JDBC contracts; the canonical "metric API" implementation.
- Benn Stancil — The metrics layer — the 2021 essay that named the category and seeded the round of investment.
- Tristan Handy — How is dbt building toward the future — the dbt founder's argument for why the metric layer belongs in the transform layer, not the BI layer.
- Cube — Headless BI architecture — Cube's case for the architecture, with the wire-protocol focus.
- Apache Arrow Flight SQL — the protocol that makes a metric tier reachable from any JDBC client; the "headless BI talks to Tableau for free" story.
- Lightdash — open-source BI for dbt — calibration of what a thin client over a metric API actually looks like in production.
- /wiki/lookml-cube-metricflow-the-landscape — the previous chapter's three-vendor landscape that this chapter abstracts into a wire-protocol pattern.
- /wiki/metric-definitions-once-queried-many-ways — the seven-fields framing of metric definitions that headless BI exposes over the wire.