Data mesh: decentralization as a governance pattern

A 600-engineer payments company in Bengaluru has 80 product engineers shipping features and four data engineers fielding every analytics question. Tickets pile up. The data team becomes the bottleneck — not because they are slow, but because they are the only people with context on every system, and there are 80 systems. By the time the data engineer understands what "merchant chargeback rate" means in the context of the disputes service, the product team has already shipped two more features that change the definition. Data mesh is the architectural response that says: the people who own the service should own the data the service produces, the central team owns the platform that makes that ownership safe, and governance is what the platform enforces — not what a committee approves.

Data mesh decentralises data ownership to the teams that produce the data, while a central platform team owns the substrate (storage, compute, catalog, contracts) that makes decentralisation safe. Four principles hold it together: domain ownership, data-as-a-product, self-serve platform, and federated computational governance. The trade-off is real — you trade central-team velocity for product-team accountability — and only pays off above ~50 producing teams, where central bottlenecks dominate.

Why centralisation fails at a certain size

Picture the Razorpay-shaped company at three sizes. At 20 engineers, one data engineer can hold every table's meaning in their head — they wrote the schema, they built the pipeline, they answer the questions. At 100 engineers, that one person is now five, and the five hold daily standups to coordinate who owns which dataset. At 600 engineers, the five are now thirty, and they spend most of their time playing schema-archaeologist on tables they did not design — running git blame on dbt models nobody on the team wrote.

The pattern is structural. Central data teams scale sub-linearly with the producing surface area. Every new product team adds N new tables, M new business definitions, and 1 new "what does active_user mean for our product?" debate. The central team's capacity grows by adding people, but each new person inherits a smaller fraction of the system's context. After a threshold — Zhamak Dehghani's original Thoughtworks article placed it around the point where you have more than ~10 source-aligned domains feeding a central warehouse — the cost of cross-team coordination dominates the cost of analytics itself.

Centralised data team vs data mesh — how the bottleneck movesTwo side-by-side diagrams: on the left, a central data team sits between many product teams and the warehouse, with every arrow funnelling through them; on the right, each product team owns its own data product and publishes through a self-serve platform layer, with the central team owning only the platform. Where the bottleneck lives — central pipe vs platform substrate Centralised (the painful shape) payments disputes payouts central data team (owns every pipeline) warehouse (everything routed via team) Data mesh (decentralised ownership) payments data product disputes data product payouts data product self-serve platform (catalog · contracts · compute) consumers (any team) platform is the substrate, not the gateway
The diagrams look similar; the political topology is opposite. On the left, the central team is the pipe through which everything must flow. On the right, the platform is a substrate every team uses, but no team has to wait for.

Why "data team becomes a bottleneck" is structural and not solvable by hiring: the central team's coordination cost grows with the number of producer-consumer pairs they sit between. With N producers and M consumers, that is N × M pairs of business definitions to reconcile. Even doubling the central team's size only multiplies their throughput by ~1.7× (Brooks' law overhead) while the coordination surface continues growing quadratically with the company.

The Thoughtworks definition of data mesh, published in 2019 by Zhamak Dehghani, names this directly: the central data team is "trying to satisfy the needs of consumers it does not understand from data it did not produce". The failure is not the team; it is the position.

The four principles, made concrete

Data mesh is often described abstractly. Make each principle concrete to a Razorpay-shaped engineering org.

1. Domain ownership. The team that runs the disputes service owns the disputes data. They are the ones who knew that on March 14th the meaning of "chargeback initiated" changed because the payment-network protocol updated. They write the dbt models, they own the dashboards, they answer "why is the chargeback rate spiking on the Mastercard segment?". Critically — they also own the on-call rotation for those data products. If a downstream Looker dashboard breaks at 2 a.m. because the disputes team renamed a column, the disputes team gets paged.

2. Data as a product. A data product is not "a table". It is a self-describing, versioned, contracted, observable, queryable artefact with an owner, an SLA, and a change log. A disputes.chargebacks_daily table that nobody has documented and nobody is on-call for is not a data product — it is a leak. The product treatment forces the producer team to think about their data the way they think about their API: backwards-compatible evolution, deprecation windows, versioned schemas, freshness commitments, and user-facing documentation.

3. Self-serve data platform. This is what the central data team becomes. They no longer write product-team pipelines. They write the infrastructure that lets product teams write their own pipelines safely: the catalog, the contract registry, the dbt-on-rails template, the Iceberg-table-creation API, the lineage tracker, the cost-attribution dashboard. The central team's success metric flips from "tickets closed" to "platform adoption" — how many domains are using the platform without needing a central engineer to hand-hold them.

4. Federated computational governance. Governance becomes code. A central governance committee defines policy ("PII columns must be tagged at creation", "every data product must have an SLA", "tables touching payments data must live in ap-south-1"); the platform enforces the policy automatically at table-creation, schema-change, and query time. No human approval gate. The committee writes the rule once; the platform applies it a million times.

Four principles of data meshDiagram laying out the four data-mesh principles in a 2x2 grid: domain ownership and data-as-a-product on the producer side; self-serve platform and federated computational governance as the platform substrate that supports them. Four principles, two layers Producer-side (the domain teams) 1. Domain ownership Disputes team owns disputes data Includes on-call for downstream Locality of context 2. Data as a product Versioned, contracted, observable SLAs, deprecation, docs API thinking for tables Platform substrate (the central team) 3. Self-serve platform Catalog, contracts, compute, lineage Used without escalation Substrate, not gatekeeper 4. Federated comp. governance Policy is code, not committee approval Enforced at create/change/query Rule once; apply automatically
The producer-side principles set the team boundary; the platform-side principles make the boundary safe.

The four principles are not independent options — pick-three is fragile. Ownership without product thinking creates a thousand undocumented tables. Product thinking without a platform forces every team to reinvent dbt, lineage, and contracts. A platform without federated governance becomes a free-for-all where every team makes its own PII rule. All four, or you are building something else.

A tiny self-serve table-creation API

The platform's value is best felt through code. Build a minimal data-product registry that domain teams call to create a new data product — and which enforces governance automatically.

# data_product_registry.py — tiny mesh-style self-serve API
from dataclasses import dataclass, field, asdict
from typing import List, Dict
from datetime import datetime
import re, json, uuid

# Federated governance policy (the central committee defines this once)
POLICIES = {
    "pii_columns_must_be_tagged": True,
    "sla_hours_required": True,
    "owner_team_required": True,
    "payments_data_region": "ap-south-1",
    "deprecation_notice_days": 30,
}

PII_NAME_HINTS = re.compile(r"(pan|aadhaar|mobile|email|phone|name|address|dob)", re.I)

@dataclass
class DataProduct:
    name: str
    domain: str
    owner_team: str
    on_call: str
    sla_hours: int
    columns: List[Dict]
    region: str = "ap-south-1"
    version: str = "v1"
    created_at: str = ""
    product_id: str = ""

class GovernanceError(Exception): pass

REGISTRY: Dict[str, DataProduct] = {}

def register(dp: DataProduct) -> DataProduct:
    # 1. Required-fields gate
    if POLICIES["owner_team_required"] and not dp.owner_team:
        raise GovernanceError("owner_team is required by policy")
    if POLICIES["sla_hours_required"] and not dp.sla_hours:
        raise GovernanceError("sla_hours is required by policy")
    # 2. PII tag enforcement
    if POLICIES["pii_columns_must_be_tagged"]:
        for col in dp.columns:
            if PII_NAME_HINTS.search(col["name"]) and not col.get("pii_tag"):
                raise GovernanceError(
                    f"column {col['name']} looks like PII but has no pii_tag")
    # 3. Region pinning for payments domain
    if dp.domain == "payments" and dp.region != POLICIES["payments_data_region"]:
        raise GovernanceError(
            f"payments domain must live in {POLICIES['payments_data_region']}")
    dp.product_id = str(uuid.uuid4())[:8]
    dp.created_at = datetime.utcnow().isoformat() + "Z"
    REGISTRY[dp.product_id] = dp
    return dp

# Disputes team registers a new data product
chargebacks = DataProduct(
    name="chargebacks_daily",
    domain="disputes",
    owner_team="team-disputes",
    on_call="riya.kulkarni@razorpay.com",
    sla_hours=4,
    columns=[
        {"name": "txn_id",         "type": "string"},
        {"name": "merchant_pan",   "type": "string", "pii_tag": "PAN"},
        {"name": "amount_paise",   "type": "int64"},
        {"name": "chargeback_ts",  "type": "timestamp"},
    ],
)

result = register(chargebacks)
print(json.dumps(asdict(result), indent=2)[:600])

# Try to register a non-compliant product
try:
    bad = DataProduct(
        name="user_dump", domain="auth", owner_team="team-auth",
        on_call="ops@razorpay.com", sla_hours=24,
        columns=[{"name": "user_aadhaar", "type": "string"}])  # no pii_tag
    register(bad)
except GovernanceError as e:
    print(f"REJECTED: {e}")
# Output:
{
  "name": "chargebacks_daily",
  "domain": "disputes",
  "owner_team": "team-disputes",
  "on_call": "riya.kulkarni@razorpay.com",
  "sla_hours": 4,
  "columns": [
    {"name": "txn_id", "type": "string"},
    {"name": "merchant_pan", "type": "string", "pii_tag": "PAN"},
    {"name": "amount_paise", "type": "int64"},
    {"name": "chargeback_ts", "type": "timestamp"}
  ],
  "region": "ap-south-1",
  "version": "v1",
  "created_at": "2026-04-25T08:42:19Z",
  "product_id": "7a1b9c
REJECTED: column user_aadhaar looks like PII but has no pii_tag

Walk through the load-bearing pieces. The POLICIES dict at the top is the central governance committee's output — written once, after a quarterly review, encoded as a Python literal. Why a dict and not a database table: at this stage of platform maturity the policy set is small (5–20 rules) and changes rarely. Encoding as code makes the rules reviewable through normal pull requests and version-controlled. Once policies hit ~50 and need different values per domain, this graduates to a dedicated policy engine like Open Policy Agent. The register function is the only entry point — domain teams cannot create a data product by writing a CREATE TABLE directly; they go through this registry call. Why a single registration choke-point and not just trusting teams to follow the rules: voluntary policy compliance has a 60–80% adherence rate even with strong culture; gated registration has 100%. The platform is the floor below which non-compliance is impossible. This is what "computational governance" means. The PII detection by column name uses a regex over common Indian-context PII column names (pan, aadhaar, mobile, email). It is a safety net — if a column looks like PII and has no pii_tag, registration fails. The disputes team had to mark merchant_pan with pii_tag: PAN for the registration to succeed; the auth team's attempt to register user_aadhaar without a tag is rejected. The GovernanceError subclass is the platform talking back to the producer team in their own language — they get a 400-style error with the rule cited, not a 30-day approval queue. The payments domain → ap-south-1 rule is RBI's payment data localisation requirement encoded as policy — if the disputes team had set region: us-east-1, registration would fail. The platform makes the regulation a compile-time error rather than a quarterly audit finding.

A real platform extends this with: contract validation (the producer's schema matches the consumer's expected schema), automatic catalog ingestion (the registered product appears in DataHub/Atlan within minutes), default observability (freshness checks scheduled on the SLA, anomaly detection on row counts), and federated cost attribution (the product's storage and compute costs land on the owning team's chargeback dashboard).

How the bottleneck moves — and what gets harder

Data mesh does not delete the central data team's work; it relocates it. Three things get easier, three things get harder.

Easier: new data products ship in days because the producer team owns the change end-to-end without coordinating with central. Domain knowledge is fresh — the people building the dbt model know what the columns mean because they wrote the source service. Cross-team analytics improves because every team treats their data as a public API and documents it accordingly.

Harder: cross-domain consistency. When the payments team says txn_id is a UUID and the disputes team says transaction_id is a 16-character alphanumeric string, joining them at consumption time requires a translation layer that nobody owns. The platform's contracts mechanism partially solves this — it forces producers to declare the schema — but the semantic alignment ("are these the same concept?") is a coordination problem that does not vanish.

Harder: discovery. With 200 data products owned by 80 teams, finding the right one for a use case is a search problem. A strong catalog with usage metrics, lineage, and free-text search becomes essential — DataHub, Atlan, Amundsen, and OpenMetadata exist for exactly this reason.

Harder: cost. Each domain team running its own pipelines without coordination tends to over-provision compute (no economies of scale on warehouse credits) and duplicate transformations (every team derives "active user" their own way). Cost attribution dashboards mitigate this by making the over-spending visible per team, but the underlying redundancy is the price of decentralisation.

Easier: governance, surprisingly. Counter-intuitive — you would expect decentralisation to weaken governance. In practice, governance enforced as code at the platform layer is more reliable than governance enforced as approvals at a central committee. The committee can be overworked or absent; the platform's contract-check runs on every registration.

When data mesh is and is not the right answer

Data mesh has a wrong-tool-for-wrong-job failure mode. Adopting it at 50 engineers is over-engineering — the central team is not a bottleneck yet, and you are paying the platform-build cost for an ailment you do not have. Adopting it without product thinking ("just decentralise the tables") creates 200 owner-less datasets that nobody documents.

The right adoption path is staged. Start by carving out one domain that is hurting most under centralisation — typically payments or user-identity in a fintech, catalogue in e-commerce, geo-events in a delivery company. Hand that domain ownership of its data products. Build the platform thinly to support that one domain. Demonstrate the unblock. Onboard the next domain. The central team gradually transitions from pipeline-writers to platform-builders. PhonePe's published data-platform talks describe roughly this five-year arc — they did not flip a switch in 2020 and emerge with a mesh in 2021.

Common confusions

Going deeper

The data-product manifest, in detail

A production data-product manifest has more fields than the toy registry shows. A representative shape: name, domain, owner_team, on_call, sla (freshness, completeness, availability), schema_version, deprecation_policy (how long old versions remain queryable after a new version ships), consumers (registered downstream tables and dashboards), cost_owner (which cost-center is billed for storage and compute), pii_classification (per-column tags), data_classification (public, internal, confidential, restricted), geographic_residency (which regions the data is allowed to live in), change_log (every breaking and non-breaking change with timestamp). The manifest is checked into the producer team's repo, validated by CI, and the platform reads it on every push. Atlan, DataHub, and OpenMetadata all converge on roughly this shape; Open Data Mesh specification (open-data-mesh.org) is an emerging standard.

Federated computational governance versus centralised approvals

The federated model relies on a small number of cross-cutting rules — typically 10 to 30 — that the central governance group agrees on and the platform enforces uniformly. Examples: PII tagging at creation (mandatory), encryption-at-rest (default-on), region pinning by data classification, deprecation-window minimum (30 days), SLA minimums (4-hour freshness for product-tier data products). The hard design choice is which rules become policy code and which remain human review. Anything mechanical — schema validation, region check, PII tag presence — becomes code. Anything requiring judgement — "is this dataset safe to share with a vendor?", "is this aggregation low-enough resolution to count as anonymous?" — stays as a review process, but the platform routes it to the right reviewer automatically based on the data classification.

The "data product" abstraction borrowed from microservices

The pattern data mesh borrows is the API-as-a-product framing that microservices pioneered: an API has SLAs, versioning, deprecation timelines, documentation, on-call, and a public contract. Apply that to a table and you get the data product. The deepest analogy: just as a Razorpay payments API is consumed by hundreds of merchants without any of them needing to read the payments-team source code, a payments data product is consumed by analytics, ML, and reporting without any of them needing to read the payments dbt models. The contract is the interface. Breaking the contract is a bug. This abstraction is what makes federated ownership scale — without it, "decentralised data" is just "everyone has their own pile of tables".

What the Indian regulatory environment adds

Indian companies adopting data mesh have to bake regulatory constraints into the platform substrate. RBI's payment data localisation requires that data classified as payments lives in ap-south-1 — encoded as a region rule in the registry, as shown above. DPDP 2023 requires that data subjects' deletion requests propagate within reasonable time across all data products that hold their data — encoded as a deletion_propagation_sla field per product, and a downstream-walk engine like the one in /wiki/pii-detection-masking-right-to-be-forgotten. SEBI requires investment-account records be retained for 8 years — encoded as a retention_minimum field per product that overrides any default deletion. The platform's job is to make these regulatory constraints invisible to the domain team — the team writes a normal manifest, and the platform applies the right rules based on the classification fields.

Where data mesh fails in practice

Three failure modes recur. First, the platform is built and the producers do not adopt it — usually because the platform requires too much manifest authoring before yielding value, or because the central team enforces it before building convenience tooling. Second, the governance committee never converges on rules, so each domain invents its own conventions and the federation collapses into balkanisation. Third, the org structure does not change — domain teams have data ownership in name but no headcount or budget for it, so data work falls back to a junior engineer who treats it as toil. The first failure is solved by platform UX; the second by leadership commitment to a governance forum; the third only by re-allocating headcount. The technology cannot fix an org-design problem.

Where this leads next

References