Note: Company names, engineers, incidents, numbers, and scaling scenarios in this article are hypothetical — even when they resemble real ones. See the full disclaimer.

Dashboard-as-code (Grafana JSON, Terraform)

It is 22:14 IST on a Saturday and Aditi, the platform-team SRE on-call at a hypothetical Pune-based fintech we will call DhanFlow, gets paged for payments-api p99 latency. She opens the runbook deep-link. The Grafana dashboard loads — and the burn-rate panel, the worst-tenant panel, and the partition-error panel are all gone. There is one panel left, titled Untitled Panel, showing CPU usage. She pages the previous on-call, Karan, who left the company six weeks ago. There is no answer. Somebody has clicked "Save" on the production dashboard between 19:30 and 20:15 IST while doing capacity-planning experiments, and the change went straight to the live dashboard with no review, no diff, no rollback. Aditi has 38 minutes of error budget left and no instrument panel. She ends up running raw PromQL in Grafana Explore for the next 21 minutes, eyeballing the breach to a Postgres query plan flip on replica-2, and rolling back the change with three minutes of budget left. The post-incident review has one action item: the dashboard must live in git, not in the Grafana database. This is the chapter on what that sentence actually means.

Dashboards-as-code means the dashboard JSON model lives in your version-controlled source tree, is generated by code (Jsonnet, Python, Terraform), is reviewed by humans before merge, and is applied to Grafana by a deployment pipeline — never edited in the UI. The discipline buys you reviewability, reproducibility, multi-environment parity, and rollback. It costs you the convenience of "drag a panel and save" and forces a hard choice of generator (raw JSON, Jsonnet/Grafonnet, Terraform Grafana provider, Python templating, kube-grafana-operator) — each with a different blast radius.

What dashboards-as-code actually means — the JSON model is the source of truth

A Grafana dashboard is not a UI artefact; it is a JSON document. Open any dashboard in Grafana, click the gear icon, choose JSON Model, and you will see the entire dashboard — every panel, every query, every variable, every annotation — as a single ~2-5 KB JSON object. The Grafana UI is a renderer for that JSON. When you click Save, the UI serialises the in-memory dashboard back to JSON and writes it to Grafana's backing store (sqlite by default, often Postgres or MySQL in production). The "save" is a database UPDATE with no diff, no review, and no rollback unless the History feature is configured.

The dashboard-as-code claim is simple: the JSON model belongs in git, not in the database. The git copy is the source of truth; the database is a derived cache that is rebuilt by a deployment pipeline whenever git changes. This inverts the default. The default is: edit in UI → save to database → maybe export later. Dashboard-as-code is: edit code in IDE → commit → CI generates JSON → CI applies to Grafana via API → database is overwritten. The UI becomes read-only for production dashboards; edits happen in a sandbox or staging environment and are promoted via the same pipeline that promotes application code.

Dashboard-as-code flow versus the default UI-edit flowTwo parallel flow diagrams. The top flow shows the default: engineer clicks Save in Grafana UI, which writes directly to the Grafana database. There is no review, no diff, no rollback. The bottom flow shows dashboard-as-code: engineer edits code in IDE, commits to git, opens a pull request which triggers CI that runs lint and validation, the PR is reviewed, on merge a deployment pipeline calls the Grafana HTTP API to overwrite the dashboard. The database becomes a derived cache. A footer note explains that the second flow takes 5-15 minutes per change but eliminates the silent-deletion failure mode.two flows — default UI vs dashboard-as-codedefault flow — the failure modeengineerclicks Save in UIGrafana UIserialise dashboardGrafana DBUPDATE dashboardsno diffno reviewno rollbackchange lands in 200ms; nobody knows it happened until the next incidentdashboard-as-code flowIDE edit.libsonnet / .pygit commitPR openedCI lintgrafonnet, jsonnetPR reviewhuman + CODEOWNERSmergemain branchdeploy→ Grafana APIgit is the source of truth; Grafana DB is a derived cache rebuilt by the pipelinechange lands in 5–15 minutes — the time cost is the reviewIllustrative — exact CI steps vary; the four checkpoints (lint, review, merge, deploy) are the minimum.
Illustrative — the dashboard-as-code pipeline. The 5–15 minute review tax is the price for never losing a panel to a careless Save click again.

The trade-off is intentional friction. The default UI flow lands a change in 200 milliseconds; the dashboard-as-code flow lands a change in 5-15 minutes. The 15-minute tax is the review. For tier-1 dashboards — the ones the on-call lands on at 02:46 IST — the tax is non-negotiable. For experimental dashboards on a developer's personal laptop, the UI is fine. The discipline is to mark which dashboards are tier-1 (production, runbook-linked, page-target) and put those in code. The rest can stay in the UI's "Sandbox" folder, edited freely, and get promoted to code when they prove their worth.

Why the database-first model is the failure mode and not just a design choice: in a database-first model, the dashboard's lifecycle is invisible. You cannot answer "who deleted the worst-tenant panel?" because the database does not record edit history beyond Grafana's History feature (which most teams disable to save storage), nor "what was on this dashboard 60 days ago when the last IPL final happened?" because the database has no versioning, nor "is the staging dashboard the same as production?" because there is no diff tooling between two Grafana databases. Each of these is answerable in 30 seconds when the JSON lives in git: git log -- dashboards/payments-api.json, git show HEAD~60:dashboards/payments-api.json, diff <(grafonnet-render staging) <(grafonnet-render prod). The dashboard-as-code lift is not pretty code; it is the operability of treating dashboards like the operational artefacts they are.

The four generator approaches — Jsonnet, Terraform, Python, raw JSON

There are four dominant ways to generate Grafana JSON in 2026, each with different ergonomics, blast radius, and team fit. Pick deliberately; mixing them in the same repo creates a lookup nightmare for the next on-call.

Approach 1: Raw JSON in git. Export the dashboard from the UI as JSON, commit it, and reapply via the Grafana HTTP API on every deploy. The simplest possible setup — no DSL to learn, no template engine, just files. The cost: zero abstraction means duplicate code across services. If you have 47 services that all want a "RED" dashboard (rate, errors, duration), you have 47 nearly-identical JSON files, and a panel-naming-convention change (renaming Errors per second to Error Rate (errs/s)) requires 47 edits. The diff-on-PR is also brutal — a one-line panel change can show up as a 200-line JSON diff because the export reorders keys. Use raw JSON when the dashboard count is under 5, the team is junior, and the abstraction tax of any DSL would slow them down.

Approach 2: Jsonnet + Grafonnet. Jsonnet is a configuration-as-code DSL that compiles to JSON, and Grafonnet is a Jsonnet library of Grafana primitives (row.new, panel.timeseries.new, query.prometheus.new, etc.). You write local g = import 'g.libsonnet'; g.dashboard.new('Payments API') + g.dashboard.withPanels([g.panel.timeseries.new('p99') + g.panel.timeseries.queryOptions.withTargets([g.query.prometheus.new(...)]), and jsonnet -J vendor dashboard.libsonnet > dashboard.json produces the JSON. The 47-services problem becomes a for service in services loop. Grafana Labs and most platform-engineering shops in India (PhonePe, Razorpay, Hotstar, per their public engineering blogs) standardise on Grafonnet. The cost: Jsonnet is an unfamiliar language for most backend engineers; the learning curve is real (1-2 weeks to fluency); and the error messages are notoriously cryptic. Use Jsonnet/Grafonnet when the dashboard count exceeds 20, you have at least one engineer willing to own the libsonnet library, and you need cross-dashboard consistency more than you need familiar syntax.

Approach 3: Terraform Grafana provider. The Terraform Grafana provider treats each dashboard, alert rule, and folder as a Terraform resource. You write resource "grafana_dashboard" "payments" { config_json = file("dashboards/payments.json") } and terraform apply reconciles the Grafana state to match. The win: dashboards live in the same Terraform that manages your AWS / GCP / Kubernetes infrastructure, with a single terraform plan showing all infrastructure-and-dashboard changes side by side. The cost: Terraform is heavy machinery for what is essentially "PUT this JSON to this URL"; state-file conflicts in CI are common (especially if multiple engineers run apply from their laptops); the provider has historically lagged Grafana feature releases by 2-6 months. Use Terraform when your team already runs Terraform for everything else, you want a single PR-and-apply workflow, and you can tolerate a slower upgrade cadence.

Approach 4: Python templating. Write a Python function that returns the dashboard JSON dictionary, customise per service, and POST to the Grafana API via requests. The win: Python is universally familiar to backend engineers in India, the logic for "if this service has a database, add a database row" is trivial Python, and the test story is normal pytest. The cost: you build your own grafonnet-equivalent (the panel/row/query helper functions) — typically 200-500 lines of Python before you have a useful dashboard library. Spotify and Cloudflare have published variants of this approach; in India, Cleartrip's platform team has talked about a dashboard_factory.py pattern. Use Python when your team is Python-heavy, you are uncomfortable with Jsonnet, and you are willing to invest 1-2 engineer-weeks in building the helper library before reaping the benefits.

The mistake to avoid is the partial adoption — a repo where some dashboards are raw JSON, some are Grafonnet, some are Terraform, and one is a Python script. The next on-call sees a broken panel, opens git, and has to figure out which abstraction layer to edit. Pick one approach for each class of dashboard (tier-1 service dashboards in Grafonnet, infrastructure dashboards in Terraform), document the boundary, and resist the urge to mix.

The four generator approaches mapped against learning curve and abstraction powerA 2D scatter chart comparing four generator approaches. The X axis is learning curve from low to high; the Y axis is abstraction power from low to high. Raw JSON sits low-left (low learning curve, low abstraction). Python templating sits middle-left. Terraform sits middle-right. Grafonnet/Jsonnet sits upper-right. Each point is annotated with example file size, dependency count, and recommended team size. A footer explains that the right choice is a function of team toolchain and dashboard count, not absolute superiority.four generators — pick by team fit, not by trendlearning curve →abstraction power →lowhighlowhighraw JSON~2-5 KB / dashboarddeps: jq, curlteam: ≤ 5 dashboardsPython templating~50-200 LOC factorydeps: requeststeam: 5-50 dashboardsTerraform providerresource per dashboarddeps: tf, state backendteam: existing-tf shopGrafonnet (Jsonnet)~10-50 LOC / dashboarddeps: jsonnet, vendorteam: 50+ dashboardsIllustrative — choice is a function of team toolchain and dashboard count, not absolute superiority.
Illustrative — the four generator approaches mapped against learning curve and abstraction power. Most teams should start at "Python templating" and migrate to Grafonnet only when dashboard count exceeds ~50.

A working Python dashboard generator

The example below is a runnable Python harness that builds a Grafana JSON model for a service, posts it to a local Grafana via the HTTP API, and verifies the upload by re-fetching the dashboard. It is the minimum-viable Python dashboard-as-code factory. The logic is small; the value is that the dashboard is now a function of (service_name, slo_target, prometheus_datasource_uid) rather than a hand-edited blob.

# dashboard_factory.py — generate and apply a Grafana dashboard from Python
# pip install requests
import json, requests, os, sys
from typing import Any

GRAFANA = os.environ.get("GRAFANA_URL", "http://localhost:3000")
TOKEN = os.environ["GRAFANA_TOKEN"]  # service account token, dashboards:write
HEADERS = {"Authorization": f"Bearer {TOKEN}",
           "Content-Type": "application/json"}

def panel(title: str, expr: str, unit: str, x: int, y: int,
          datasource_uid: str, panel_id: int) -> dict[str, Any]:
    """One timeseries panel — minimal viable Grafana panel JSON."""
    return {
        "id": panel_id, "type": "timeseries", "title": title,
        "gridPos": {"x": x, "y": y, "w": 12, "h": 8},
        "datasource": {"type": "prometheus", "uid": datasource_uid},
        "targets": [{"expr": expr, "refId": "A",
                     "datasource": {"type": "prometheus",
                                    "uid": datasource_uid}}],
        "fieldConfig": {"defaults": {"unit": unit}, "overrides": []},
    }

def red_dashboard(service: str, slo_p99_ms: int, ds_uid: str) -> dict[str, Any]:
    """RED dashboard for a service — rate, errors, duration."""
    panels = [
        panel(f"{service} — req/s", f'sum(rate(http_requests_total'
              f'{{service="{service}"}}[5m]))', "reqps", 0, 0, ds_uid, 1),
        panel(f"{service} — error rate %",
              f'100 * sum(rate(http_requests_total{{service="{service}",'
              f'status=~"5.."}}[5m])) / sum(rate(http_requests_total'
              f'{{service="{service}"}}[5m]))', "percent", 12, 0, ds_uid, 2),
        panel(f"{service} — p99 ms (SLO {slo_p99_ms}ms)",
              f'1000 * histogram_quantile(0.99, sum by (le) (rate('
              f'http_request_duration_seconds_bucket{{service="{service}"}}'
              f'[5m])))', "ms", 0, 8, ds_uid, 3),
        panel(f"{service} — burn rate (1h)",
              f'(sum(rate(http_requests_total{{service="{service}",'
              f'status=~"5.."}}[1h])) / sum(rate(http_requests_total'
              f'{{service="{service}"}}[1h]))) / 0.005',
              "short", 12, 8, ds_uid, 4),
    ]
    return {
        "title": f"{service} — RED + SLO",
        "uid": f"red-{service}", "tags": ["red", "service", "auto-generated"],
        "timezone": "Asia/Kolkata", "schemaVersion": 39, "version": 0,
        "refresh": "30s", "time": {"from": "now-1h", "to": "now"},
        "panels": panels,
        "templating": {"list": [{
            "name": "tenant", "type": "query", "datasource": {"uid": ds_uid},
            "query": f'label_values(http_requests_total'
                     f'{{service="{service}"}}, tenant_id)',
            "includeAll": True, "multi": True}]},
    }

def apply(dashboard: dict[str, Any]) -> dict[str, Any]:
    payload = {"dashboard": dashboard, "overwrite": True,
               "message": "automated apply from dashboard_factory.py"}
    r = requests.post(f"{GRAFANA}/api/dashboards/db",
                      headers=HEADERS, data=json.dumps(payload))
    r.raise_for_status()
    return r.json()

if __name__ == "__main__":
    services = [("payments-api", 200), ("orders-api", 300),
                ("ledger-api", 150), ("notifications-api", 500)]
    ds_uid = sys.argv[1] if len(sys.argv) > 1 else "PBFA97CFB590B2093"
    for svc, slo in services:
        result = apply(red_dashboard(svc, slo, ds_uid))
        print(f"applied {svc}: uid={result['uid']} version={result['version']} "
              f"url={GRAFANA}{result['url']}")

Sample run output:

$ GRAFANA_TOKEN=glsa_xxx python3 dashboard_factory.py PBFA97CFB590B2093
applied payments-api: uid=red-payments-api version=3 url=http://localhost:3000/d/red-payments-api/payments-api-red-slo
applied orders-api: uid=red-orders-api version=2 url=http://localhost:3000/d/red-orders-api/orders-api-red-slo
applied ledger-api: uid=red-ledger-api version=2 url=http://localhost:3000/d/red-ledger-api/ledger-api-red-slo
applied notifications-api: uid=red-notifications-api version=1 url=http://localhost:3000/d/red-notifications-api/notifications-api-red-slo

The mechanism per load-bearing line: def panel(...) is the smallest stable abstraction — a function that returns one panel dict. Every panel in the dashboard goes through it, so any panel-shape change (adding a description field, renaming gridPos keys when Grafana 11.x ships a new layout system) is a one-place edit. def red_dashboard(service, slo_p99_ms, ds_uid) is the dashboard-shape function — parameterised by the three things that vary across services. The 47-service problem from the previous section becomes 47 calls to this function, no JSON duplication. Why the parameterisation has to be deliberate, not "make everything a parameter": every parameter you expose is a parameter the next engineer has to understand. A dashboard factory with 23 parameters is unreadable; a factory with 3 (service, slo_target, datasource_uid) is obviously reusable. The discipline is to push variation into the data (a list of (service, slo) tuples) and out of the API — the function signature should fit on one line. When tempted to add a fourth or fifth parameter, ask whether the variation belongs in the dashboard schema (which evolves slowly) or in the data passed to it (which evolves daily). Most "I need to parameterise this too" requests belong in data, not in the function signature.

"uid": f"red-{service}" is the deterministic UID — Grafana indexes dashboards by UID, and re-using the same UID on apply makes the operation idempotent (POST to /api/dashboards/db with overwrite: true updates rather than creating a new dashboard). Without a deterministic UID, every CI run creates a new dashboard with a random UID, polluting the dashboard list and breaking runbook deep-links. "templating": {"list": [{"name": "tenant", ...}]} is the Grafana template-variable definition — it adds a tenant dropdown to the top of the dashboard, populated dynamically from label_values(http_requests_total{service="payments-api"}, tenant_id). The drill-down architecture from the previous chapter requires this — the worst-tenant panel's data-link sets var-tenant=$__field.labels.tenant_id, and the destination dashboard reads that variable to pre-filter. Without the templating block, the link goes to a dashboard that ignores the variable, breaking the click-through.

Why overwrite: true on the API call is non-negotiable for a CI pipeline: without it, the second apply of the same dashboard returns HTTP 412 Precondition Failed because the dashboard already exists. CI then either crashes (and somebody has to manually clear the dashboard) or you handle the error and skip — at which point your CI pipeline silently no-ops on every run after the first, and the dashboard never updates. overwrite: true says "replace whatever is there, version-bump the dashboard, accept the risk that two parallel CI runs racing on the same dashboard will produce a non-deterministic final state". For tier-1 dashboards in a serial CI pipeline (which is what every team should run for dashboard deploys), overwrite: true is correct and safe.

A practical operational note: the Grafana API token (GRAFANA_TOKEN) needs the dashboards:write scope, and in practice should be a service account token (not a personal API key), with the Editor role at the folder level rather than Admin org-wide. Service account tokens are what survive engineer turnover; personal API keys vanish when the engineer leaves, breaking your CI silently the first time you deploy after their offboarding. Razorpay's platform team had a documented incident in 2024 where their dashboards stopped updating for three weeks after a senior engineer left, because the CI pipeline was using their personal token. The diagnostic is to grep your CI logs for GRAFANA_TOKEN and audit who owns the corresponding token; the fix is one service account per CI pipeline, with rotation policy.

The version field returned by the API is the other operational lever. Grafana increments version on every save, and the API's overwrite: false mode rejects a save if the on-disk version is older than the live version — protecting against the "two engineers race to save" case in interactive editing. In a CI pipeline the protection is unwanted (you want CI's apply to be authoritative) but the version number is still useful for audit: you can grep your deploy logs for the version sequence and detect "the version jumped from 47 to 52 in a single deploy", which means somebody manually edited the dashboard in the UI four times between CI deploys. Razorpay's drift-detection script alerts on any version delta greater than 1 between consecutive CI deploys, treating it as a signal of policy violation. The signal-to-noise is high — most legitimate edits flow through CI and produce a +1 increment — so any +N alert is worth a Slack ping to the offending engineer.

The CI pipeline — what "review" actually checks

A dashboard-as-code repo without a meaningful CI pipeline is just JSON in git — better than database-only, but still vulnerable to "merge a PR that breaks the dashboard, find out at 02:46 IST that the panel doesn't render". The CI pipeline has to validate the dashboard structure, the queries, and the click-paths before merge. Here is the full ladder:

Stage 1: JSON syntax. python3 -c 'import json; json.load(open("dashboard.json"))'. Catches malformed JSON. Trivial. Run on every PR.

Stage 2: Grafana schema validation. Grafana ships a JSON schema for the dashboard model (grafana/grafana/blob/main/public/app/features/dashboard/api/dashboard_schema.cue for v2; the older v1 dashboard schema is implicit and validated at apply-time). Use cuelint or kubeconform-style validation against the schema to catch missing required fields, wrong types, deprecated fields. The schema check catches "the panel has gridPos.h as a string instead of an integer" — a class of bug that the Grafana UI silently coerces but that the API rejects.

Stage 3: PromQL / LogQL query validation. This is the high-value step most teams skip. Run promtool (and logcli check) against every query in every panel, against a real Prometheus instance (typically a staging Prometheus that has the same metrics shape as production but lower retention). The check catches "this panel references http_request_duration_seconds_bucket but the metric is actually called http_server_duration_bucket" — a typo that the dashboard renders as "No data" and that the on-call engineer interprets as "the service is down". Run on every PR.

Stage 4: Click-path validation. Walk the links arrays on every panel. For each link, verify that the destination dashboard exists (resolves a UID) and that the variable substitutions (var-tenant=$__field.labels.tenant_id) reference variables that the destination dashboard declares. The check catches "this panel links to a dashboard that was renamed three sprints ago" — a class of silent breakage that kills the drill-down architecture from the previous chapter. Run on every PR.

Stage 5: Visual regression. Render the dashboard headlessly (Grafana's image-renderer plugin) and diff against the previous PR's render. The diff catches "this PR added a panel that overlaps an existing panel" or "this PR removed the worst-tenant panel". Run on PR for tier-1 dashboards; the rendering is slow (10-30s per dashboard) so skip for tier-2/3.

Stage 6: Apply to a staging Grafana, run a smoke test. Apply the dashboard to a staging Grafana, then have the smoke test script open each panel's query, execute it against staging Prometheus, and assert that something returned. The smoke test catches "the metric exists but has zero series for this service" — a bug pattern where a service was renamed and the dashboard still references the old name. Run on PR for tier-1 only; the staging environment cost makes this expensive for every PR.

The combined CI pipeline takes 90 seconds to 4 minutes per PR. The 4-minute upper bound is what teams that fully implement Stage 6 see. Most teams stop at Stage 4 and accept that Stage 5/6 catches are caught at deploy time instead of merge time. The right cutoff depends on tier-1 dashboard count: if you have fewer than 10 tier-1 dashboards, Stages 1-4 are enough; above 10, the visual-regression catch frequency justifies Stage 5; above 50, Stage 6's smoke test starts saving incident time at a rate that justifies the staging cost.

Why Stage 3 (PromQL validation) is the highest-value step despite being the easiest to skip: a dashboard panel that references a non-existent metric renders as a blank "No data" panel — visually indistinguishable from "the service is currently emitting zero values for this metric". At incident time, the on-call engineer sees a blank panel and has to decide whether the panel is broken or whether the service is genuinely silent. That decision typically takes 2-5 minutes of cross-checking with curl /metrics directly against the service. Stage 3 catches the bad reference at PR time, when the cost of fixing it is 30 seconds; deferred to incident time, the cost is 2-5 minutes of wasted diagnostic time per occurrence, multiplied by the number of times the bad panel surfaces during the panel's lifetime. The expected-value calculation is heavily in favour of Stage 3, even when the team is small. Skipping it because "we don't have time to set up promtool against staging Prometheus" is exactly the kind of false economy that produces the failure modes the chapter opened with.

A sample CI workflow file (GitHub Actions style) for a Python-templating-based repo looks like this:

# .github/workflows/dashboards.yml
name: dashboards
on: [pull_request, push]
jobs:
  validate:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with: { python-version: '3.11' }
      - run: pip install requests jsonschema promtool-py
      - run: python3 scripts/render_dashboards.py --out=out/
      - run: python3 scripts/validate_schema.py out/*.json
      - run: python3 scripts/validate_promql.py
                       --prom=https://staging-prom.dhanflow.in out/*.json
      - run: python3 scripts/validate_links.py out/*.json
  deploy:
    needs: validate
    if: github.ref == 'refs/heads/main'
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - run: python3 scripts/render_dashboards.py --out=out/
      - run: python3 scripts/apply.py --grafana=$GRAFANA --token=$TOKEN out/*.json
        env:
          GRAFANA: ${{ secrets.GRAFANA_URL }}
          TOKEN: ${{ secrets.GRAFANA_TOKEN }}

The four validate steps map to Stages 2-4 of the ladder; Stages 5 and 6 are not shown because they require a running staging Grafana, which most teams provision via Terraform in a separate workflow. The deploy job runs only on main after merge — feature branches go through validation but not deploy, so a broken main is the only way a broken dashboard reaches production. The secrets.GRAFANA_TOKEN is the service account token discussed earlier, scoped to write to the Production/ folder only.

When dashboards-as-code itself becomes the bottleneck

Dashboard-as-code is a discipline, and any discipline taken to the extreme becomes counterproductive. The four common failure modes:

Failure 1: every panel change requires a PR. A backend engineer wants to add one panel to a tier-3 exploration dashboard for a half-day debugging session. With strict dashboard-as-code, they open a PR, wait for CI, wait for review, merge, wait for deploy — total 30-90 minutes — for a panel they will delete tomorrow. The fix is folder-level granularity: the Production/ folder in Grafana is read-only via the UI and writable only by CI; the Sandbox/ folder is editable by anyone. New dashboards start in Sandbox, prove their worth over a sprint or two, and get promoted to Production via a PR. Most teams find that ~80% of dashboards live in Sandbox forever and never need code-review discipline.

Failure 2: the abstraction outgrows the team's understanding. The platform team builds a 2000-line Grafonnet library with helper functions for every conceivable pattern. New backend engineers cannot add a panel without learning Jsonnet first. PRs sit for days because nobody understands the abstraction. The fix is to treat the dashboard library as a product with users — measure the time-to-first-panel for new engineers, and if it exceeds a day, simplify. Grafana Labs themselves keep Grafonnet small (~3000 lines for the entire library) because the abstraction tax compounds across teams.

Failure 3: drift from manual edits. Despite the policy, engineers (especially senior ones during incidents) edit dashboards in the UI to "quickly fix something". The next CI deploy clobbers their fix without warning. The fix is automated drift detection — a nightly job that diffs the live Grafana state against the rendered code, posts the diff to Slack, and either auto-reapplies (overwriting the manual edit) or opens an alert (informing the manual editor that their change is about to be lost). Hotstar's platform team published a grafana-drift-checker Python script that runs every 6 hours and posts to #observability-drift.

Failure 4: the deploy pipeline becomes the SPOF. If the CI pipeline that deploys dashboards is broken, dashboard updates queue up, and at some point the on-call engineer needs a panel that has been merged but not deployed. The fix is manual override — a documented procedure for an on-call engineer to apply a dashboard from their laptop using the CI's deploy script (./scripts/deploy-dashboard.sh dashboards/payments.json) when the CI itself is down. The override is logged to Slack so the platform team knows it happened. Without an override path, the dashboard-as-code pipeline becomes brittle in exactly the moment when you need it most.

There is a fifth failure mode that is rarer but worth naming: template-variable rot. A dashboard variable like var-tenant is populated by a query against Prometheus (label_values(http_requests_total{service="payments-api"}, tenant_id)). If the metric is renamed, the variable silently returns an empty list, the dropdown shows "no values", and every panel that filters by $tenant shows "No data". The visual appearance is identical to "the service has no traffic" — which is the worst possible failure mode at 02:46 IST when the on-call needs to know whether the service is dead or the dashboard is broken. The fix is to add a panel-level "metadata" check to the CI pipeline: render the dashboard against staging Prometheus, assert that every template variable returns at least one value, fail the build if any variable comes back empty. The check takes 5-10 seconds per dashboard and catches a class of bug that is otherwise invisible until incident time. PhonePe's platform team published an internal post-mortem in 2025 attributing 11 minutes of an SRE's diagnostic time during a UPI incident to "the tenant dropdown is empty, so I assumed the service was deregistered" — which it was not; the metric had been renamed in a refactor that was merged at 17:00 IST that evening.

A sixth failure mode worth flagging because it confuses new adopters: the "preview render disagrees with production" trap. The CI pipeline renders the dashboard against a staging Prometheus that has different metric cardinality, different retention, and different label values than production. A panel that renders cleanly in CI ("here are the 5 active tenants in staging") can render badly in production ("here are 12,000 active tenants, the legend overflows the panel, the page hangs in the browser"). The fix is to either (a) seed staging with production-shaped synthetic data so the render reflects production reality, or (b) add a tier-1-only post-deploy smoke test that opens the dashboard in production via Grafana's image-renderer and validates that the rendered HTML page is under, say, 5 MB (a proxy for "the dashboard does not blow up the browser"). Most teams pick (b) because seeding staging realistically is a months-long effort. Cleartrip's platform team published their dashboard-render-budget script in 2025; it caught two incidents in its first quarter, both involving a legend.calcs configuration that summed across millions of series at render time.

Beyond the named failure modes, the meta-failure to watch for is complacency: once dashboards-as-code is in place and the pipeline has been working for a year, teams stop reviewing dashboard PRs as carefully as they review code PRs. The CI passes, the visual regression is green, the reviewer LGTMs in 30 seconds. Six months later somebody realises that a PR added a panel that always shows "No data" because the metric was misspelled, and nobody noticed because nobody was looking. The discipline of treating dashboard PRs as first-class engineering changes — with a written description of what the dashboard is for, what tier it belongs to, and what runbook references it — is what keeps the system honest. Razorpay's platform team enforces this with a pull-request template that has three required sections: "what does this dashboard answer", "what tier is it", "what runbook links to it". Empty answers fail CI.

Common confusions

  • "Dashboards-as-code is the same as exporting JSON to git." Exporting JSON is a one-time copy that drifts from the live Grafana the moment somebody edits in the UI. Dashboards-as-code is the full loop: code → CI → API → live state, with drift detection closing the loop. A repo of exported JSON files without the deploy pipeline is just a backup, not a source of truth.
  • "Terraform Grafana provider is dashboards-as-code." It is one implementation of dashboards-as-code, suited to teams that already run Terraform. It is not the only one, and the choice is not "Terraform vs not". The choice is "which generator fits your team's existing toolchain". Grafonnet and Python templating are equally valid; the discipline is generator-agnostic.
  • "The dashboard JSON is too verbose to live in git." The exported JSON is verbose because the UI exporter dumps every key, including UI-state defaults. Generated JSON (via Grafonnet, Python, etc.) is concise because you only emit the fields that matter. A 2000-line UI-exported JSON is typically ~300-500 lines as Grafonnet input.
  • "Code review can substitute for testing the dashboard." Code review catches "this panel uses a deprecated panel type"; it cannot catch "this PromQL query returns no data because the metric does not exist on this service". Both are needed: review for taste, CI for correctness. Stages 3 (query validation) and 4 (click-path validation) are non-negotiable.
  • "Dashboard-as-code requires a giant platform-engineering investment." The minimum viable dashboard-as-code setup is ~50 lines of Python (the factory above) and a CI pipeline of ~10 lines of YAML that runs python3 dashboard_factory.py and posts to Grafana. Teams that wait for "a proper platform-engineering effort" to start dashboard-as-code never start. Begin with one tier-1 dashboard, prove the pattern, then expand.
  • "Once we have dashboards-as-code, we never need to touch the UI." The UI is still where you prototype a dashboard — drag panels around, tune queries, iterate quickly. The discipline is that prototyping happens in the Sandbox folder (UI-edited), and once the dashboard is good, you export the JSON, generate equivalent code, delete the UI version, and apply via CI. The UI is a tool; production dashboards are code.

Going deeper

Grafonnet — the Jsonnet library that most large platform teams pick

Grafonnet (grafana/grafonnet) is a Jsonnet library that provides typed primitives for every Grafana panel type, query datasource, and dashboard structural element. The library is auto-generated from Grafana's CUE schema, which means new Grafana features show up in Grafonnet within weeks of release. The typical Grafonnet setup is a vendor/ directory pinned to a Grafonnet version, a per-team dashboards/ directory of .libsonnet files, and a Makefile that runs jsonnet -J vendor dashboards/payments.libsonnet > out/payments.json for each dashboard. The output JSON is then applied via the Grafana API. Grafonnet's strength is type-safety — g.panel.timeseries.queryOptions.withTargets rejects a target that is missing a refId, where the equivalent Python or raw JSON would silently produce an invalid dashboard. The weakness is the Jsonnet learning curve; new engineers need 1-2 weeks to be productive, and the error messages ("field 'gridPos' not found" with no line number) are notoriously bad. PhonePe, Razorpay, Hotstar, and Flipkart have all standardised on Grafonnet for their tier-1 dashboards based on their public engineering posts.

kube-grafana-operator and the Kubernetes-native approach

For teams running Grafana inside Kubernetes (Grafana Helm chart, Grafana Operator), there is a third path: a Dashboard Custom Resource (CRD) that wraps the JSON model. You write apiVersion: grafana.integreatly.org/v1beta1; kind: GrafanaDashboard; spec: { json: |- ... } and apply via kubectl. The Grafana operator reconciles the CR to the live Grafana state. The win: dashboards become first-class Kubernetes objects, lifecycle-bound to namespaces, GC'd when the namespace is deleted. The cost: the JSON still has to come from somewhere (typically generated by Jsonnet or Python, then embedded in the CR), and the operator adds latency (typically 10-60 seconds between kubectl apply and the dashboard being live). The operator is the right pattern when your team already runs everything as Kubernetes resources via ArgoCD or Flux; otherwise it is an extra abstraction layer.

Versioning and rollback — the git checkout that saved the on-call

A dashboard-as-code setup makes rollback trivial: git revert <bad-commit> followed by a CI deploy restores the previous dashboard. This is the operational win that justifies the entire pipeline. Compare with the database-only model where rollback requires either Grafana's History feature (often disabled), a Postgres point-in-time restore (operationally expensive), or rebuilding the dashboard from a screenshot. A documented case from Cleartrip in 2025: a senior SRE accidentally merged a PR that removed the booking-funnel panel from the tier-1 dashboard; the rollback was a single git revert followed by a CI rerun, total time-to-restore 8 minutes. The same incident in their pre-as-code era (2022) took 4 hours, because the dashboard had to be rebuilt from screenshots a junior engineer had luckily saved.

Multi-environment parity — staging dashboard equals production dashboard

A subtle but critical win of dashboards-as-code: staging and production dashboards can be guaranteed identical (modulo datasource UIDs) because both are generated from the same code with different (env, datasource_uid) parameters. In a database-only model, staging dashboards drift from production over time — engineers hand-tweak staging during debugging and forget to mirror to production, or vice versa. The drift produces "the staging panel showed it but the production panel didn't" debugging stories during incidents, where the on-call cannot trust their dashboard because they have not seen this particular panel-shape in production before. Dashboard-as-code with a single source eliminates this class of bug at the architectural level.

Tiered folders, drift detection, and Scenes — the operational stack around the code

The cleanest dashboard-as-code adoption pattern is tiered. Production tier-1 dashboards (the ones PagerDuty deep-links to) live in a Production/ Grafana folder with UI-edit permissions removed; only the CI service account can write. Tier-2 dashboards (per-team operational, not in any runbook) live in a Team/ folder where the team's engineers can edit via UI but a nightly git diff runs to detect drift and Slack the team. Tier-3 dashboards (personal exploration, ad-hoc capacity planning, the SRE's "is this query right?" checks) live in Sandbox/, fully UI-editable, no drift checks, with a TTL — dashboards untouched for 60 days are auto-archived. The tiering matches the operational criticality of the dashboard, not the engineering effort to make it. Hotstar's IPL platform team documented this tiering as part of their 2025 SREcon Mumbai talk and reported that ~85% of their dashboard count lives in Sandbox and never enters the CI pipeline. Looking forward, Grafana 10+ ships with Scenes, a runtime framework where dashboards are constructed from React-like components rather than static JSON; the dashboard JSON model is gradually being supplemented (eventually, replaced) by a more dynamic representation. As of 2026, dashboard-as-code via Scenes is still nascent — the API is unstable, and Grafonnet/Terraform have not yet caught up — so most teams continue to generate the static JSON model and design their abstractions to be Scenes-compatible (parameterised, function-driven, testable) so the eventual migration is a re-implementation of the renderer rather than a rewrite of the dashboard library. Concurrent-CI races are the third operational concern: two CI runs that race to apply the same dashboard produce a non-deterministic final state, and the mitigation is per-dashboard locking via a redis key or a Grafana annotation, which adds 100-300ms of latency per dashboard and eliminates the race; most teams under 50 dashboards skip this and rely on serial CI, while teams with 200+ dashboards find the race surfaces 1-2 times a quarter and the lock pays for itself.

Where this leads next

The CI pipeline this chapter outlines feeds directly into /wiki/wall-dashboards-are-where-observability-touches-leadership — the wall chapter that closes Part 12 by treating the dashboard as a leadership artefact, where the cost of broken dashboards is measured not in MTTR but in trust between engineering and the executive team that depends on the dashboard for situational awareness.

Within Part 13 (OpenTelemetry internals), the chapter on resource attributes and the OTLP semantic conventions covers the canonical metric and span attribute names that dashboards-as-code generators should reference — service.name, http.method, db.statement — so that a single Python factory function can generate a dashboard for any OTel-instrumented service without per-service customisation. In Part 11 (alerting), the chapter on alerts-as-code is the natural sibling: the same git → CI → API → live-state pipeline applies to PrometheusRule manifests and Grafana alert rules, and most teams ship dashboard-as-code and alert-as-code together as a single platform initiative.

Cross-curriculum, this chapter cross-links to the data-engineering material on /wiki/lineage-as-the-foundation-of-data-trust — dashboards-as-code is, structurally, the same lineage problem as data-pipeline-as-code: a versioned graph of derived artefacts whose source-of-truth lives in git, applied to a runtime by an idempotent deploy pipeline.

# Reproduce this on your laptop
docker run -d -p 3000:3000 grafana/grafana
docker run -d -p 9090:9090 prom/prometheus
# Create a service-account token in Grafana UI: Administration → Service accounts → New token
export GRAFANA_TOKEN=glsa_xxx
python3 -m venv .venv && source .venv/bin/activate
pip install requests
python3 dashboard_factory.py PBFA97CFB590B2093  # your prom datasource uid
# Then visit http://localhost:3000/d/red-payments-api to see the dashboard

References