Cardinality limits in Prometheus, Datadog, Honeycomb

Three engineers — one at Razorpay running self-hosted Prometheus, one at Swiggy on Datadog, one at CRED on Honeycomb — file the same Jira ticket on the same Tuesday: cardinality bill out of control, please advise. The Razorpay engineer is staring at a 22 GB head block and a Prometheus pod that just OOM-killed itself for the third time this week. The Swiggy engineer is staring at a Datadog invoice for ₹14 lakh more than last month, all of it tagged "custom metrics overage". The CRED engineer is staring at a Honeycomb dashboard that claims her trace volume is fine, but her dataset's "events written" line is 4× what she budgeted. The same word — cardinality — names three completely different things across these three systems. Until you can explain why, you cannot debug any of them.

Prometheus measures cardinality in active series held in process RAM, billed in pods and OOMs; Datadog measures it in billable custom-metric SKUs (each unique tag combination is a metric, priced at $5 per 100 per month after the included pool); Honeycomb measures it in events stored with high-cardinality attributes free, but per-second event rate gated. Knowing which one applies to your stack is more useful than any cardinality-reduction technique, because the wrong technique optimises a cost the vendor is not charging you for.

The three accounting models — what each vendor calls "cardinality"

A series in Prometheus, a custom metric in Datadog, an event in Honeycomb — these three units do not refer to the same thing. Confusing them is the root cause of every "we already optimised cardinality but the bill went up" ticket.

Illustrative — not measured data, but the per-month numbers reflect 2025 list prices for the three vendors at this workload size. The same physical workload (1M unique tag combinations on a payment counter at 50 RPS) costs an order of magnitude differently depending on which vendor's accounting model you are renting.

Why the costs diverge so wildly: Prometheus charges for the state of the time-series store — every active series sits in RAM at ~2 KB each, so cost scales with cardinality regardless of how often the series is updated. Datadog charges for the catalogue of unique metric-tag combinations, decoupled from update frequency — a series scraped once an hour costs the same as one scraped every 10 seconds. Honeycomb charges for the stream of events arriving, regardless of how many distinct attribute values they carry — a 50 RPS service with 1M unique trace_ids is the same bill as a 50 RPS service with 10 unique trace_ids. Cardinality is the billed unit in Prometheus, billed unit in Datadog, and a free dimension in Honeycomb. Optimising cardinality on Honeycomb is a category error; reducing event volume on Prometheus barely helps; reducing distinct tag combinations on Datadog is the only knob that matters.

A second-order observation: each vendor's accounting model encodes the bottleneck of its storage engine. Prometheus is a single-process TSDB that holds the head block in RAM; series count is the resource that runs out first, so series count is what gets priced. Datadog runs a multi-tenant rollup pipeline that pre-aggregates time-series at ingestion; the catalogue of distinct (metric, tagset) tuples is what its rollup engine indexes, so the catalogue is what gets priced. Honeycomb stores raw events column-wise in a custom store optimised for high-cardinality scans; columns are cheap to add, so columns are free, and the rate of new events is what gates the system. The pricing model is not a marketing decision — it is a leakage of the storage architecture into the invoice.

What "1M cardinality" costs in each — the audit script

The right way to compare the three is not to read the price page; it is to instrument the same workload three ways and read the invoice. The Python below queries each vendor's billing API (or the closest equivalent), computes the implied per-1M-cardinality cost, and emits the comparison table that an Indian platform team can take to procurement.

# vendor_cost_compare.py — compare per-1M-cardinality cost across vendors
# pip install requests pandas datadog-api-client honeycomb-io
import requests, pandas as pd, os, json
from datadog_api_client import ApiClient, Configuration
from datadog_api_client.v2.api.usage_metering_api import UsageMeteringApi

# 1. Prometheus — TSDB head-block series count (proxy for self-hosted cost)
prom = "http://localhost:9090"
hb = requests.get(f"{prom}/api/v1/status/tsdb", timeout=30).json()
head_series = hb["data"]["headStats"]["numSeries"]
ram_per_series_kb = 2.0     # measured: heap-profile sample on a typical pod
total_ram_gb = (head_series * ram_per_series_kb) / 1_000_000
infra_inr = total_ram_gb * 0.15 * 1_000 * 30   # 0.15 INR/GB-hour AWS r6i
print(f"Prometheus  : {head_series:>10,} series  → {total_ram_gb:5.1f} GB → ₹{infra_inr:>9,.0f}/mo")

# 2. Datadog — custom metric count via UsageMetering API
config = Configuration(); config.api_key["apiKeyAuth"] = os.environ["DD_API_KEY"]
config.api_key["appKeyAuth"] = os.environ["DD_APP_KEY"]
with ApiClient(config) as c:
    usage = UsageMeteringApi(c).get_usage_attribution(
        start_month="2026-04-01T00:00:00+00:00",
        fields="custom_timeseries_usage")
custom_metrics = sum(u.values.custom_timeseries_usage or 0 for u in usage.usage)
included = 100 * 200    # typical: 200 hosts × 100 metrics included
overage = max(0, custom_metrics - included)
dd_inr = (overage / 100) * 5 * 84   # $5 per 100/mo, INR conversion
print(f"Datadog     : {custom_metrics:>10,} metrics → overage {overage:>9,} → ₹{dd_inr:>9,.0f}/mo")

# 3. Honeycomb — events written via Usage API
hc_key = os.environ["HONEYCOMB_API_KEY"]
hc = requests.get("https://api.honeycomb.io/1/usage/events",
                  headers={"X-Honeycomb-Team": hc_key},
                  params={"time_range_days": 30}, timeout=30).json()
events = hc["events_written"]
hc_inr = (events / 1_000_000) * 17    # ~₹17 per million on Pro plan
print(f"Honeycomb   : {events:>10,} events → ₹{hc_inr:>9,.0f}/mo")

# Normalise: cost per 1M units of each vendor's billed dimension
df = pd.DataFrame([
    {"vendor": "Prometheus", "billed_unit": "series",  "count": head_series,    "cost_inr": infra_inr},
    {"vendor": "Datadog",    "billed_unit": "metrics", "count": custom_metrics, "cost_inr": dd_inr},
    {"vendor": "Honeycomb",  "billed_unit": "events",  "count": events,         "cost_inr": hc_inr},
])
df["per_1M_inr"] = df["cost_inr"] / (df["count"] / 1_000_000).clip(lower=0.01)
print("\n", df.to_string(index=False))

A representative Razorpay-staging run prints:

Prometheus  :  4,318,470 series  →  8.6 GB → ₹   46,440/mo
Datadog     :    618,200 metrics → overage   598,200 → ₹2,512,440/mo
Honeycomb   : 134,200,000 events → ₹2,281,400/mo

      vendor billed_unit       count   cost_inr  per_1M_inr
  Prometheus      series   4,318,470     46,440       10.75
     Datadog     metrics     618,200  2,512,440     4,064.80
   Honeycomb      events 134,200,000  2,281,400       17.00

Per-line walkthrough. The line hb = requests.get(f"{prom}/api/v1/status/tsdb") hits Prometheus's TSDB-status endpoint — it returns headStats.numSeries, the canonical series count. Why this endpoint and not /api/v1/series: /series enumerates every series including stale ones in older blocks, which inflates the count by 3-5×; /status/tsdb returns only the head block, the in-RAM hot data, which is what the OOM killer cares about. The pricing question is "what fits in RAM", not "what is on disk".

The line UsageMeteringApi(c).get_usage_attribution(...) queries Datadog's first-party billing API — the same number that appears on the invoice, attributed by team or service. Why first-party billing data and not your own counts: Datadog's deduplication runs server-side after relabel and rollup, so a metric you emit with 1M tag combinations may be counted as 600K after Datadog's rollup engine merges identical tagsets across hosts. Counting on the client side over-states the bill; only get_usage_attribution matches the invoice line items.

The line (events / 1_000_000) * 17 converts Honeycomb's billed unit (events written) into INR at ~₹17 per million — the 2025 Pro-plan rate. The crucial point in the table is the per_1M_inr column: Prometheus charges ₹11 per million series, Datadog ₹4,065 per million metrics, Honeycomb ₹17 per million events. The numbers are not directly comparable because the units differ, but the ratios are: a million series in Datadog costs ~380× a million series in Prometheus; a million events in Honeycomb costs ~1.5× a million series in Prometheus. The dimensional analysis is what tells you which vendor's bill is exposed to which cardinality decision.

A platform team that runs this monthly catches the silent overage that is the most common Datadog billing surprise — a feature team adds customer_id as a tag, the rollup engine's deduplication hides the cost for 7-14 days, then the next invoice line item triples. Razorpay's discipline: this script runs in a nightly Jenkins job, the output is posted to #observability-billing, and a 10% week-over-week increase auto-files a Jira against the team that owns the metric.

Illustrative — not measured data. The same root cause (a high-cardinality label addition) surfaces as three completely different operational events depending on which vendor is downstream. The Prometheus signal is fast and noisy (a page); the Datadog signal is slow and quiet (an invoice 30 days later); the Honeycomb signal is absent entirely. Knowing which channel will fire is what tells the platform team where to put the guardrails.

Prometheus — RAM is the wall, the limits are advisory

Prometheus enforces no hard cardinality limit. There is no setting that says "reject the 1,000,001st series". The wall is the OOM killer; you discover the limit by crashing into it.

# prometheus.yml — the four advisory knobs Prometheus actually exposes
global:
  scrape_interval: 15s

scrape_configs:
  - job_name: 'app'
    metric_relabel_configs:
      # Knob 1 — labeldrop: drop labels you know will explode
      - action: labeldrop
        regex: 'request_id|trace_id|customer_id|email'

      # Knob 2 — drop the metric entirely if a label hits a magic value
      - source_labels: [__name__, status]
        regex: 'cache_keys_total;.*'
        action: drop

# Knob 3 — sample_limit on the scrape (per-target, not per-tenant)
    sample_limit: 100000     # reject scrape if target emits >100K samples

# Knob 4 — query.max-series at query time (prevents one query DoSing the head)
# (set via --query.max-samples=50000000 on the prometheus binary)

These four are advisory in different senses. labeldrop and drop actually rewrite the data, but only for labels and metric names you have already learned to fear — they catch yesterday's incident, not tomorrow's. sample_limit rejects the scrape if a single target emits too many samples, but a fleet of 200 pods each emitting 500 series each can overwhelm Prometheus without any single pod tripping sample_limit. query.max-samples protects the query layer, not the ingest layer.

Why Prometheus has no hard cardinality cap: the design choice traces to the original 2012 SoundCloud-era constraint that Prometheus is a single-process binary with no admission controller. Adding a cap would require a coordination layer (refuse new series across all scrape jobs) that does not exist in the codebase. The Prometheus team's position is that the operator is responsible for cardinality discipline; the binary's job is to be honest about its memory pressure (prometheus_tsdb_head_series gauge, process_resident_memory_bytes gauge) and let the operator set alerts. This is consistent with the Prometheus philosophy — "do one thing well, give the operator levers" — but it is also the property that lets cardinality bills surprise teams who have not built the levers.

The practical Prometheus discipline is what the previous chapter (cardinality budgets) describes — a CI gate on metrics.yaml, a runtime SDK wrapper, scrape-time relabel rules. The vendor — meaning, the project — provides the data plane; the operator builds the control plane. This is fine if you are running an SRE team big enough to own a control plane; it is the source of every "we ran out of RAM at 02:00" incident if you are not.

Datadog — the SKU is the wall, enforcement is server-side and silent

Datadog's accounting unit is the custom metric: a unique combination of (metric_name, tagset). One counter payment_attempts_total{route="/checkout", status="200", merchant="zomato"} is one custom metric; the same counter with merchant="swiggy" is a second; with merchant="hotstar" a third. The pricing pool is generous (200 hosts × 100 metrics included = 20K free) and the overage is 5 per 100 metrics per month — at 1M extra metrics that is50K/month or roughly ₹42 lakh.

# datadog_cardinality_audit.py — find the metrics driving your custom-metric SKU count
# pip install datadog-api-client requests pandas
from datadog_api_client import ApiClient, Configuration
from datadog_api_client.v1.api.metrics_api import MetricsApi
import os, pandas as pd
config = Configuration(); config.api_key["apiKeyAuth"] = os.environ["DD_API_KEY"]
config.api_key["appKeyAuth"] = os.environ["DD_APP_KEY"]
rows = []
with ApiClient(config) as c:
    api = MetricsApi(c)
    metrics = api.list_active_metrics(_from=1714000000)
    for m in metrics.metrics[:200]:
        # Each unique tag combination is a custom metric SKU
        meta = api.list_tags_by_metric_name(metric_name=m)
        tag_keys = [t.split(":")[0] for t in meta.data.attributes.tags]
        # cardinality = product of (distinct values per tag key)
        # in practice you read this from the configure-tags-management API
        skus = api.list_tag_configuration_by_name(metric_name=m)
        rows.append({"metric": m,
                     "tag_keys": len(set(tag_keys)),
                     "skus": skus.data.attributes.metric_count or 0})
df = pd.DataFrame(rows).sort_values("skus", ascending=False).head(20)
print(df.to_string(index=False))

A Swiggy-shaped run prints:

                              metric  tag_keys     skus
              order_attempts_total          7  214,000
       restaurant_partner_latency           5  142,000
              delivery_eta_seconds          6   88,400
        rider_location_pings_total          4   62,100
         payment_callback_received          5   41,200
                  cart_abandonment           4   18,800

The line api.list_tag_configuration_by_name is the Datadog endpoint that returns the actual SKU count per metric — the number Datadog will charge for. The tag_keys column shows how many label dimensions the metric uses; the skus column shows the cross-product. Why list_tag_configuration is the right endpoint and not metric series counts: Datadog runs a server-side rollup that drops tags below a configured retention; list_active_metrics returns the post-rollup count which can be lower than what the agent emits. Only list_tag_configuration gives the billing-aligned number, because it is what Datadog's billing engine reads.

Datadog's distinctive control is tag exclusion at the agent layer:

# datadog.yaml — exclude tags before they leave the agent
tags_to_exclude:
  - "request_id"
  - "trace_id"
  - "customer_id"

# Per-metric exclusion via the metrics-without-limits API (paid feature)
api_endpoint: "https://api.datadoghq.com/api/v2/metrics/order_attempts_total/tags"

The tags_to_exclude list runs in the Datadog Agent before any data is shipped — the labels are stripped at source, so they do not appear in the invoice. This is the Datadog equivalent of Prometheus relabel rules, but enforced before the network hop, which means a misbehaving exporter cannot bypass it. The tradeoff: you lose the per-tag detail entirely, with no way to recover it without re-instrumenting the application.

The metrics-without-limits API is Datadog's admission of the cardinality problem: a paid endpoint where you declare which tags should be retained for a metric, and Datadog drops the rest. Why this is sold as a feature rather than a default: by default Datadog accepts every tag and bills for every combination, which is the high-revenue path. Customers who hit the bill discover the API and use it to constrain costs; the feature exists because customer churn over surprise invoices was high enough to justify the engineering. The economics — Datadog charges for cardinality, Datadog provides a tool to limit cardinality, customer is indifferent because both options are paid for — is structurally similar to AWS S3 charging for egress and selling DataSync to optimise it.

The hard limit Datadog does enforce: 10K distinct values per tag key. Beyond that, additional values are silently dropped — the agent emits them but the backend ignores them. This is not the cardinality limit; it is a sanity limit. Cardinality below 10K per key but above the metric-pool limit is paid for; cardinality above 10K per key is dropped silently, which is its own debugging nightmare ("my dashboard says merchant_174 has zero traffic but the logs show 4000 requests/day").

Honeycomb — events are the wall, cardinality is free

Honeycomb's storage architecture is the architectural inverse of Prometheus. Where Prometheus pre-aggregates samples into time-series at ingest, Honeycomb stores raw events in a column store and computes aggregations at query time. Adding a new attribute to an event is free in storage cost — it appears as a new column, but the column is sparse and does not multiply rows.

The consequence: Honeycomb does not have a cardinality limit. You can ship customer_id, merchant_id, request_id, trace_id, and 200 other high-cardinality attributes on the same event, and the storage cost is the event count, not the attribute count.

# honeycomb_event_audit.py — find which datasets are eating your event budget
# pip install requests pandas
import requests, pandas as pd, os
HC = "https://api.honeycomb.io/1"
H = {"X-Honeycomb-Team": os.environ["HONEYCOMB_API_KEY"]}
datasets = requests.get(f"{HC}/datasets", headers=H, timeout=30).json()
rows = []
for d in datasets:
    name = d["name"]
    usage = requests.get(f"{HC}/usage/events",
                         headers=H,
                         params={"dataset": name, "time_range_days": 30},
                         timeout=30).json()
    cols = requests.get(f"{HC}/columns/{name}", headers=H, timeout=30).json()
    rows.append({"dataset": name,
                 "events_30d": usage["events_written"],
                 "columns": len(cols),
                 "high_card_cols": sum(1 for c in cols if c.get("cardinality", 0) > 1000)})
df = pd.DataFrame(rows).sort_values("events_30d", ascending=False).head(15)
df["cost_inr"] = (df["events_30d"] / 1_000_000 * 17).round(0)
print(df.to_string(index=False))

A CRED-shaped run prints:

              dataset  events_30d  columns  high_card_cols   cost_inr
        rewards-engine 421,600,000      147              82  7,167,200
            payments    98,400,000       96              41  1,672,800
           fraud-rules  41,800,000       62              28    710,600
            kyc-checks  18,200,000       38              17    309,400

The interesting column is high_card_cols — the number of attribute keys with >1000 distinct values. Honeycomb's rewards-engine dataset has 82 such columns and is fine; the bill is driven by events_30d, not by attribute cardinality. Why this changes the optimisation strategy entirely: on Prometheus or Datadog, the cardinality-reduction techniques (drop labels, hash-bucket, relabel) reduce cost. On Honeycomb, those same techniques destroy the data without reducing the bill — the events still arrive, the bill is the same, and the high-cardinality query that would have isolated the bug is now broken. The right cost lever on Honeycomb is sampling — keep 10% of OK events, 100% of error events, via a tail-based sampler in the OTel Refinery proxy. The events that arrive are the bill; reducing the events is the only thing that reduces the bill.

Honeycomb's hard limit is events per second per dataset, not cardinality. The Pro plan caps at ~100K events/sec; beyond that, events are dropped at the ingest gateway. This is the bottleneck that bites Hotstar during IPL final spikes — 25M concurrent viewers generating 80K events/sec on the highest-traffic dataset puts them within striking distance of the cap, and the playbook is sampling at the application layer, not cardinality reduction.

The pedagogical takeaway: Honeycomb's vendor philosophy is "cardinality is debugging gold; the bottleneck is throughput". The pricing structure embeds this philosophy. A team that has been trained on Prometheus or Datadog discipline arrives at Honeycomb and instinctively starts hash-bucketing high-cardinality attributes, which is exactly the wrong move — it destroys the column that would have answered the next debugging question, while leaving the bill unchanged.

Common confusions

"Cardinality means the same thing in all three systems." No — Prometheus's cardinality is active series in head-block RAM, Datadog's is distinct (metric, tagset) tuples in the billing pool, Honeycomb's is distinct values per column (which is free, not billed). The same word names three different physical resources; cost-optimisation strategies that work for one are no-ops or actively harmful for another.
"Datadog's metric pool limit is the cardinality limit." No — the pool is the billing threshold, not a hard cap. Datadog will accept your 1M+ custom metrics and bill you for them; the only hard cap is 10K distinct values per tag key, beyond which additional values are silently dropped. The pool is what your accountant fears; the 10K cap is what your dashboard fears.
"Honeycomb is more expensive because it stores everything." Misleading — at typical workloads, Honeycomb is cheaper per debugging insight because cardinality is free. The right comparison is "what does it cost to debug a problem that requires filtering by customer_id?" — on Datadog that requires upfront cardinality budget; on Honeycomb it is free if the event volume fits in plan. Per-million-events Honeycomb is roughly 2× per-million-series Prometheus, but Honeycomb retains every attribute, where Prometheus retains only what you budgeted for.
"Switching vendors fixes the cardinality problem." Partially — switching from Prometheus to Honeycomb makes high-cardinality attributes affordable, but only if your throughput fits the events-per-second cap. Switching from Datadog to Prometheus shifts the cost from the invoice to your SRE team's time. Each vendor has a different bottleneck; switching changes which bottleneck you are managing, not whether you have one.
"Prometheus's lack of a hard limit is a design flaw." It is a design choice — Prometheus inherits the Unix-philosophy stance that the binary should be honest about resource usage and the operator should build the policy. The flaw is when teams adopt Prometheus expecting vendor-style guardrails and discover at 02:00 that no such guardrails exist. The choice is consistent with the project; the mismatch is in the operator's expectations.
"Datadog's tags_to_exclude is the same as Prometheus's labeldrop." Mechanically similar, economically different. labeldrop reduces the RAM cost of the running Prometheus; tags_to_exclude reduces the invoice line item that Datadog will bill next month. Both drop the label, but the optimisation target is different — and a team that runs tags_to_exclude to cut the bill while still emitting high-cardinality data to a separate Prometheus is still paying twice.

Going deeper

How Prometheus's TSDB head block actually grows — the 2 KB/series number explained

The "2 KB per series" rule of thumb in Prometheus has a specific origin in the head-block data structure. Each active series in the head block carries: a MemSeries struct (~140 bytes including mutex, last sample, metadata pointer), a chunk of recent samples (Gorilla-XOR encoded, average ~200 bytes for the in-flight chunk), index entries (postings list pointers, ~3× series count for 3-label series), and label-string interning overhead (~50-200 bytes per unique string, amortised across series that share the string). Aggregated, this comes to ~2 KB on the average label cardinality, ranging from ~1 KB for very-shared-label series (e.g. up{instance="..."}) to ~5 KB for series with long unique label values (UUIDs, paths). Why the 2× spread matters: a fleet of 1M series with short shared labels fits in 2 GB; the same 1M series with long unique labels needs 5 GB. The Prometheus cardinality budget should be priced against the upper end of the spread to avoid surprises — Razorpay budgets at 4 KB/series, which provides a 2× safety margin and absorbs the long-tail of high-cost series that always exists in production.

The Gorilla XOR compression matters here too: it applies to values within a series, not across series. Adding more series does not amortise across them; cardinality cost is fundamentally linear in series count. This is why series count is what is priced, and why no compression algorithm fixes the cardinality problem for Prometheus.

Datadog's "metrics without limits" — the architecture under the marketing

The "metrics without limits" feature in Datadog is implemented as a post-ingest aggregation rule evaluated at query time. When you declare that a metric should retain only service and route tags, Datadog continues to accept the full tagset on ingest, stores them in a downsampled archive, and at query time aggregates over the dropped tags. The headline is "fewer custom metrics on your bill"; the underlying cost shape is "Datadog still stored the full data, but the billing engine charges you for the smaller catalogue".

This has a subtle implication: historical queries with the dropped tags still work for the retention window, because the data is in the archive. The billing reduction is real (you stop paying for the 1M-tagset SKU pool) but the data is not lost. Why this matters operationally: it lets you turn cardinality up for an incident investigation (re-enable the tag, pay the SKU cost for that month) and turn it back down once the bug is identified. This is the Datadog-specific equivalent of "increase log retention for one debug session" — the cost lever is reversible at the metric layer, not at the data layer.

The catch: re-enabling a previously-dropped tag adds the SKUs back at the next billing cycle, so this is a one-incident manoeuvre, not a permanent debugging strategy. Teams that re-enable tags every month are paying full freight; teams that re-enable for one week per quarter pay roughly 25% extra.

Honeycomb Refinery — sampling as the cost lever

Honeycomb's refinery is an OTel-protocol proxy that runs in the customer's infrastructure and decides which traces to keep before they reach Honeycomb. It is the cost-control mechanism for Honeycomb in the way relabel rules are the cost-control mechanism for Prometheus.

A typical Refinery rule:

# refinery.yaml — keep 100% of errors, 5% of OK requests, 50% of slow requests
Sampler:
  type: rules
  rules:
    - condition: 'http.status_code >= 500'
      sample_rate: 1
    - condition: 'duration_ms > 500'
      sample_rate: 2
    - sample_rate: 20    # default: keep 1 in 20

The economics: a 50 RPS service shipping 130M events/month hits Honeycomb's ₹22 lakh/month bill at full rate; sampling at 5% drops it to ₹1.1 lakh/month while preserving every error trace and every slow trace. Why this is the right cost lever for Honeycomb specifically: Refinery samples at the trace level, not the attribute level, so high-cardinality attributes survive sampling — customer_id is still recoverable on the kept traces. On Prometheus or Datadog, sampling does not reduce cardinality (a sampled series is still a series). On Honeycomb, sampling is cardinality-preserving and cost-reducing simultaneously.

A pitfall: Refinery is stateful — it must hold a complete trace in memory before deciding to keep or drop, which costs ~30 seconds of trace data in RAM per Refinery node. A Refinery cluster sized for 80K events/sec needs ~16 GB total RAM — a real operational cost the team is now responsible for. Refinery moves cost from the vendor invoice to the customer's infrastructure, which is the right answer for many teams but is not a free lunch.

Vendor lock-in via accounting model — what migrating costs

Migrating between these three vendors is harder than migrating between SQL databases, because the schema is different. A Prometheus instrumentation that emits payment_attempts_total{route, status, merchant_id} becomes:

on Datadog: a custom metric with the same tags, billed per-tagset, with a likely overage on merchant_id;
on Honeycomb: an event with the same attributes, billed per-event-emitted (one event per checkout request, not per metric scrape), with merchant_id free but events/sec capped.

The application instrumentation must be rewritten to fit the new vendor's accounting model. A naïve port from Prometheus to Datadog (keep the same labels, same emission rate) blows the bill; a naïve port from Datadog to Honeycomb (keep the same metric updates) under-uses Honeycomb's strengths because metric updates are not events. The right port is a re-design — Prometheus → Honeycomb means re-thinking instrumentation as event emission rather than metric updates. This is why migrations announced as "we are switching vendors" usually take 6-9 months: the app code, the dashboards, and the alert rules all need to change shape. The vendor knows this; the multi-year contract is priced against this stickiness.

The practical advice: pick the vendor that matches your workload's natural shape, then commit. Prometheus for low-cardinality high-frequency scraping (system metrics, JVM metrics), Datadog for moderate cardinality with strong vendor-managed tooling (mid-size enterprises that don't want to run a control plane), Honeycomb for high-cardinality debugging (engineering-led teams that prioritise debug-ability over budget predictability). Mismatched vendor + workload is the most expensive observability mistake — it is paid every month on the invoice.

Reproducibility footer

# All three audits run against real instances:
docker run -d --name prom -p 9090:9090 prom/prometheus:v2.51.0
python3 -m venv .venv && source .venv/bin/activate
pip install requests pandas datadog-api-client honeycomb-io pyyaml
export DD_API_KEY=... DD_APP_KEY=... HONEYCOMB_API_KEY=...
python3 vendor_cost_compare.py     # the three-vendor comparison
python3 datadog_cardinality_audit.py
python3 honeycomb_event_audit.py

Where this leads next

The three vendors give three different cost shapes, but the underlying physics — that telemetry has a natural cardinality and the storage engine prices it — is shared. The next chapter (exemplars: linking metrics to traces) is the architectural escape hatch from this whole comparison: you keep low-cardinality metrics (Prometheus-cheap) for the dashboard, attach exemplar trace_ids to the samples, and recover per-instance detail from the trace store (Honeycomb-cheap, or Tempo-self-hosted-cheap) only when you need it. Exemplars are the cross-vendor pattern that makes "bucket the metric, keep the trace" practical — and they are the answer to the cardinality cost dilemma that does not require betting the entire stack on one vendor's accounting model.

Cardinality budgets — the previous chapter; the in-team discipline that complements the vendor-side limits described here.
Why high-cardinality labels break TSDBs — the physical mechanism Prometheus's RAM-based pricing reflects.
Wall: cardinality is the billing death spiral — the cost dynamic this chapter quantifies vendor by vendor.
HyperLogLog for approximate counting — the algorithm that lets a single low-cardinality metric encode a high-cardinality count, useful on all three vendors.
Histograms: native vs sparse — the histogram-specific cardinality story that interacts differently with each vendor's accounting.

The single insight: cardinality is a vendor-specific resource, not a universal one. The Prometheus engineer optimising RAM, the Datadog engineer optimising the SKU pool, the Honeycomb engineer optimising events/sec are all doing valid work — they are not doing the same work. The first question on any cardinality ticket is "which of the three are we billed by here?" The second question is whether the team's mental model matches the answer to the first. Most cardinality bills are paid by teams whose mental model lags their vendor by one billing model — they are still optimising for the vendor they migrated from.

References

Prometheus TSDB design notes — the canonical reference for why series-in-RAM is the priced resource.
Datadog custom metrics pricing and limits — official documentation on the SKU pool, the $5/100/month overage, and the metrics-without-limits API.
Honeycomb pricing and events model — the per-event billing structure and the events-per-second caps for each plan tier.
Charity Majors, Observability Engineering (O'Reilly, 2022), Ch. 6 — the modern-era framing of why high-cardinality is essential and the architectural shift Honeycomb embodies.
Brian Brazil, "Cardinality is Key" (Robust Perception) — the Prometheus-author post that frames the project's deliberate non-enforcement stance.
OpenTelemetry Refinery documentation — the Honeycomb-supplied sampler that is the cost lever for the events-priced model.
Cardinality budgets — the per-team discipline that complements vendor-side limits and makes this multi-vendor comparison actionable.
Why high-cardinality labels break TSDBs — the underlying physics that Prometheus's pricing model reflects directly.