Note: Company names, engineers, incidents, numbers, and scaling scenarios in this article are hypothetical — even when they resemble real ones. See the full disclaimer.

Tiered storage for metrics, logs, and traces

At 03:14 the page fires. Aditi opens Grafana, types rate(http_5xx[5m]), and the panel renders in 280 ms because the last 6 hours of http_5xx live on a Mimir ingester's local NVMe. Six weeks later Kiran in finance forwards her a compliance ticket asking for the merchant-fee dispute trail from 14 October, and the same query against the same metric name takes 47 seconds because that data lives in three Parquet files in s3://yatrika-mimir-cold/. Same metric, same query, same person — different tier, different latency budget, different rupee cost. Tiering is not "how long do we keep data"; tiering is "how long the reader will wait for the answer". Get that distinction wrong and you either burn ₹4 crore a year keeping cold-tier-grade data on hot SSDs, or you make the on-call wait 47 seconds while the page is still firing.

Hot, warm, and cold are three storage tiers with three reader profiles — paged on-call (sub-second), feature engineer (sub-minute), auditor (sub-hour). The rule that decides which tier a piece of telemetry sits on is the query-latency budget of its likely reader, not its age. Metrics, logs, and traces tier differently: metrics downsample at boundaries, logs change index strategy, traces drop indexes entirely and lean on the trace-id. Get the boundaries wrong and you pay 10× either in cost (hot too long) or in MTTR (cold too soon).

The reader-budget rule and why age-based tiering misses it

Every observability vendor pitch starts with the same diagram: 7-day hot, 30-day warm, 365-day cold, retention shrinks as data ages, cost shrinks with it. The diagram is correct; the reasoning is wrong. Data does not get cheaper to store as it ages — S3 Standard-IA is the same price whether the byte is one hour old or one year old. What gets cheaper is the acceptable query latency. A metric scraped 30 seconds ago is going into a Grafana panel that someone is staring at right now; a metric scraped 30 days ago is going into a quarterly review someone will skim next Tuesday. The hot tier exists because the on-call cannot wait 47 seconds; the cold tier exists because the auditor can.

Yatrika ran age-based tiering for two quarters. The 7-day hot tier held 3.4M metric series on local NVMe at ₹18 lakh/month. The 365-day cold tier held the same 3.4M series, downsampled to 5-minute resolution, on S3 Standard-IA at ₹2.1 lakh/month. The platform team was proud of the 8.5× cost reduction at the cold tier. Then a Q4 incident happened: a payments-team alert fired about a 15-minute latency spike on 18 December at 14:22 IST. By the time on-call investigated on 19 December at 09:00 IST, the 18-hour-old data had aged into the warm tier — Mimir queriers returned, but they returned in 8 seconds per panel because they hit S3 instead of the ingester's memory-mapped block. Eight seconds per panel on a 12-panel dashboard meant 96 seconds of dead time per refresh during an active incident. The bug was not retention; the bug was that "warm" started at 24 hours when the on-call's investigation budget extended to 72 hours. The fix was reader-budget tiering: hot until the data falls outside any active investigation, warm until it falls outside any feature-engineering analysis, cold thereafter.

Why age is the wrong primary axis: age is a proxy variable, not a causal one. The actual variable is "what is the longest investigation window an on-call might walk into?" — for Yatrika that turned out to be 72 hours (the time between an incident and its formal post-mortem). Sizing the hot tier from 24 hours to 72 hours cost ₹6 lakh/month more in NVMe but saved 90 seconds of MTTR on the December incident, which the payments-VP valued at ₹40 lakh in deferred merchant churn. Reader-budget tiering reframes the boundary from a finance question into an SRE question, and the SRE question has a known answer.

Reader-budget tiering — three readers, three latency budgets, three storage backendsA horizontal three-band diagram. Top band: HOT (0 to 72 hours), reader is on-call paged at 03:14, latency budget under 1 second, backend is local NVMe ingester memory-mapped block, cost ₹18 lakh per month for 3.4M metric series. Middle band: WARM (3 to 30 days), reader is feature engineer running a Monday-morning analysis, latency budget under 30 seconds, backend is S3 Express One Zone, cost ₹6.4 lakh per month. Bottom band: COLD (30 to 540 days), reader is compliance auditor or capacity planner, latency budget under 5 minutes, backend is S3 Standard-IA with downsampled 5-minute aggregates, cost ₹0.9 lakh per month. Each band shows a representative query, the typical reader, and the per-tier rupee cost. Reader-budget tiering — Yatrika's three-tier metric layout (illustrative) Each boundary is set by the slowest plausible reader, not the data's age HOT · 0 – 72 h Reader: on-call paged at 03:14, mid-incident Latency budget: under 1 s per panel Backend: Mimir ingester, local NVMe, memory-mapped 2h block 3.4M series 10s resolution ₹18 L / month 52% of bill WARM · 3 – 30 d Reader: feature engineer, Monday post-mortem, A/B retro Latency budget: under 30 s per panel Backend: S3 Express One Zone, Mimir store-gateway, 2h-block index 3.4M series 10s resolution ₹6.4 L / month 19% of bill COLD · 30 – 540 d Reader: auditor, capacity planner, regulatory subpoena Latency budget: under 5 min per query Backend: S3 Standard-IA, downsampled 5-min aggregates, query-on-demand 3.4M series 5min resolution ₹0.9 L / month 3% of bill Boundary rule Each boundary is set by the slowest reader who might walk into the data — never by a calendar.
Illustrative — Yatrika's reader-budget tiering. The 72-hour hot boundary is set by the longest active-investigation window seen in 2026 incidents; the 30-day warm boundary by the longest weekly-review cadence; the 540-day cold boundary by the regulatory retention floor for payment data. Boundaries shift when readers shift, not when the data ages.

The reader-budget rule generalises. For logs the readers are: on-call grepping for the last error (hot, 1–7 days), product engineer reproducing a customer complaint (warm, 7–30 days), security investigator chasing an incident from last quarter (cold, 30–540 days). For traces: on-call drilling from a paging metric to a span (hot, 24–72 hours), platform engineer doing a fan-out audit (warm, 7–14 days), forensic auditor reconstructing a single transaction (cold, 30–365 days). The reader profiles are the same across pillars; the storage shape that serves them is not. Metrics tier by downsampling, logs tier by changing index strategy, traces tier by collapsing to a trace-id-only index.

Three pillars, three tiering shapes

The pillar shapes diverge because their access patterns diverge. A metric query is "show me the time-series for http_5xx{service=payments} over 24 hours" — a sequential scan over a known label set. A log query is "find the 18 lines containing dispute_id=DSP-441" — a needle-in-haystack search. A trace query is "fetch the span tree for trace_id 0a3f..." — a primary-key lookup with a fan-out tree underneath. Each pattern has a different cost equation across hot / warm / cold, which forces a different tiering shape.

For metrics, the dominant cost in the warm and cold tiers is the cardinality multiplied by the resolution. A 3.4M-series fleet at 10-second resolution is 1.02 trillion samples per month. Downsampling to 5-minute aggregates at the cold-tier boundary divides the sample count by 30 but preserves cardinality. The downsampled metric is good enough for capacity planning and quarterly review — but not for incident investigation, because the 5-minute aggregate hides the 30-second spike that paged the on-call. Hence: hot keeps full resolution; warm keeps full resolution but on cheaper storage; cold keeps cardinality but downsamples. The boundary between warm and cold is exactly where the reader stops needing 30-second-spike resolution. See /wiki/downsampling-for-long-retention for the aggregate-choice question — which Mimir reads as compactor.downsampling-enabled and which Prometheus federation handles as a recording rule.

For logs, the dominant cost in the hot tier is the index size. Loki's full-text-or-label-index design (see /wiki/full-text-search-for-logs-the-cost-model) keeps a per-stream label index but does not index the log content — content searches scan the chunks. Hot logs use the full label index for sub-second {service="payments"} queries; cold logs drop the per-stream index entirely and rely on object-store list operations and chunk-level brotli compression. The cold-tier query pattern shifts from {service="payments"} |= "dispute" (interactive) to "scan all chunks from 14 October between 14:00 and 15:00, brotli-decompress, regex-search" (batch). The reader pays 10× the latency at the cold tier, but the platform team pays 40× less per gigabyte stored.

For traces, the design space is different again. Tempo's columnar approach (see /wiki/trace-storage-at-scale-tempos-columnar-approach) already drops most indexes — only service.name and name are indexed at the warm tier; everything else is column-scanned via TraceQL. The cold-tier pattern collapses further: at Yatrika, traces older than 30 days retain only the trace-id and the root-span attributes (service, operation, status, duration). The full span tree is dropped or compacted into a per-trace Parquet file accessible only by trace-id lookup. You can no longer search "show me all error traces in October from the payments service" at the cold tier; you can only fetch "the trace tree for trace_id 0a3f... from October" if you already know the trace_id. This is acceptable because the cold-tier reader profile is forensic — they have the trace_id from a transaction record, a customer complaint, or a regulatory request, and they need the tree, not a search.

Why traces tier most aggressively: a trace tree is a high-fanout structure (the IPL final at JioCinema produced 80-microservice traces with 600+ spans each), so storing the full tree at indexable granularity for 12 months is the most expensive-per-useful-query data type in observability. Most cold-tier trace lookups are by trace-id (regulatory, customer complaint), not by attribute search — so dropping the attribute index at the cold-tier boundary recovers 95% of storage cost while losing 5% of access patterns. The 5% you lose are the "find me all error traces from last quarter" queries, which can be answered from the metrics tier (rate(spans_total{status="error"}[1d])) instead. Tier-aware design moves access patterns across pillars when the tier collapses an index.

The audit script — measuring tier residency, cost, and reader-budget violations

The audit primitive in /wiki/cardinality-budgets-revisited measured per-team cardinality cost. The tier-residency audit measures something orthogonal: for each metric / log stream / trace service, which tier is the data living on, and is that tier serving the actual reader profile. The most common finding is tier-misalignment — data the on-call queries weekly that has aged into the cold tier and now takes 47 seconds to render, or data nobody has queried in 90 days still sitting on the hot NVMe.

# tier_audit.py — find tier-misaligned telemetry across metrics, logs, and traces
# pip install requests pandas python-dateutil
import requests, pandas as pd, datetime as dt
from collections import defaultdict
from dateutil.parser import isoparse

MIMIR = "http://mimir.yatrika.internal:8080"
LOKI  = "http://loki.yatrika.internal:3100"
TEMPO = "http://tempo.yatrika.internal:3200"
HOT_END_H = 72                 # 72-hour hot boundary (reader: on-call)
WARM_END_D = 30                # 30-day warm boundary (reader: feature engineer)
RATE_HOT_PER_GB_MONTH  = 4200  # local NVMe + replicas, ₹/GB-month
RATE_WARM_PER_GB_MONTH = 980   # S3 Express One Zone
RATE_COLD_PER_GB_MONTH = 110   # S3 Standard-IA

def tier_for_age(hours: float) -> str:
    if hours < HOT_END_H: return "hot"
    if hours < WARM_END_D * 24: return "warm"
    return "cold"

# 1. Metrics tier residency — Mimir compactor block stats (size, time range)
r = requests.get(f"{MIMIR}/api/v1/blocks/yatrika-prod", timeout=60).json()
metric_rows = []
for b in r["blocks"]:
    age_h = (dt.datetime.utcnow() - isoparse(b["max_time"])).total_seconds() / 3600
    metric_rows.append({"signal": "metric", "tenant": b["tenant"],
                        "size_gb": b["size_bytes"] / 1e9, "age_h": age_h,
                        "tier_actual": b["storage_class"],
                        "tier_expected": tier_for_age(age_h)})

# 2. Log tier residency — Loki ingester chunk metadata
r = requests.get(f"{LOKI}/loki/api/v1/series?match[]={'{job=~\".+\"}'}", timeout=60).json()
# (in real use, walk loki object-store manifest; abbreviated here)
log_rows = [{"signal": "log", "tenant": s["tenant"], "size_gb": s["size_gb"],
             "age_h": s["age_h"], "tier_actual": s["tier"],
             "tier_expected": tier_for_age(s["age_h"])} for s in r["streams"]]

# 3. Trace tier residency — Tempo block list
r = requests.get(f"{TEMPO}/api/v2/blocks?tenant=yatrika", timeout=60).json()
trace_rows = []
for b in r["blocks"]:
    age_h = (dt.datetime.utcnow() - isoparse(b["end_time"])).total_seconds() / 3600
    trace_rows.append({"signal": "trace", "tenant": b["tenant"],
                       "size_gb": b["size_bytes"] / 1e9, "age_h": age_h,
                       "tier_actual": b["storage_class"],
                       "tier_expected": tier_for_age(age_h)})

# 4. Combine, attribute cost, find misalignments
df = pd.DataFrame(metric_rows + log_rows + trace_rows)
def rupees(row):
    rate = {"hot": RATE_HOT_PER_GB_MONTH, "warm": RATE_WARM_PER_GB_MONTH,
            "cold": RATE_COLD_PER_GB_MONTH}[row["tier_actual"]]
    return row["size_gb"] * rate
df["inr_per_month"] = df.apply(rupees, axis=1)
df["misaligned"] = df["tier_actual"] != df["tier_expected"]

# 5. Per-pillar tier breakdown
print("\n=== Tier residency × signal type ===")
print(df.groupby(["signal", "tier_actual"]).agg(
    gb=("size_gb","sum"), inr=("inr_per_month","sum")).round(0))

# 6. Misalignment offenders — what is on the wrong tier
mis = df[df["misaligned"]].sort_values("inr_per_month", ascending=False).head(8)
print(f"\n=== {len(df[df['misaligned']]):,} misaligned blocks — top 8 by ₹/month ===")
print(mis[["signal", "tenant", "age_h", "tier_actual",
           "tier_expected", "size_gb", "inr_per_month"]].to_string(index=False))
Sample run on Yatrika 2026-04-25:
=== Tier residency × signal type ===
                       gb       inr
signal  tier_actual
log     cold        4180   459800
        hot          820  3444000
        warm        1640  1607200
metric  cold        2840   312400
        hot          480  2016000
        warm         960   940800
trace   cold        1240   136400
        hot          290  1218000
        warm         620   607600

=== 14 misaligned blocks — top 8 by ₹/month ===
signal       tenant  age_h tier_actual tier_expected  size_gb  inr_per_month
   log     payments    18.2        cold           hot     22.4         9856.0
 trace        risk     2160.0        warm          cold      8.1        7938.0
metric    platform     360.0         hot          warm      1.8         7560.0
   log         risk    96.0          hot          warm      1.4         5880.0
 trace      payments   18.0          warm           hot     0.6          588.0
metric     payments   220.0         cold          warm     14.2         1562.0
   log     platform    480.0         warm          cold      6.4         6272.0
metric        risk    72.5          hot          warm      0.4         1680.0

Read the per-pillar table first. Logs are the dominant cost — ₹54.5 lakh/month vs ₹32.7 lakh for metrics and ₹19.6 lakh for traces — driven entirely by the hot-tier line at ₹34 lakh. That tells you log retention is the next lever; not metric resolution. The misalignment table tells the second story: 14 blocks are on the wrong tier, with the top offender being a payments-team log block aged 18 hours that has somehow ended up on the cold tier — almost certainly a tiering policy bug where Loki's chunk-flush lifecycle hook fired on an empty stream and shipped the stub straight to cold. The on-call who needs that log during the next incident will hit a 47-second cold read; the next time the bug fires, MTTR rises.

The next-row offender is a risk-team trace block aged 2160 hours (90 days) still on warm at ₹7.9k/month. The tiering policy is supposed to demote 30+ day traces to cold, and this block missed the demotion — probably a Tempo compactor crash that left orphan blocks behind. ₹7.9k/month is small in isolation but compounds: across the fleet there are ~140 such orphans by the audit's end-of-quarter run, totalling ₹11 lakh/quarter the platform team is paying for data nobody queries. The audit's job is to find these before the FinOps quarterly review, not after.

Why we tier separately by signal not by tenant: a payments-tenant might have hot-tier logs, warm-tier metrics, and cold-tier traces all simultaneously, because the on-call uses logs aggressively, the feature engineer uses metrics weekly, and the auditor uses traces only on-request. Tiering by tenant collapses these three different reader-profiles into one boundary, which means either the on-call's logs age out too fast or the auditor's traces never demote. Per-signal tiering preserves the reader-budget rule per pillar — and the audit-script grouping makes the cost picture visible per pillar so the platform team can lever the right boundary.

Animated tier-flow — what happens at a boundary crossing

When data crosses a tier boundary, the storage backend changes, the index strategy changes, and the reader's effective query latency changes. Most of the bugs in tiered observability live in this transition — block-flush races, downsample-vs-original divergence, index-rebuild lag, the "I queried for last week and got partial data" failure mode where the warm-tier block is still being uploaded.

A 2-hour Mimir block crossing the hot→warm boundary at t=72hAn animated diagram showing a single 2-hour Mimir block as it ages. From 0-72h the block sits on local NVMe with full 10-second resolution, queried in under 1 second by Grafana panels representing the on-call. At t=72h the block flushes to S3 Express One Zone, the index rebuilds, and a 30-second window opens during which queries against this block return partial data. After t=72h+30s the block is fully indexed at the warm tier. The query-latency line tracks: 200ms in hot, 800ms during transition, 8s at warm. The animation loops over 6 seconds. Block lifecycle at the hot→warm boundary (illustrative) t=0 t=72h t=96h 0ms 8000ms HOT — local NVMe memory-mapped, p99 query 200 ms flush + reindex ~30 s WARM — S3 Express store-gateway, p99 query 8 s 2h-block The 30-second risk window Queries against this block during the flush return partial data — the hot replica is gone, the warm index is mid-build. Mimir's `query_blocks_storage_querier_blocks_load_failures_total` counter spikes here; alert on rate above 0.5/min.
Illustrative — a 2-hour Mimir block traversing the hot→warm boundary. The 30-second flush-and-reindex window is the failure-mode window observability platform teams alert on. The block in motion represents data ageing; the colour-flip at the boundary represents the moment a query against this block is most likely to return partial results.

The boundary-crossing failure mode shows up most often in partial-data alerts — the on-call queries rate(http_5xx[5m]) over the boundary and gets a graph with a 30-second gap because the block was flushing during the query window. Mimir surfaces this as query_blocks_storage_querier_blocks_load_failures_total; Loki surfaces it as loki_chunk_fetcher_errors_total; Tempo as tempo_blocks_open_failures_total. Alerting on any of these above 0.5 events/min for 5 minutes catches block-flush bugs within one alert window. Why a 5-minute alert window and not 1 minute: the boundary crossings are inherently episodic — at the 72-hour mark, all blocks created exactly 72 hours ago flush within the same minute, producing a brief, expected spike in load-failure counters. A 1-minute alert would page on every boundary cohort. A 5-minute window with 0.5/min threshold filters out the episodic noise but catches the sustained-error pattern of a real bug (compactor crash, S3 throttle, index-rebuild deadlock).

Common confusions

  • "Tiering is a retention setting." It is not. Retention is "how long do we keep data before deletion"; tiering is "what storage backend does the data live on at each age". The two interact — you cannot tier data older than your retention window, because it is gone — but they are independent decisions. Yatrika has 540-day metric retention with three tiers; their old vendor had 30-day retention with one tier. Same retention question, completely different storage architecture.
  • "Hot is always SSD, cold is always object storage." Misleading. Hot-tier Mimir on AWS is NVMe-backed EC2; hot-tier Mimir on a self-hosted k8s with a Ceph backend is RBD-attached XFS; hot-tier Mimir at a small startup might be a single Prometheus on a single host. The defining property of "hot" is sub-second query latency and a synchronous-replicated write path, not the underlying medium. Cold tier almost always ends up on S3-compatible object storage because the latency budget is loose enough to forgive the network round-trip.
  • "You should tier all three pillars on the same boundaries." Wrong, and an expensive mistake. Logs are dominated by index cost in hot, traces by span-tree cost in warm, metrics by sample-resolution cost in cold. The boundary that minimises log cost (drop the index at 7 days) is wrong for metrics (drop full resolution at 30 days). Each pillar needs its own boundary, which is why the audit script tiers by (signal, tenant), not by tenant alone.
  • "Cold-tier data is read-only." Half-true. The data itself is read-only after compaction, but the index over cold data is typically rebuildable. Tempo's TraceQL re-indexer can rebuild a per-block index over cold-tier traces if a forensic investigation needs a fast attribute search; Mimir's compactor can re-downsample a cold block at higher resolution if the original is still on disk. Cold is "read-only by default, rewritable on demand at human cost".
  • "Cheaper tier always means cheaper query." No — cold queries are more expensive per query than hot queries. S3 GET-list-fetch costs ₹0.04 per 1000 requests; a cold-tier dashboard with 12 panels each scanning 50 blocks generates 600 GETs per refresh. A heavily-used cold-tier dashboard can cost more in S3 API charges than a hot-tier dashboard costs in NVMe. The right framing: cold tier has cheap storage and expensive query, hot has expensive storage and cheap query — pick by query frequency, not by data age.
  • "Tier transitions are atomic." Almost never. Mimir blocks flush over ~30 seconds; Loki chunks over ~5 minutes; Tempo blocks over ~2 minutes. During the transition window queries return partial data and the per-block load-failure counters spike. Treating tier transitions as atomic produces a class of alerts ("intermittent missing data on Tuesday at 14:00") that the platform team chases for weeks before realising the alert is the boundary itself.

Going deeper

Setting tier boundaries from incident-investigation telemetry

The 72-hour hot boundary at Yatrika was not chosen abstractly — it came from analysing 18 months of incident retrospectives and finding that 95% of follow-up investigations completed within 68 hours of the original page. The data lives in an incidents.csv exported from PagerDuty and the company's post-mortem template; the analysis is a 20-line pandas script that buckets time-from-page-to-final-postmortem-comment by quantile. The 72-hour figure is the p95; the p99 is 144 hours, which is what triggered Yatrika's escalation policy: pages older than 72 hours needing investigation generate a "promote-to-hot" PR that re-loads specific blocks back onto the ingester for the duration of the investigation. The promote-to-hot pattern is rare (12 PRs in 2026 H1) but cheap (a few hundred rupees per promote, vs the ₹6 lakh/month it would cost to extend the hot boundary fleet-wide to 144 hours). Boundary choices are not symmetric — the hot boundary is sized for the p95 reader; the cold boundary is sized for the regulatory floor; the warm boundary fills the rest.

The cold-tier query-frequency cliff

Most cold-tier vendor pricing assumes a query-frequency cliff: you query a cold block once or twice a year, the per-query cost is high, the per-storage cost is low. The cliff breaks when a recurring query falls into the cold tier. At Yatrika the Q4 financial-close report runs on the last business day of every month and queries transaction_volume{merchant_tier="enterprise"} over the prior 12 months. By the second quarter, 9 of the 12 months are cold-tier. Each monthly run costs ₹2.4 lakh in S3 GETs because the query scans 1100 cold blocks. The fix is recurring-query promotion: an audit job runs every Friday, identifies queries that ran more than 4 times in the prior 30 days against cold-tier data, and tags those blocks for warm-tier promotion. The financial-close report is now ~₹14k/month instead of ₹2.4 lakh — and the only cost was a 30-line Python audit and a Mimir API call. The pattern composes with /wiki/cardinality-budgets-revisited: once query-frequency is on the dashboard, the team that owns the report sees its own cold-tier cost line and starts pre-aggregating against it.

The deletion-tier — what happens after cold

Cold is not the last tier. After cold comes deletion, and the boundary between cold and deletion is a regulatory question, not an engineering one. Indian payment-data regulation (RBI master direction 2024) mandates 7-year retention for certain transaction-level fields; Yatrika's compliance team chose 540 days for telemetry as the SRE-relevant retention floor, with a separate "compliance archive" pipeline that extracts the regulated fields into a Parquet lake on S3 Glacier Deep Archive. The compliance archive is outside the observability tier system — it is a feature-engineering data product, not telemetry, and it costs roughly ₹0.04/GB-month. The mistake is to conflate "RBI says keep 7 years" with "Mimir must keep metrics for 7 years". The metric-side cost of 7-year retention at full-fleet scale would be ₹54 lakh/month at the cold tier; the compliance-archive Parquet equivalent is ₹4.2k/month. Tiering also means knowing when a piece of data leaves observability altogether.

Reproduce this on your laptop

# 1. Spin up a single-tenant Mimir + Loki + Tempo stack
git clone https://github.com/grafana/mimir && cd mimir/development/mimir-monolithic-mode
docker compose up -d
python3 -m venv .venv && source .venv/bin/activate
pip install requests pandas python-dateutil prometheus-client

# 2. Emit ~20K synthetic series across 3 simulated tenants
python3 -c "
from prometheus_client import Counter, start_http_server
import random, time, threading
start_http_server(8000)
counters = [Counter(f'svc_{t}_{i}_total','reqs',['route'])
            for t in ['payments','risk','platform'] for i in range(7)]
for c in counters:
    for r in [f'/r{i}' for i in range(1000)]:
        c.labels(r).inc()
print('emitted ~21K series')
time.sleep(120)" &

# 3. Force a compaction so blocks tier from hot to warm to cold
curl -X POST http://localhost:9009/compactor/ring?forget=true
sleep 30
python3 tier_audit.py
# Expect: per-pillar tier breakdown, misalignment offenders listed,
# rupee cost attributed per-tier and per-tenant.

Where this leads next

/wiki/index-free-log-storage-clickhouse-parquet is the storage-shape question for logs in particular: when the cold-tier boundary drops the per-stream index, what data layout does the cold tier actually use? ClickHouse columnar tables and Parquet on S3 are the two answers Indian observability teams are converging on, and they have different trade-offs (ClickHouse: faster ad-hoc queries, requires running a cluster; Parquet on S3: cheaper, requires a query engine like DuckDB or Presto). The tiering decisions in this article compose with the cold-tier data layout in that one.

/wiki/vendor-vs-self-hosted-economics is the parallel cost question — at what fleet size does the FinOps math flip from "buy Datadog/Honeycomb/New Relic" to "run Mimir/Loki/Tempo"? The tiering shape in this article is most of the answer: vendors charge per-GB-ingested, which prices all three tiers identically, so a fleet with 80% of its data in cold-tier-shaped queries pays 8× too much under vendor pricing. Self-hosting is what unlocks the per-tier rupee differential — but only if the platform team has the bandwidth to operate Mimir's compactor + S3 + ingester ring without losing 30% of an SRE's time per quarter.

/wiki/long-term-storage-thanos-cortex-mimir is the implementation-choice question for the metric-side tier system — Thanos, Cortex, and Mimir are the three production answers, and the tiering boundaries in this article map onto each system's compactor + store-gateway + querier topology slightly differently. The reader-budget rule from this article is the conceptual frame; that article is the deployment frame.

References