Structured vs unstructured logging

Aditi has been on call at Razorpay for nine hours and the alert that woke her says payments-api error rate above 0.5%. She opens Grafana, sees a flat error-rate line above the threshold, and reaches for the logs. She types {service="payments-api", level="error"} |~ "GATEWAY_TIMEOUT" into the LogQL bar and waits. Twelve seconds later she gets 4,138 lines, each one a sentence: Payment for merchant M01023 amount 4280 failed at upstream razorpay-acquirer-3 reason GATEWAY_TIMEOUT after 3 retries. She reads the first thirty, sees they all look similar, and now needs to know: are these spread across many merchants or concentrated on a few? Which acquirer is worst hit? What is the median amount? She has the data — every answer is right there in those 4,138 sentences — but she cannot ask any of those questions, because the data is shaped like prose and the backend can only count strings.

Six months later, after a quiet refactor, she runs the same incident. Now the log line is {"event":"payment_failed","merchant":"M01023","amount":4280,"upstream":"razorpay-acquirer-3","reason":"GATEWAY_TIMEOUT","retries":3} and her query becomes {service="payments-api"} | json | event="payment_failed" | rate by (upstream). Three seconds later she has a chart of failures by acquirer. Same fleet, same volume, same Loki cluster. The only thing that changed is whether the variable parts of each event live inside the message string or alongside it as named fields. That single structural decision is what this chapter is about.

A log line is structured when its variable data lives in named fields, and unstructured when the variability is baked into the message text. Structured logs let the backend group, filter, and aggregate without parsing prose; unstructured logs leave you with grep. The migration from one to the other is mostly mechanical — keep the message constant, push every variable into an attribute — but it pays for itself within weeks because every dashboard, alert, and incident query becomes cheaper, faster, and provable.

What "structured" actually means at the wire level

The word "structured" gets used loosely. A log line written as logger.info(json.dumps({"event": "payment_failed", "merchant": "M01023"})) looks structured because it ships JSON, but if the next call writes logger.info(json.dumps({"msg": f"Failed for {merchant}"})) you have JSON-formatted prose, not structured logging. The wire format and the discipline are two different things, and confusing them is the most common reason teams ship "structured logs" that behave exactly like unstructured ones at query time.

Structured logging has three properties that have to hold together:

The message text is a constant event name. payment_failed, not Payment failed for merchant M01023. The message is what the backend uses as a low-cardinality grouping key — if it changes per event, every event becomes its own group and grouping stops working. Loki, Elasticsearch, and Splunk all treat the message field as a primary discriminator.
Every variable lives in a named, typed attribute. merchant_id is a string, amount_paise is an integer, retries is an integer, success is a boolean. The types matter because the query language uses them — amount_paise > 50000 only works if amount_paise is a number, and "is a number" has to be true at write time, not at query time.
The schema is consistent across emitters. If service A writes trace_id at the top level and service B writes it nested under meta.trace, the join across services has to be done by the agent or the query, and either way it is a fragility that will eventually break. The discipline of a shared schema is what separates a working pipeline from a brittle one.

The wire format that has settled across the industry is JSON-per-line, sometimes called NDJSON or JSON Lines. One JSON object per log record, separated by \n, no leading/trailing whitespace, no nested arrays of records, no envelope. The OpenTelemetry Logs Data Model formalises this with a richer schema (resource attributes, scope, severity number, body), and Logfmt — Heroku's key=value key=value format — is a lighter alternative that some Go shops still use, but JSON-per-line is the dominant choice because every backend speaks it and every language can produce it.

Illustrative — four wire formats for the same payment_failure event. The prose form supports only substring search; structured forms unlock field-level queries, type-aware comparisons, and trace correlation. Most modern stacks land on JSON-per-line; OTLP adds the trace-context fields the SDK already knows.

The format choice matters less than the discipline. A team that writes JSON with a free-text msg field and ad-hoc attribute names has chosen the wire format without choosing the structure, and the result behaves like prose dressed up as JSON. The reason JSON wins as wire format is not that it is intrinsically better than logfmt or protobuf-LogRecords; it is that every language has a JSON encoder in its standard library, every log-shipping agent (Fluent Bit, Vector, OTel Collector) parses it natively, and every backend (Loki, Elasticsearch, Splunk, ClickHouse) can ingest it without a custom parser. The portability is the point.

Why type preservation at the producer matters: JSON has six types (string, number, boolean, null, object, array) and most backends preserve the producer's type into their internal storage. If amount_paise is written as the JSON number 4280, Loki, Elasticsearch, and ClickHouse all keep it as a numeric column or label and you can ask amount_paise > 50000. If it is written as the JSON string "4280", you get a string-comparison "4280" > "50000" which is lexicographic — "50000" sorts after "4280" and your query returns wrong results without any error. The bug is invisible in dev where amounts are similar magnitudes; it surfaces in production when a ₹999 and a ₹10,000 event compare backwards. The only fix is to enforce typed emission at the application layer, because once the wire bytes are stringified the backend cannot recover the original type.

Watching the migration happen — a measurable comparison

The right way to convince a team to migrate is not to argue about it; it is to run both shapes against the same workload and let the numbers do the work. The script below emits 5,000 synthetic payment events twice — once as unstructured prose, once as structured JSON — then runs a series of realistic incident queries against both forms and prints the per-query latency, the result-set quality, and the storage cost. The shape of the numbers matches what you see when migrating a production payments service from f-string to JSON wide events.

# struct_vs_unstruct.py — emit both shapes, run incident queries against each
# pip install loguru orjson
import loguru, orjson, gzip, time, random, re, sys
from collections import Counter
from statistics import median

random.seed(7)
N = 5_000
PAYMENTS  = ["UPI", "CARD", "NETBANKING", "WALLET"]
ACQUIRERS = ["razorpay-acq-1", "razorpay-acq-2", "razorpay-acq-3", "razorpay-acq-4"]
REASONS   = ["OK", "GATEWAY_TIMEOUT", "INSUFFICIENT_FUNDS", "RISK_BLOCK", "OK", "OK", "OK"]

# Generate identical event stream, render in two shapes
events = []
for i in range(N):
    events.append({
        "ts": "2026-04-25T11:%02d:%02dZ" % (i // 60 % 60, i % 60),
        "merchant": f"M{random.randint(1, 2000):05d}",
        "method":   random.choice(PAYMENTS),
        "amount":   random.randint(100, 50_000),
        "acquirer": random.choice(ACQUIRERS),
        "reason":   random.choice(REASONS),
        "retries":  random.randint(0, 4),
    })

unstruct = [
    f"{e['ts']} INFO Payment via {e['method']} for merchant {e['merchant']} "
    f"amount {e['amount']} via {e['acquirer']} result {e['reason']} retries {e['retries']}"
    for e in events
]
struct = [orjson.dumps({"event": "payment", **e}).decode() for e in events]

def size_kb(lines): return sum(len(l) for l in lines) / 1024
def gz_kb(lines):   return len(gzip.compress("\n".join(lines).encode())) / 1024

# Three incident queries — answered against each shape
def q_unstruct_count_timeout(lines):
    return sum(1 for l in lines if "GATEWAY_TIMEOUT" in l)
def q_struct_count_timeout(lines):
    return sum(1 for l in lines if (j := orjson.loads(l)).get("reason") == "GATEWAY_TIMEOUT")
def q_unstruct_top_acq(lines):
    pat = re.compile(r"via (razorpay-acq-\d) result GATEWAY_TIMEOUT")
    return Counter(m.group(1) for l in lines if (m := pat.search(l))).most_common(3)
def q_struct_top_acq(lines):
    return Counter(j["acquirer"] for l in lines
                   if (j := orjson.loads(l))["reason"] == "GATEWAY_TIMEOUT").most_common(3)
def q_unstruct_p50_amt(lines):
    pat = re.compile(r"amount (\d+) via \S+ result GATEWAY_TIMEOUT")
    vals = [int(m.group(1)) for l in lines if (m := pat.search(l))]
    return median(vals) if vals else None
def q_struct_p50_amt(lines):
    vals = [j["amount"] for l in lines
            if (j := orjson.loads(l))["reason"] == "GATEWAY_TIMEOUT"]
    return median(vals) if vals else None

def time_it(fn, *args):
    t = time.perf_counter()
    out = fn(*args)
    return out, (time.perf_counter() - t) * 1000

print(f"events generated  : {N:,}")
print(f"unstruct raw size : {size_kb(unstruct):,.1f} KB / gzip {gz_kb(unstruct):,.1f} KB")
print(f"struct   raw size : {size_kb(struct):,.1f} KB / gzip {gz_kb(struct):,.1f} KB")
print()
print(f"{'query':32}  {'unstruct':>20}  {'struct':>20}")
for name, u_fn, s_fn in [
    ("count(reason=GATEWAY_TIMEOUT)", q_unstruct_count_timeout, q_struct_count_timeout),
    ("top-3 acquirers by timeout",    q_unstruct_top_acq,       q_struct_top_acq),
    ("p50 amount on timeout",         q_unstruct_p50_amt,       q_struct_p50_amt),
]:
    u_out, u_ms = time_it(u_fn, unstruct)
    s_out, s_ms = time_it(s_fn, struct)
    print(f"{name:32}  {str(u_out)[:14]:>14} ({u_ms:4.1f}ms)  "
          f"{str(s_out)[:14]:>14} ({s_ms:4.1f}ms)")

Sample run on a 2024 MacBook Air:

events generated  : 5,000
unstruct raw size : 591.8 KB / gzip 78.4 KB
struct   raw size : 738.2 KB / gzip 86.1 KB

query                              unstruct                struct
count(reason=GATEWAY_TIMEOUT)             714 ( 1.2ms)          714 ( 7.8ms)
top-3 acquirers by timeout      [('razorpay-a ( 4.1ms)  [('razorpay-a ( 8.4ms)
p50 amount on timeout                   25238 ( 4.6ms)         25238 ( 8.0ms)

Three observations — and only the first one is the obvious one. First, both shapes return identical results, so structuring is not about correctness on the easy queries; it is about which queries are possible at all without writing a new regex. Second, the structured form is mildly slower in this micro-benchmark (8ms vs 4ms) because orjson.loads parses the whole record while the regex short-circuits on the substring — but real backends invert this: Loki and Elasticsearch parse JSON once at ingest and store typed columns, so the per-query parse is amortised to zero. The micro-benchmark is misleading because it doesn't model the storage layer. Third, the structured form is 25% larger on raw bytes and only 10% larger after gzip, because JSON's structural overhead ({, }, ", :, key names) compresses aggressively against itself. The 10% storage premium is the tax you pay for queryability, and it is the cheapest tax in observability.

The third query — p50 amount on timeout — is the one to stare at. The unstructured version requires a hand-crafted regex (amount (\d+) via \S+ result GATEWAY_TIMEOUT) that has to encode the entire surrounding sentence shape, including the order of fields. If next month a developer adds a new field between amount and via, the regex silently returns no rows and the panel quietly reads zero. The structured version is j["amount"] on the events that match j["reason"] == "GATEWAY_TIMEOUT" — there is no positional encoding, no shape assumption, no fragility. The "structured logs are easier to query" argument usually focuses on syntax sugar; the real argument is that structured queries do not depend on the textual layout of the message and therefore do not silently break when the message changes.

Why the size penalty disappears in production: in the synthetic above, gzip compresses the JSON keys ("merchant", "amount", "acquirer") along with the values, getting a 7-9× ratio. In a real Loki or ClickHouse store, the schema-aware columnar layout (Loki's structured-metadata, ClickHouse's JSONEachRow with MergeTree, Elasticsearch's _source + per-field codec) stores the keys exactly once per stream — they are not repeated per record at all. The on-disk size of structured logs is therefore typically smaller than the equivalent unstructured prose for the same information content, because the field names get factored out and the values pack into typed columns with type-specific compression. The 25% wire-size penalty is real on the wire; the 10% gzip penalty is real on cold storage; the on-disk penalty after the backend's columnar encoding is typically negative.

Migrating a real codebase — patterns and pitfalls

The hardest part of a migration is not the new log calls; it is the existing thirteen thousand logger.info(f"...") calls that pre-date the discipline. A typical Indian-fintech monolith that has accumulated log calls over five or six years has somewhere between 8,000 and 30,000 of them, distributed across hundreds of files, written by dozens of engineers in three different style eras. The naive approach — open every file and rewrite — fails because nobody has the budget to do that, and even if they did, no-one knows which calls are still live in production traffic. The migrations that have actually shipped at scale (Razorpay 2022, Swiggy 2023, Flipkart 2024) all follow roughly the same pattern, in roughly the same order:

The first move is lint-level enforcement on new calls. A pre-commit hook, a Bandit-style AST checker, or a lightweight grep rule in CI fails any PR that introduces logger.<level>(f"...") or string concatenation inside a log call. New code stops adding to the unstructured pile. This catches roughly 80% of the future drift with one config change and zero refactor work, and it is the only step you should take before doing anything else — without it, the migration is a sieve.

The second move is a shared structured-logging helper that the codebase calls instead of the raw logger. Something like audit_log("payment_failed", merchant=m, amount=a, reason=r) that internally calls logger.bind(**kwargs).info(event) and enforces the schema (merchant is always a string, amount is always paise-integer, reason is always one of a known enum). Building this helper is a one-day task; getting it adopted is a six-month task because every team has to rewrite their hot-path log calls. The helper is also where you put the trace-context binding (auto-attach trace_id/span_id from the active OTel span), the PII redaction (auto-redact any pan or aadhaar field), and the schema versioning (schema_version=2 on every line so the agent can branch on parsing).

The third move is agent-side parsing for the legacy lines. The Vector or Fluent Bit pipeline gets a transform that runs every unstructured line through a regex set keyed by the source file — payments-api lines match one regex, risk-engine lines match another, legacy-php-monolith lines match a third. The regex extracts the variable parts and synthesises a structured envelope around them, so by the time the line reaches Loki it is JSON-shaped even if the application is still emitting prose. This is the bridge that lets the dashboards switch to structured queries before the application migration completes. The regex set is fragile — every logger.info change can break a regex — but it is acceptable as a transition because the regex set decays naturally as the application code migrates. Razorpay's 2022 transition shipped about 140 such regexes covering 92% of their log volume, and by the end of 2023 only 18 remained as the application-side migration caught up.

# vector_parse_legacy.py — what the agent-side regex extraction looks like
# pip install regex orjson
import regex as re, orjson

LEGACY_PATTERNS = [
    # payments-api f-string from 2019
    (re.compile(
        r"^(?P<ts>\S+ \S+) (?P<level>\w+) Payment via (?P<method>\w+) "
        r"for merchant (?P<merchant>M\d+) amount (?P<amount>\d+) via "
        r"(?P<acquirer>\S+) result (?P<reason>\w+) retries (?P<retries>\d+)$"),
     {"event": "payment", "service": "payments-api"}),
    # risk-engine pre-2021 style
    (re.compile(
        r"^(?P<ts>\S+ \S+) (?P<level>\w+) Risk decision for "
        r"(?P<user>U\d+) score=(?P<score>\d+) verdict=(?P<verdict>\w+)$"),
     {"event": "risk_decision", "service": "risk-engine"}),
]

def parse_legacy(line: str) -> bytes | None:
    for pattern, defaults in LEGACY_PATTERNS:
        m = pattern.match(line.strip())
        if not m:
            continue
        d = m.groupdict()
        for k in ("amount", "score", "retries"):
            if k in d: d[k] = int(d[k])
        return orjson.dumps({**defaults, **d})
    return None  # let it through as raw if no pattern matches

samples = [
    "2026-04-25 11:23:14 INFO Payment via UPI for merchant M01023 amount 4280 via razorpay-acq-3 result GATEWAY_TIMEOUT retries 3",
    "2026-04-25 11:23:15 WARN Risk decision for U7821 score=82 verdict=REVIEW",
    "2026-04-25 11:23:16 INFO Cache hit ratio 0.94 over last minute",  # no pattern
]
for s in samples:
    out = parse_legacy(s)
    print((out or s.encode()).decode()[:120])

{"event":"payment","service":"payments-api","ts":"2026-04-25 11:23:14","level":"INFO","method":"UPI","merchant":"M01023","amoun
{"event":"risk_decision","service":"risk-engine","ts":"2026-04-25 11:23:15","level":"WARN","user":"U7821","score":82,"verdict":
2026-04-25 11:23:16 INFO Cache hit ratio 0.94 over last minute

Two of three legacy lines are now structured at the agent; the third was not in the pattern set and falls through unchanged, where it stays grep-able but not group-able. Vector configs do exactly this with their parser transform — the Python above is the algorithm, the production version is YAML. The fall-through behaviour matters: a line that fails to parse should never be dropped, because a missing pattern is a parsing bug, not a data-quality decision, and the cost of losing forensic data is much higher than the cost of keeping a few unparsed lines in the index.

The fourth move is measurable adoption tracking. A weekly metric — log_lines_total{shape="structured"} / log_lines_total{shape=~".*"} — that tells the platform team whether the migration is progressing or stalled. Most teams do not measure this and therefore do not know that their migration plateaued at 60% three months ago. The teams that do measure it — Razorpay, Cred, Swiggy — uniformly report that adoption follows an S-curve: slow for the first three months while teams write the helper and learn the patterns, fast for the next three to four months as the high-frequency call sites convert, then a long flat tail (sometimes years) for the rare-event log calls in low-touch services. The "we are 95% structured" milestone is the one that matters; chasing the last 5% is usually not worth the engineering time and is better handled by the agent-side regex pipeline.

Why the agent-side regex bridge is structurally important during migration: the production logging pipeline is read by dashboards, alerts, and on-call queries that cannot wait 18 months for the application code to migrate. If you wait until the application is fully structured before switching the dashboards, you keep paying the cost of unstructured queries in the meantime, and the migration pays back nothing until the very end. The agent-side regex set lets the dashboards switch to structured queries from week one — the data shape they consume is already JSON, even though the data shape the application emits is still prose. The regex layer is throwaway code by design (it dies as the application converts), but its lifetime value is roughly the area between the migration's start and end times multiplied by the per-query cost difference, which for a payments-scale fleet is a number with seven digits in rupees. The regex layer is what makes the migration pay back continuously rather than at the end.

A common mistake during migration is over-eager flattening. Faced with logger.info(f"User {u.id} did {action}"), the temptation is to write logger.bind(user_id=u.id, action=action).info("user_action") and call it done. But u may have ten more attributes (email, segment, signup_date, last_login) that were never in the original log line because the f-string only mentioned id — and now is the moment to decide which of those become attributes. Bind too few and the structured form is a strictly worse log line (lower information content for higher cost); bind too many and the log size triples and your bill follows. The right answer is bind what an incident query would ask — for a payment_failed event, that is merchant_id, amount, method, acquirer, reason, retries; not the user's signup date or email. The discipline is to imagine the dashboard panel or alert query that would consume this log line, and bind exactly the fields that panel needs to filter or group by.

What the agent and backend do with structure

The reason structured logs are queryable is not magic; it is that the agent and backend can build typed indexes and columnar storage from the JSON shape. Walking through what each layer does makes the cost-benefit obvious.

The agent (Vector, Fluent Bit, OTel Collector) reads the JSON line, validates it as a parseable object, applies any per-field transforms (PII redaction, attribute renaming for schema migration, dropping fields that exceed size limits), and ships it to the backend with the parsed structure preserved. A non-structured line goes through the same agent but as a raw string — the agent has nothing to act on, so it cannot redact PII reliably (regex on prose is best-effort), cannot apply field-level rate limits (it can only rate-limit by source, not by field-value), cannot route by attribute. The structural difference at the agent is roughly 5–10× more configuration possibilities for structured input, and that headroom is what makes the bill controllable as the fleet grows.

The backend does the heavy lifting. Loki's structured_metadata feature (introduced in Loki 3.0) stores parsed JSON attributes as queryable metadata alongside the chunk, separate from the indexed labels — {service="payments-api"} | merchant="M01023" uses the chunk-side metadata index, which is fast on recent data and scans-with-skip on older chunks. Elasticsearch's per-field _source with optional index: true per attribute gives you the same property with different mechanics: the JSON document is parsed at ingest, the indexed fields go into Lucene's inverted index, the rest stays in _source. ClickHouse's JSONEachRow ingest with JSON column type produces a fully typed columnar layout where every distinct attribute path becomes its own dynamically-typed column with its own per-column compression codec — the densest layout of the three, at the cost of a less-mature observability ecosystem.

Illustrative — what each layer of the log pipeline can do with a structured vs unstructured line. The choice the application makes at emission time determines the option-set at every downstream layer. Once a line is unstructured at the application, no amount of agent or backend cleverness recovers the lost queryability.

The cardinality conversation lives at the boundary between structured-metadata and labels. Labels in Loki, indexed-fields in Elasticsearch — the fields the backend builds an inverted index on — must be low-cardinality (typically <50 distinct values per stream). Service name, level, region, environment, cluster: these are good labels. High-cardinality fields — merchant_id with 2 million values, user_id with 14 million, trace_id with billions — must live in structured-metadata or _source, where they are queryable but not indexed. Putting merchant_id as a Loki label produces 2 million streams and breaks the ingester; putting it as structured-metadata produces zero new streams and lets you filter by it inside a label-scoped query ({service="payments-api"} | merchant_id="M01023"). The label-vs-payload split is the same calculus as for metric labels, and the same misconfiguration kills both kinds of backend in the same way.

A subtle property of structured logs is that they make log-to-metric extraction cheap and consistent. If the application emits {"event":"payment_failed","reason":"GATEWAY_TIMEOUT","duration_ms":830}, the backend can extract a counter rate({service="payments-api"} | event="payment_failed" | rate by reason) and a histogram quantile_over_time(0.99, {service="payments-api"} | event="payment_succeeded" | unwrap duration_ms[5m]) directly from the log stream — no parallel metric emission needed. This pattern lets a service that ships only structured logs derive every metric it needs from the log pipeline, and is the technical foundation of the "wide events" school of observability that Honeycomb has been advocating since 2018. The trade is freshness (log-derived metrics are typically 5–30 seconds behind native Prometheus counters) and write cost (logs are 11,000× more expensive per event than metrics, so the freshness premium is paid in volume), so the right answer is usually a hybrid — native metrics for the high-frequency, latency-critical alerts; log-derived metrics for everything else.

Common confusions

"JSON-formatted logs are the same as structured logs." JSON is the wire format; structured logging is the discipline of keeping the message constant and putting variables in named fields. logger.info(json.dumps({"msg": f"Payment failed for {merchant}"})) is JSON-formatted but not structured — the variability is still in the message text, and the backend gets one unique msg per event instead of one shared event name across all events. Real structured logging is logger.bind(merchant=merchant).info("payment_failed") — same JSON wire format, but payment_failed is now a stable grouping key.
"Logfmt and JSON are equivalent for structured logging." Logfmt (key=value key=value) supports field=value queries but loses type information — every value is a string at parse time, so amount=4280 and amount="4280" are indistinguishable, and range queries on numeric fields are lexicographic. JSON preserves the six wire types, so amount: 4280 is a number and amount > 5000 works as expected. Logfmt is acceptable for low-stakes internal services; JSON is the right choice when types matter, which is most of the time.
"Structured logging is more expensive because the lines are longer." The on-wire bytes are typically 20-30% larger because of the JSON structure, but on cold storage the ratio falls to ~10% with gzip, and on backend disks the ratio is often negative because columnar storage factors out repeated keys and packs values with type-specific compression. The per-event cost difference is dwarfed by the per-incident cost difference: an unstructured pipeline costs less to store but costs hours of on-call time per incident in regex writing, while a structured pipeline costs slightly more to store but answers most incident queries in seconds. The TCO calculation is one-sided.
"You can always parse unstructured logs at query time with a regex." You can, until the message format changes. Unstructured logs encode the variable parts positionally inside a sentence, and any change to the surrounding sentence (a new field added, a word reworded, a comma moved) breaks every regex that depended on the old shape. The dashboards quietly read zero rows, the alerts quietly stop firing, and nobody notices until the next incident is the one that should have paged. Structured logs do not have this fragility because the field names are part of the data, not part of the layout.
"OpenTelemetry's LogRecord schema is just JSON with extra steps." OTLP LogRecords are JSON-shaped at the protobuf-encoding layer but add three things plain JSON does not have: severity_number (integer 1-24 instead of free-text level strings), trace_id and span_id as first-class fields (so the join to traces is structural, not by-attribute-lookup), and resource attributes (service.name, k8s.pod.name, host.name as a separate dimension applied to every record from that resource, instead of repeated per-line). The wire format is essentially JSON, but the schema is observability-aware in ways that pure JSON is not.
"Once you migrate to structured logs you can drop your existing parsers." The migration is rarely complete. Five years of unstructured log calls do not all rewrite themselves, so your agent will run regex-based parsers against legacy lines for a long time — typically 18 to 36 months past the start of migration. The discipline is to keep the regexes small, named, and owned by the team that owns the source, and to retire each one as the corresponding application code converts. The presence of legacy parsers is not a failure of the migration; it is the bridge that lets the migration ship without a hard cutover.

Going deeper

What the OpenTelemetry Logs Data Model adds and why

The OpenTelemetry Logs Data Model is the most recent and most complete attempt to standardise the structured-log schema across languages, frameworks, and backends. It is worth reading the spec end-to-end (it is short — about 30 pages) because every field in it earns its place. timeUnixNano — nanosecond-precision timestamp, no timezone string to parse. observedTimeUnixNano — when the agent saw the log, separate from when the application produced it, so clock skew and shipping latency can be measured. severityNumber — integer 1-24 mapping to TRACE/DEBUG/INFO/WARN/ERROR/FATAL with sub-levels, replacing the inconsistent info/INFO/Info/20 strings that every codebase has at least three of. severityText — the original level string, kept for human readability. body — a structured value (string, number, object, array) holding the event payload, which can be a free-form message or a structured dictionary. attributes — a list of typed key-value pairs attached to the record. traceId and spanId — first-class binary identifiers that connect this log record to the active span at emission time. resource — attributes about the emitting process (service.name, host.name, k8s.pod.name) that apply to every log record from this resource and are therefore stored once per resource, not once per record. scope — instrumentation library name and version, so a pip install of an instrumentation package can be tracked across the fleet. The schema is denser than logfmt, denser than plain JSON, and densely supported by every modern collector — and adopting it gives every log record the same fields, which is what makes cross-service queries possible without ad-hoc field translation.

How Loki's structured_metadata changes the cost story

Loki's structured_metadata feature (released in Loki 3.0, 2024) is a quiet but substantial change to the cost model of structured logging. Before structured_metadata, the only way to make a JSON field queryable was to either promote it to a label (which paid cardinality cost, often unbearably) or to scan the entire chunk body at query time with | json | field=value (which paid scan cost, often slowly). Structured metadata is a third option: parsed JSON attributes are stored alongside the chunk in a per-chunk index that is faster to scan than the chunk body but doesn't pay the cardinality cost of a label. The empirical result, from Grafana's own benchmarks and Cred's 2024 platform-team report, is that a query like {service="payments-api"} | merchant_id="M01023" against structured-metadata runs about 3-5× faster than the equivalent against the chunk body, and the storage overhead is typically 5-10% of the chunk size — i.e., much cheaper than the same data as labels. The lesson is that as the backends evolve, the right place for high-cardinality structured fields keeps getting closer to first-class queryability without paying the full cardinality cost. Choosing the right backend version matters as much as choosing the right schema.

The Honeycomb wide-events school

Charity Majors and Liz Fong-Jones have been advocating a position called wide events since 2018 — the idea that every log line should be a high-cardinality, high-attribute structured record with everything you might ever need to query, all in one event, rather than the traditional split of "logs for events, metrics for counts, traces for spans". The argument is that disk is cheap, a wide event is a superset of all three primitives (you can derive metrics by aggregation, derive traces by span_id correlation, query as logs), and the operational cost of maintaining three parallel pipelines is much higher than the storage cost of one fat one. Honeycomb's Refinery is the tool that makes this practical at scale — it does tail-based sampling of wide events with full attribute retention on errors and slow requests, and lets you query the resulting dataset with field-level filters and aggregations. Whether you adopt the wide-events position or stay with the three-pillars model, the structural primitive is the same: a log line is a structured record with every attribute the producer cared about, and the choice is downstream — how much of it you keep, how you query it, how you aggregate it. Wide events is what structured logging looks like when you take it all the way.

Schema versioning — when fields move and what happens

A schema choice you make in 2026 will be wrong in 2028. Field names get renamed (user_id becomes customer_id after the company segments its users), types change (amount becomes amount_paise to disambiguate units), nesting shifts (merchant: "M01023" becomes merchant: {id: "M01023", region: "south"}). Every codebase that does structured logging for more than a year accumulates schema drift, and the question is how to handle it gracefully. The pattern that has held up is schema_version as a first-class field — every record carries schema_version: 2 (or 3, or 7) and the agent or query layer branches on it. The agent's transform rules can normalise old-version records to the current schema (rename user_id → customer_id, move flat fields into nested objects), so dashboards don't have to. The cost is one extra field per record (~6 bytes) and the discipline of incrementing the version every time the schema changes; the benefit is that schema migration becomes an agent config change rather than a coordinated cross-team rewrite of every dashboard. Razorpay's payments service is on schema_version 7 as of early 2026; their dashboards query the canonicalised current schema and the agent does the version-to-version coercion.

When unstructured is the right choice

There are exactly two situations where unstructured logs are the right choice and structured is overkill: kernel and very-low-level system logs (where the producer is printk or syslog(3) and there is no application-level discretion to add structure) and single-developer scripts that will never run again. Outside those, the default should be structured. The argument "structured is overkill for this small service" is almost always wrong because services rarely stay small; a service that handles 50 RPS today handles 5,000 RPS in two years if the company is doing anything right, and the cost of retrofitting structure into a five-year-old service is vastly higher than the cost of writing it structured from the start. The other argument — "we will add structure when we need it" — is also almost always wrong because the moment you need it is the moment of an incident, and rewriting the log calls during an incident is not a viable plan. Default to structured; use unstructured only when the producer is genuinely outside your control.

# Reproduce this on your laptop
python3 -m venv .venv && source .venv/bin/activate
pip install loguru orjson regex
python3 struct_vs_unstruct.py
python3 vector_parse_legacy.py
# Expected: identical query results across both shapes; structured queries
# are simpler and don't depend on positional encoding of the message text.
# To compare in a real backend, ship structured JSON to a local Loki via
# python-logging-loki and run the same queries via logcli.

Where this leads next

Wall: logs are the oldest pillar and the most abused — the framing chapter for Part 3, covering the three pathologies (over-emission, under-structure, indefinite retention) that this chapter's structural discipline begins to address.
Cardinality: the master variable — the label-vs-payload split for log fields is the same calculus as for metric labels, and getting it wrong is the most common reason a structured-logging migration produces a more expensive backend rather than a cheaper one.
Metrics, logs, traces — what each is good at — once log lines are structured wide events, the pillar boundaries become fuzzier and the question of which pillar to use for which question becomes a query-shape decision, not a data-shape one.

The next chapter in this section moves from the schema-and-shape question to the query language that operates on structured logs at scale — LogQL's grammar, what it can and cannot express, and the structural reason that field-level queries on structured logs are 10-100× faster than substring searches on unstructured ones. The discipline of structured emission this chapter describes is the foundation; the queries the next chapter writes are what make the discipline pay off in practice.

References

OpenTelemetry — Logs Data Model — the canonical structured-log schema, with severity_number, trace_id, span_id, resource attributes, and scope as first-class fields.
Charity Majors, "Logs are streams, not files" (Honeycomb blog, 2019) — the wide-events position and the argument that every log line should be a high-cardinality structured record.
Cindy Sridharan, Distributed Systems Observability (O'Reilly, 2018), Ch. 5 — the foundational text on structured logging, with practical guidance on schema design and the migration pattern from f-strings to wide events.
Grafana Loki — Structured metadata — the third option (beyond labels and chunk-body scan) for queryable high-cardinality structured fields, released in Loki 3.0.
Vector documentation — VRL transforms — the agent-side language for parsing, redacting, and reshaping log lines as they flow through the pipeline; the practical home of the legacy-pattern regex set.
Logfmt — Brandur Leach, "logfmt: log files for human consumption" — the case for the lighter key=value semi-structured format, with notes on its tradeoffs against JSON.
Wall: logs are the oldest pillar and the most abused — internal article on the three pathologies of log pipelines and the framing of structured logging as the prerequisite for fixing two of them at once.