Note: Company names, engineers, incidents, numbers, and scaling scenarios in this article are hypothetical — even when they resemble real ones. See the full disclaimer.

Semantic conventions

It is 04:30 IST at a hypothetical Hotstar streaming-platform NOC and Karan, the on-call SRE for the playback service, is staring at a TraceQL query that should return the slowest playback initiations from the last hour and is returning nothing. The query is {http.method = "GET" && http.route = "/play/{contentId}"}. He has watched playback latency p99 climb from 180 ms to 950 ms on the dashboard above his desk for the last twelve minutes; he knows the traces are there because the QPS panel shows 240k requests/sec hitting the endpoint. The TraceQL just refuses to find them. Twenty-three minutes later, after pulling a single span out of Tempo by trace ID, he sees the attributes: http.request.method = "GET", url.template = "/play/{contentId}". Neither of his query keys exists on this span. The Java service was upgraded to opentelemetry-javaagent-2.5.0 last weekend, which moved the agent from semantic-conventions v1.20.0 (http.method, http.route) to v1.27.0 (http.request.method, url.template), and the dashboards, alerts, and runbooks that the team had built over two years all still query the old keys. The data is being captured. Nothing in Grafana can find it.

Semantic conventions are the OpenTelemetry specification that defines the names and meanings of the attributes auto-instrumentation and manual instrumentation are supposed to put on spans, metrics, and logs. They answer questions like "what key holds the HTTP status code?" (http.response.status_code, since v1.21), "what holds the database name?" (db.namespace, since v1.24, replacing the older db.name), "what holds the messaging system?" (messaging.system), and several thousand more. The conventions themselves are not code — they are a YAML-driven spec at github.com/open-telemetry/semantic-conventions — but every SDK, every auto-instrumentation package, every backend, and every dashboard query that participates in an OTel pipeline is making bets on them. When the conventions evolve (and they do, on a quarterly cadence), every consumer of those bets has to migrate or break.

Semantic conventions are OpenTelemetry's spec for what attribute keys mean — the agreement that lets a span attribute named http.route from a Python Flask service join a metric labelled http.route from a Java Spring service in the same dashboard query. They are versioned (currently v1.30.x), evolve via a deprecation cycle, and are split into "stable" attributes you can rely on and "experimental" attributes that will rename. A fleet that pins semantic-convention versions per service and ships a Schema URL with every span survives SDK upgrades; a fleet that does not has dashboards that silently go blank when an agent jumps a minor version.

What semantic conventions are — a spec, not code

The OpenTelemetry semantic-conventions repo is a YAML database of attribute definitions. Each attribute has a key (http.request.method), a stability level (stable, experimental, deprecated), a type (string, int, double, boolean, string array), allowed values (for enums — http.request.method is one of GET, POST, PUT, ...), a brief description, and a list of requirement levels per signal type (required on HTTP server spans, recommended on HTTP client spans, opt-in on metrics). The repo currently holds about 1,300 such attributes spread across thirty-odd domains: http, db, messaging, rpc, network, process, host, cloud, k8s, aws, azure, gcp, faas, feature_flags, genai, and a long tail of others.

The YAML is then code-generated into language-specific constants. In Python, the Flask instrumentation imports SpanAttributes.HTTP_REQUEST_METHOD from opentelemetry.semconv.attributes, a constant whose string value is "http.request.method". In Java, it imports HttpAttributes.HTTP_REQUEST_METHOD. In Go, semconv.HTTPRequestMethodKey. Every SDK and instrumentation library compiles in a snapshot of the conventions at build time, which is why upgrading the agent jar is what shifts the attribute names — the compiled-in constants are different.

Semantic conventions — the spec, the codegen, and what ends up on the wireA vertical pipeline. At the top, the YAML spec at github.com/open-telemetry/semantic-conventions defines an attribute with key, type, stability, requirement level. Below, codegen produces language-specific constants in Python, Java, Go, JavaScript bindings. Below that, instrumentation libraries import those constants and call setAttribute when emitting spans. Below that, the OTLP wire format carries the key-value pair. At the bottom, backends like Tempo, Honeycomb, Datadog index by attribute key for queries.From spec to query — what a single attribute name has to surviveIllustrative — the same key flows through five layers; an upgrade at any one layer can rename it under you.Layer 1 — YAML specid: http.request.method type: string stability: stable requirement_level: requiredLayer 2 — codegen into language constantsPython: HTTP_REQUEST_METHOD="http.request.method" Java: HTTP_REQUEST_METHOD Go: HTTPRequestMethodKeyLayer 3 — instrumentation library uses constantspan.set_attribute(HTTP_REQUEST_METHOD, request.method) # Flask, FastAPI, requests, ...Layer 4 — OTLP on the wireKeyValue { key: "http.request.method", value: { string_value: "GET" } } # protobuf bytesLayer 5 — backend index + query{ http.request.method = "GET" } # TraceQL / PromQL / SQL — your dashboard query keys this name
Illustrative — semantic conventions live as YAML in the spec, get codegen'd into language constants, get baked into instrumentation libraries at build time, travel as protobuf KeyValue pairs over OTLP, and end up as query keys in the backend. A version bump at layer 1 cascades down to layer 5, but only at the speed of upstream redeploys — which is uneven across a fleet.

The conventions themselves split into two categories that the spec is careful to distinguish but most tutorials are not. Resource attributes describe the thing emitting telemetry — the service, the host, the container, the cloud region. They go on the OTLP Resource once per process and apply to every span, metric, and log it emits. The standard set is service.name (required), service.version, service.instance.id, service.namespace, plus the host.*, os.*, process.*, container.*, k8s.*, cloud.* families when running in those environments. Signal attributes describe the event being recorded — the HTTP request, the DB query, the Kafka publish. They go on each span individually. http.request.method is a signal attribute; service.name is a resource attribute. Why this split matters for your TSDB bill: resource attributes are emitted once per process and are typically low-cardinality (a service has one name, one version, runs on one host at a time). Signal attributes are emitted once per event and can blow cardinality if the wrong attribute is promoted to a metric label — http.request.method has six possible values (low cardinality, safe), but http.target (the concrete URL path) has thousands per service (high cardinality, dangerous). The semantic-convention spec marks signal attributes that are safe-to-label-on-metrics with a requirement_level: opt_in on the metrics signal — most teams ignore this and pay for it later.

The third category, often elided, is schema-bound attribute conventions for events — the structure of event.name, span events with names like exception or feature_flag, and the strict per-event attribute lists those names imply. An exception event must carry exception.type, exception.message, exception.stacktrace; a feature-flag event must carry feature_flag.key, feature_flag.provider_name. These are conventions on the contents of an event, not on the event's container, and they are the part of semantic conventions that backends use to drive specialised UIs (Sentry-style exception views, feature-flag attribution panels). Treat them as conventions just as binding as the HTTP/DB ones — the feature_flag.key panel in Honeycomb breaks the same way Karan's dashboard broke if your code emits featureFlag.key (camelCase, common in JavaScript) instead of the snake-case spec form.

See semantic conventions on the wire

The cleanest way to internalise semantic conventions is to look at the bytes they put on the OTLP wire. The script below stands up a Flask service, instruments it with two different HTTP instrumentation packages — one pinned to the older spec (http.method, http.route) and one to the newer spec (http.request.method, url.template) — captures both, and prints the keys each emits. The point is to make the version skew Karan hit at 04:30 visible in 30 lines of Python.

# semconv_skew_demo.py — show how a single concept (HTTP method, HTTP route) lands
# on the wire as different attribute keys depending on the semconv version
# baked into the instrumentation library.
# pip install flask requests opentelemetry-sdk \
#             opentelemetry-exporter-otlp-proto-grpc grpcio
import os, time, threading
from concurrent import futures
import grpc, requests
from flask import Flask, request
from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter
from opentelemetry.sdk.resources import Resource
from opentelemetry.proto.collector.trace.v1 import (
    trace_service_pb2, trace_service_pb2_grpc)

# 1) Capture collector that just records every received KeyValue.
CAPTURED_ATTRS = []
class Capture(trace_service_pb2_grpc.TraceServiceServicer):
    def Export(self, req, ctx):
        for rs in req.resource_spans:
            for ss in rs.scope_spans:
                for sp in ss.spans:
                    keys = sorted(a.key for a in sp.attributes)
                    CAPTURED_ATTRS.append({
                        "span_name": sp.name,
                        "scope": ss.scope.name,
                        "scope_version": ss.scope.version,
                        "schema_url": rs.schema_url,
                        "keys": keys,
                    })
        return trace_service_pb2.ExportTraceServiceResponse()
srv = grpc.server(futures.ThreadPoolExecutor(max_workers=2))
trace_service_pb2_grpc.add_TraceServiceServicer_to_server(Capture(), srv)
srv.add_insecure_port("127.0.0.1:24317"); srv.start()

# 2) Two SDK setups: one writing OLD-spec keys, one writing NEW-spec keys.
def make_provider(scope_name: str, scope_version: str, schema_url: str):
    tp = TracerProvider(resource=Resource.create({
        "service.name": "hotstar-playback-api",
        "service.version": "2.5.0",
    }))
    tp.add_span_processor(BatchSpanProcessor(
        OTLPSpanExporter(endpoint="http://127.0.0.1:24317", insecure=True)))
    return tp.get_tracer(scope_name, scope_version, schema_url=schema_url)

old = make_provider("instr.http.old", "1.20.0",
                    "https://opentelemetry.io/schemas/1.20.0")
new = make_provider("instr.http.new", "1.27.0",
                    "https://opentelemetry.io/schemas/1.27.0")

# 3) Same span, two attribute namings.
with old.start_as_current_span("GET /play/{contentId}") as s:
    s.set_attribute("http.method", "GET")
    s.set_attribute("http.route", "/play/{contentId}")
    s.set_attribute("http.status_code", 200)
with new.start_as_current_span("GET /play/{contentId}") as s:
    s.set_attribute("http.request.method", "GET")
    s.set_attribute("url.template", "/play/{contentId}")
    s.set_attribute("http.response.status_code", 200)
time.sleep(2.0)

for c in CAPTURED_ATTRS:
    print(f"scope={c['scope']:18}  semconv={c['scope_version']:7}  schema={c['schema_url'][-7:]}")
    print(f"   keys: {c['keys']}")
Sample run:
scope=instr.http.old      semconv=1.20.0   schema=1.20.0
   keys: ['http.method', 'http.route', 'http.status_code']
scope=instr.http.new      semconv=1.27.0   schema=1.27.0
   keys: ['http.request.method', 'http.response.status_code', 'url.template']

Six lines of output, four lessons. The two scopes emit semantically identical information using disjoint key sets. A query for http.method matches the first span and not the second; a query for http.request.method matches the second and not the first. Why this is the failure mode at the top of this chapter: a fleet that runs Java services on opentelemetry-javaagent-1.20 and Python services on opentelemetry-instrumentation-flask shipped with semconv 1.27 will produce traces where the same conceptual attribute exists under two names depending on which service emitted the span. A TraceQL query against the trace tree is matching at most half the spans in any given trace. Dashboards built before the upgrade silently lose half their data. The schema_url is the contract. Each Resource and each ScopeSpans block in OTLP carries a schema_url — a URL pointing to the semantic-convention version that this telemetry conforms to. The backend can use this URL to know "this span uses 1.20.0 keys, that span uses 1.27.0 keys" and apply a translation table. Backends that ignore schema_url (most of them, in 2026) cannot translate. The scope version is not the SDK version. opentelemetry-sdk may be 1.30.0 (the SDK version), but the instrumentation scope reports 1.27.0 (the semconv version it was built against). The two version numbers are independent, which is exactly the source of confusion every team hits in their first migration. The span name is identical across both. Span names are not part of semantic conventions — they are convention-by-convention (HTTP server spans should be named after the route template) but the spec does not enforce naming, and inconsistent naming is its own debugging trap.

The schema_url field is the part of OTLP that exists to solve this problem and that almost nobody uses. The OTel spec defines OpenTelemetry Schema Files — a YAML format that describes how to translate from version N to version N+1 (rename http.method to http.request.method, drop http.flavor in favour of network.protocol.version, etc.). A backend that reads schema_url, fetches the schema file, and applies the translation can present a unified view across versions. Tempo, Honeycomb, Datadog, New Relic — all of them have partial implementations as of 2026; none of them have complete coverage. The pragmatic answer is to read schema_url yourself in your Collector and emit a schema.url attribute on every span so your queries can filter by version explicitly when you are debugging a skew.

The migration cycle — how a name actually changes

The HTTP semantic conventions spent two years migrating from http.method to http.request.method, and the migration ladder is the canonical example for every other convention change. Understanding the ladder is what lets a fleet survive without breaking dashboards.

The cycle has four phases, and each phase is a specific signal that downstream consumers can detect.

Phase 1 — proposal and experimental. A new attribute (http.request.method) is added to the spec marked stability: experimental. The instrumentation libraries do not emit it yet. Backends and dashboards should not query it. This phase typically lasts 3–6 months while the SIG (special-interest group) gathers feedback. The signal: the attribute exists in semantic-conventions/docs/http/ but is marked experimental.

Phase 2 — dual emission opt-in. The instrumentation libraries gain a feature flag (OTEL_SEMCONV_STABILITY_OPT_IN=http) that, when set, makes the library emit both the old and new attribute names on every span. This is the safe migration path: you ship the env var, dashboards continue to work against the old keys while you migrate them to the new keys, and once every dashboard is migrated you remove the env var (or move it to the next mode). Spans during this phase are double the attribute count for HTTP spans, which has a real cost — roughly +200 bytes per span on the OTLP wire, +5–10% TSDB cardinality if the new keys are also promoted to metric labels.

Phase 3 — new-default opt-out. The default flips. OTEL_SEMCONV_STABILITY_OPT_IN=http/dup continues to emit both; OTEL_SEMCONV_STABILITY_OPT_IN=http/old emits only the old; the unset default emits only the new. This is the phase Karan's incident lived in — the agent upgrade silently moved the team from dup (the prior default in the older javaagent) to the new default, and dashboards that still queried old keys went blank. The signal: the agent's release notes say "default semconv changed to 1.27.0".

Phase 4 — old removed. The old attributes are removed from the spec entirely. Instrumentation libraries no longer emit them under any flag. Backends that still index them keep old data queryable but new data does not populate. This is typically 18–24 months after Phase 1 begins. The signal: the spec marks the old attributes deprecated: removed.

The four-phase migration cycle for a renamed attributeA horizontal timeline with four phases. Phase 1 experimental: new key exists in spec only, no library emits it. Phase 2 dual emission: env var opts into emitting both old and new. Phase 3 new-default: default flips to new only, opt-out to keep old. Phase 4 old removed: only new key exists. Above each phase the visible signal a fleet operator can detect; below each phase the action a fleet should take.The migration ladder for a renamed attribute (e.g. http.method → http.request.method)Illustrative — actual durations for the http migration: 4 months experimental, 9 months dual, 8 months new-default, ongoing removal.Phase 1experimentalspec only, libs silentPhase 2dual-emission opt-inenv var → both keysPhase 3new-defaultold via opt-outPhase 4old removednew key onlyDetect: spec marks experimentalAction: do nothing,monitor SIG discussionsDetect: env var documentedAction: enable dup, buildparallel dashboards on newDetect: agent release notesAction: pin opt-out ORfinish migrating queriesDetect: spec marks removedAction: dashboards mustalready use new keysThe fleet that pins agent versions per service and tracks the spec changelog migrates through Phase 2 deliberately.The fleet that auto-upgrades agents and ignores the spec discovers Phase 3 the morning after the upgrade — at 04:30 IST.
Illustrative — the four-phase ladder semantic-convention changes follow. Phase 2 (dual emission) is the safe migration window; missing it forces an emergency migration during Phase 3 when dashboards have already broken. The discipline: subscribe to the spec changelog, treat each rename as a planned migration, never let an agent upgrade ship without a compatibility audit.

The discipline this imposes on a fleet is that agent version, semconv version, and dashboard query keys are three coupled variables that must move together. A typical Razorpay-shape platform team holds this coupling in a services.yaml somewhere: each service declares the agent version it runs, the semconv version that agent ships, and the dashboard JSON files that query its spans. A pre-merge check refuses to bump an agent version unless the dashboard files have been audited against the new semconv. It is bureaucratic; it is also what keeps the on-call out of the 04:30 IST blast zone. Teams that try to skip the bookkeeping discover that "we'll just upgrade and fix anything that breaks" is a 3-day incident, not a 30-minute one — because the breakage manifests as silently empty dashboards, not as a noisy error, and "silently empty" can take hours to notice.

The same migration ladder applies to every other convention rename. db.name is becoming db.namespace. messaging.kafka.message.offset became messaging.kafka.offset and is now messaging.destination.partition.id. peer.service is being absorbed into service.name on the remote-service side. Each rename runs its own four-phase cycle, possibly overlapping with others. A fleet that is mid-migration on three conventions at once is normal, not exceptional, and the way to survive is to track each convention's phase in the same services.yaml.

Semantic conventions also define requirement levels per attribute per signal. The four levels are required (the span/metric/log MUST set this attribute), conditionally required (MUST set under specific conditions, e.g. http.response.status_code is required only if the response was sent), recommended (SHOULD set, useful but not load-bearing), and opt-in (MAY set, expected to be off by default because of cost or privacy). A library that fails to emit a required attribute is non-conformant; a library that emits an opt-in attribute by default is also non-conformant in the other direction.

The required-vs-opt-in distinction is exactly the cardinality safety mechanism the spec uses. http.request.method is required (low cardinality, six values, safe). http.route is required for server spans (templated, ~50 values per service, safe). url.full is opt-in (high cardinality, can include query params and path params, dangerous). The spec is telling you, in the requirement level itself, which attributes are safe to use as metric labels and which are not. The spanmetrics connector (ch.91) and other span-to-metric converters should respect this and refuse to label on opt-in attributes by default. Most do not — most accept whatever attribute list you configure — and this is how Hotstar, Razorpay, and Flipkart all separately discovered cardinality bombs in their first OTel rollouts.

The other axis the spec uses to constrain attribute promotion is stability. Stable attributes are guaranteed not to rename within a major version of the spec; experimental attributes can rename in any minor version. The discipline: never base a long-lived dashboard query on an experimental attribute. The spec has a public list at opentelemetry.io/docs/specs/semconv/general/attribute-naming/ showing which attributes are stable (HTTP, gRPC, FaaS basics, the resource attributes). Everything else — genai.*, feature_flags.*, large parts of messaging.*, cicd.* — is experimental as of 2026 and will rename. A team that builds a feature-flag attribution dashboard on feature_flags.evaluation.id (experimental) must accept that the dashboard will break on the next semconv release that touches that namespace; the alternative is to wait until those attributes go stable, or to ship a Collector-level attribute alias that survives the rename.

The conditionally-required clause hides a third trap. http.response.status_code is conditionally required: it must be set if the response was sent. If the request was aborted before a response (client disconnected, timeout fired in the middleware), the attribute is correctly absent. A query that filters on http.response.status_code >= 500 will miss every aborted request, because the attribute is not "0" or "unknown" — it is not present at all. Teams that have not internalised this end up missing 5xx-shaped failures in their error-rate metrics. The fix is either to query on the absence ({ -http.response.status_code }) when investigating timeouts, or to use a recording rule that emits a 5xx_or_aborted synthetic metric counting both. The semantic convention is correct; the dashboard query needs to know about the conditional.

Policing conventions in your fleet — the audit pipeline

A spec that nobody enforces is documentation. The fleets that actually keep semantic-convention discipline run a conformance audit in their Collector pipeline — a small loop that samples spans, checks them against an expected attribute set, and emits a semconv_violations_total metric labelled by service, scope, and missing/extra attribute. The audit catches three failure shapes that release-time tests miss: third-party libraries silently emitting custom keys (a vendor SDK pinning to its own private semconv), drift where a team patches an instrumentation library locally and forgets to upstream the rename, and the slow accumulation of "we'll clean it up later" custom attributes that gradually replace the spec keys.

The audit shape that works is a Python sampling job that pulls 1000 spans per minute from the Collector's debug exporter (or a Tempo HTTP query), checks each span's resource and span attributes against a expected.yaml per service, and emits Prometheus metrics. The expected.yaml is generated from the same source-of-truth that drives instrumentation imports — one file, two consumers, single point of update. When semconv_violations_total{service="razorpay-payments-api", reason="missing_required"} starts climbing, the platform team has a paged signal before a downstream dashboard breaks.

# semconv_audit.py — sample spans, check conformance, emit Prometheus metrics
# pip install requests prometheus-client pyyaml
import time, yaml, requests
from prometheus_client import Counter, start_http_server

VIOLATIONS = Counter(
    "semconv_violations_total",
    "spans failing semconv conformance",
    ["service", "scope", "reason", "attribute"],
)

# expected.yaml — one block per (service, scope), listing required attribute keys
# and the set of allowed attribute keys (everything else is flagged as drift).
EXPECTED = yaml.safe_load(open("expected.yaml"))

def check_span(resource_attrs: dict, scope: str, span_attrs: dict, service: str):
    rules = EXPECTED.get(service, {}).get(scope)
    if not rules:
        VIOLATIONS.labels(service, scope, "unknown_scope", "-").inc()
        return
    for k in rules.get("required", []):
        if k not in span_attrs and k not in resource_attrs:
            VIOLATIONS.labels(service, scope, "missing_required", k).inc()
    allowed = set(rules.get("required", [])) | set(rules.get("recommended", []))
    for k in span_attrs:
        if k not in allowed and not k.startswith(("razorpay.", "internal.")):
            VIOLATIONS.labels(service, scope, "drift", k).inc()

def sample_loop(tempo_url: str):
    while True:
        # Pull a batch of recent traces; in practice, Tempo's /api/search +
        # /api/traces/{id}; here we just hit a mock endpoint that returns
        # a list of {resource_attrs, scope, span_attrs, service.name}.
        r = requests.get(f"{tempo_url}/api/search?limit=1000")
        for span in r.json()["spans"]:
            res = span["resource"]; scope = span["scope"]
            attrs = span["attributes"]; service = res.get("service.name", "?")
            check_span(res, scope, attrs, service)
        time.sleep(60.0)

if __name__ == "__main__":
    start_http_server(9464)
    sample_loop("http://tempo:3200")
Sample run after 10 minutes against a fleet mid-migration:
# HELP semconv_violations_total spans failing semconv conformance
# TYPE semconv_violations_total counter
semconv_violations_total{service="razorpay-payments-api",scope="instr.flask",reason="missing_required",attribute="http.response.status_code"} 1843
semconv_violations_total{service="razorpay-payments-api",scope="instr.flask",reason="drift",attribute="http_method"} 947
semconv_violations_total{service="hotstar-playback-api",scope="instr.servlet",reason="missing_required",attribute="url.template"} 12091
semconv_violations_total{service="zerodha-orders-api",scope="vendor.nse",reason="unknown_scope",attribute="-"} 3048

Five output lines, three concrete signals. missing_required = 1843 for http.response.status_code on razorpay-payments-api is exactly the conditionally-required trap from the previous section — the service is aborting requests before sending a response, and the audit is correctly flagging that the attribute is missing on those spans. The fix is not to make the instrumentation lie about a status code; it is to add a recording rule that counts aborted-without-response separately. Why this signal beats waiting for a dashboard to break: a missing required attribute manifests as "rows-not-counted" in any dashboard that filters on it. The dashboard goes blank in proportion to the violation rate, but the on-call only notices when something they care about goes blank. The audit catches the violation independently of any specific dashboard, weeks before the failure ladders up to user-visible incident metrics. drift = 947 for http_method (note the underscore) is a custom team-internal attribute — someone wrote a manual span and used http_method instead of the spec's http.request.method. This is the slow drift that erodes a fleet over months; catching it via an audit lets the platform team open a PR against the offending service before the attribute proliferates. unknown_scope = 3048 for vendor.nse is a third-party vendor SDK (a hypothetical NSE feed library at Zerodha) that emits its own scope; the audit cannot conform-check it because there are no rules, but the metric tells the platform team to ask the vendor for their semconv documentation or to add explicit allow-list rules.

The audit, like any other piece of observability tooling, must itself be observable. The prometheus-client server on port 9464 is what makes the audit's output queryable — a Grafana panel showing rate(semconv_violations_total[1h]) per service, with a 200-violations-per-minute alert threshold, gives the platform team the same cadence of feedback they have for any other infrastructure metric. The discipline that closes the loop is to wire semconv_violations_total into the platform team's quarterly OKRs: "drive fleet-wide violation rate below X per million spans". This is what makes the audit a system rather than a script — measurement plus a target plus a budget for closing the gap.

The Python audit above is a demonstration shape; the production version a Hotstar-shape platform team runs is more elaborate (samples 5%–10% of spans rather than 1000-per-minute, tracks per-attribute deprecation timelines so it can alert on attributes that will be removed in N spec releases, integrates with the company's deprecation-warning Slack bot). The shape is the same in both: spans in, rules-checked, violations metricized, alert when violations climb. Building this once and running it forever is what graduates a fleet from "we use semantic conventions" to "we are conformant to semantic conventions" — the difference between a spec mentioned in a runbook and a spec the system actively defends.

Common confusions

  • "Semantic conventions are the same as OpenTelemetry." They are part of the OTel project but not the same thing. The OTel spec has three large pieces — the API (how your code creates spans), the SDK (how spans get processed and exported), and the semantic conventions (what attributes spans carry). A library can be API-conformant and convention-non-conformant; this is the most common shape of "OTel" libraries built before semconv stability landed.
  • "service.name and service.namespace are interchangeable." They are not. service.name is the service's identifier within its namespace; service.namespace is the team or domain that owns it. service.name=payments-api, service.namespace=razorpay-core is correct; collapsing both into service.name=razorpay-core-payments-api loses the structure that lets backends group services by team.
  • "All HTTP spans use the same attribute names." They do not, across versions. A span emitted by a Java agent at semconv 1.20 uses http.method; a span from a Python agent at semconv 1.27 uses http.request.method. Both are valid, both are conformant to their version, neither is queryable by a single key without backend-side translation.
  • "Schema URL is metadata I can ignore." It is not metadata; it is the only field in OTLP that tells a downstream consumer which version of the conventions a span follows. Ignoring it means treating all spans as if they belong to one unknown version, which is exactly how dashboards silently go blank during a fleet upgrade.
  • "If a key isn't in semantic conventions, I shouldn't use it." Sometimes — many backends will accept any key. But custom keys live outside the convention namespace (razorpay.merchant.kyc_status, not merchant.kyc_status) so they don't collide with future spec additions. Use a vendor-prefix or company-prefix for custom keys; never invent a top-level key.
  • "Resource attributes and span attributes can hold the same data, so it doesn't matter where I put it." It matters. Resource attributes are emitted once per process and apply to every signal, span attributes per event. Putting service.name on every span individually wastes 30+ bytes per span and breaks backend grouping that expects service.name on the resource. Putting http.request.method on the resource is meaningless because a service handles many methods.

Going deeper

How the YAML spec is structured and how to read it

The semantic-conventions repo organises attributes into model/<domain>/ directories — model/http/, model/db/, model/messaging/, etc. Each YAML file defines either an attribute group (a set of attributes that travel together, e.g. http.client defines all client-span HTTP attributes) or a signal definition (which attribute groups apply to which signal type). The build process emits both human-readable Markdown documentation (docs/http/http-spans.md) and code-generated language bindings (opentelemetry-semantic-conventions packages on PyPI, Maven Central, etc.). Reading the source YAML directly is faster than reading the docs once you know the structure: model/http/registry.yaml is the registry of every HTTP attribute, and model/http/http-spans.yaml says which of them are required-vs-recommended on server vs client spans. When debugging "is this attribute required?", grep the registry.

The same YAML drives semantic-convention coverage tests in instrumentation libraries. The Java javaagent's instrumentation/http-tests/ directory has a test suite that asserts every supported HTTP framework emits the required HTTP attributes for every supported semconv version. The Python contrib repo has a similar test layer. When an instrumentation package upgrades its semconv version, it is the test suite that catches the missing required attributes — and the same test suite is what a downstream team can run against their custom instrumentation package to verify conformance.

Schema URLs and attribute translation in practice

The OpenTelemetry Schema Files spec defines a YAML format for declaring transformations between semconv versions. A schema-1.27.0.yaml file might contain rename_attributes: { http.method: http.request.method } in the version-1.27.0 section. A backend that fetches this file can translate any 1.20.0 span into 1.27.0 keys at query time. Honeycomb's "schema-aware queries" feature does this; Tempo's metrics-generator processes schema URLs partially (resource attributes only, last we checked); Datadog ignores them and relies on its own internal mapping. The reason most backends do not implement schema files fully is that the translation rules grow combinatorially as new versions land — a backend supporting 30 versions has 870 pairs of translations to maintain — and most teams find it cheaper to force fleet-wide semconv pinning than to absorb the translation cost backend-side.

The team that ships a Collector-side translator wins both ways. A transform processor with OTTL rules like set(attributes["http.request.method"], attributes["http.method"]) where attributes["http.method"] != nil can up-version an old span before it reaches the backend, hiding the version skew from downstream queries. The OTel community ships a semconv-stability-translator processor that does this for the HTTP domain automatically; equivalent processors exist for messaging and db. A fleet that runs these processors in the Collector (ch.91) can absorb mid-migration agent versions transparently.

Custom attributes and the namespace discipline

When the spec does not cover something — a Razorpay-internal merchant.kyc_status, a Hotstar-internal playback.cdn.pop, a Zerodha-internal order.exchange.segment — you write a custom attribute. The spec's guidance is to namespace it under your company or product (razorpay.merchant.kyc_status, hotstar.playback.cdn_pop, zerodha.order.exchange_segment) so it does not collide with future spec additions. The OTel project explicitly reserves the top-level namespaces it currently uses (http., db., messaging., etc.) and the rest are fair game.

The deeper discipline is to treat custom attributes as a contract within your fleet. If razorpay.merchant.kyc_status exists, it should be defined in a single YAML file in your platform-engineering repo with the same fields semantic conventions use — type, allowed values, requirement level, description. The instrumentation libraries should import a constant from a generated internal_semconv Python module rather than hardcoding the string. The dashboards should query the same constant. The Collector should know which custom attributes to allow on metric labels (low cardinality) and which to strip (high cardinality). The hypothetical Hotstar platform team maintains exactly such an internal-semconv repo with ~120 custom attributes; the discipline pays back the first time a team mass-renames an internal attribute and discovers that two dashboards, one Tempo query, and a Slack alerting rule need to migrate together — the constants give them a grep target.

What the spec does not cover

Semantic conventions do not standardise: span names (only HTTP server spans have a recommended naming pattern, and even that is non-binding), metric names beyond a few standardised RED/USE patterns, log severity scales (the OTel Severity enum is standardised but most logging libraries ignore it), or error categorisation beyond error: bool on spans and the exception event shape. The pragmatic outcome is that span names, metric names, and error categorisation are fleet-internal conventions — your team must agree on them, your platform team must police them, but no upstream spec will save you from a Java team naming spans dispatchRequest while a Python team names them Flask.handle_request. Treat these gaps as platform-engineering responsibilities, not spec gaps to be solved by upgrading.

The other notable gap is payload data. Semantic conventions have no opinion on whether an HTTP request body, a Kafka message payload, or a SQL query parameter should appear on a span. The spec leaves this entirely to the operator, with a strong implied "probably do not put PII or large blobs on spans". Teams that put SQL parameters on spans are doing so against the spec's silent-but-firm guidance, and they typically discover the cost when a regulator (RBI, SEBI) audits their telemetry storage and finds card numbers in trace bodies. The conservative discipline: put the SQL template on db.statement (parameterised, low-cardinality, useful for grouping), never the rendered SQL with parameter values. This is the kind of guidance the spec hints at but does not enforce, and where platform teams must be the enforcers.

Reproduce this on your laptop

# Reproduce this on your laptop
python3 -m venv .venv && source .venv/bin/activate
pip install flask requests opentelemetry-sdk \
            opentelemetry-exporter-otlp-proto-grpc grpcio
python3 semconv_skew_demo.py
# Watch: two scopes emit semantically identical info under disjoint key sets;
# schema_url is the only OTLP field that distinguishes them.
# Then change the schema URLs to the same version and watch downstream queries
# match all spans uniformly.

Where this leads next

The next chapter /wiki/the-otlp-protocol covers the wire format that carries these attribute key-value pairs across processes — the protobuf shape of Resource, ScopeSpans, Span.attributes, and the schema_url field this article kept pointing at. Once you can read OTLP bytes in your head, semantic conventions become "what fills in the keys"; the protocol is "what fills in the value field, types, and framing".

The orthogonal direction is what semantic conventions do for metrics — there is a parallel spec for standardised metric names (http.server.request.duration, db.client.operation.duration) and the same dual-emission migration cycle applies to metrics, often six months behind the corresponding span conventions. /wiki/why-three-pillars-is-a-flawed-framing-profiles-events-slos frames the broader question of how attribute conventions cross signal boundaries; semantic conventions are the spec that attempts to make a span attribute and a metric label refer to the same concept.

Karan's incident at 04:30 IST ended at 05:12 with a Collector-level translator processor merged to main, dashboards still on the old keys, and a runbook entry that listed every semconv version pinned per service. Two weeks later the platform team shipped the dual-emission flag fleet-wide and started migrating dashboards in earnest. By the next agent upgrade — six months later — the fleet was on the new keys, the translator processor was decommissioned, and the upgrade was a no-event. The discipline that lets a fleet survive a semconv migration is exactly the discipline that distinguishes an observability culture from an observability deployment: treat the spec as a contract, version-pin the contracts, and never let an agent upgrade ship without an audit.

References