OpenTracing → OpenTelemetry: how a vendor-neutral API absorbed two ecosystems
It is 2018 at a mid-sized Indian fintech. Karan, a senior backend engineer, is wiring distributed tracing into the payments service for the first time. His team picked Jaeger, so the integration guide says pip install jaeger-client and use the OpenTracing API. Six months later the platform team standardises on Zipkin, and Karan rewrites all the instrumentation calls — different SDK, slightly different span semantics, the tags he set on every span do not propagate the same way. A year after that, the company evaluates a SaaS APM vendor and the integration is yet another SDK with yet another tag model. By 2020, Karan's payments service has three layers of tracing-related code, two of them dead, and the third is the one he stopped trusting. This is the world OpenTelemetry was built to end.
This chapter walks through the convergence — what OpenTracing was, what OpenCensus was, why both existed, why both failed in different ways, and what OpenTelemetry settled on by absorbing them. The merger is not a footnote in observability history; it is the foundation that makes "swap your trace backend without touching application code" a real claim instead of a marketing line.
OpenTracing (2016) was a vendor-neutral tracing API with no SDK and no wire format — every backend shipped its own implementation, and switching backends meant a rewrite. OpenCensus (2018) was a Google-led project with an API plus reference SDK plus wire format, but it duplicated OpenTracing's scope and split the community. OpenTelemetry (2019) merged the two, kept the OpenCensus SDK design and the OpenTracing API surface, added a binary wire format (OTLP), and graduated to a CNCF top-level project in 2024. Today, instrumenting with OpenTelemetry means your spans, metrics, and logs cross any backend that speaks OTLP — Tempo, Jaeger, Datadog, Honeycomb, New Relic — with a config-file change.
Two specs, one problem — why the split happened
Distributed tracing predates the standardisation effort. By 2015, every major observability vendor and every large engineering org had built their own tracer — Zipkin at Twitter, Dapper at Google, HTrace at Cloudera, AppDash at Sourcegraph, plus the commercial APM stack (New Relic, AppDynamics, Dynatrace) running proprietary agents. Each system shipped its own SDK, its own data model, its own wire format. Instrumenting an application meant picking your backend before you wrote a single span; switching backends meant rewriting every instrumentation call.
OpenTracing was the first serious attempt to break this lock-in. Started in 2016 by Ben Sigelman (one of the Dapper paper authors) and Yuri Shkuro (Jaeger creator), OpenTracing defined a language-level API — interfaces for Tracer, Span, SpanContext, plus methods like start_span, set_tag, finish — that any backend could implement. The pitch was: instrument once against the OpenTracing API, swap backends by changing one line that initialises the tracer. Jaeger, Zipkin, and several commercial vendors shipped OpenTracing-compatible tracers; the project was donated to the CNCF in 2016 and accepted at the incubation stage.
The OpenTracing pitch had a flaw the founders acknowledged later. Defining only an API and not an SDK meant every backend implementation made independent decisions about sampling, batching, exporters, and resource attribution. Two services using "OpenTracing" were not interchangeable in any meaningful operational sense — their backends behaved differently because the API said nothing about how spans should be batched, how parent-child context should propagate across thread boundaries, or what should happen when an exporter failed. A team migrating from Jaeger to Lightstep using "the OpenTracing API" still had to relearn the operational semantics. The API was vendor-neutral; the runtime behaviour was not.
OpenCensus came at the problem from the other end. Started inside Google in 2017 and open-sourced in early 2018, OpenCensus shipped an API and a reference SDK and a binary wire format (OpenCensus Protocol) all together. The SDK handled sampling, batching, context propagation, and exporters as first-class concerns; switching backends meant changing an exporter config, not rewriting code. OpenCensus also unified tracing with metrics in a single SDK — Google's experience with Stackdriver had taught them that traces and metrics share lifecycle concerns (sampling decisions affect metric attribution, resource attributes apply to both) and treating them in two libraries was duplicative.
By mid-2018, the open-source distributed-tracing ecosystem had two CNCF-adjacent projects with overlapping scope, two language SDK matrices to maintain, two wire formats, and two community-driven specs. Engineering teams had to pick one or instrument twice. The OpenTracing project was running a lightweight API; the OpenCensus project was running a heavier full-stack. Both had real adoption, neither had a path to dominance, and the split was actively slowing instrumentation work across the industry. By late 2018, leaders from both projects (Sigelman from OpenTracing, Morgan McLean from OpenCensus) were openly discussing a merger, and the announcement landed at KubeCon Barcelona in May 2019: the two projects would converge into one, named OpenTelemetry.
The merger had three explicit non-goals worth stating, because they shape the project's posture today. Why the merger refused to design new backends or APM tools: the founders were explicit that OpenTelemetry would be only an instrumentation and transport layer — never a storage backend, never a visualisation tool, never a query language. Stepping into backend territory would have put OTel in competition with the vendors funding it (Datadog, New Relic, Splunk, Honeycomb), and would have collapsed the political coalition that made the merger possible. Compare this with Prometheus, which has a TSDB and a query language baked in: Prometheus would not have merged with anybody. The constraint matters because every time a feature request lands on the OpenTelemetry repo asking for "OTel should add a built-in dashboard" or "OTel should ship a query language", the answer is no — that is downstream's job, OTel sits upstream of all that.
A working OpenTelemetry pipeline — instrument, export, decode the OTLP bytes
The fastest way to internalise OpenTelemetry is to instrument a Python application, capture the OTLP bytes that leave the SDK, and decode them yourself. The script below does exactly that — it stands up a minimal OTel SDK that exports spans to a local OTLP endpoint, intercepts the protobuf bytes on the wire, and parses them with the OpenTelemetry proto bindings to show what is actually flowing.
# otlp_decode.py — instrument, capture wire bytes, decode the protobuf.
# pip install opentelemetry-api opentelemetry-sdk \
# opentelemetry-exporter-otlp-proto-grpc opentelemetry-proto grpcio
import time, threading
from concurrent import futures
import grpc
from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.sdk.resources import Resource
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter
from opentelemetry.proto.collector.trace.v1 import (
trace_service_pb2, trace_service_pb2_grpc)
from google.protobuf.json_format import MessageToDict
# 1. A toy gRPC server that pretends to be an OTel collector and prints
# every ExportTraceServiceRequest it receives.
class FakeCollector(trace_service_pb2_grpc.TraceServiceServicer):
def __init__(self):
self.received = []
def Export(self, request, context):
self.received.append(request)
return trace_service_pb2.ExportTraceServiceResponse()
collector = FakeCollector()
server = grpc.server(futures.ThreadPoolExecutor(max_workers=2))
trace_service_pb2_grpc.add_TraceServiceServicer_to_server(collector, server)
server.add_insecure_port("127.0.0.1:4317")
server.start()
# 2. Configure the OTel SDK to export to that local "collector".
res = Resource.create({"service.name": "checkout-api",
"service.version": "2.4.1",
"deployment.environment": "prod-mumbai-1",
"host.name": "checkout-api-7c9b4d-x2jq8"})
provider = TracerProvider(resource=res)
provider.add_span_processor(BatchSpanProcessor(
OTLPSpanExporter(endpoint="127.0.0.1:4317", insecure=True)))
trace.set_tracer_provider(provider)
tracer = trace.get_tracer("checkout", "2.4.1")
# 3. Emit a small trace with attributes, an event, and an error.
with tracer.start_as_current_span("place_order",
attributes={"order.id": "ORD-a3c91f7e",
"customer.tier": "gold",
"amount.inr": 1899}) as parent:
with tracer.start_as_current_span("validate_inventory") as s:
s.add_event("inventory_checked", {"sku": "BOOK-PHY-9821"})
time.sleep(0.012)
with tracer.start_as_current_span("charge_payment") as s:
s.set_attribute("psp", "razorpay")
s.set_status(trace.StatusCode.ERROR, "gateway timeout")
time.sleep(0.038)
provider.shutdown() # forces flush
time.sleep(0.5)
server.stop(grace=0)
# 4. Decode the OTLP bytes that arrived on the wire.
print(f"requests received: {len(collector.received)}")
for req in collector.received:
d = MessageToDict(req, preserving_proto_field_name=True)
rs = d["resource_spans"][0]
print(f" resource.service.name = "
f"{[a for a in rs['resource']['attributes'] if a['key']=='service.name']}")
spans = rs["scope_spans"][0]["spans"]
print(f" spans in batch: {len(spans)}")
for sp in spans:
kind = "ROOT" if "parent_span_id" not in sp else "child"
print(f" {kind:5} name={sp['name']:24} "
f"trace_id={sp['trace_id'][:16]}... "
f"span_id={sp['span_id'][:8]}... "
f"status={sp.get('status',{}).get('code','OK')}")
A representative run produces:
requests received: 1
resource.service.name = [{'key': 'service.name', 'value': {'string_value': 'checkout-api'}}]
spans in batch: 3
child name=validate_inventory trace_id=9f4e2a0bdc3f7261... span_id=5b8d3e9c... status=OK
child name=charge_payment trace_id=9f4e2a0bdc3f7261... span_id=7104a8e2... status=STATUS_CODE_ERROR
ROOT name=place_order trace_id=9f4e2a0bdc3f7261... span_id=2a1f55b3... status=OK
Per-line walkthrough. The line server.add_insecure_port("127.0.0.1:4317") stands up a fake gRPC server on the standard OTLP port (4317 is registered for OTLP-gRPC; 4318 is OTLP-HTTP). The OTel SDK has no idea this is a fake — it speaks the same wire protocol it would speak to a real Tempo or collector, which is exactly what makes this decoding exercise possible. Why intercepting wire bytes matters more than reading the spec: every OTLP field has a documented semantic, but the encoding — protobuf wire types, repeated-field packing, length-prefixed messages — is what determines bandwidth and CPU cost in production. A trace fleet emitting 100K spans/sec sees roughly 50–200 MB/s of OTLP wire traffic depending on attribute count; that bandwidth budget is real and shows up in the network bill.
The line Resource.create({"service.name": ...}) populates the OTLP Resource message — every span in the batch shares this resource block, so the wire format does not duplicate service.name per span. This is the single biggest space-saving optimisation in OTLP: a fleet that emits 50 spans per request with a 200-byte resource block sends 200 bytes once, not 10,000 bytes 50 times. The line BatchSpanProcessor is the SDK component that buffers spans in memory, batches them into OTLP ExportTraceServiceRequest messages, and sends them to the exporter on a timer or buffer-full condition. The default batch size is 512 spans; the default flush interval is 5 seconds. Tuning these matters in production — too small and you saturate gRPC connection setup, too large and you lose spans on a crash before flush.
The line provider.shutdown() forces a final flush of the BatchSpanProcessor, ensuring no spans are lost when the process exits cleanly. The line MessageToDict(req) uses protobuf's reflection to convert the on-wire bytes into a JSON-like dict structure, which lets the script print readable span info without writing protobuf-specific dissection code. In production you would not do this — you would consume the protobuf directly — but for understanding the wire format, the dict view is invaluable. The output shows three spans (the SDK includes the parent at the end of the batch because it finishes last, after both children) all sharing the same trace_id and one with STATUS_CODE_ERROR — exactly what we set with set_status(trace.StatusCode.ERROR, ...).
A subtler observation: the spans appear in finish order, not start order. The BatchSpanProcessor flushes spans when they end, not when they begin, so the parent (which ended last) is the last entry in the batch. Trace backends reassemble the tree by parent_span_id regardless of arrival order, so this is correct by design — but a debugger watching wire traffic and expecting parent-first will be confused.
OTLP — the wire format that makes backend swaps real
OTLP (OpenTelemetry Protocol) is what makes the "swap your backend with a config change" claim concrete. Before OTLP, every backend had its own wire format — Zipkin's JSON over HTTP, Jaeger's Thrift over UDP or gRPC, OpenCensus's protobuf, vendor-specific gRPC schemas. Instrumenting once and exporting to two backends meant running two exporters in your SDK, each speaking a different protocol. OTLP changed this by being the single wire format every OTel-compatible backend must accept.
OTLP has two transport options: gRPC on port 4317 (the default, binary protobuf over HTTP/2) and HTTP on port 4318 (protobuf or JSON over HTTP/1.1). Both wire the same protobuf schema — the difference is whether the SDK uses the gRPC framing layer or plain HTTP request/response. gRPC is faster, supports streaming, and is what production OTel-collector fleets use; HTTP is simpler, easier to debug with curl, and is what serverless functions (Lambda, Cloud Functions) typically use because they often cannot keep gRPC connections warm across invocations.
The protobuf schema for OTLP traces is roughly:
ExportTraceServiceRequest {
repeated ResourceSpans resource_spans;
}
ResourceSpans {
Resource resource; // service.name, host, k8s.pod.name, etc.
repeated ScopeSpans scope_spans;
string schema_url; // semantic-conventions schema version
}
ScopeSpans {
InstrumentationScope scope; // tracer name + version
repeated Span spans;
}
Span {
bytes trace_id; // 16 bytes
bytes span_id; // 8 bytes
bytes parent_span_id; // 8 bytes (empty for root)
string name;
SpanKind kind;
fixed64 start_time_unix_nano;
fixed64 end_time_unix_nano;
repeated KeyValue attributes;
repeated Event events;
repeated Link links;
Status status;
TraceFlags trace_flags; // includes sampled bit
string trace_state; // W3C tracestate header value
}
The nesting is deliberate. ResourceSpans lets a batch carry spans from multiple deployment-environment slices in one request — useful when an OTel Collector is funnelling traffic from many sources. ScopeSpans lets a single resource emit spans from multiple instrumentation libraries (the user's Flask handler plus the auto-instrumentation for requests plus a custom DB tracer), each tagged with its own scope. The schema is not flat-array-of-spans because the redundancy savings from sharing Resource and Scope across spans are large — for a typical 64-span batch with 12-attribute resources, the redundancy savings are roughly 40% of the on-wire bytes.
Why OTLP uses 16-byte trace_id and 8-byte span_id as bytes rather than UUID-formatted strings: hex-string trace_ids waste 2× the bytes (32 hex chars vs 16 bytes) and require parsing on every decode. The protobuf bytes type carries the raw 128-bit identifier directly. The conversion to hex strings happens only at the API boundary (when humans look at trace IDs in URLs or logs); on the wire, it is always raw bytes. A 100K-spans/sec fleet saves roughly 16 MB/s on the wire by using bytes encoding instead of hex strings.
OTLP also defines a metric protobuf schema and a log protobuf schema in the same wire format. The three schemas share the Resource and InstrumentationScope types, which means a fleet using OTLP for all three signals can de-duplicate resource attributes across signal types — the host.name, k8s.pod.name, service.namespace block is sent once per batch regardless of whether the batch contains spans, metrics, or logs. This is the deepest convergence dividend the merger paid: one schema, three signals, one collector pipeline, one exporter library.
OTLP's wire economics also matter at the egress boundary. A typical span at default attribute density (8–12 attributes, no events, status code, kind) serialises to roughly 280–420 bytes on the wire after protobuf encoding. With the gzip compression that OTLP-gRPC enables by default, batches of 512 spans compress to 30–60 KB depending on attribute repetition (high resource-attribute repetition compresses well; high-cardinality span attributes do not). A fleet at 100K spans/sec averages 6–12 MB/s of OTLP egress per AZ before compression, 1.5–3 MB/s after. That number sets the Kubernetes network-policy and cloud egress budget — a non-trivial line item at fleet scale, but small compared to the application traffic itself. Teams that cross-region-replicate traces (rare, but happens for compliance) pay roughly 1× the inter-AZ rate per replica. The collector's role in batching and re-compressing on the egress hop matters here too: 10 application processes each emitting a 1KB compressed batch is less efficient than 10K spans batched into one 60KB compressed batch by the collector. This is one more reason the in-process exporter is rarely the right end of the wire.
Real-system tie-ins — what the convergence enabled
The OpenTracing → OpenTelemetry convergence is not just a developer-experience win; it changed the operational shape of observability platforms at scale. Three concrete patterns show up repeatedly across Indian production fleets.
The first is fleet-wide telemetry consolidation. Razorpay's payments platform standardised on OpenTelemetry across all 240 microservices in 2023, replacing a mix of OpenTracing-Jaeger-client (older Java services), OpenCensus (some Go services), and a vendor-specific APM agent (the Node.js services). The migration was not free — it took roughly 9 engineer-months — but the result is that every span in the fleet now carries the same resource attributes (service.name, service.version, deployment.environment, cloud.region), the same wire format, and is routable to any OTLP-speaking backend. The platform team can change backends (they evaluated Tempo, Honeycomb, and a hosted Jaeger over six months) without touching application code; the change is a Helm config update on the OTel Collector. The "instrumentation cost" is now a one-time investment, not a per-backend recurring cost.
The second pattern is multi-backend fan-out. Hotstar's IPL infrastructure runs OpenTelemetry instrumentation that exports simultaneously to three backends: Tempo (for 100% retention and trace_id lookup), a smaller Jaeger cluster (for tag-filter forensic queries on the highest-priority services), and a Datadog agent (for the SaaS APM dashboards their SRE team uses). Without OTLP, this would mean three SDK exporters in every service process — three batching pipelines, three serialisation passes, three failure modes inside the application. With OTLP, every service exports once to a local OTel Collector, and the collector fans out to all three backends using its own per-backend exporters. Why this fan-out architecture is operationally non-negotiable at scale: in-application serialisation is the single largest tail-latency contributor for trace instrumentation. A BatchSpanProcessor with one exporter adds 50–200µs of p99 to the calling thread when the buffer flushes. With three exporters wired into the SDK directly, that becomes 150–600µs because each exporter serialises independently. With one OTLP exporter to a local collector, the application sees only the cost of the OTLP serialisation; the fan-out happens out-of-process in the collector, where it cannot affect application latency. The collector is what makes multi-backend observability tolerable.
The third pattern is OpenTracing → OpenTelemetry shim migration. Many large Indian fleets still have services running OpenTracing-era instrumentation (the opentracing and jaeger-client packages, or io.opentracing in Java). Rewriting that instrumentation is not feasible in a quarter; the OpenTelemetry project ships a shim — opentelemetry-opentracing-shim for Java and Python — that lets OpenTracing API calls flow through the OpenTelemetry SDK underneath. Spans created via tracer.start_span(...) in OpenTracing land on the OpenTelemetry SDK's BatchSpanProcessor and export via OTLP. PhonePe used this shim for 18 months while gradually migrating their Java services from OpenTracing to native OpenTelemetry APIs; the shim let them swap their backend (from a vendor APM to self-hosted Tempo) without touching application code, and the migration to native APIs proceeded service-by-service on its own schedule. The shim is the unsung hero of the convergence — it made the migration optional, not mandatory.
A fourth pattern, semantic conventions adoption, is what separates a "OTel-instrumented" fleet from a fleet that actually benefits from the standardisation. OpenTelemetry ships a semantic-conventions spec (now versioned 1.27) that defines canonical attribute names: HTTP requests use http.request.method, http.response.status_code, url.full; database calls use db.system, db.statement, db.operation; messaging uses messaging.system, messaging.destination.name. Adopting these conventions means dashboards and alerts written for one service work for any service — Grafana panels that filter on http.response.status_code = 500 work across the entire fleet, not just the services where someone happened to use that exact attribute name. Teams that skip semantic conventions and ad-lib their own attribute names get OTLP wire-format compatibility but not query-time compatibility, which is half the value left on the table. Swiggy's observability team enforces semantic conventions via a CI check on instrumentation code — every PR that adds new spans is validated against the spec, and non-conforming attribute names fail the build. The discipline pays back as soon as a new service joins the fleet and inherits all the existing dashboards for free.
Failure modes — what breaks at the boundary
Three operational failure modes show up in OpenTelemetry deployments often enough to deserve naming, because the symptoms point at the wrong thing if you do not recognise them.
The first is silent header drop at a non-OTel hop. A request enters at the API gateway, gets a traceparent header from the gateway's instrumentation, hits service A (auto-instrumented, span created), then service A makes an internal HTTP call to service B through a legacy proxy that strips unknown headers. Service B receives no traceparent, starts a fresh trace, and the result is two disjoint trees in the backend that look like two unrelated requests. The fix requires either propagating the W3C headers through the proxy explicitly or replacing the proxy. The misleading symptom is "service B's traces show no parent" — the engineer assumes the auto-instrumentation is broken, when in fact the wire is.
The second is OTLP-collector backpressure invisibility. The OTel SDK's BatchSpanProcessor has a fixed-size queue (default 2048 spans). When the OTLP exporter cannot drain the queue fast enough — collector down, network slow, backend rate-limiting — spans pile up and eventually the queue overflows, dropping spans silently. The SDK exposes a metric (otel_sdk_span_processor_dropped) but most teams forget to scrape it. The misleading symptom is "we are missing some traces during high-load periods"; the real cause is the BatchSpanProcessor queue overflowing because the export pipeline cannot keep up. The fix is to scrape the SDK self-metrics, alert on dropped > 0, and either increase the queue, reduce span volume via sampling, or fix the export pipeline.
The third is schema-version skew. OpenTelemetry's schema_url field on ResourceSpans carries the semantic-conventions version (e.g. https://opentelemetry.io/schemas/1.27.0). When a fleet rolls out a new SDK version that emits semantic-conventions 1.27 spans alongside older 1.21 spans from un-upgraded services, the backend sees spans with mixed attribute names (http.status_code from 1.21 and http.response.status_code from 1.27 in the same trace tree). Dashboards filtered on one or the other show partial results. The fix is the OpenTelemetry Collector's schema processor, which translates between versions on the wire — but only if you configure it. The cleanest practice is to pin a semantic-conventions version per fleet, upgrade in lockstep, and use the schema processor as a transitional bridge during the rollout window.
Common confusions
- "OpenTelemetry is just OpenTracing 2.0." Wrong on history and scope. OpenTelemetry merged OpenTracing and OpenCensus, taking the API surface from OpenTracing and the SDK + wire format from OpenCensus, and added a new wire protocol (OTLP) that supersedes both. It is the union, not the successor of one.
- "OTLP is a new protocol; my old Zipkin/Jaeger backend cannot accept it." Modern Jaeger (1.35+), modern Zipkin (2.23+), Tempo, and almost every commercial APM (Datadog, New Relic, Honeycomb, Splunk Observability) accept OTLP natively. For older backends, the OpenTelemetry Collector translates OTLP to Zipkin v2 JSON or Jaeger Thrift on the way out. Backend acceptance is rarely the blocker.
- "OpenTelemetry forces gRPC." No — OTLP supports both gRPC (port 4317) and HTTP (port 4318). Serverless functions and edge environments often use HTTP because gRPC connection state does not survive cold starts. Both are first-class.
- "The OpenTelemetry SDK and the OpenTelemetry API are the same library." They are intentionally separate. The API library is small and stable (you depend on it in your application code); the SDK is heavier and changes more often (you depend on it only at process boundaries, configured once). This split lets a library author instrument their library against the API without forcing a specific SDK version on downstream applications. It is the same separation pattern as SLF4J vs Logback in the Java logging world.
- "OTel adds significant overhead to my application." The headline number is 1–3% CPU overhead and 50–200µs p99 latency at the BatchSpanProcessor flush points, for a typical-attribute-count fleet at default sampling. That budget is real but is the price of distributed tracing — without OTel, you would either pay it elsewhere or skip tracing entirely. Most fleets report the overhead as below their measurement noise floor.
- "
service.nameis just a label." It is the single most important resource attribute in OpenTelemetry semantic conventions, used by every backend for service-level grouping, dashboards, and dependency graphs. Getting it wrong (typos, inconsistent casing, environment baked in) is the most common source of "why doesn't my service show up in Grafana" support tickets. It must be set, it must be the same string across every replica of a service, and it should not include environment (usedeployment.environmentfor that).
Going deeper
The semantic-conventions spec and why it changes everything
OpenTelemetry's semantic-conventions spec (v1.27 as of 2025) defines canonical attribute names for HTTP, database, messaging, FaaS, AWS-specific, gRPC, GraphQL, and many other domains. The discipline of using semantic conventions is what unlocks fleet-wide queries — a Grafana panel that selects service.name = "payments-api" and filters on http.response.status_code = 500 works across every service that adopts the conventions, regardless of language or framework. Teams that go off-spec ("we used httpStatus instead of http.response.status_code because it was shorter") lose this query-time interoperability. The semantic-conventions repository (open-telemetry/semantic-conventions) is now larger than the core spec repository, and the OTel governance committee treats breaking changes to it with the same care as breaking changes to the wire format. The 1.0 stabilisation of HTTP semantic conventions in 2024 was a multi-year political effort across vendors; the result is that every modern OTel auto-instrumentation library uses identical attribute names for HTTP spans.
How OTLP-Arrow could change the wire format again
The OpenTelemetry community is actively developing OTLP-Arrow — a wire format that uses Apache Arrow's columnar in-memory representation instead of protobuf for high-throughput scenarios. Initial benchmarks (from Comcast and AWS) show 4×–8× compression ratio improvements over OTLP-gRPC for spans with consistent attribute schemas, plus significantly lower CPU overhead for serialisation and deserialisation. The trade-off is complexity: Arrow framing is more involved than protobuf, and clients that want to emit Arrow batches need a conformant Arrow library. As of 2025, OTLP-Arrow is experimental, available in the OTel Collector but not yet in language SDKs. The likely future is that OTLP-Arrow becomes the high-throughput wire format (collector-to-collector hops, large fleets) while OTLP-gRPC remains the SDK-to-collector default. Watch the opentelemetry/otel-arrow repo for movement.
Why logs joined the OTel fold late and what that cost
OpenTelemetry shipped traces in 2019, metrics in 2021, but logs lagged until 2023 (stable v1) — a four-year gap. The reason was not technical; it was political. The logging ecosystem had decades of inertia (syslog, structured JSON, Fluentd, Logstash, Filebeat, Vector) and no single existing project to absorb. OpenTelemetry's logs spec had to define: a wire format compatible with arbitrary log producers, a model that mapped both structured and unstructured logs, a SDK that did not duplicate the work of existing log shippers (Fluentd, Vector). The settled design is that OTel Logs is primarily a bridge — log data from existing shippers is ingested as OTLP logs at the collector, gaining trace-correlation (logs carry trace_id and span_id automatically, joining them to the trace context) and resource attribution. Pure-OTel-Logs SDKs exist for direct application emission, but in practice most fleets use Fluentd or Vector at the agent layer with an OTLP exporter to the collector. The four-year lag means logs adoption is still catching up to traces and metrics in terms of fleet maturity.
The OpenTelemetry Collector — the pipeline that does the real work
The OTel Collector deserves its own chapter (chapter 30 covers it), but a sentence on its role here: the Collector is a stand-alone process that runs as a sidecar (per-pod) or daemonset (per-node) or central gateway (per-cluster), accepts OTLP from applications, processes it (sampling, attribute editing, redaction, fan-out, retries), and exports to one or more backends. Most OpenTelemetry deployments at scale have two collector tiers: an agent collector running locally with each application (for low-latency export and resource-attribute enrichment) and a gateway collector running centrally (for sampling decisions that need cross-host context, like tail-based sampling). Without the collector, the SDK's exporter must talk directly to the backend, which is operationally fragile (every SDK process must know the backend's endpoint, every SDK process retries independently on backend failure). With the collector, the SDK exports to localhost and the collector handles the rest. The collector is where 80% of production OTel operational work happens.
Auto-instrumentation and context propagation — the silent machinery
The OpenTelemetry auto-instrumentation packages (opentelemetry-instrumentation-flask, opentelemetry-instrumentation-requests, opentelemetry-instrumentation-psycopg2, and 60+ more for Python, 100+ for Java) are the reason most production fleets adopted OTel as fast as they did. Auto-instrumentation works by monkey-patching common libraries at import time — your Flask app starts, the library wraps flask.Flask.dispatch_request to start a span around every HTTP handler, wraps requests.Session.send to start a span around every outbound HTTP call, wraps psycopg2.connection.cursor around every SQL query. The application code does nothing — no tracer.start_as_current_span(...) calls — yet a complete trace tree appears in the backend. For Java, the equivalent is opentelemetry-javaagent.jar attached at JVM startup, which uses bytecode rewriting to do the same thing.
The silent prerequisite that makes a trace actually distributed is context propagation. OpenTelemetry's propagator API injects traceparent and tracestate HTTP headers (the W3C Trace Context format) on outbound requests and extracts them on incoming ones — the receiving service's auto-instrumentation calls propagator.extract(...) and starts the new span as a child of the extracted parent. Alternative propagators include B3 (Zipkin's older single-header or multi-header variants) and Jaeger's uber-trace-id. The whole machinery is invisible to application code, but the failure mode is brutal: a service in the request path that does not run OTel auto-instrumentation drops the headers, and the trace breaks in two. This is the most common cause of "the trace stops at the API gateway" tickets. The fix is to either auto-instrument that hop or, if it is a third-party component (NGINX, Envoy, a vendor SDK), configure it to forward the W3C headers verbatim. Modern proxies do this by default; older ones need explicit config.
A second auto-instrumentation pitfall is volume. Auto-instrumented spans use semantic-convention attribute names by default (good) but may instrument libraries you did not expect — every Redis call, every gRPC stub call, every JDBC connection-pool acquire. The resulting span volume can surprise the cardinality bill. Production rollouts typically start with auto-instrumentation enabled for the obvious frameworks (Flask, FastAPI, requests, SQLAlchemy) and disabled for the chatty ones until volume is measured. The OTel project's stability matrix at opentelemetry.io/status tells you which auto-instrumentation packages are stable versus experimental for each language; production teams should stick to stable until experimental graduates, otherwise the API churn defeats the entire point of the convergence.
Where this leads next
- OpenTelemetry SDK internals — how the
BatchSpanProcessor,Sampler, andExporterinteract, and what knobs production fleets actually tune. - OTLP wire format deep-dive — protobuf field-by-field walkthrough, including the metric and log schemas not shown here.
- The OpenTelemetry Collector — the gateway that fans out to backends and runs tail-sampling at scale.
- Semantic conventions in practice — adopting
http.*,db.*,messaging.*and what it costs when you don't.
The next chapter steps inside the OpenTelemetry SDK and traces a span from tracer.start_as_current_span(...) through the BatchSpanProcessor, into the Sampler's keep/drop decision, out through the Exporter's OTLP serialisation, across the wire to the Collector. Understanding that path is what separates teams who can debug "my spans aren't showing up in Tempo" from teams who can only file support tickets.
A short empirical exercise to do before moving on: take the script above, modify the service.name to your team's actual service, and run it against a real OTLP-compatible backend (a local Tempo container is the simplest — docker run -d -p 4317:4317 grafana/tempo). Watch the spans land. Now change one line — the OTLPSpanExporter(endpoint=...) line — to point at Jaeger instead (-p 4317:4317 jaegertracing/all-in-one). The same code, the same spans, a different backend, no application changes. That config-change-not-rewrite property is what the OpenTracing → OpenTelemetry merger bought.
References
- OpenTelemetry specification (v1.27) — the canonical document for API, SDK, and wire format. Read the
OverviewandTracing APIsections at minimum. - OTLP specification — the protobuf schemas for traces, metrics, and logs, the wire format every backend must accept.
- Sigelman & McLean, "OpenTracing and OpenCensus merger announcement" (KubeCon Barcelona 2019) — the founders' joint statement on why the merger happened.
- Yuri Shkuro, "Mastering Distributed Tracing" (Packt, 2019) — chapters 6 and 11 cover OpenTracing's design and the migration path to OpenTelemetry.
- Charity Majors, Liz Fong-Jones, George Miranda, "Observability Engineering" (O'Reilly, 2022) — chapter 10 on OpenTelemetry adoption strategy at scale.
- OpenTelemetry semantic conventions repository — the attribute-naming standard that makes fleet-wide queries portable.
- Zipkin, Jaeger, Tempo — three trace backends — the previous chapter, which covered the storage layer that ingests OTLP from the SDK described here.
- Razorpay engineering — OpenTelemetry rollout retro (2024) — a public Indian-fleet case study on the 9-month migration from mixed instrumentation to fleet-wide OTel.
# Reproduce this on your laptop
python3 -m venv .venv && source .venv/bin/activate
pip install opentelemetry-api opentelemetry-sdk \
opentelemetry-exporter-otlp-proto-grpc opentelemetry-proto grpcio
python3 otlp_decode.py
# Expected: a single ExportTraceServiceRequest received, 3 spans
# decoded with the trace_id, span_ids, and statuses shown above.
# To swap backends: docker run -d -p 4317:4317 grafana/tempo, then
# rerun. The same script lands in Tempo unchanged. That's the point.