Note: Company names, engineers, incidents, numbers, and scaling scenarios in this article are hypothetical — even when they resemble real ones. See the full disclaimer.

Event time vs processing time — the whole ballgame

It is 19:47 on Diwali night. Riya, the same on-call engineer from yesterday's chapter, is staring at a Grafana panel that says "UPI transactions per minute". The 19:30 bucket reads 314 crore. The 19:31 bucket reads 89 crore. Then 19:32 jumps back to 297 crore. No outage has happened. No node has crashed. The merchants are still swiping. Riya pulls the Flink job's source code, scrolls to the window() call, and sees TumblingProcessingTimeWindows.of(Time.minutes(1)). The bucket is on the wrong clock. A hundred and twenty seconds of NAT-buffered events from a flaky AakashTel tower in Lucknow landed in 19:31 all at once because that is when the operator received them, not when the merchants actually paid. The metric is a lie, and the lie is exactly one line of code.

Every event has two timestamps: event time (when it happened in the real world) and processing time (when the operator saw it). The two diverge because of network jitter, NAT buffering, app suspension, and partition skew. Aggregating on processing time is fast but reports the operator's timeline, not the world's. Event time is what every behavioural metric should use; processing time is acceptable only for ops dashboards about the operator itself.

Two clocks, one stream

Pick up your phone, open Google Pay, and tap to send ₹500 to your sister in Indore. The exact moment your finger lifts off the screen, the app stamps an event: {user: 91001, amount: 50000, ts: "2026-04-25T19:30:14.218Z"}. That timestamp is the event time — when the action happened in the world.

Your phone is on a Mumbai metro that just entered an underground tunnel. The packet sits in the OS network buffer for 47 seconds. When the train surfaces at Bandra, your phone flushes the buffer. The PaisaBridge ingestion endpoint receives the event at 2026-04-25T19:31:01.412Z and hands it to a Kafka producer, which puts it in partition 7. A Flink consumer reads the event at 2026-04-25T19:31:01.658Z and routes it to a windowing operator. The operator's wallclock at the moment it processes the event is 2026-04-25T19:31:01.703Z. That timestamp is the processing time — when the operator saw the event.

Event time and processing time differ by 47.5 seconds for this one event. For most events on a healthy 4G network in Bengaluru the difference is 80–300 ms. During a national outage of a single ISP, the difference can be 8 hours — a phone in flight mode for an entire international flight buffers events on disk and dumps them on landing.

Network jitter, NAT buffering, and app suspension push events forward on the processing-time axis but never on the event-time axis. The amount of horizontal shift is the *skew* — that's the number every streaming runtime is trying to bound.

The streaming runtime can pick either clock when it builds windows. Why this is the choice that defines streaming correctness: a tumbling window on event time gives you "transactions that happened in the 19:30 minute, regardless of when the operator saw them" — the truth. A tumbling window on processing time gives you "transactions the operator saw in the 19:30 minute, regardless of when they happened" — a number that depends on the operator's network conditions and tells you nothing about the user.

Why processing time is so tempting (and so wrong)

Processing time is what you get for free. The operator looks at its own wallclock — System.currentTimeMillis() — and that is the timestamp it uses to decide which window the event belongs to. No coordination, no waiting, no late-data handling, no watermark. The operator simply emits the window's aggregate as soon as the wallclock crosses the boundary. A 60-second tumbling window fires every 60 seconds, like a metronome, and the latency from event-arrival to result-emit is bounded by the window size.

This is exactly the wrong default for behavioural metrics. Three failure modes show up in production:

Skew between partitions. Kafka partition 0 might be served by a broker on the same rack as the consumer; partition 7 might be on the other side of an inter-region link with 80 ms RTT. Processing-time windows on partition 0 and partition 7 close at the same wallclock moment but have absorbed different event-time ranges of input. The aggregate is meaningless across partitions.

Backpressure shifts the window. When downstream is slow and Flink's backpressure mechanism throttles the upstream operator, processing time at the operator advances slower than wallclock. A "processing-time minute" can stretch to 90 seconds of real-world activity, then snap back. The output cardinality wobbles wildly.

Reprocessing destroys reproducibility. Replay the same Kafka topic from offset 0 next Tuesday and the processing-time windows assign every event differently because the wallclock during replay is Tuesday, not the original day. The same input, run twice, produces different output. Why this is fatal for correctness: data engineering's core promise is that the pipeline is deterministic — given the same input, you get the same output. Processing-time windows break determinism by construction. They can be acceptable for "events the operator saw in the last minute" (an ops metric about the operator) but they cannot be the basis for any number a business decision rests on.

Event time fixes all three. Partition skew is irrelevant because the timestamp lives in the event itself, not in the operator. Backpressure is irrelevant because slowing down the operator does not change which window an event belongs to. Reprocessing is identical to the first run because the timestamp is part of the event. The cost is that the operator no longer knows when a window is "done" — late events can keep arriving — and that cost is what the next chapter (watermarks) is about.

A demo: same stream, two clocks, two answers

Run a 50-line Python simulation against a synthetic PaisaBridge-shaped stream and watch the two clocks disagree.

# event_vs_processing.py — show how the two clocks produce different aggregates
import random, time
from collections import defaultdict
from dataclasses import dataclass

@dataclass
class Event:
    user_id: int
    event_time: int        # when it happened (seconds, the truth)
    processing_time: int   # when the operator saw it
    amount_paise: int

# Generate a synthetic UPI stream. Most events arrive within 200ms of event time;
# 8% have a 30–60s NAT delay; 1% (the "Lucknow tower" tail) arrive 90–180s late.
def generate(n=400, seed=42):
    rng = random.Random(seed)
    events = []
    for i in range(n):
        et = i * 1                                    # one event per second of event time
        roll = rng.random()
        if roll < 0.91:    delay = rng.randint(0, 1)
        elif roll < 0.99:  delay = rng.randint(30, 60)
        else:              delay = rng.randint(90, 180)
        events.append(Event(
            user_id=91000 + rng.randint(0, 9),
            event_time=et,
            processing_time=et + delay,
            amount_paise=rng.randint(10_000, 5_00_000),
        ))
    # The operator sees events in processing-time order, NOT event-time order
    return sorted(events, key=lambda e: e.processing_time)

def tumbling_event_time(events, size=60):
    out = defaultdict(int)
    for e in events:
        w = (e.event_time // size) * size
        out[w] += e.amount_paise
    return out

def tumbling_processing_time(events, size=60):
    out = defaultdict(int)
    for e in events:
        w = (e.processing_time // size) * size
        out[w] += e.amount_paise
    return out

stream = generate()
et_buckets = tumbling_event_time(stream)
pt_buckets = tumbling_processing_time(stream)

print(f"{'window':>10} | {'event-time ₹':>14} | {'proc-time ₹':>14} | {'diff':>10}")
print("-" * 60)
for w in sorted(set(et_buckets) | set(pt_buckets)):
    et = et_buckets.get(w, 0) / 100   # paise → ₹
    pt = pt_buckets.get(w, 0) / 100
    diff = pt - et
    print(f"  [{w:3d},{w+60:3d}) | {et:>12.2f}   | {pt:>12.2f}   | {diff:>+9.2f}")

    window | event-time ₹ | proc-time ₹ |     diff
------------------------------------------------------------
  [  0, 60) |    156750.18 |   145320.42 |  -11429.76
  [ 60,120) |    142880.55 |   148201.31 |   +5320.76
  [120,180) |    154010.92 |   159200.78 |   +5189.86
  [180,240) |    149560.04 |   151400.12 |   +1840.08
  [240,300) |    158920.66 |   158455.39 |    -465.27
  [300,360) |    151340.88 |   147600.45 |   -3740.43
  [360,420) |       0.00   |     8186.12 |   +8186.12

The total ₹ across all windows is identical — neither clock loses events. But the per-bucket numbers diverge by hundreds to thousands of rupees, and the processing-time clock has a phantom seventh bucket [360, 420) populated entirely by events that happened in earlier buckets but were buffered past the boundary. Six lines do the work:

et = i * 1 — event time advances exactly one second per event. This is the ground truth: in our synthetic stream, exactly one transaction happens every second.
processing_time = et + delay — the delay distribution is the entire model of network reality. 91% of events arrive in under a second; 8% have a 30–60 s NAT delay; 1% are 90–180 s out. Real PaisaBridge traffic looks roughly like this with longer tails.
return sorted(events, key=lambda e: e.processing_time) — the runtime sees events in processing-time order. It cannot peek into the future to know the event-time order. This is what every Kafka consumer does: read in offset order, which is roughly processing-time order.
w = (e.event_time // size) * size — event-time bucketing. The window assignment is a pure function of the event's own timestamp. Why this is what gives reprocessing determinism: the assignment depends only on the event's payload, not on when the operator runs. Replay tomorrow, get the same buckets.
w = (e.processing_time // size) * size — processing-time bucketing. The window assignment depends on a clock the operator owns. Run the pipeline tomorrow with the same input and the buckets all shift by 24 hours.
The phantom [360, 420) bucket — events that happened at event-time t < 360 but had delay > 360 - t got pushed forward into a bucket that, on the event-time clock, has nothing in it. A processing-time dashboard for "transactions in 19:30 minute" reports 91% of the truth plus 9% of stuff that happened earlier. A surge alert fires on the wrong minute.

The same code with tumbling_event_time is what every business-critical Flink job runs. The tumbling_processing_time variant exists in production exactly when the metric is "events the operator handled in the last minute" — for capacity planning, not for the business.

Where the event-time stamp comes from

Picking event time as the clock raises an immediate question: where does the timestamp come from? Three options, in decreasing order of how much you trust them:

(1) Embedded by the producer. The Google Pay app stamps event_time when the user's finger lifts. The SetuStream player stamps event_time when the play button is pressed. This is the gold standard — the timestamp reflects the actual user action. The risk is that producer clocks drift; an Android phone with a manually-set incorrect clock can stamp events 4 hours into the future. Production systems clamp incoming event_time to [wallclock - 24h, wallclock + 5min] and reject (or correct) outliers.

(2) Stamped by the gateway. The mobile event arrives at the PaisaBridge edge gateway and the gateway adds gateway_received_at. This is what every event has some trustworthy timestamp, but it's already +50 to +500 ms past the user action. For most analytics this is fine; for sub-second SLAs it's the wrong clock.

(3) Stamped by Kafka. Kafka's KafkaProducer adds a record timestamp on append, and Flink can use KafkaSource.timestampsAndWatermarks(...) to read it. This timestamp is processing time at the producer, not the user's action. It's the floor — better than nothing, worse than (1) or (2). Use this only when neither the producer nor the gateway gives you a stamp.

In a PaisaBridge or BharatBazaar pipeline, you almost always have option (1) for events the company controls (logins, taps, transactions) and option (2) for events from third parties (payment gateway callbacks, partner webhooks). The pipeline assigns event time per source: transaction_events.event_time comes from the app, bank_callback_events.event_time comes from the gateway. The Flink job's WatermarkStrategy.forBoundedOutOfOrderness(Duration.ofSeconds(30)) uses whichever field is the canonical event-time, and the rest of the system (windowing, joins, late-event handling) sees a single coherent timeline.

The further right the stamp, the more network and queueing delay it has absorbed. By the time Flink sees an event, "when did this happen?" is not a question it can answer — only the event itself can.

The cost of going event-time

Event time is correct, processing time is fast — and the gap between the two is paid for in two places. The first is buffering. The operator cannot fire a window until it is reasonably sure all events for that window have arrived, which means holding window state in RocksDB longer than the window's wall-time duration. A 60-second window with 30 seconds of allowed lateness holds state for 90+ seconds before emitting. At SetuStream scale (millions of concurrent users), that's gigabytes of additional state.

The second cost is late events. With processing time, every event lands in some window — the one the operator was working on when the event arrived. With event time, an event can arrive after its window has already fired. The runtime has to choose: drop the late event (cheap, lossy), buffer until "allowed lateness" then drop, or update the already-emitted aggregate by emitting a retraction + new value (correct, expensive). Apache Flink defaults to "drop after allowed lateness"; Beam offers all three. Picking the policy is a product decision — the PaisaBridge reconciliation job emits retractions because every paisa matters; the SetuStream concurrent-viewer dashboard drops late events because a 0.4% miss is cheaper than the downstream pipeline complexity.

This trade-off is what Build 8 spends the next several chapters on. Watermarks at /wiki/watermarks-how-we-know-were-done are the mechanism for "reasonably sure all events have arrived". Late-data handling at /wiki/late-data-handling-allowed-lateness-side-outputs-retractions is the mechanism for "an event arrived after the window fired".

Common confusions

"Event time and processing time are usually close enough that it doesn't matter." They are close on the median (median delay ~150 ms in healthy networks) but the tail dominates aggregates. The 1% of events with 90+ second delay are exactly the events that show up as a phantom bucket in your dashboard. P99 skew matters more than average skew for correctness.
"Processing time is faster, so use it for low-latency dashboards." It's faster only because it skips the watermark — meaning it emits the wrong number sooner. A "fast wrong" dashboard is worse than a "30-second-delayed correct" dashboard for any business decision. Use processing time only for ops metrics about the operator itself.
"Event time requires perfectly-synchronised clocks across producers." No — it requires the event-time stamp to be set at a consistent reference point per source (e.g. the Google Pay app uses NTP-synced device time, the PaisaBridge gateway uses NTP-synced server time). A few seconds of clock drift is absorbed by WatermarkStrategy.withTimestampAssigner(...) plus a forBoundedOutOfOrderness(Duration.ofSeconds(30)) bound. The pipeline does not need clock synchrony, only clock quality.
"Ingestion time (Kafka's record timestamp) is the same as event time." It is processing time at the producer. If the producer is the user's app and the app stamps the event before publishing, you can configure Kafka to use that field (message.timestamp.type=CreateTime) and effectively get event time. If the producer is a gateway that stamps on receipt, you get gateway-processing-time. The naming is misleading; check what your specific producer does.
"Reprocessing is fine on processing time as long as you replay quickly." Reprocessing on processing time is never deterministic. Even at full speed, the wallclock during replay differs from the original wallclock. The window assignment changes, the aggregates differ from the first run, and any downstream system that joined against those aggregates now disagrees with itself. This is the primary reason Lambda architecture (separate batch + speed paths) failed — the two paths produced different numbers and reconciling them was its own job.
"Watermarks are an event-time-only thing." Watermarks are how you decide a window is "done" on the event-time clock — they are inherently event-time. Processing-time windows don't have watermarks; they fire on a wallclock metronome. If your job uses processing time and you're worried about late data, you're using the wrong clock.

Going deeper

The Dataflow Model — why event time won the design debate

Akidau et al.'s 2015 Dataflow paper crystallised the position: any streaming model that doesn't separate event time from processing time is fundamentally broken because batch and streaming should be unified, and batch by definition uses event time (the input is already collected). The paper formalised four questions the runtime must answer separately — what is computed, where in event-time bucketing, when in processing-time triggering, how in late-data refinement. Flink, Beam, Spark Structured Streaming, and Materialize all implement this model. The win was theoretical: it's now agreed that "event time + watermarks + triggers" is the only consistent way to define correctness. Reach for this paper when someone in a design review claims "let's just use processing time for now" — the answer is in section 3 of Akidau.

Skew across partitions — the asymmetric backpressure trap

Kafka delivers per-partition order but no cross-partition order. If your Flink job has 32 parallel subtasks each reading one partition, and partition 0 is on a healthy broker while partition 7 is recovering from a leader election, the per-subtask watermark on partition 0 advances to T = 19:35 while partition 7 is still at T = 19:32. The job's overall watermark is the minimum across subtasks (19:32), so the entire pipeline waits for partition 7 to catch up. This is correct — you cannot fire a window until all partitions have crossed it — but it surfaces in metrics as "watermark lag" growing. The fix is WatermarkStrategy.withIdleness(Duration.ofMinutes(2)) which lets a stalled partition be excluded from the watermark calculation, with the trade-off that a partition coming back online after 2 minutes will have all its events treated as late.

Why event time is hard for joins

A stream.join(other_stream).where(...).window(TumblingEventTimeWindows.of(Time.minutes(1))) requires both streams to advance their event-time watermark together. If one stream is the high-volume click stream from the BharatBazaar catalogue (1M events/sec, watermark advances rapidly) and the other is low-volume order-completed events (100 events/sec, watermark advances slowly), the join waits on the slow stream. Production systems use interval joins — where(left.time BETWEEN right.time - 5min AND right.time + 1min) — which buffer the smaller stream and emit join results as the larger stream's watermark advances. The state cost is (arrival_rate_left × interval_size) events held in RocksDB. This is why Flink's IntervalJoin operator has dedicated state-pruning logic — without it, a low-volume stream paired with a high-volume stream would grow state unboundedly.

Replay determinism — the savings test

A pipeline is "replay-safe" if running the same input twice produces the same output. Test it by running your job once on yesterday's Kafka topic, snapshotting the output sink, then running it again from offset 0 and diffing. With event-time windows the diff is empty (modulo retractions if late data crossed the allowed-lateness boundary differently — usually still empty). With processing-time windows the diff is gigabytes, every aggregate slightly different. This is the test every data engineering team should run before promoting a streaming job to production. Reproducibility is what separates a pipeline from a script.

When processing time IS the right answer

Three categories. (a) Operator health metrics — "events processed per second by this Flink subtask" is meaningfully on processing time, because the question is about the operator. (b) Rate-limiting / circuit-breaking — "if this stream's processing-time throughput exceeds 100k/sec, slow down the upstream" — the decision is about the operator's current load, not the user. (c) Some short-window real-time alerts where 30-second precision doesn't matter — "did BharatBazaar's checkout error rate exceed 0.5% in the last 60 wallclock seconds?" can use processing time because the lag distribution is well-bounded. For everything else (UPI volume, SetuStream concurrent viewers, BhojanBox ETA accuracy, BharatRail seat availability) — event time.

Where this leads next

The next chapter — /wiki/watermarks-how-we-know-were-done — explains the mechanism the runtime uses to decide a window is safe to fire on the event-time clock. After that, /wiki/late-data-handling-allowed-lateness-side-outputs-retractions covers what to do when an event arrives past its window's deadline.

The mental model: event time is the truth, processing time is the operator's view, and the gap between them is what every streaming runtime is built to manage. Watermarks bound the gap. Triggers decide what to emit before the bound is reached. Allowed lateness decides what to do with events that overshoot. The next three chapters are mechanism papers on a single problem — running a clock that the operator does not own.

Build 8's capstone — /wiki/wall-exactly-once-is-a-lie-everyone-tells — closes the loop by showing how event-time windowing interacts with Kafka offset commits and sink transactions to deliver end-to-end exactly-once semantics. Without event time, exactly-once is not even a coherent goal.

References

The Dataflow Model — Akidau et al., 2015 — the paper that defined event time vs processing time as a first-class distinction. Section 2 is the canonical statement of the problem.
The world beyond batch: Streaming 101 (Akidau on O'Reilly) — the readable blog version of the Dataflow ideas. Start here if the paper is too dense.
Apache Flink — Event Time and Watermarks — the Flink-specific mechanics, including idleness handling and per-source watermark assignment.
Kafka — Message timestamps (KIP-32) — the design that lets a Kafka record carry a producer-stamped time, separate from broker-receipt time.
Streaming Systems — Akidau, Chernyak, Lax (O'Reilly) — the textbook treatment, chapters 1–3 on what/where/when/how.
/wiki/windowing-tumbling-sliding-session — the previous chapter; windows are the structures whose boundaries this chapter is choosing the clock for.
/wiki/watermarks-how-we-know-were-done — the mechanism for deciding when an event-time window is safe to emit.
/wiki/state-stores-why-rocksdb-is-in-every-streaming-engine — where the buffered events for in-flight windows actually live.