Note: Company names, engineers, incidents, numbers, and scaling scenarios in this article are hypothetical — even when they resemble real ones. See the full disclaimer.
Temporal and durable execution
It is a Friday afternoon at PaySetu and Aarav, a backend engineer two years out of college, has just deployed a new merchant-payout flow. The flow is tame: pull pending settlements, debit the platform account, credit the merchant's bank, write an audit row, send a confirmation email. Five steps. The first deploy goes out at 16:42. At 16:51 the worker pod gets evicted by Kubernetes — a node was being drained for a kernel patch, nothing dramatic, just the world doing its job. Aarav opens the logs and sees that 312 payouts were "in progress" when the pod died. Of those, 47 were debited but not credited — the bank API had returned 200, the platform account had been debited, but the function had crashed before writing the audit row or calling the merchant credit endpoint. The money is somewhere. Not lost, exactly — the bank reconciliation report on Monday will show it — but it is not in the merchant's account, and the merchant is going to call support tomorrow. Aarav spends the weekend writing reconciliation scripts and learns the lesson every backend engineer eventually learns: a function that touches more than one external system cannot survive a process restart by itself. The function needs help from a layer below it that remembers what already happened. That layer is durable execution, and Temporal is its most popular implementation.
Durable execution is a small idea with large consequences. The idea: turn the workflow function into something that can be replayed deterministically from a recorded history of side effects, so a crashed worker can hand off mid-execution to a fresh worker and the fresh worker resumes exactly where the crashed one stopped. The consequence: your business logic stops worrying about restarts entirely. A 30-day workflow with await asyncio.sleep(30 * 86400) in the middle just works — the worker process can be killed and re-deployed a hundred times during those thirty days; the workflow does not notice.
Durable execution makes a workflow function as crash-survivable as a database row. Temporal records every activity invocation and result into an event history; on worker restart, it replays the function from the start, short-circuiting recorded activities to their cached results so the function ends up at the exact line where it stopped. The trade-off: workflow code must be deterministic — no time.now(), no random(), no direct I/O. All non-determinism funnels through activities, the only things allowed to talk to the world.
The wall this solves — why a normal function dies on restart
Consider Aarav's payout function written naively. It is six lines:
def payout(merchant_id, amount):
settlement_id = pull_settlement(merchant_id) # step 1
debit_platform(amount) # step 2
credit_merchant(merchant_id, amount) # step 3
write_audit(settlement_id, "credited") # step 4
send_email(merchant_id, "your payout is in") # step 5
If this function crashes between step 2 and step 3, it has moved money out of the platform account but not into the merchant. On restart, the function starts again from line 1. Line 1 fetches a fresh settlement, line 2 debits again, line 3 credits, and so on. The platform account is now double-debited; the merchant is correctly credited; the audit table thinks two payouts happened. None of those is recoverable from inside the function — it does not know that "this is a re-run after a crash".
You can patch this with idempotency keys (the dedup approach): pass the same settlement_id to debit_platform so the second call no-ops, and so on. That works for step-level idempotency, but it does not solve the harder problem: the function still has to re-execute every step on restart, including ones whose results are expensive to recompute (an LLM call, a fraud-score lookup, a 4-second cross-region API). On a workflow with a 24-hour sleep in the middle ("send the receipt email tomorrow morning"), the function cannot just be "restarted" — there is no way to teleport the call stack to "right after the sleep". The function ends, the event is forgotten, the email never sends.
Why this is a wall: a normal function lives in process memory. On crash, the memory is gone, the call stack is gone, the local variables are gone. The function has no way to know "I have already done steps 1, 2, 3 — start at step 4". To survive crashes, the function's progress has to live somewhere persistent — outside the process, in something that survives pod evictions and machine failures. That somewhere is the durable-execution engine's event history.
The trick — replay from a recorded history
Temporal's central design is unusual enough to be worth deriving from scratch. The workflow function is never run to completion in one go. Instead, every time a worker picks up the workflow, the engine hands it the complete event history so far and asks the worker to replay the function from line 1.
The replay is short-circuited at every activity call. When the function reaches debit_platform(amount), the worker does not actually debit. It looks at the history: has there already been a debit_platform event recorded for this workflow run? If yes, the worker returns the cached result from the history without making the API call, and the function continues to the next line. If no, the worker dispatches the activity for real, the activity runs, the result is appended to the history as a new event, and the worker stops the workflow function (because the activity is now in flight). Later, when the activity completes, the engine wakes a worker (maybe the same one, maybe a different one), hands it the updated history, and the worker replays from the start again. This time the debit_platform call short-circuits to the cached result, and the function reaches credit_merchant, where the same dance happens.
The shape of the function's execution, drawn out, is: replay → first new activity → suspend → activity completes externally → replay → second new activity → suspend → ... until the function returns. Each replay starts from line 1; each replay reaches one line further than the last. Why this works: the function's local variables are reconstructed on every replay by re-executing the deterministic prefix. There is no state to checkpoint — the state is the history of activity events, and re-running the function with that history reproduces the local variables exactly. This is exactly Raft's approach to state-machine replication, applied to a workflow function instead of a key-value store.
The first time you see this you ask: isn't replaying the function on every step wildly expensive? For a 5-step workflow you replay 5 times, doing 1+2+3+4+5 = 15 line-executions instead of 5. The answer is: yes, but the function body is pure code-without-I/O, so it runs in microseconds. The expensive things are the activities, and those are short-circuited from the history. The replay tax is a few microseconds per step in exchange for crash-survivability. It is a phenomenally good trade.
The determinism contract — what your code can and cannot do
For replay to reproduce the same local variables, the workflow function must be deterministic on the history: given the same input and the same sequence of activity results, it must take the same code path every time. This is the determinism contract, and it is the single source of pain for new Temporal users. The forbidden things in workflow code:
time.now()/time.time()— wall-clock reads. Two replays at different wall-clock times would diverge. Useworkflow.now()(which Temporal records into the history).random.random(), UUIDs fromuuid.uuid4()— non-deterministic. Useworkflow.random()orworkflow.uuid4()(seeded from the history).- Direct I/O — file reads, HTTP calls, database queries. All of these must be wrapped in activities. The activity is the only thing allowed to interact with the outside world.
- Threading, multiprocessing, asyncio loops — non-deterministic scheduling. Use Temporal's own coroutine primitives (
workflow.sleep,workflow.wait_condition). - Iteration over unordered collections (
set,dict.keys()in older Pythons) — order can vary. Sort first. - Reading mutable globals — they can change between replays. If you need configuration, pass it in as workflow input or fetch it via an activity.
These rules sound restrictive until you internalise the model: workflow code is the policy, activities are the actions. Anything that talks to a database, sends an email, computes something expensive, or reads a clock is an activity. Anything that decides which action to take is workflow code. With this split, the workflow function ends up looking like a plain Python function with await activity.execute(...) instead of direct calls, and the determinism rules are easy to follow.
A toy durable-execution engine — runnable
The cleanest way to feel the replay model is to build a 60-line version of it. The version below is missing persistence, networking, and timers, but it captures the core trick: a function is replayed against a history, activities short-circuit on a hit, and the function suspends when it reaches an unseen activity.
import json, random, time
from dataclasses import dataclass, field
@dataclass
class History:
events: list = field(default_factory=list) # [(name, args_hash, result), ...]
def lookup(self, name, args):
for ev_name, ev_args, ev_result in self.events:
if ev_name == name and ev_args == args:
return ev_result
return None
def append(self, name, args, result):
self.events.append((name, args, result))
class WorkflowSuspended(Exception):
"""Raised when a workflow reaches an activity not yet in history."""
def activity(history, name, fn, *args):
cached = history.lookup(name, args)
if cached is not None:
return cached # short-circuit on replay
result = fn(*args) # actually run it
history.append(name, args, result)
raise WorkflowSuspended(name) # stop here; resume next replay
# Real activities — talk to the world
def debit_platform(amount): return {"ok": True, "txn": f"D{random.randint(1000,9999)}"}
def credit_merchant(mid, amt): return {"ok": True, "txn": f"C{random.randint(1000,9999)}"}
def write_audit(sid, status): return {"audit_id": random.randint(10000,99999)}
def send_email(mid, body): return {"sent": True}
# Workflow code — deterministic on the history
def payout_workflow(history, mid, amount):
activity(history, "debit", debit_platform, amount)
activity(history, "credit", credit_merchant, mid, amount)
activity(history, "audit", write_audit, mid, "credited")
activity(history, "email", send_email, mid, "your payout is in")
return "DONE"
# The "engine" — replay until completion
def run_until_complete(mid, amount):
history = History()
for replay_n in range(1, 100):
try:
result = payout_workflow(history, mid, amount)
print(f"replay #{replay_n}: COMPLETED -> {result}")
return history
except WorkflowSuspended as e:
print(f"replay #{replay_n}: suspended after activity '{e}'")
run_until_complete("M-7731", 4500)
Sample run:
replay #1: suspended after activity 'debit'
replay #2: suspended after activity 'credit'
replay #3: suspended after activity 'audit'
replay #4: suspended after activity 'email'
replay #5: COMPLETED -> DONE
Five replays, four activities, one final completion. Each replay reaches exactly one activity further than the last; each replay reconstructs the function's locals by re-running the prefix. Now imagine the engine crashes between replay #2 and replay #3, and the worker is restarted on a different machine. The new worker reads the persisted history (in real Temporal: from a Cassandra-backed event store), and replay #3 picks up exactly where replay #2 left off. The workflow does not notice the crash. That is durable execution.
Why this is the same algorithm as Temporal's, just simpler: real Temporal persists history.events to a fault-tolerant store (Cassandra in Cadence, configurable in Temporal), the worker is a separate process polling for tasks, the suspension is implemented via cooperative coroutines instead of exceptions, and the activities run on different worker pools so they can scale independently. But the core — replay against a history, short-circuit on a hit, suspend on a miss — is exactly the loop above.
Where this shows up in production
CricStream's video-encoding pipeline is a Temporal workflow. A creator uploads a 4-hour match recording at 23:14. The workflow validates the upload, kicks off five parallel encode activities (1080p, 720p, 480p, 240p, audio-only), waits for all to complete, generates a thumbnail, runs a content-moderation pass, publishes to the CDN, sends a "your video is live" notification. Total wall-clock: 90 minutes. The workflow's start_to_close_timeout is 4 hours — anything past that is a workflow-level failure. During those 90 minutes, Kubernetes evicts the worker pod twice. Both times, a different worker picks up the workflow's pending tasks and resumes from the history. The creator never knows. Why this is hard to do without Temporal: the "wait for five parallel encode activities" step is the killer. With raw queues, you would write a counter-based aggregator that decrements as each encode finishes — and then handle the case where the counter row gets corrupted, the aggregator process dies between increments, the message redelivers and increments twice. Temporal's await asyncio.gather(*encode_activities) does this in one line because the gather is part of the deterministic workflow code; the engine handles all the partial-failure cases.
PaySetu's onboarding-with-KYC-followup is a 14-day workflow. Day 0: KYC document submitted. Day 0+5min: email sent. Day 7: if KYC still pending, send reminder. Day 14: if still pending, expire the application. Implementing this with cron jobs and database flags is doable but ugly. Implementing it with Temporal is await workflow.sleep(timedelta(days=7)); if state.kyc_status == 'pending': await activities.send_reminder(). The 7-day sleep is durable — it persists across worker restarts, deploys, even region failovers.
Common confusions
- "Temporal is a queue" — It is not. Queues store messages and let consumers pull them; Temporal stores workflow histories and dispatches replay tasks to workers. The unit is the workflow execution, not the message. A queue cannot replay your function; Temporal's whole point is that it can.
- "I can just call
time.sleep()in workflow code" — You cannot.time.sleep()blocks a real thread; Temporal needs to suspend the workflow so the worker can do other work. Useawait workflow.sleep(...)which durably records the sleep into history and returns control to the worker. - "Activities are just functions, I can call them like normal" — Inside workflow code, calling an activity directly bypasses the engine. You must use
await workflow.execute_activity(...)so the engine records the call into history and can short-circuit on replay. - "Determinism only matters during initial development" — It matters forever. Every code change to a workflow function must preserve replay-determinism for in-flight workflows that started on the old code. This is workflow versioning, and it is one of the harder operational concerns. See the going-deeper section.
- "Temporal replaces my message broker" — Mostly no. Temporal handles workflow orchestration; high-throughput pipelines (analytics events, log shipping) still want Kafka. The two coexist: Kafka for the firehose, Temporal for the multi-step transactional flows.
- "The history grows forever and that's fine" — It is not. A workflow with a 30-day loop that runs every minute has 43,200 events, which is past Temporal's 10K-event-per-history soft limit. You use
continue-as-newto seal the current history and start a fresh one with the same workflow id. New users discover this around the time their first workflow hits 10,000 events and the engine starts complaining.
Going deeper
The Cadence paper and the lineage
Temporal is the open-source descendant of Uber's Cadence, which is itself the descendant of Amazon's Simple Workflow Service (SWS) and Microsoft's Durable Task Framework. The lineage matters because the determinism-replay model is not the most natural design — it is a hard-won engineering insight about what is operationally tractable at scale. Cadence's authors, in their 2017 paper, explicitly contrast it with the "explicit checkpointing" model (where the engine periodically snapshots the workflow's state). Explicit checkpointing has a smaller replay cost but a much larger storage cost (a full state dump per checkpoint vs. a list of event tuples), and the snapshot is opaque to introspection. Replay-on-history is more code-intensive at run time but the history is itself a complete, queryable audit log of the workflow — which is what every SRE wants when the workflow misbehaves.
Versioning — the hardest operational concern
The workflow function is replayed deterministically. What happens when you change the function? Imagine a workflow that has been running for 3 days and is paused on await workflow.sleep(...). You deploy v2 of the workflow function with an extra activity inserted before the sleep. When the sleep ends and a worker resumes the workflow, replay starts from line 1 of the new function. The new function tries to execute the new activity at the new position; the history has no record of it (because the workflow was started on v1); the engine sees a non-deterministic divergence and raises a NonDeterminismError. This kills the workflow. Why this is the worst class of bug: it does not appear in tests because tests start fresh; it appears only for in-flight workflows on the day of the deploy. To handle it, Temporal exposes workflow.patched("v2-extra-step") — a version marker that records which version the workflow is using. The function reads the marker; if the marker is "v1", the function does the old thing; if "v2", the new thing. Both branches must be retained until all in-flight v1 workflows have completed.
Sticky workers and the cache
Replaying from line 1 every time would be wasteful for long workflows. Temporal optimises this with sticky workers: after a worker replays a workflow, it caches the in-memory state of the workflow function. The next task for that workflow is preferentially routed to the same worker, which can resume from the cached state without replaying. If the worker dies, the next task falls through to any worker, which replays from history. This is a pure performance optimisation — correctness still depends on the replay model.
Restate, DBOS, and the next generation
Temporal's design is from 2014–2019. The model has been refined since. Restate (founded 2023) embeds the engine as a sidecar and uses an event-sourced log directly accessible to application code. DBOS (also 2023) goes further — it stores the workflow state in the application's own Postgres via a transactional log. Both relax some of Temporal's heavyweight infra requirements while keeping the determinism-replay core. The unifying idea — function replay against a recorded history — is now considered a primitive of the field, not a Temporal-specific trick.
Reproduce this on your laptop
python3 -m venv .venv && source .venv/bin/activate
# Save the toy engine snippet as durable_demo.py and run:
python3 durable_demo.py
# Expected: 4 'suspended' lines, then 'COMPLETED -> DONE' on replay #5.
# Now: kill the process between replays (force-quit), persist the History list to a JSON file, restart, and watch the workflow resume.
Where this leads next
Durable execution is the engine; orchestration is the architectural pattern that uses it. The next chapters in this part build on durable execution:
- /wiki/wall-orchestration-is-its-own-layer — the wall this is the answer to.
- /wiki/saga-pattern-compensating-actions — sagas are a natural fit for durable execution; the compensation logic is just more workflow code.
- /wiki/at-least-once-idempotency-in-practice — activities still need idempotency keys because the engine can re-dispatch on activity-worker crash.
The lesson the rest of this part inherits: a workflow function that survives crashes is not magic, it is a determinism contract paired with a recorded history. Once you accept the contract, the rest of the orchestration story (timers, signals, queries, child workflows, sagas) is application of the same primitive in different shapes.
References
- Temporal — concepts and architecture — the canonical introduction to event-sourced workflow execution and replay.
- Cadence — design overview — the open-source predecessor to Temporal; same ideas, the original paper.
- Maxim Fateev and Samar Abbas, "Cadence: a fault-tolerant stateful code platform" (2018) — the canonical paper on the determinism-replay model.
- Restate — the durable execution sidecar — the next-generation engine with a different deployment shape.
- DBOS — durable execution in Postgres — the "your database is the workflow log" approach.
- Microsoft Durable Functions — patterns and concepts — the Azure variant of the same model.
- Chris Richardson, "Pattern: Saga" (microservices.io) — the practitioner reference for compensation patterns durable engines naturally support.
- /wiki/wall-orchestration-is-its-own-layer — the orchestration wall that durable execution is the answer to.