Note: Company names, engineers, incidents, numbers, and scaling scenarios in this article are hypothetical — even when they resemble real ones. See the full disclaimer.
Workflow-as-code vs DAG-as-yaml
It is 2am at MealRush and Karan, the on-call data engineer, is staring at an Airflow DAG that refuses to run for one specific restaurant. The DAG has 14 tasks; the third task — a branch operator — is supposed to skip the price-recomputation step on weekdays after 11pm and run a lighter version instead. The branch is yaml: a BranchPythonOperator that returns the name of the next task to run. For 99.97% of restaurants the branch fires correctly. For one restaurant in Indore, the branch returns the wrong task name, the lighter path never runs, and a daily ₹4-lakh rollup is missing. There is no stack trace. There is a yaml file, a xcom table, and a 90-second wait between scheduler ticks.
This is the central tension in workflow engines: the workflow has to be durable (survive crashes, retries, restarts) and inspectable (the engine has to know which step is next), and there are two fundamentally different ways to satisfy both constraints. The DAG-as-yaml camp (Airflow, Argo Workflows, AWS Step Functions State Language) declares the workflow as a static structure the engine interprets — every node, every edge, every condition lives in configuration. The workflow-as-code camp (Temporal, Cadence, DBOS, Dapr Workflows) lets you write the workflow in an ordinary programming language — if, for, try/except, function calls — and the engine durably records every effect so that re-running the same code produces the same decisions. The choice shapes everything downstream.
DAG-as-yaml engines store the workflow as static configuration the engine walks; workflow-as-code engines store an event history of activity results and replay it through your code to reconstruct state. The yaml camp wins on visualisation and SQL-queryable structure; the code camp wins on dynamic branching, expressivity, and local debugging. The architectural primitive that makes workflow-as-code possible is deterministic replay — and once you understand replay, the rest of the trade-offs follow mechanically.
The two execution models — what the engine actually stores
A DAG-as-yaml engine stores the workflow definition itself: nodes, edges, dependencies, retry policies, schedules — all parsed from yaml or json at deploy time and held in the metastore. When a workflow run begins, the engine creates a run record whose state is "task A: pending, task B: pending, …" and walks the graph node-by-node, marking each task running, then success or failed, then promoting downstream tasks to pending. The state machine is the engine's; your code only runs inside individual tasks.
A workflow-as-code engine stores a different shape entirely: an event history of activity invocations and their results. Your workflow function is ordinary code, but every external effect (calling an activity, sleeping, awaiting a signal, reading the clock) is intercepted by the engine SDK and routed through the history. When the engine needs to know "what happens next?" it does not consult a yaml file — it runs your workflow function from the top, and replays each external effect from the history. If the history says approve_kyc returned "K-7731", the SDK returns "K-7731" from the activity call without re-executing it; only when the workflow function reaches an effect that has no history entry does the engine actually schedule the activity and record the new event.
Why this matters: the storage shape determines what the engine can do without your code present. A yaml engine can render the DAG, run "show me all DAGs whose third task is approve_kyc" as a SQL query against the metastore, and let an operator retry a single node from the UI — because the structure is data the engine owns. A code engine cannot do any of that without running your workflow function (because the structure only exists when your code is executing); but it can let you write if, for, while, recursion, dynamic activity dispatch, and ordinary control flow — because there is no static graph to constrain you.
Deterministic replay — the load-bearing requirement
Workflow-as-code only works because of one strong constraint: the workflow function must be deterministic with respect to the event history. Given the same history, running the function from the top must take the same path through if, for, and activity calls every time. This is what lets the engine "fast-forward" through everything that already happened — the SDK replays each activity call, returns the recorded result, and your code reaches the same execution state it was in before the worker crashed. Once it reaches an effect with no history, the SDK schedules a new activity and yields control.
Determinism imposes rules:
- No
datetime.utcnow()orrandom.random()directly — they return different values across replays. Useworkflow.now()andworkflow.uuid4()instead, which the engine intercepts and records. - No reading from process memory shared with other workflows — that memory is not in the history.
- No "external" I/O directly — every external call must be an activity, so the engine can record its result.
- No threads, no
asyncio.gatheroutside the SDK's helpers — task ordering must be reproducible.
The reward for accepting these rules is enormous: your workflow function reads as ordinary linear code, but it is durably executable across crashes, restarts, and weeks-long sleeps. A await workflow.sleep(timedelta(days=7)) blocks for a week. An if user.is_premium: branches based on data fetched in an earlier activity. A for line_item in order.lines: iterates over a list whose length is only known at runtime. None of this is expressible in a static yaml DAG — the yaml file would have to know the loop bound at deploy time.
Why "deterministic replay" is non-obvious: the natural intuition is that the engine re-runs the work, which would be expensive and incorrect (the second approve_kyc would create a duplicate KYC record). It does not re-run anything; it re-traces the function's path through the recorded history, returning the same activity results to the same call sites. Re-running is replay's opposite — it is what would happen without the history.
A toy workflow-as-code engine — runnable
The 70-line version below shows the replay primitive. The "engine" stores an event history; the workflow function is ordinary Python; the engine intercepts each act() call, consults the history, and either replays or schedules.
import time, json, hashlib
class History:
def __init__(self): self.events, self.cursor = [], 0
def next_event(self):
if self.cursor < len(self.events):
ev = self.events[self.cursor]; self.cursor += 1; return ev
return None
def append(self, ev): self.events.append(ev)
def reset_cursor(self): self.cursor = 0
class Engine:
def __init__(self, hist): self.hist = hist
def act(self, name, fn, *args):
ev = self.hist.next_event()
if ev is not None: # replay path
assert ev["name"] == name, f"non-determinism: history has {ev['name']}, code asked for {name}"
return ev["result"]
result = fn(*args) # first-execution path
self.hist.append({"name": name, "result": result})
return result
def now_seed(self):
ev = self.hist.next_event()
if ev is not None: return ev["result"]
seed = int(time.time())
self.hist.append({"name": "__now__", "result": seed})
return seed
# Activities — ordinary functions, no engine awareness
def approve_kyc(m): return f"K-{m}"
def create_acct(m): return f"A-{abs(hash(m)) % 9999:04d}"
def insert_row(m, a): return {"merchant": m, "acct": a, "ok": True}
def light_path(a): return {"path": "light", "acct": a}
def full_path(a): return {"path": "full", "acct": a}
# Workflow function — ordinary control flow
def onboard(eng, merchant):
k = eng.act("approve_kyc", approve_kyc, merchant)
a = eng.act("create_acct", create_acct, merchant)
seed = eng.now_seed()
if seed % 2 == 0: # deterministic on replay
return eng.act("light", light_path, a)
return eng.act("full", full_path, a)
# First execution — no history
hist = History()
eng = Engine(hist)
result1 = onboard(eng, "M-7731")
print("first run result:", result1)
print("history after first run:")
for ev in hist.events: print(" ", ev)
# Simulate worker crash + restart — replay from the same history
hist.reset_cursor()
eng2 = Engine(hist)
result2 = onboard(eng2, "M-7731")
print("replay result: ", result2)
print("results equal: ", result1 == result2)
Sample run:
first run result: {'path': 'light', 'acct': 'A-9091'}
history after first run:
{'name': 'approve_kyc', 'result': 'K-M-7731'}
{'name': 'create_acct', 'result': 'A-9091'}
{'name': '__now__', 'result': 1745931622}
{'name': 'light', 'result': {'path': 'light', 'acct': 'A-9091'}}
replay result: {'path': 'light', 'acct': 'A-9091'}
results equal: True
The first execution scheduled four effects and recorded them in order. The replay reset the cursor, ran the same function from the top, and got the same result without executing a single activity body — every act() call was satisfied from history. That equivalence is the entire trick.
The walkthrough, line by line:
Engine.act(...)is the interception point. It checks whether history has a recorded result for this position; if yes, return the recorded result; if no, execute the function and record the result. This is the same primitive Temporal'sworkflow.execute_activityand DBOS's@DBOS.stepuse, just stripped down.now_seed()demonstrates how non-deterministic functions are handled — wrap them in the same intercept-and-record protocol. Real engines do this fornow(),uuid4(),random(),sleep().- The
onboardworkflow function uses ordinary Python control flow (if, function calls, return). There is no DAG description; the execution trace of this function is the DAG, and that trace is a different shape per input. - The replay path's correctness depends on
seed % 2evaluating to the same boolean both times — which it does because the seed is recorded in history. If the workflow had usedtime.time()directly, the boolean would diverge on replay and the nexteng.act("light", ...)would assert withnon-determinism.
A real engine adds: persistent history (Postgres / sharded storage), worker pools, activity timeouts, retries, signals, child workflows, versioning. None of those change the core trick — they harden it.
Why the toy engine is enough to reason about real ones: every Temporal / Cadence / DBOS quirk that catches engineers in production — "why did my workflow re-run an activity?", "why did this if evaluate differently on replay?", "why does my unit test fail only after a worker restart?" — reduces to the cursor-and-history mechanism above. If you can predict what the toy engine will do given a history and a code change, you can predict what the production engine will do.
What you actually trade — the production view
KapitalKite's options-trading workflow has 4 fixed steps and runs ~80,000 times a day. The yaml-as-DAG version (Step Functions State Language) gave the trading-ops team a UI they loved: every run was a clickable graph, every failure was a node-level retry button, every state was queryable in CloudWatch. KapitalKite stayed on Step Functions for two years before hitting the wall: the new "smart-routing" requirement needed a for loop over up to 12 candidate exchanges, with early termination, and Step Functions' iteration constructs (Map state) could not express early termination cleanly. They migrated to Temporal, lost the visual-DAG UI for a quarter while building their own dashboard, gained for venue in route_plan: ... with break, and never looked back.
CricStream's "publish IPL highlight" workflow went the other way. It was a Temporal workflow with 3 fixed steps (transcode → CDN push → feed index update) that ran ~14M times a day during a series. The control flow was not dynamic; the steps were the same every time. The team's pain was not expressivity — it was that no one outside engineering could see what a workflow was doing without a Temporal CLI. They migrated to Argo Workflows (yaml DAGs on Kubernetes) and exposed the visual DAG in their internal CMS so editorial staff could see exactly which highlights were stuck at the CDN-push step. The migration cost them dynamic branching they were not using anyway, and bought them a debugging UI that cut MTTR from 12 minutes to 90 seconds for the editorial team.
PaySetu uses both. Their merchant-onboarding workflow (multi-step KYC, dynamic compensation, partner-API retries with bespoke logic) is on Temporal — workflow-as-code, because the control flow is genuinely dynamic. Their hourly settlement DAG (read CSV, compute aggregates, write to warehouse, send report — a fixed 11-step pipeline that runs at the top of every hour) is on Airflow — DAG-as-yaml, because the structure never changes and the data team wanted SQL-queryable run history. The split is conscious; PaySetu's platform team published an internal RFC titled "When to reach for which engine" that codifies it.
The honest comparison, point by point:
- Static DAG visualisation. Yaml wins outright. The engine renders the graph from the spec; you get a UI for free; non-engineers can read it.
- Dynamic control flow. Code wins outright. Loops with runtime-determined bounds, recursion, conditional fan-out, dynamic activity dispatch — all native; in yaml they require either wrapper-task gymnastics (Airflow's
expand()for dynamic task mapping) or pushing the logic into a single mega-task (which loses per-step retry semantics). - Local debugging. Code wins. You can run the workflow function under
pytestwith mocked activities; set breakpoints; inspect variables. Yaml workflows can only be debugged by running them on the engine and reading logs. - SQL-queryable run history. Yaml wins. The run state is rows in the metastore database (
task_instance,xcom,dag_run); ad-hoc queries are trivial. Code-engine histories are typically opaque blobs with engine-specific query APIs. - Versioning. Code wins, with caveats. Yaml workflows pin a DAG ID per version and start fresh runs; code workflows must use the engine's versioning primitives (Temporal's
workflow.GetVersion, DBOS's migration API) to evolve in place. Both have failure modes. - Scheduler latency. Yaml engines tick (Airflow's default scheduler interval is 5 seconds; many production deployments run 1 second). Code engines are typically push-based and respond to events immediately. For workflows with many short steps, this matters.
- Multi-language support. Yaml engines accept arbitrary task containers (any language goes inside Docker). Code engines require an SDK in your language; Temporal has Python / Go / Java / TypeScript / .NET; Argo runs Bash. Polyglot teams sometimes flip this trade-off.
Common confusions
- "Workflow-as-code workflows are 'just Python', so I can use any library" — They are Python under the constraint of determinism. Calling
requests.get()directly inside a workflow function is wrong; it has to be wrapped in an activity. Callingtime.sleep()is wrong; useworkflow.sleep. Workflow code is a restricted dialect — most libraries are fine inside activities, not inside the workflow function. - "DAG-as-yaml is the same as configuration-as-code" — They are not. Configuration-as-code means writing the configuration in a Turing-complete language (Pulumi, CDK) that emits the static config. DAG-as-yaml is purely declarative; you cannot do
if customer_tier == 'gold': add_step(...)at runtime, only at parse time. - "Temporal is just a fancier Airflow" — They are different categories. Airflow is for batch data pipelines that run on a schedule; Temporal is for transactional workflows (often event-triggered, long-running, with signals and queries). Both call themselves "workflow engines" and the overlap on simple cases is real, but the operating model diverges sharply at scale.
- "Workflow-as-code workflows are non-deterministic because they call activities over the network" — The workflow is deterministic with respect to the recorded activity results. The activities themselves can be wildly non-deterministic (call APIs, write to databases); the engine just records what they returned and replays that.
- "DAG-as-yaml is more durable because the DAG is in the database" — Both are equally durable. The yaml engine stores task states in its metastore; the code engine stores activity results in its history store. The data is durable in both cases; only the shape differs.
- "You can convert any code workflow to a yaml DAG mechanically" — You can convert workflows whose control flow is static. Workflows with
for line_item in order.items()(loop bound runtime),while not approved:(unbounded loops), or recursion cannot be unrolled into a static graph without losing semantics.
Going deeper
Versioning workflows in flight — the hardest problem in either model
A workflow that has been running for 6 days when you deploy a new version of the workflow code is the production nightmare both camps share. In Temporal, you call workflow.GetVersion("step-3-changed", DEFAULT_VERSION, 1) and gate the new behaviour behind the version gate; existing runs return DEFAULT_VERSION, new runs return 1, and the engine records the version at first call so replay is deterministic. In Airflow, you typically pin a dag_id per version (payment_pipeline_v2) and let the old version drain — old runs continue against v1's definition until they complete. Both approaches are tedious; both are necessary; and both are usually botched the first time a team hits them. (See /wiki/temporal-and-durable-execution for the deep dive on Temporal's version gates.)
Hybrid engines — Prefect, Dagster, Mage, Flyte
A third generation of engines (Prefect 2/3, Dagster, Mage, Flyte) tries to combine the two models: write the workflow in code, but the engine extracts a static DAG at parse time for visualisation, with explicit "dynamic task" primitives for the parts that genuinely need runtime branching. The result is a hybrid: you get a UI for the static structure plus code-level expressivity for the parts that need it. The trade-off is that the "static DAG" the engine extracts is sometimes a lie — it shows the possible shapes, not the actual shape this run will take — which causes its own debugging confusion. Dagster's "asset graph" is a clean version of this idea (the graph is the data assets the workflow produces, not the tasks that produce them).
The metastore database is a load-bearing single point of failure
Both camps lean heavily on a relational database for state. Airflow's metastore (Postgres/MySQL) holds DAG runs, task instances, xcoms, connections — and Airflow becomes unusable when it is unavailable. Temporal stores history in Cassandra / MySQL / Postgres (configurable) — the engine cannot make progress without it. The choice of metastore quietly determines the engine's scaling envelope: Airflow Postgres clusters routinely become the bottleneck at ~100k task instances/day; Temporal's history store sharding is what lets it run millions of workflows/second at companies that publish numbers. (See /wiki/distributed-state-stores for how the metastore choice cascades through the rest of the system.)
What dbt is and isn't — DAG-as-yaml's purest form
dbt is the most extreme DAG-as-yaml engine: every node is a SQL SELECT, every edge is a ref('upstream') reference, the entire DAG is generated at parse time from your project. dbt explicitly cannot do anything dynamic — it is a pure declarative compile-and-run model. The reason it works at scale (every analytics team in the country uses it) is that for the analytical-pipeline use case, you genuinely don't want dynamic control flow. Static is a feature, not a limitation, when the workload is "transform table X into table Y on a schedule". (See /wiki/dbt-as-the-purest-dag-engine.)
Where this leads next
The split between yaml-DAG and workflow-as-code is not a fashion cycle; it reflects two genuinely different shapes of work. Batch analytics pipelines are static DAGs by nature, and the yaml form is honest about that. Transactional and operational workflows almost always have dynamic shape, and the code form is honest about that. Picking the wrong shape produces engineering pain in proportion to the mismatch.
- /wiki/temporal-and-durable-execution — the durable-execution mechanism that makes workflow-as-code possible.
- /wiki/airflow-and-the-yaml-dag-tradition — the canonical yaml-DAG engine, the production patterns, and the failure modes.
- /wiki/the-saga-pattern-revisited-in-workflows — sagas as a code-engine native primitive.
- /wiki/retries-as-a-first-class-concept — retries handled identically by both camps, but exposed differently.
- /wiki/orchestration-vs-choreography — the third axis: who drives the workflow, the engine or the participants.
The lesson: the engine you pick is a bet on which kind of mistake you would rather debug at 2am — a yaml typo that misroutes a branch, or a non-determinism error that causes a replay to diverge. Both are real; neither is going away; pick the one you find more debuggable.
References
- Temporal documentation — Workflow as code — the canonical articulation of the model.
- Apache Airflow documentation — DAGs — the canonical yaml-DAG engine; ironically Airflow DAGs are written in Python, but as static structure, not control flow.
- Argo Workflows — pure yaml-DAG, Kubernetes-native.
- AWS Step Functions — State Language Reference — Amazon's yaml-DAG dialect.
- DBOS — Durable execution in Postgres — workflow-as-code without a separate engine; the database is the engine.
- Sergey Bykov, "Why workflows need to be code" (Temporal blog, 2022) — the founder's case for the code model.
- Maxime Beauchemin, "Functional data engineering" (2018) — the case for static DAGs in batch analytics, from Airflow's creator.
- /wiki/the-saga-pattern-revisited-in-workflows — the previous chapter; sagas are easier to express in code than in yaml.