Airflow vs Dagster vs Prefect: the real design differences

Three months into a new job at a Bengaluru fintech, Karan opened the runbook for a failing daily pipeline. The Airflow DAG was green for every task — extract_orders, transform_orders, load_orders had all returned exit code 0 — but the downstream daily_revenue_summary table was empty. The bug: transform_orders had silently produced 0 rows because an upstream schema change broke a JOIN, and Airflow has no concept of "the table this task produces". It only knows about tasks. A Dagster job in the same shape would have been red, because Dagster orchestrates assets, not tasks, and an asset that materialises empty when a check expects rows fails the run. The choice of orchestrator is not a feature comparison — it is a choice about what the orchestrator believes a pipeline is.

Airflow models a pipeline as a directed graph of tasks; Dagster models it as a directed graph of assets (the tables the tasks produce); Prefect models it as a Python function with decorators. The three frameworks make the same trade-offs at different layers — and the decision of which to use is really a decision about what your team wants the orchestrator to know.

Three different bets about what a pipeline is

A scheduler has to answer one question every minute: "what should run next, and on which inputs?" Each of the three frameworks answers it from a different starting point, and the entire downstream design — task definition, retry semantics, lineage, deployment, on-call experience — flows from that initial bet.

Three frameworks, three abstractions for "pipeline"A three-column comparison. Airflow column: a DAG of tasks (boxes connected by arrows), labelled "task graph". Dagster column: a DAG of assets (table icons connected by arrows), labelled "asset graph". Prefect column: a Python function with @flow decorator and inner @task calls, labelled "Python first". Each shows the same logical pipeline (extract -> transform -> load) modeled in its own way.Same pipeline, three different mental modelsAirflow — task graphextract_orderstransform_ordersload_ordersunit: a callableedge: temporal depDagster — asset graphorders_raw [table]orders_fact [table]revenue_daily [table]unit: a materialisationedge: data depPrefect — Python first@flowdef daily(): raw = extract() fact = transform(raw) load(fact) if late: branch... for shop in shops: sub_load(shop)unit: a function call
The headline difference: Airflow models tasks (verbs), Dagster models assets (nouns), Prefect models a Python function (control flow). Every other design choice flows downstream from this.

Airflow's bet is that pipelines are sequences of operations against external systems. The DAG is the user-facing object; tasks are the unit of retry, scheduling, and observability. A task says "I will SSH into a box and run psql" or "I will execute a SQL statement on Snowflake" — the orchestrator does not know what data flowed through, only that the operation finished. This is the right abstraction when the data is genuinely opaque to the orchestrator (a SparkSubmit task that runs a 200-node job, an SFTP transfer, an external API call) and the orchestrator's job is to compose those operations safely.

Dagster's bet is that pipelines exist to produce data assets — tables, files, ML models — and that the orchestrator should know about those assets directly. An asset is declared with @asset and the framework infers the dependency graph from how assets reference each other. The orchestrator knows that orders_fact is downstream of orders_raw, and that re-running orders_fact requires the latest version of orders_raw. This makes "rebuild this table from yesterday's data" a one-line CLI command — dagster asset materialize --select orders_fact --partition 2026-04-24 — instead of a hand-orchestrated re-run of three tasks.

Prefect's bet is that the orchestrator should disappear into the user's Python code. A pipeline is a @flow-decorated function that calls @task-decorated subroutines. The control flow is whatever Python you write — for shop in shops: process(shop), if today.weekday() == 6: skip(), try: load() except: rollback(). The DAG is dynamic, computed at runtime from how the function executes. This makes Python-native conditional logic and dynamic fan-out trivial — at the cost of static introspectability, which the other two frameworks lean on heavily.

Each bet is internally consistent. Each is wrong for some workloads and right for others. The hard part is figuring out which one matches the shape of the pipelines your team actually writes.

How each one runs the same job

The cleanest way to see the differences is to write the same pipeline in all three frameworks. The pipeline: pull yesterday's UPI transaction file from S3, transform it, load it into a Postgres reporting table, and run a freshness check. This is the bread-and-butter of an Indian fintech's daily reporting pipeline — Razorpay's settlement summary, PhonePe's reconciliation, Cred's reward attribution all look like this.

# === AIRFLOW (2.7+) ===
# A DAG of tasks. Dependencies declared with bitshift operators or set_upstream.
from airflow.decorators import dag, task
from datetime import datetime, timedelta
import pandas as pd, boto3, psycopg2

@dag(
    dag_id="upi_daily_report",
    start_date=datetime(2026, 4, 1),
    schedule="0 2 * * *",                  # 02:00 IST daily
    catchup=False,
    default_args={"retries": 3, "retry_delay": timedelta(minutes=5)},
    tags=["razorpay", "build-4"],
)
def upi_daily_report():
    @task
    def extract(data_interval_end):
        s3 = boto3.client("s3", region_name="ap-south-1")
        key = f"upi/{data_interval_end:%Y-%m-%d}.csv"
        obj = s3.get_object(Bucket="razorpay-raw", Key=key)
        return obj["Body"].read().decode("utf-8")    # task returns -> XCom

    @task
    def transform(raw_csv: str) -> str:
        df = pd.read_csv(pd.io.common.StringIO(raw_csv))
        df = df[df["status"] == "SUCCESS"]
        df["amount_inr"] = df["amount_paise"] / 100
        return df.to_csv(index=False)

    @task
    def load(transformed_csv: str, data_interval_end):
        with psycopg2.connect("postgres://reports") as cx:
            cx.cursor().execute(
                "DELETE FROM upi_daily WHERE date = %s",
                (data_interval_end.date(),),
            )
            cx.cursor().copy_expert(
                "COPY upi_daily FROM STDIN WITH CSV HEADER",
                pd.io.common.StringIO(transformed_csv),
            )
            cx.commit()

    raw = extract()
    transformed = transform(raw)
    load(transformed)

upi_daily_report()
# === DAGSTER (1.6+) ===
# A graph of assets. Dagster infers dependencies from function arguments.
from dagster import asset, Definitions, AssetCheckResult, asset_check, DailyPartitionsDefinition
from datetime import datetime
import pandas as pd, boto3, psycopg2

daily = DailyPartitionsDefinition(start_date="2026-04-01", timezone="Asia/Kolkata")

@asset(partitions_def=daily, group_name="razorpay", retry_policy={"max_retries": 3})
def upi_raw(context) -> pd.DataFrame:
    key = f"upi/{context.partition_key}.csv"
    s3 = boto3.client("s3", region_name="ap-south-1")
    obj = s3.get_object(Bucket="razorpay-raw", Key=key)
    return pd.read_csv(pd.io.common.StringIO(obj["Body"].read().decode("utf-8")))

@asset(partitions_def=daily, group_name="razorpay")
def upi_daily(context, upi_raw: pd.DataFrame) -> pd.DataFrame:
    df = upi_raw[upi_raw["status"] == "SUCCESS"].copy()
    df["amount_inr"] = df["amount_paise"] / 100
    with psycopg2.connect("postgres://reports") as cx:
        cx.cursor().execute("DELETE FROM upi_daily WHERE date = %s",
                            (context.partition_key,))
        cx.cursor().copy_expert("COPY upi_daily FROM STDIN WITH CSV HEADER",
                                pd.io.common.StringIO(df.to_csv(index=False)))
        cx.commit()
    return df

@asset_check(asset=upi_daily)
def upi_daily_has_rows(upi_daily: pd.DataFrame):
    return AssetCheckResult(passed=len(upi_daily) > 0,
                            metadata={"rows": len(upi_daily)})

defs = Definitions(assets=[upi_raw, upi_daily], asset_checks=[upi_daily_has_rows])
# === PREFECT (2.x) ===
# A Python function. Control flow is whatever you write.
from prefect import flow, task
from prefect.tasks import task_input_hash
from datetime import datetime, timedelta
import pandas as pd, boto3, psycopg2

@task(retries=3, retry_delay_seconds=300, cache_key_fn=task_input_hash)
def extract(date_str: str) -> pd.DataFrame:
    s3 = boto3.client("s3", region_name="ap-south-1")
    obj = s3.get_object(Bucket="razorpay-raw", Key=f"upi/{date_str}.csv")
    return pd.read_csv(pd.io.common.StringIO(obj["Body"].read().decode("utf-8")))

@task
def transform(df: pd.DataFrame) -> pd.DataFrame:
    df = df[df["status"] == "SUCCESS"].copy()
    df["amount_inr"] = df["amount_paise"] / 100
    return df

@task
def load(df: pd.DataFrame, date_str: str) -> int:
    with psycopg2.connect("postgres://reports") as cx:
        cx.cursor().execute("DELETE FROM upi_daily WHERE date = %s", (date_str,))
        cx.cursor().copy_expert("COPY upi_daily FROM STDIN WITH CSV HEADER",
                                pd.io.common.StringIO(df.to_csv(index=False)))
        cx.commit()
    return len(df)

@flow(name="upi_daily_report")
def upi_daily_report(date_str: str | None = None):
    date_str = date_str or (datetime.now() - timedelta(days=1)).strftime("%Y-%m-%d")
    raw = extract(date_str)
    if len(raw) == 0:                           # ordinary Python branching
        return {"status": "no_data", "rows": 0}
    transformed = transform(raw)
    rows = load(transformed, date_str)
    return {"status": "ok", "rows": rows}
# Same logical pipeline, three operational shapes
$ airflow dags trigger upi_daily_report
[2026-04-25 02:00:01] DAG run upi_daily_report__2026-04-24 queued
[2026-04-25 02:03:14] task extract  succeeded
[2026-04-25 02:04:02] task transform succeeded
[2026-04-25 02:04:51] task load     succeeded

$ dagster asset materialize --select upi_daily --partition 2026-04-24
[2026-04-25 02:00:01] Materializing upi_raw[2026-04-24]
[2026-04-25 02:03:14] upi_raw materialised (rows=2,431,072)
[2026-04-25 02:04:51] upi_daily materialised (rows=2,398,114)
[2026-04-25 02:04:52] check upi_daily_has_rows passed (rows=2,398,114)

$ prefect deployment run upi-daily-report/prod
[2026-04-25 02:00:01] flow run started, parameters={date_str: '2026-04-24'}
[2026-04-25 02:04:51] flow run completed, return={status: ok, rows: 2398114}

The headline differences in three lines. Airflow's data_interval_end is the framework's name for "the close of the window this run is processing", passed as a context kwarg into every task; the run's identity is its data interval, not its execution time. Why anchoring on data_interval_end matters: it is the same anchor the SLA mechanism in chapter 25 uses. A task that processes "yesterday's data" knows what "yesterday" means without consulting datetime.now(), which makes backfills correct without code changes — the same task body produces the right output for any historical date because the window is parameter, not implicit. Dagster's partition_key plays the same role but exposes it as a first-class concept: every materialisation is tagged with its partition, and dagster asset materialize --partition 2026-04-24 is the API for "rebuild this slice". Prefect's date_str is just a flow argument — Prefect itself does not know about data partitions; the user passes the date as a string.

The deeper mechanism: in Airflow, dependencies are declared by chaining function calls (raw = extract(); transformed = transform(raw) — Airflow detects that transform reads raw and wires the dependency). In Dagster, dependencies are inferred from function signatures (def upi_daily(upi_raw: pd.DataFrame) declares "this asset reads upi_raw" — the framework binds the parameter name to the asset). In Prefect, dependencies are not inferred at all — they are whatever the Python code does. Why signature-based dependency inference is a big deal: in Dagster, the dependency graph is statically discoverable. The CLI can answer "what does upi_daily depend on?" without running anything. In Prefect, that answer requires actually executing the flow, because the flow's structure may depend on its inputs. This is the cost of dynamic graphs — but it is also why Prefect handles for shop in 1k_shops: process(shop) natively while Airflow needs the dynamic-task-mapping API and Dagster needs partition multiplexing.

The retry semantics also diverge. Airflow retries a task at the task level — the entire transform task re-runs, producing fresh XCom output, and load reads the new value. Dagster's retry_policy retries the asset materialisation — the entire compute re-runs and the new result replaces the old. Prefect's retries=3 retries the task call, with optional cache_key_fn=task_input_hash to short-circuit when the same input was seen before (so a retry of a deterministic transform on the same input returns the cached output without re-computing).

The execution model: scheduler, executor, worker

Past the DAG-vs-asset-vs-function differences, all three frameworks share a roughly common execution architecture — a scheduler that decides what to run, an executor that dispatches it, and workers that actually run the code. The differences are in where the boundaries sit and which component owns which responsibility.

Scheduler / executor / worker — three slightly different cutsA three-row diagram comparing execution architectures. Airflow row: scheduler reads DAGs from filesystem, writes to metadata DB, executor pulls from queue, workers (Celery or Kubernetes) execute. Dagster row: daemon reads asset graph from code location, writes to event log, run worker spawned per run, executes one or more steps. Prefect row: server holds flow state, agent in user infra polls for work, worker executes flow, all communication via API.Where the boundaries sit — and what crosses themAirflowschedulerparses DAGsmetadata DBPostgresexecutorCelery / K8sworkerruns taskDagsterdaemonsensors+schedulesevent logstructuredrun workerper-run processstep workerper assetPrefectserver / cloudREST APIwork poolqueue per envworker (in your VPC)polls APIflow runsubprocess
Airflow centralises around the metadata DB; Dagster centralises around the event log; Prefect centralises around the API server. The dashed Prefect arrows show that worker→server communication is pull-based, not push — workers in your VPC poll a managed control plane.

Airflow's metadata DB is the source of truth. Every task instance, every DAG run, every variable, every connection lives in Postgres. The scheduler reads DAG files from a filesystem (or a Git sync sidecar in cloud deployments) and writes scheduled task instances into the DB. Workers read task instances from the DB (via Celery or directly), execute them, and write status back. The DB is also the bottleneck: at one Bengaluru e-commerce shop running 4,000 DAGs, scheduler latency went from 2 seconds to 4 minutes when the metadata DB hit 200 GB and task_instance queries started doing full scans. Operating Airflow at scale is mostly operating its Postgres.

Dagster's event log is the source of truth. Every materialisation, every check, every step start/end is an event in an append-only log. The asset graph is rebuilt from the event log on demand — "what is the latest version of orders_fact?" is answered by scanning events for the most recent successful materialisation. The event log is event-sourced, which makes lineage queries cheap (every read of every asset is in the log) but also makes the log itself the operational hot spot. Dagster Cloud uses Postgres + a managed event store; self-hosted Dagster typically runs on Postgres for both.

Prefect's API server is the source of truth. Workers in your VPC poll the API for work, execute flows, and stream state back. The API stores flow runs, task runs, and state transitions. The architectural distinction is that the control plane (the API) and the data plane (your workers) are explicitly separate — your code runs on your infra and only talks to Prefect over HTTPS, which makes the security and compliance story easier for regulated industries. Razorpay-tier fintechs running on Prefect cite this hybrid model as the reason they didn't pick Airflow's tighter coupling between scheduler and worker.

The trade-offs that fall out of the architectural choice are surprisingly large. Airflow's "the DB has everything" makes operational tooling trivial — you can write any custom dashboard or audit query as a SQL query against airflow.dag_run. Dagster's "the event log has everything" makes lineage and asset-history queries trivial but makes ad-hoc operational queries harder. Prefect's "the API has everything" makes integration trivial — every operation is an HTTP call, easy to script, easy to wrap in custom UIs — but introduces a network dependency between every worker and the control plane.

How each one handles the things that bite you at 3 a.m.

The honest comparison is not in the happy-path tutorial; it is in the failure modes. Here is what happens when the same things go wrong in each framework.

A task crashes mid-run. Airflow: the task instance is marked up_for_retry, the scheduler re-queues it after retry_delay, the next worker picks it up. Idempotency is your problem; the framework retries blindly. Dagster: the materialisation is marked failed, the run continues for unaffected assets, the user can re-materialise just the failed asset with dagster asset materialize --select failed_asset --partition <key>. Asset-level retry semantics. Prefect: the task is retried within the same flow run process, in-memory cache from earlier tasks is preserved, the flow continues. Process-level retry semantics — the entire flow does not need to restart.

Schema drift in the source. Airflow: the task succeeds (it ran SQL that ran cleanly), the downstream JOIN produces wrong data, the on-call discovers it tomorrow morning. Dagster: an @asset_check on the asset's columns can fail the materialisation; the data is not promoted. Prefect: the task succeeds; the user has to wire validation manually with pydantic or great_expectations and raise to fail the flow.

Backfill of a historical date. Airflow: airflow dags backfill -s 2026-04-01 -e 2026-04-15 upi_daily_report — the scheduler queues 15 runs in execution-date order. Dagster: dagster asset backfill --select upi_daily --partitions 2026-04-01..2026-04-15 — the framework knows the partition definition and runs them in dependency order, in parallel where possible. Prefect: loop in Python — for date in pd.date_range('2026-04-01', '2026-04-15'): upi_daily_report(date.strftime('%Y-%m-%d')) — there is no native backfill primitive because there is no native concept of "this flow's runs are partitioned by date".

Late-arriving upstream data. Airflow: sensors poll for the upstream file; the task is up_for_reschedule until the sensor sees the file. The DAG's data_interval_end does not slip — if the file arrives 4 hours late, the scheduled run waited 4 hours and finished 4 hours later. Dagster: FreshnessPolicy on the upstream asset triggers a re-materialisation when the upstream becomes stale; downstream assets pick up the new version. Prefect: the user writes a sensor pattern manually — typically a polling loop inside the flow.

A dependency between two pipelines. Airflow: ExternalTaskSensor reads the metadata DB to check whether the upstream DAG ran successfully on the matching execution date — fragile, prone to deadlock, the canonical "Airflow gotcha" that has burned every team that uses it. Dagster: the asset graph is the dependency graph; cross-pipeline dependencies are just asset references, no separate sensor needed. Prefect: flow A subscribes to flow B's completion event via Prefect's event-driven scheduling — works, but requires both flows to be in the same Prefect deployment.

The pattern: Airflow gives you primitives and expects you to compose them; Dagster gives you opinionated abstractions that handle common cases automatically; Prefect gives you Python and expects you to write whatever logic you need. Each is right for some teams and wrong for others. Why this matters more than feature checklists: a feature checklist always favours the framework with the most features, but the framework with the most features is also the framework with the most ways to use it wrong. Teams that pick Airflow for "it has every feature" often end up with a sprawl of incompatible operator versions and three different ways to express dependencies, while teams that pick Dagster for "it has fewer features but they all compose" end up with a smaller, cleaner codebase that handles 80% of cases automatically. The right framework is the one whose opinions match what your team actually wants to think about.

What goes wrong (real-world, the kind that makes the on-call cry)

The "everything is XCom" Airflow trap. New Airflow users return data from tasks (return df), which serialises through the metadata DB as XCom. This works for small data; it explodes when someone returns a 500MB DataFrame and the metadata DB starts to OOM. The Razorpay data platform team in 2023 had a 2-hour outage when a single DAG's XComs hit 12GB and Postgres ran out of WAL. The fix is to never return data through XCom — return file paths or table names, and have the next task read from there. Dagster's I/O managers solve the same problem at the framework level; Prefect users hit it less because flows pass values in-process by default.

The Dagster "every asset is materialised every run" misuse. A team migrating from Airflow declares 200 assets, schedules them all daily, and is surprised when the daily run takes 8 hours instead of 90 minutes. Dagster's strength — "the framework will rebuild whatever you ask for" — is its weakness when you ask for everything. The fix is partitioned assets and selective re-materialisation: auto_materialize_policy reactively rebuilds assets only when their upstreams change.

The Prefect dynamic-flow drift. A flow whose structure depends on its inputs (for shop in get_shops_from_db(): process(shop)) is impossible to introspect ahead of time. The Prefect UI shows you what ran, but cannot show you what would run. This is fine until a critical pipeline silently changes its structure (someone adds a row to the shops table) and the on-call cannot tell at a glance whether yesterday's run is comparable to today's. The fix is to materialise the structure into the flow definition (e.g., generate a separate sub-deployment per shop), trading dynamism for introspectability.

Mixing UTC and IST in schedules. All three frameworks default to UTC for schedule cron expressions. A team writing schedule="0 2 * * *" thinking they are scheduling 02:00 IST is actually scheduling 07:30 IST (UTC+5:30). The first time the DST-affected upstream system shifts one hour, the pipeline breaks. The fix in Airflow is timezone="Asia/Kolkata" on the DAG; in Dagster, a DailyPartitionsDefinition(timezone="Asia/Kolkata"); in Prefect, schedule=CronSchedule(cron="0 2 * * *", timezone="Asia/Kolkata"). Always set the timezone explicitly.

Coupling task code to framework imports. Airflow tasks that import from airflow.models import Variable cannot be unit-tested without spinning up Airflow. Dagster assets that take context as a positional argument cannot be unit-tested without a Dagster context. Prefect tasks decorated with @task carry framework state. The fix is to keep the business logic in plain Python functions (def transform(raw_df) -> pd.DataFrame) and use the framework decorator as a thin adapter (@task def transform_task(raw_df): return transform(raw_df)). The transform function gets unit-tested; the adapter gets integration-tested. Teams that ignore this rule end up with pipelines that work in production but cannot be tested in CI.

The "all three at once" anti-pattern. A team that adopts Airflow for legacy DAGs, then Dagster for the modern data platform, then Prefect for ad-hoc ML jobs ends up running three orchestrators with three on-call rotations and three sets of credentials. Each tool individually is fine; the combination has 3× the operational surface and 0 cross-tool lineage. The answer in 2026 is usually to pick one and migrate ruthlessly; if you must run two (e.g., legacy Airflow + new Dagster), draw an explicit boundary and never have a Dagster asset depend on an Airflow task — that direction lies dragons.

Common confusions

Going deeper

Airflow's history and why it is structurally what it is

Airflow was open-sourced by Airbnb in 2015 to replace their cron-based pipeline mess. The early design choices — DAGs as Python files, Postgres as the metadata DB, Celery as the executor — were pragmatic for 2015 and have been load-bearing for the next decade. The "task as the primitive" model fit a world where the orchestrator's job was to safely compose external operators (Hive, Spark, Presto, S3) — the orchestrator did not need to know about data, because the data lived elsewhere. The cost of those choices shows up at scale: the metadata DB is the bottleneck, the DAG-parsing-by-importing-Python is the security hole, and the lack of asset-awareness is the lineage gap. Airflow 2.0+ has tried to retrofit dataset-aware scheduling (Dataset events, data-driven scheduling), but the deep model is still task-centric. A pipeline written natively in Airflow 2.7 looks remarkably similar to one written in Airflow 1.10.

Dagster's "software-defined assets" and the dbt influence

Dagster's central abstraction — software-defined assets — was directly influenced by the dbt model: a project is a graph of materialised tables, and the build tool's job is to determine which subset to rebuild based on what changed. Dagster generalises that beyond SQL: an asset can be a Parquet file in S3, a feature in a feature store, an ML model in a model registry, a Looker dashboard. The framework's cross-asset lineage works across all of them, and the @asset_check mechanism lets each asset declare its own contract. The result: Dagster is the orchestrator that thinks like a build tool. The trade-off is verbosity — you declare more things up front, which feels slow until you realise how much downstream tooling derives from those declarations.

Prefect 2.x and the architectural rewrite

Prefect 1.x was an Airflow-style scheduler with a different syntax. Prefect 2.x (released 2022) was a near-total rewrite around the "flow as a Python function" idea — the orchestrator stepped further back and let the user's Python be the source of truth. The trade-offs were debated heatedly: Prefect 2.x dropped some Prefect 1.x features (DaskExecutor, certain deployment modes) and introduced a steeper learning curve for teams used to declarative DAGs. The 2.x architecture has settled by 2026: workers, work pools, deployments, and the API-first control plane are now the standard, and the dynamic-flow model is the right shape for ML and conditional pipelines. Teams running heavy SQL/dbt workloads still tend to land at Dagster; teams running Python-heavy ML pipelines often pick Prefect.

What changes in 2026: AI-generated pipelines and the framework race

By 2026, all three frameworks have AI-pipeline-generation tools (Airflow's airflow ai, Dagster's dagster ai, Prefect's prefect ai) that take a natural-language description and emit a pipeline. The quality of the generated code is roughly equal across frameworks; the differentiator is which framework's generated code is maintainable by a human afterwards. Dagster's typed assets and signature-based dependencies generate cleanly; Prefect's flow code reads like a normal Python function; Airflow's generated DAGs tend to overuse operators where a PythonOperator would do. The longer-term trend is that the orchestrator is becoming a deployment target for AI-generated code — and the frameworks that produce the most reviewable code, not the most code, win this race.

How a Bengaluru fintech team picks one in 2026

Aditi runs the data platform at a Bengaluru lending startup. She inherited Airflow when she joined in 2023; in early 2026 the team migrated to Dagster. The forcing function was lineage: the credit-risk team needed column-level lineage from the source CDC stream all the way to the model features, and Airflow's task-level lineage could not deliver it. Dagster's asset graph + dbt integration gave them the answer in two weeks. The migration cost was 4 engineers × 5 months for ~80 pipelines. The decision document Aditi wrote for the CTO listed five questions the team answered before committing: (i) What does our team think a pipeline is — a sequence of operations or a tree of tables? (ii) What fraction of our pipelines need column-level lineage? (iii) How do we currently handle backfills, and is the manual effort acceptable? (iv) Do our pipelines have dynamic structure (varying with input)? (v) What is our 5-year migration tolerance? The answers steered toward Dagster. A different team — same scale, but ML-heavy — would likely have ended up at Prefect. A team with 12 years of Airflow muscle memory and 3,000 DAGs would likely have stayed.

Where this leads next

The choice of orchestrator is the most consequential platform decision a data team makes after picking the warehouse. It determines what 3 a.m. pages look like for the next five years, what new engineers learn first, and how cleanly the data team's work composes with the rest of the engineering org. Build 5 picks up from here — once the orchestrator is in place, the next layer is lineage, observability, and data contracts, and each framework has a different story for how those layers attach.

The deeper lesson is that orchestrators are not interchangeable Lego bricks. Each one has a thesis about what a pipeline is, and the thesis shapes everything downstream. A team that picks the framework whose thesis matches their actual workload writes less code, ships more reliably, and pages less. A team that picks the wrong one fights the framework on every other change. The honest cost of getting this wrong is not measured in CPU or licences — it is measured in the velocity of every data engineer on the team, every day, for years.

References

  1. Airflow architecture overview — official documentation for the scheduler, executor, and metadata DB model.
  2. Dagster: Software-defined assets — the asset model, freshness policies, and asset checks.
  3. Prefect 2 architecture — work pools, workers, and the API-first control plane.
  4. Maxime Beauchemin, "The Rise of the Data Engineer" (2017) — the Airbnb blog post that contextualises Airflow's design choices.
  5. Nick Schrock, "Introducing software-defined assets" — the Dagster founder's case for asset orchestration.
  6. Prefect: from 1.0 to 2.0 — design rationale — the rewrite explainer covering why dynamic graphs replaced static DAGs.
  7. Writing a DAG executor in 200 lines — chapter 23, the from-scratch build that exposes what the three frameworks share underneath.
  8. Backfills: re-running history correctly — chapter 24, the operation that exposes each framework's approach to historical re-processing.