In short

After parsing, binding, and optimisation you are holding a physical plan — a tree of concrete algorithms (SeqScan, Filter, Project, NestedLoopJoin). The tree does not run itself. Something has to walk it and produce rows. Graefe's Volcano (1994) answered with the iterator model: every operator implements three methods — open() prepares state, next() returns the next tuple or a sentinel meaning "done", close() releases resources. Operators store their children as fields. When a parent's next() is called, it calls next() on its children, does per-tuple work (filter, transform, join), and returns a tuple up to its caller. Execution is pull-based: the consumer drives the producer by demanding the next row. Every operator holds at most O(1) tuples, which makes execution streaming — a million-row query uses the same memory as a ten-row query. The model has one famous weakness: calling next() billions of times through virtual dispatch is slow, which vectorised engines (DuckDB) and compiled engines (HyPer) fix in different ways. This chapter builds the classic iterator model in Python — Scan, Filter, Project, NestedLoopJoin — and traces a real query through it tuple by tuple.

The optimiser has finished. Predicates have been pushed down, joins reordered, and each logical operator replaced with a physical algorithm. You are holding a tree like this:

Project(exprs=[u.name, o.total],
  child=NestedLoopJoin(predicate=(u.id == o.user_id),
    left=SeqScan('users'),
    right=Filter(predicate=(o.total > 1000), child=SeqScan('orders'))))

Nothing has executed. No pages read, no rows compared, no results produced. Something has to walk the tree and actually do it. That something is the executor, and the design of the executor is the subject of this chapter.

The question is: what interface should every operator expose so the executor can walk the tree generically, without special cases per operator type? Graefe's 1994 answer — in Volcano — An Extensible and Parallel Query Evaluation System — was three methods: open, next, close. Thirty years later Postgres, SQLite, MySQL, SQL Server, Oracle, and virtually every row-based database use some version of it. Volcano is the grandparent of every query executor you will read.

The three-method protocol

Every operator, no matter what it does internally, exposes the same three methods:

The three-method protocol: open, next, closeThree boxes arranged horizontally. First box labelled open() with description 'allocate state, open children, position cursors'. Second box labelled next() with description 'produce one tuple OR return EOF'. Third box labelled close() with description 'free state, close children'. Arrows show a typical call sequence: one call to open(), then many calls to next() each returning one tuple until EOF, then one call to close().open()called onceallocate state,open children,position cursors,build hash tablesnext()called many timesreturn the next tupleOR return EOFsentinel when thestream is exhaustedclose()called oncefree state,close children,unpin pages,release memoryThe whole protocol is three methods. Every operator implements them. The executor calls them.open() once, next() until EOF, close() once. That is the whole interface.
The Volcano three-method protocol. Every physical operator — Scan, Filter, Join, Aggregate — implements these three methods. Execution of a query is: open() the root, next() repeatedly until EOF, close() the root.

In Python, the protocol is an abstract base class with three methods:

# execute/operator.py
from abc import ABC, abstractmethod
from typing import Iterator

# A Tuple is just a dict from column name to value, for this teaching engine.
# Real systems use a packed byte layout for cache efficiency.
Tuple = dict[str, object]

# The EOF sentinel. When next() has nothing left, it returns this.
EOF = None

class Operator(ABC):
    @abstractmethod
    def open(self) -> None: ...

    @abstractmethod
    def next(self) -> Tuple | None: ...

    @abstractmethod
    def close(self) -> None: ...

Why three methods and not two (say, just next() with lazy setup)? Because some operators need a bulk prepare step that cannot happen inside next(). A hash join builds its hash table from the entire right input before returning the first result; an aggregate reads its entire input before emitting the first group. open() is where that bulk work lives. Conversely, close() is where you release the buffer-pool page pins and file handles; without a dedicated close, operators would leak those on partial consumption (LIMIT 10 stops calling next() after ten tuples — the tree still needs to be torn down).

Why return None rather than raise StopIteration like Python iterators do: exceptions for control flow are slow, and the executor will call next() hundreds of millions of times in a real query. A sentinel return value is one pointer comparison; an exception is a full unwind. Every production engine uses a sentinel (Postgres returns a null TupleTableSlot, SQLite returns SQLITE_DONE), not an exception.

Each algebra operator as an iterator

Each logical operator from the previous chapter has at least one physical iterator implementation. Some have many — a logical Join might become a NestedLoopJoin, a HashJoin, or a MergeJoin depending on the optimiser's cost model. For this chapter, build the simplest form of each and use them to run real queries.

Scan

The leaf of every tree. A Scan reads rows from a base table, one at a time. In a real system it goes through the buffer pool (built in Build 5) and reads page-at-a-time; for this teaching iterator, assume rows are already in a Python list and return them one by one.

# execute/scan.py
from .operator import Operator, Tuple

class SeqScan(Operator):
    """Sequential scan: read every row of a table in storage order."""

    def __init__(self, table_name: str, rows: list[Tuple]):
        self.table_name = table_name
        self._rows = rows                # in a real system, iterate pages
        self._cursor = 0                 # current position

    def open(self) -> None:
        self._cursor = 0                 # rewind (supports re-open for NLJoin)

    def next(self) -> Tuple | None:
        if self._cursor >= len(self._rows):
            return None                  # EOF
        row = self._rows[self._cursor]
        self._cursor += 1
        return row

    def close(self) -> None:
        self._cursor = 0                 # nothing else to release in this toy

Fifteen lines. open() sets the cursor. next() returns the next row or None. close() is trivial here but in a real system would unpin the current buffer-pool page. The _rows list is a stand-in for the real scan cursor; in chapter 47 of Build 5, this is where your HeapFile iterator plugs in.

Filter

Keep rows where a predicate holds, drop the rest. The filter holds a child operator and a predicate function.

# execute/filter.py
from .operator import Operator, Tuple

class Filter(Operator):
    """σ — keep rows where predicate(row) is true."""

    def __init__(self, child: Operator, predicate):
        self.child = child
        self.predicate = predicate       # a callable: Tuple -> bool

    def open(self) -> None:
        self.child.open()

    def next(self) -> Tuple | None:
        while True:
            row = self.child.next()
            if row is None:
                return None              # child exhausted → we are too
            if self.predicate(row):
                return row               # pass this one up
            # otherwise loop: fetch the next candidate from child

    def close(self) -> None:
        self.child.close()

Why the while loop inside next(): a filter has to keep asking its child for rows until it finds one that passes the predicate. If the child returns ten rows of which only the third matches, the filter calls child.next() three times inside a single next() call of its own. Only when the child returns EOF does the filter return EOF too. This is the "pull" part of "pull-based execution" — the filter pulls rows from its child until it has something to return.

Project

Keep only the named columns (or compute expressions). Like Filter, it holds a child and forwards open/close.

# execute/project.py
from .operator import Operator, Tuple

class Project(Operator):
    """π — compute each output expression from the input row."""

    def __init__(self, child: Operator, exprs: list, output_names: list[str]):
        self.child = child
        self.exprs = exprs               # each callable: Tuple -> value
        self.output_names = output_names

    def open(self) -> None:
        self.child.open()

    def next(self) -> Tuple | None:
        row = self.child.next()
        if row is None:
            return None
        return {name: e(row) for name, e in zip(self.output_names, self.exprs)}

    def close(self) -> None:
        self.child.close()

Unlike Filter, Project never loops: every input row produces exactly one output row. The loop would be wasted work.

NestedLoopJoin

The simplest join algorithm and the one that works no matter what. For every row of the outer (left) input, rewind the inner (right) input and scan it from the start, emitting pairs that satisfy the join predicate. The chapter on join algorithms will build better ones; this is the baseline every database has as a fallback.

# execute/nljoin.py
from .operator import Operator, Tuple

class NestedLoopJoin(Operator):
    """⋈ — for each outer row, scan inner and emit matching pairs."""

    def __init__(self, left: Operator, right: Operator, predicate):
        self.left = left
        self.right = right
        self.predicate = predicate       # Tuple x Tuple -> bool
        self._outer: Tuple | None = None # current outer row

    def open(self) -> None:
        self.left.open()
        self.right.open()
        self._outer = self.left.next()   # load the first outer row

    def next(self) -> Tuple | None:
        while self._outer is not None:
            inner = self.right.next()
            if inner is None:            # right exhausted for this outer row
                self._outer = self.left.next()
                if self._outer is None:
                    return None          # both exhausted
                self.right.open()        # rewind inner for the new outer
                continue
            if self.predicate(self._outer, inner):
                return {**self._outer, **inner}   # merged row
        return None

    def close(self) -> None:
        self.left.close()
        self.right.close()

Why the outer is loaded in open() rather than on the first next(): symmetry with inner cursor handling, and next()'s loop assumes self._outer already holds a candidate. An alternative design holds a flag _first_call and does the outer fetch in next(); both work. The Volcano paper uses the open() approach for clarity.

Why self.right.open() is called again inside the loop: each time the inner stream is exhausted for the current outer row, it must be rewound so the next outer row can pair with every inner row from the start. This is why open() on a SeqScan resets the cursor — so a NestedLoopJoin can legitimately re-open its inner child. Operators that cannot be cheaply re-opened (a hash join's build phase) need a different join algorithm, which is chapter 46's subject.

That is it. Four operators — SeqScan, Filter, Project, NestedLoopJoin — about fifteen lines each, and you can already run any SELECT with equi- or theta-joins. The brevity is not an accident; it is the whole point of the iterator model.

Pull-based execution — next() cascades upward

The execution model is called pull-based because the consumer drives the producer. The user's code (or the wire-protocol layer sending rows to the client) calls next() on the root of the tree. The root calls next() on its children, which call next() on their children, all the way down to the leaves. When the leaves return a row, it propagates up, transformed at each level by whatever the operator does.

Tuples flowing up through an operator treeA tree of operators. At the top is Project. Below it is NestedLoopJoin with two children: SeqScan of users on the left, and Filter on the right. The Filter's child is SeqScan of orders. Downward arrows labelled 'next() call' go from parent to child along every edge. Upward arrows labelled 'tuple returned' go from child to parent along every edge. Next to the Filter node is a small note saying 'may call next() several times before returning one'. Below the tree is a caption about the cascade.Project[u.name, o.total]next()tupleNestedLoopJoinu.id == o.user_idnext()next()SeqScan(users)leafFiltero.total > 1000loops internallyuntil matchSeqScan(orders)leafdown-arrows (accent)next() calls flow from parent to childup-arrows (rule)tuples flow from child to parent
Tuples flow up, next() calls flow down. The consumer at the top drives the producer at the bottom. Filter is the only operator that may call next() on its child several times before returning one tuple of its own — everyone else is one-in, one-out.

A query executes like this:

# execute/run.py

def run(root: Operator) -> list[Tuple]:
    root.open()
    results = []
    while True:
        row = root.next()
        if row is None:
            break
        results.append(row)
    root.close()
    return results

That is the entire executor. Eight lines of top-level code, and every operator's three methods. The whole complexity of running SQL lives inside the operator implementations — the driver is trivial.

Worked query, traced through

Set up two tiny tables and run a real query end-to-end. Watch what each operator does.

Running SELECT u.name, o.total FROM users u, orders o WHERE u.id = o.user_id AND o.total > 1000

Two tiny tables and the plan the optimiser produced (with the filter already pushed below the join):

users = [{'id':1,'name':'Dipti'}, {'id':2,'name':'Raghav'}, {'id':3,'name':'Priya'}]
orders = [{'order_id':101,'user_id':1,'total': 500},
          {'order_id':102,'user_id':1,'total':2500},
          {'order_id':103,'user_id':2,'total':1200},
          {'order_id':104,'user_id':3,'total': 900}]

plan = Project(
  child=NestedLoopJoin(
    left=SeqScan('users', users),
    right=Filter(lambda r: r['total'] > 1000, SeqScan('orders', orders)),
    predicate=lambda l, r: l['id'] == r['user_id']),
  exprs=[lambda r: r['name'], lambda r: r['total']],
  output_names=['name','total'])

print(run(plan))

Trace the first next() at the root:

  1. Project.next()NestedLoopJoin.next(). The NLJ already has _outer = {id:1,name:'Dipti'} from open(). Call Filter.next().
  2. Filter.next()SeqScan(orders).next(){order_id:101, user_id:1, total:500}. Predicate false. Loop. SeqScan.next(){order_id:102, user_id:1, total:2500}. Predicate true. Return.
  3. Back in NLJ: outer id=1, inner user_id=1. Join predicate true. Return merged {id:1,name:'Dipti',order_id:102,user_id:1,total:2500}.
  4. Project keeps name, total and returns {name:'Dipti', total:2500}.

Driver appends, calls next() again. Filter drops order 103's 1200? No — 1200 > 1000, so it passes. But the join predicate 1 == 2 fails; loop. Order 104 fails the filter. Filter returns None. NLJ advances outer to Raghav, rewinds inner via right.open(). Orders 101, 102 filtered/join-fail; order 103 passes both and returns {name:'Raghav', total:1200}. Raghav has no more qualifying orders. Advance outer to Priya. Her only order is 900, filtered out. No joins. Outer exhausts. Root returns None.

Final result:

[{'name': 'Dipti',  'total': 2500},
 {'name': 'Raghav', 'total': 1200}]

At no point did the engine hold more than one outer row, one inner candidate, and one output tuple. Even if the tables held a million rows each, that footprint does not grow. Iterator-model execution is streaming, and this is its single biggest practical virtue.

Pros and cons

The iterator model has stuck around for thirty years because its pros run deep. Its cons are real too, and they are what the last decade of query-engine research has been fixing.

Pros:

Cons:

For short OLTP queries on small data, the pros dominate — the query is I/O bound, not next()-bound. This is why every OLTP database on earth uses the iterator model. The cons only bite on analytical queries over hundreds of millions of rows, and that is exactly where vectorised and compiled engines have taken over.

Common confusions

Going deeper

You have an executor that runs any query the optimiser produces. What follows is the architectural fork in the road: the iterator model's two modern successors — push-based with code generation, and vectorised — both of which arose because the iterator model does not keep up with modern CPUs on analytical workloads.

Push-based execution with code generation (HyPer, Umbra)

In the pull model, every operator implements next() and is called by its parent. In the push model, every operator implements consume(row) and calls its parent. The producer drives; the consumer reacts.

The trick is that push interacts beautifully with code generation: when a scan produces a row, the calls to its parent's consume, its grandparent's consume, and so on all the way up to the root, can be inlined into a single compiled function. No virtual dispatch because the whole tree is one straight-line function. This is the idea behind HyPer (Neumann 2011): emit LLVM IR at query time, compile to native code, run. On analytical workloads that gives 5–10× over an interpreted iterator model, because data stays in CPU registers across operator boundaries instead of being spilled to the stack on every next().

Vectorised execution — batch next() (MonetDB X100, DuckDB)

The vectorised approach, first in MonetDB X100 (Boncz et al. 2005) and now the bedrock of DuckDB, keeps the iterator model but changes the unit: next() returns a vector — a fixed-size batch (usually 1024 or 2048 values) laid out columnwise. Each next() does N rows of work, amortising the virtual call. The inner loop is a tight scan over N columnar values, which the compiler can auto-vectorise to SIMD — on an AVX-512 CPU, a predicate over 1024 floats takes ~128 cycles instead of ~5,000 scalar. Build 15 of this curriculum implements this on top of the same algebra.

Morsel-driven parallelism

A third evolution — morsel-driven execution (Leis et al. 2014) — replaces Volcano's exchange operators with a thread pool that grabs small morsels (~10K rows) and runs them through the plan. Excellent load balancing, no partitioning overhead. Umbra and DuckDB both use variants.

Production iterator models

The iterator model is not dead. It is the default that every new engine has to argue against, and even engines that have argued against it keep a fallback iterator path for operators that don't fit the vectorised or compiled mould.

Where this leads next

You now have a working executor. Four operators, three methods each, and the whole of SQL's physical execution laid out as a recursive next() cascade. The next chapters extend the model with serious join algorithms.

Every operator you add plugs into this three-method protocol. The protocol is the stable interface; the operators behind it are where the ideas change. Volcano's elegance is exactly this: the surface is the same for everyone, from a one-day-old SeqScan to a fully tuned ParallelGraceHashJoin. The tree composes. The executor stays eight lines.

References

  1. Graefe, Volcano — An Extensible and Parallel Query Evaluation System, IEEE TKDE 6(1), 1994 — the founding paper. Introduces the open/next/close protocol and the exchange operator for parallel execution. Still worth reading cover-to-cover.
  2. Neumann, Efficiently Compiling Efficient Query Plans for Modern Hardware, VLDB 2011 — the HyPer paper, which introduced the push-model-plus-code-generation approach. If you ever wondered why modern engines emit LLVM IR, start here.
  3. Boncz, Zukowski, Nes, MonetDB/X100: Hyper-Pipelining Query Execution, CIDR 2005 — the original vectorised-execution paper. The ideas here became DuckDB's entire execution layer twenty years later.
  4. Leis, Boncz, Kemper, Neumann, Morsel-Driven Parallelism: A NUMA-Aware Query Evaluation Framework for the Many-Core Age, SIGMOD 2014 — how modern parallel executors schedule work. The successor to Volcano's exchange operator.
  5. PostgreSQL source tree, execProcnode.c — the dispatch routine ExecProcNode that calls each operator's next() equivalent. Reading this plus one or two operator files (e.g. nodeSeqscan.c, nodeNestloop.c) is the quickest tour of a real iterator model in production.
  6. Kemper, Neumann, HyPer: A Hybrid OLTP&OLAP Main Memory Database System Based on Virtual Memory Snapshots, ICDE 2011 — the broader HyPer architecture paper, which sets the code-generation executor in its engine context. Pairs with reference 2.