In short

The SQL parser hands the optimiser a logical algebra tree built from the classical operators — σ (selection), π (projection), (join), γ (group-by), (union), Scan (leaf). That tree is correct but almost always slow: it reflects whatever shape the programmer typed, not what the engine can execute cheaply. The optimiser's first job is to rewrite this tree into an equivalent one — same rows, same columns, same answer — that is cheaper to run. Rule-based rewrites do that using pure syntactic pattern matches on the tree, driven by the equivalence laws of relational algebra. No statistics. No cost model. No data. Just "this shape of subtree is provably equivalent to that shape, and that shape is cheaper."

The two rules you must know are predicate pushdown — push σ below , below π, below — and projection pushdown, which does the same for π. A third family, join commutativity and associativity, exposes alternative join orders for a cost-based layer to choose among. These rewrites routinely cut I/O by 10×–100× because they shrink the rows and columns flowing between operators before expensive work like a hash join or an external sort runs on them. A 10 GB join that the parser placed above a filter becomes a 500 MB join once the filter moves below it.

Every production optimiser ships a rule pack. Apache Calcite has roughly 120 rewrite rules in its catalogue; PostgreSQL has them hardcoded across planner.c and friends; Spark's Catalyst models each rule as an immutable Rule[LogicalPlan] tree transformation. They all drive a fixed-point loop — apply every rule to the tree, keep applying until no rule changes anything, stop. That is the safe, deterministic half of the optimiser. Everything statistical — join reordering for real, physical operator choice, index selection — comes after, in the next chapter.

You write this query against an OLTP-scale warehouse replica:

SELECT u.name
FROM users u
JOIN orders o ON u.id = o.user_id
WHERE u.country = 'IN' AND o.year = 2026;

users has ten million rows, orders has a hundred million, and the join is on an indexed equality. A parser that simply translates SQL into algebra, one clause at a time, produces this logical tree:

π_{u.name}(
  σ_{u.country='IN' ∧ o.year=2026}(
    users ⋈_{u.id = o.user_id} orders
  )
)

Read it bottom-up the way the executor would: scan every row of users, scan every row of orders, join them on the key (10^8 output rows), then filter the 10^8 rows down to the ones where country = 'IN' and year = 2026, then project u.name. The join runs on the unfiltered inputs. Assume hash join on sorted-ish data; the join alone reads 10 GB of order pages and does a ten-million-row build on users. Wall-clock: minutes.

Now look at the same tree with predicates pushed down:

π_{u.name}(
  σ_{u.country='IN'}(users) ⋈_{u.id = o.user_id} σ_{o.year=2026}(orders)
)

The filters run first, on their respective base tables, independently. If country = 'IN' keeps 10% of users and year = 2026 keeps 5% of orders, the join now sees a one-million-row build side and a five-million-row probe side — a twenty-fold shrinkage. With an index scan on (country) and (year) the filters don't even read the full tables. Wall-clock: seconds.

Both trees produce the same rows. Both satisfy the SQL semantics. The second tree is the first tree after the optimiser applied two rules from the relational algebra equivalence laws. No statistics were consulted to produce it. That is the whole game of rule-based rewriting: use algebra to make the tree cheaper, before the cost-based layer gets involved.

Relational algebra, briefly

Five operators cover everything the parser emits. Every dialect of SQL compiles into some variant of this set, and every planner works by rewriting trees over it.

symbol name reads like
σ_c(R) selection "rows of R where predicate c holds"
π_A(R) projection "rows of R with only the columns in A"
R ⋈_c S (theta-)join "pairs (r, s) where c(r, s) holds"
γ_{G, F}(R) group-by-aggregate "rows of R grouped by G, aggregated by F"
R ∪ S, R ∩ S, R − S set operations usual meaning

Scan(T) is the leaf — read relation T off disk. Every SQL query ends up as some tree of these over Scan leaves.

The operators have equivalence laws. These are theorems of set theory (or multiset theory for SQL bag semantics); the optimiser doesn't invent them, it just applies them. The five laws that carry most of the weight:

\sigma_c(\sigma_d(R)) \equiv \sigma_{c \wedge d}(R) \qquad \text{(cascade of selections)}
\sigma_c(R \bowtie S) \equiv \sigma_c(R) \bowtie S \qquad \text{when } c \text{ references only columns of } R
\sigma_c(R \cup S) \equiv \sigma_c(R) \cup \sigma_c(S) \qquad \text{(distribution over union)}
\pi_A(R \bowtie S) \equiv \pi_A\bigl(\pi_{A \cap R_{cols} \cup J}(R) \bowtie \pi_{A \cap S_{cols} \cup J}(S)\bigr) \quad J = \text{join columns}
(R \bowtie S) \bowtie T \equiv R \bowtie (S \bowtie T) \qquad \text{(associativity, inner joins)}

Why these are called equivalences and not transformations: each ≡ says "these two trees produce the same multiset of rows on every possible input." The optimiser is free to substitute either side for the other without consulting data, without running statistics, without cost analysis. That is what makes rule-based rewrites safe. The next chapter's cost-based rewrites are different — they choose among equivalent trees using estimated cardinalities, and a bad estimate produces a legal plan that runs very slowly. Rule-based rewrites cannot do that. If a rewrite changes the answer, the rewrite is a bug, not a plan choice.

Why join commutativity R ⋈ S ≡ S ⋈ R is not in the list above: the list is the rules that are always applied in the fixed-point loop. Commutativity is a rule that exposes alternative plans to the cost-based layer rather than committing to one shape. Rule-based rewriting leaves commutativity to the physical planner (it chooses build vs probe based on size) and the enumeration-based join-order search (the subject of the next chapter).

There are more — distribution of selection over union, pushdown past aggregation, set-operation simplification — and together they form the rule pack. Calcite's public catalogue is ~120 rules; a small, self-contained one like CockroachDB's opt/norm package has ~60. PostgreSQL's are spread across four or five files and are harder to count, but the number is in the same ballpark. You do not need to memorise them. You need to understand the three or four archetypes and recognise the rest as variations on them.

The rule engine — a fixed-point loop

A rule is a function from a subtree to a rewritten subtree. The engine walks the tree, tries every rule at every node, and keeps iterating until nothing changes. Here is the whole thing:

# query/opt/rules.py
from dataclasses import dataclass
from typing import Callable, Any

Tree = Any                                  # a Node of the plan tree
Rule = Callable[[Tree], Tree]               # identity if the rule doesn't fire

def optimise(tree: Tree, rules: list[Rule]) -> Tree:
    """Apply rules until the tree reaches a fixed point."""
    changed = True
    while changed:
        changed = False
        for rule in rules:
            new_tree = rule(tree)
            if new_tree is not tree:        # identity check: did the rule fire?
                tree = new_tree
                changed = True
    return tree

Fifteen lines. The is not identity check is the engine's contract: a rule returns the input tree unchanged (same object reference) when the pattern doesn't match, and returns a new tree when it does. Fixed-point termination is guaranteed as long as every rule either leaves the tree alone or replaces it with a strictly "smaller" tree under some termination ordering — which is true for pushdown rules (they lower the total node-height of predicates) and for simplification rules. Rules that create infinite loops (commutativity applied mindlessly — swap R ⋈ S for S ⋈ R, swap back, swap again) must be excluded from the fixed-point loop and invoked by a different planner (see the Volcano/Cascades discussion in the Going Deeper section).

Rules come in two traversal flavours. A top-down rule applies at the root and recurses into children only if the root didn't fire. A bottom-up rule recurses first, applies to children, then to the parent. Predicate pushdown is naturally top-down — you find a σ and push it into its child, then recurse into the new child to keep pushing. Projection pushdown is naturally bottom-up — you compute the set of columns each subtree produces, which depends on children, then decide what to project at each node.

def push_down_selection(node: Tree) -> Tree:
    """Top-down: try to push σ past its child."""
    if isinstance(node, Select):
        pushed = _try_push(node)            # might return node unchanged
        if pushed is not node:
            return pushed
    # recurse into children (skip the rewritten case — caller will re-enter)
    return node.map_children(push_down_selection)

Apache Calcite's HepPlanner is this fixed-point engine: it applies rules one at a time in a user-chosen order until the tree is a fixed point. It is the "rule-based" half of Calcite; the "cost-based" half is VolcanoPlanner, which instead enumerates all rule applications and picks the cheapest. PostgreSQL does not have an explicit rule engine — the rewrites are scattered as imperative passes across src/backend/optimizer/prep/ and src/backend/optimizer/plan/preptlist.c. Spark Catalyst does have one: each rule is an object extends Rule[LogicalPlan] and a RuleExecutor runs them in batches to fixed point. The abstraction is the same across engines; only the packaging differs.

Predicate pushdown — the flagship rule

This is the rewrite that single-handedly saves most queries. Start with the tree from the opening example:

π_{u.name}(
  σ_{u.country='IN' ∧ o.year=2026}(
    users ⋈_{u.id = o.user_id} orders
  )
)

Predicate pushdown proceeds in two sub-steps.

Step 1: split conjunctions. The law σ_{a ∧ b}(X) ≡ σ_a(σ_b(X)) says a selection over a conjunction can be unfolded into a tower of selections. After this step:

π_{u.name}(
  σ_{u.country='IN'}(
    σ_{o.year=2026}(
      users ⋈_{u.id = o.user_id} orders
    )
  )
)

Two independent selections, each referencing columns from just one side of the join.

Step 2: push each selection past the join. The law σ_c(R ⋈ S) ≡ σ_c(R) ⋈ S applies when c only references R's columns. u.country lives in users, so σ_{u.country='IN'} pushes into the left child. o.year lives in orders, so σ_{o.year=2026} pushes into the right child. After:

π_{u.name}(
  σ_{u.country='IN'}(users) ⋈_{u.id = o.user_id} σ_{o.year=2026}(orders)
)

Each filter now sits directly above its own base table, which means the executor can push it even further — into the table scan itself (an index scan with country = 'IN', or a scan with filter that evaluates the predicate on each page during read). The join receives the already-filtered subsets as its inputs.

Predicate pushdown: before and afterTwo query plan trees side by side. Left tree labelled 'before' shows projection at top, then a single selection on a conjunction, then a join, then two scans. Right tree labelled 'after' shows projection at top, then a join, with each child being a selection over a scan. Arrows indicate the transformation. Predicate pushdown: a single rule, a 20× I/O win BEFORE AFTER π_{u.name} σ_{u.country='IN' ∧ o.year=2026} ⋈_{u.id=o.user_id} Scan(users) Scan(orders) join sees 10^7 × 10^8 = all rows join output: 10^8, then filter kills 99.5% π_{u.name} ⋈_{u.id=o.user_id} σ_{u.country='IN'} Scan(users) σ_{o.year=2026} Scan(orders) each filter runs on its base table join sees 10^6 × 5·10^6 — 20× shrunk split ∧, push past ⋈ Rule 1: σ_{a∧b}(X) ≡ σ_a(σ_b(X)) — split conjunctions. Rule 2: σ_c(R ⋈ S) ≡ σ_c(R) ⋈ S, when c only references columns of R. Join input shrinks from 10 GB to 500 MB; total I/O drops roughly 20×.
Before: the join runs on unfiltered ten-million × hundred-million-row inputs. After: two predicates have been split from the conjunction and pushed past the join into their respective base-table scans. The join sees the already-filtered inputs. The trees are provably equivalent by the selection-over-join law.

Here is the rewrite as a Python rule, pattern-matching against an AST-like Node:

# query/opt/push_selection.py
from dataclasses import dataclass, replace

def push_selection_past_join(node):
    """σ_c(R ⋈ S) -> σ_{c|R}(R) ⋈ σ_{c|S}(S) ⋈ σ_{c|both}(...)."""
    if not (isinstance(node, Select) and isinstance(node.child, Join)):
        return node                                          # identity
    join = node.child
    conjuncts = split_conjunction(node.predicate)            # [u.country='IN', o.year=2026]
    left_preds, right_preds, join_preds = [], [], []
    for c in conjuncts:
        cols = referenced_columns(c)
        if cols.issubset(join.left.schema):   left_preds.append(c)
        elif cols.issubset(join.right.schema): right_preds.append(c)
        else:                                  join_preds.append(c)    # keep above join
    new_left  = Select(and_(left_preds),  join.left)  if left_preds  else join.left
    new_right = Select(and_(right_preds), join.right) if right_preds else join.right
    new_join  = replace(join, left=new_left, right=new_right)
    return Select(and_(join_preds), new_join) if join_preds else new_join

The structure is worth staring at. You split the predicate into three buckets by which side's columns each conjunct references. Predicates referencing only the left push left; only the right push right; predicates that touch both (say u.region = o.ship_region) stay above the join because they need columns from both inputs to evaluate. The reconstructed tree has (up to) three selections — one on each child, one above — and any of them can be None if its bucket is empty.

Run this rule at every Select-over-Join node in the tree and the fixed-point loop will keep applying it until all predicates have settled into the subtree closest to the scan they belong to. With the iterator model from the previous chapters, this means each scan's next() evaluates the filter row by row — or better, the scan fuses the filter into itself and uses an index.

Why pushing past π is almost always a win: a projection throws away columns. If the projection sits above a selection and the selection only needs columns the projection keeps, pushing the selection above the projection makes the selection redundant (or at least no cheaper). But pushing the selection below the projection lets the projection see a smaller input, which is free for row stores and a significant win for column stores (unread columns never leave disk). Either way, moving the selection downward is never worse.

Why you cannot always push past an aggregation: σ_c(γ_{G, F}(R)) is not in general equal to γ_{G, F}(σ_c(R)). If the predicate c is on a grouped column (say region = 'south'), you can push it — pre-filtering is safe because rows with region ≠ 'south' would have been grouped into groups you're about to throw away. If the predicate is on an aggregated column (say SUM(total) > 1000, a HAVING clause), you cannot push it — you need to finish aggregating before you can check the aggregate. This is why SQL distinguishes WHERE (pushable) from HAVING (not). The optimiser codifies this as a side condition on the σ_c(γ(R)) rewrite.

Projection pushdown

The same idea, for π. Columns you are not going to use should not travel up the tree. The law:

\pi_A(R \bowtie_J S) \equiv \pi_A\bigl(\pi_{(A \cap R_{cols}) \cup J}(R) \bowtie_J \pi_{(A \cap S_{cols}) \cup J}(S)\bigr)

Read: if the top wants columns A, then each side only needs to carry the columns from A that it owns, plus the join columns J (which are needed to perform the join itself). Anything else can be projected away before the join.

In a row store this saves CPU and memory per tuple — smaller rows, fewer cache misses, smaller hash buckets. In a column store it saves entire columns worth of disk I/O: if orders has sixty columns and only three are needed downstream, the other 57 never leave the storage engine. For Parquet or ORC files, projection pushdown is what makes analytical queries fast — the scan only reads column stripes for projected columns.

The rule:

def push_projection_into_join(node):
    """π_A(R ⋈_J S) -> π_A(π_{A∩R ∪ J}(R) ⋈_J π_{A∩S ∪ J}(S))."""
    if not (isinstance(node, Project) and isinstance(node.child, Join)):
        return node
    j = node.child
    A = set(node.columns)
    J = referenced_columns(j.predicate)      # columns the join itself needs
    left_needed  = (A & j.left.schema)  | (J & j.left.schema)
    right_needed = (A & j.right.schema) | (J & j.right.schema)
    new_left  = Project(left_needed,  j.left)  if left_needed  < j.left.schema  else j.left
    new_right = Project(right_needed, j.right) if right_needed < j.right.schema else j.right
    return Project(node.columns, replace(j, left=new_left, right=new_right))

The < check (proper subset) avoids inserting a no-op projection when the child already produces exactly the needed columns — otherwise the fixed-point loop would keep re-inserting identical projections forever.

Column pruning generalises this to all operators: every node computes the set of columns it needs from each child, and the scan at the bottom only reads those. Spark's ColumnPruning rule and DuckDB's projection pushdown implement exactly this analysis.

Join reordering — commutativity and associativity

Inner joins satisfy two more laws:

R \bowtie S \equiv S \bowtie R \qquad \text{(commutativity)}
(R \bowtie S) \bowtie T \equiv R \bowtie (S \bowtie T) \qquad \text{(associativity)}

Commutativity is why the optimiser gets to pick the build side of a hash join — the join's logical output doesn't care which side you call R and which S, so you can build the hash table on whichever side is smaller. Associativity is richer: it says the same three-way join can be bracketed three ways (and in general the n-way join has C_{n-1} Catalan-many bracketings), and the optimiser can pick the cheapest.

Left-deep vs bushy plans. A three-way join A ⋈ B ⋈ C can be parenthesised as (A ⋈ B) ⋈ C (left-deep) or A ⋈ (B ⋈ C) (right-deep) or, for four-way and above, bushy: (A ⋈ B) ⋈ (C ⋈ D). Most optimisers restrict the search space to left-deep plans. Two reasons. First, left-deep plans pipeline naturally — the output of the first join streams directly into the second join's probe side without materialisation. A bushy plan has two intermediate results alive simultaneously, doubling memory pressure. Second, the search space is much smaller: for n tables, left-deep has n! plans while bushy has Catalan C_{n-1} · n! — the difference starts to matter at n = 6 and dominates by n = 10.

Here is the critical point about rule-based reordering: the rewrites do not pick an order. Commutativity and associativity are equivalences; applying them blindly in a fixed-point loop would never terminate (swap, swap back, swap again). The rule-based layer only enumerates the equivalent orderings and hands them to the cost-based layer. Calcite's VolcanoPlanner tracks multiple equivalent subtrees in a memo structure (a DAG where each node is a logical expression and its equivalence class is a set of physical alternatives). PostgreSQL's optimiser uses dynamic programming over the System R join-enumeration algorithm (Selinger 1979). Both require statistics — join selectivity estimates, row counts — to pick a specific order, which is why this is where rule-based ends and cost-based begins.

What rule-based does do unconditionally:

Picking between (A ⋈ B) ⋈ C and A ⋈ (B ⋈ C) requires knowing how many rows come out of A ⋈ B versus B ⋈ C, which needs statistics. That is the next chapter.

Other rules worth knowing

A rule pack doesn't end at pushdowns and join laws. Some of the other rewrites every production engine ships:

Constant folding. SELECT x + 2 + 3 FROM t becomes SELECT x + 5 FROM t at plan time. A single arithmetic operation saved per row, multiplied by billions of rows per day in a warehouse, is worth real money. Every parser's expression tree walks itself looking for operator nodes whose children are all literals and collapses them.

Column pruning. The top-level SELECT a, b FROM t keeps only columns a and b; every other column can be dropped from every intermediate result. In a columnar store this turns into "don't even open the column file for c"; in a row store it shrinks tuples. The rule propagates a "required columns" set top-down through the tree and the scan at the bottom reads only the reachable ones.

Subquery unnesting. The SQL EXISTS (SELECT 1 FROM orders o WHERE o.user_id = u.id) in a correlated subquery would naively run the inner query once per outer row — an O(|users| · |orders|) nested loop. The optimiser detects the correlation predicate o.user_id = u.id and rewrites the subquery into a semi-join: users ⋉_{u.id = o.user_id} orders, which the hash-join engine can evaluate once with a single pass. Unnesting is complicated — NOT EXISTS, correlated aggregates, lateral joins — but the pattern is the same: turn nested iteration into a single join with a modified output semantic.

Union rewrites. R UNION S deduplicates, which costs a sort or a hash-distinct. R UNION ALL S does not. If the optimiser can prove that R and S produce disjoint row sets (say, R = σ_{x='a'}(T) and S = σ_{x='b'}(T) — disjoint by construction), it can rewrite UNION to UNION ALL and skip the deduplication. This is a big win for partition-union patterns common in time-series tables.

Outer-join to inner-join. R ⟕ S WHERE S.col IS NOT NULL is equivalent to R ⋈ S — the left outer join would have produced NULL rows for unmatched left tuples, and the filter kills exactly those rows, so you can drop the outer semantics and join inner-only. This matters because inner joins reorder freely under associativity while outer joins do not.

A full worked example

Predicate pushdown on a 100 GB join

You have two tables in a warehouse:

  • users: 10^7 rows × 100 bytes = 1 GB. Stored in 8 KB pages, B_U = 1.25 · 10^5 pages.
  • orders: 10^8 rows × 100 bytes = 10 GB. B_O = 1.25 · 10^6 pages.

Buffer pool: M = 10^4 pages (80 MB).

The query:

SELECT u.name
FROM users u JOIN orders o ON u.id = o.user_id
WHERE u.country = 'IN' AND o.year = 2026;

Filter selectivities (from the query description, not statistics): 10% of users are Indian, 5% of orders are from 2026. These numbers only enter the cost after pushdown exposes separate filters — before pushdown the whole join runs on the full tables.

Cost without pushdown. The naive tree is π(σ(U ⋈ O)): full scans of both tables into a grace hash join, which costs 3(B_U + B_O) page operations plus the filter pass on the output.

\text{I/O}_{\text{no-pushdown}} = 3(B_U + B_O) = 3 \cdot (1.25 \cdot 10^5 + 1.25 \cdot 10^6) = 4.125 \cdot 10^6 \text{ pages}

At 0.5 ms per page on SSD, that is 33 minutes of I/O time. Add the filter pass, which reads the ~10^8-row join output from its (already in-memory) buffer — call it CPU-bound and negligible compared to the I/O.

T_{\text{no-pushdown}} \approx 4.125 \cdot 10^6 \cdot 0.5 \text{ ms} \approx 2060 \text{ s} \approx 34 \text{ min}

Cost with pushdown. After the rule, each filter runs on its own base table before the join. Assume a plain scan-with-filter (no index) for symmetry, so the filters do not reduce scan cost — they reduce the join input size.

  • σ_{country='IN'}(users): reads B_U = 1.25 · 10^5 pages, emits 10% = 10^6 rows = 1.25 · 10^4 pages worth of data.
  • σ_{year=2026}(orders): reads B_O = 1.25 · 10^6 pages, emits 5% = 5 · 10^6 rows = 6.25 · 10^4 pages.

Now the hash join sees the much smaller inputs: B'_U = 1.25 · 10^4, B'_O = 6.25 · 10^4. With M = 10^4 buffer pages, B'_U is just over memory, so one merge pass of grace hash suffices. Cost:

\text{I/O}_{\text{join after filter}} = 3(B'_U + B'_O) = 3(1.25 \cdot 10^4 + 6.25 \cdot 10^4) = 2.25 \cdot 10^5

Plus the original full scans for the filter pass:

\text{I/O}_{\text{scans}} = B_U + B_O = 1.375 \cdot 10^6

Total:

\text{I/O}_{\text{pushdown}} = 1.375 \cdot 10^6 + 2.25 \cdot 10^5 = 1.6 \cdot 10^6 \text{ pages}
T_{\text{pushdown}} \approx 1.6 \cdot 10^6 \cdot 0.5 \text{ ms} \approx 800 \text{ s} \approx 13 \text{ min}

Why the ratio is "only" 2.5× here and not 20×: the base-table scans themselves dominate, and pushdown doesn't save that cost unless there is an index. With an index on users(country) and orders(year), the filter scans become O(|output|) instead of O(|input|) — roughly 1.25·10^4 + 6.25·10^4 = 7.5·10^4 pages for both scans, bringing total I/O down to 3·10^5 pages and wall-clock to ~150 seconds. The full 13× speedup only shows up when pushdown is combined with index selection, which is a separate optimisation on top. Pushdown's pure contribution is "shrink the join input"; indexes then amplify it by shrinking the scan input too.

With indexes: Index-Scan(users, country='IN') reads 1.25 · 10^4 pages; Index-Scan(orders, year=2026) reads 6.25 · 10^4 pages; hash join on the shrunk inputs reads another 2.25 · 10^5 pages. Total ~3 · 10^5 pages = 150 s. Compared to the 2060 s no-pushdown baseline, that is a 14× speedup — well into the "10–100×" range claimed at the top.

Output size. The join produces rows where u.country='IN' AND o.year=2026 AND u.id = o.user_id. If join selectivity on user_id is the reciprocal of |users| (~10^{-7}) and inputs are 10^6 × 5·10^6, output is 5 · 10^5 rows — half a million matched user-order pairs for a report, sensible.

Common confusions

Going deeper

These are the implementation details that turn "apply some equivalence laws" into a production optimiser. Two are worth the read.

Spark Catalyst — rules as tree transformations

Catalyst models a logical plan as an immutable case-class tree. A rule is an object extends Rule[LogicalPlan] with an apply(plan: LogicalPlan): LogicalPlan method. A rule executor runs batches of rules to fixed point:

object PushPredicateThroughJoin extends Rule[LogicalPlan] {
  def apply(plan: LogicalPlan): LogicalPlan = plan transform {
    case Filter(cond, j @ Join(left, right, Inner, Some(joinCond), hint)) =>
      val (leftC, rightC, otherC) = split(cond, left.outputSet, right.outputSet)
      val newLeft  = leftC.map(Filter(_, left)).getOrElse(left)
      val newRight = rightC.map(Filter(_, right)).getOrElse(right)
      val newJoin  = Join(newLeft, newRight, Inner, Some(joinCond), hint)
      otherC.map(Filter(_, newJoin)).getOrElse(newJoin)
  }
}

The transform combinator is a visitor over the immutable tree: it applies the partial function to every node bottom-up and returns a new tree with the rewrites threaded through. If the partial function doesn't match, the node is returned unchanged. This is what makes rule-based rewriting composable — adding a rule is writing one case and registering it in a batch. Catalyst ships roughly 100 such rules.

The RuleExecutor is exactly the fixed-point loop from this chapter, with one extra feature: rules are grouped into batches, each batch with its own termination strategy (fixed point, or "once" for rules that should apply exactly once). This avoids interleaving rules that interact badly.

Reference: Catalyst paper (Armbrust et al., SIGMOD 2015).

Bloom filters as "runtime filter pushdown"

A technique that looks like predicate pushdown but runs at execution time. Consider A ⋈ B where |A| is small (a filtered dimension, 10^5 rows) and |B| is large (a fact table, 10^9 rows). Classic predicate pushdown cannot help: there's no explicit predicate on B, only the implicit equijoin.

A Bloom filter pushdown builds a Bloom filter on A's join keys during A's scan, ships it down to B's scan, and B uses it to skip rows whose join key definitely isn't in A. This is not a correctness-preserving logical rewrite — the tree remains A ⋈ B — but it is a runtime optimisation that reduces B's effective row count by orders of magnitude. Impala pioneered it as "runtime filters"; Presto, Spark, and Snowflake now all ship variants.

It sits between rule-based and cost-based optimisation: the decision whether to generate a Bloom filter is cost-based (only useful when A is small and selective), but the rewrite shape is rule-based (insert a BloomProbe node into B's scan). Modern optimisers blur the line between the two halves.

Where this leads next

Rule-based rewrites are the deterministic half of the optimiser. They turn whatever the parser hands them into a canonical, pushed-down, pruned tree that commits to no particular join order. What rule-based leaves open is exactly the interesting question — given several equivalent trees, which one runs fastest on your actual data?

That needs three things the next chapters build:

The physical plan choice — nested-loop vs hash vs sort-merge, index scan vs sequential, hash agg vs sort agg — is also cost-based. The rule-based layer produces a logical plan (operators, order, predicates); the cost-based layer picks the physical implementation of each operator.

Everything you saw in the previous chapters — nested-loop join, hash join, sort-merge join, the iterator model — fits into that picture as "physical operators the cost-based layer chooses among." Rule-based rewrites pick which subtree goes into them; cost-based optimisation picks which operator runs each subtree. Together they produce the plan you see in EXPLAIN ANALYZE.

References

  1. Selinger, Astrahan, Chamberlin, Lorie, Price, Access Path Selection in a Relational Database Management System, SIGMOD 1979 — the System R paper that defined the rule-based plus cost-based architecture every modern optimiser still uses. Section 3 describes the predicate-pushdown rule; sections 4–5 are cost-based join enumeration.
  2. Graefe, Query Evaluation Techniques for Large Databases, ACM Computing Surveys 25(2), 1993 — the canonical survey. Chapter 3 on rule-based transformations is still the best reference for the taxonomy of rewrites.
  3. Armbrust, Xin, Lian, Huai, Liu, Bradley, Meng, Kaftan, Franklin, Ghodsi, Zaharia, Spark SQL: Relational Data Processing in Spark, SIGMOD 2015 — describes Catalyst's rule framework and shows the Rule[LogicalPlan] abstraction used in the Going Deeper section.
  4. Begoli, Camacho-Rodríguez, Hyde, Mior, Lemire, Apache Calcite: A Foundational Framework for Optimized Query Processing Over Heterogeneous Data Sources, SIGMOD 2018 — the Calcite paper, covering HepPlanner (rule-based) and VolcanoPlanner (cost-based) side by side.
  5. PostgreSQL source tree, src/backend/optimizer/prep/prepjointree.c — the production predicate-pushdown implementation, including outer-join reduction and subquery unnesting. Long but clean; the comments are half the reading.
  6. Neumann and Kemper, Unnesting Arbitrary Queries, BTW 2015 — the modern reference for subquery unnesting, generalising the classical EXISTS→semi-join rewrite to arbitrary correlated subqueries.