Prepared Statements and Plan Caching

Note: Company names, engineers, incidents, numbers, and scaling scenarios in this article are hypothetical — even when they resemble real ones. See the full disclaimer.

In short

A prepared statement is a server-side object that turns the SQL pipeline from parse-plan-execute into parse-plan-once, execute-many. You send PREPARE foo(int) AS SELECT * FROM users WHERE id = $1 once; the server parses the SQL text into a syntax tree, resolves names against the catalog, rewrites the tree algebraically, and runs the cost-based optimiser to produce a physical plan. The plan is stored under the handle foo. Subsequent EXECUTE foo(42) calls skip parsing and planning entirely — they bind the parameter into placeholders already embedded in the plan and run it.

For short OLTP queries, parse-plus-plan is 50-80% of total server CPU cost. A SELECT ... WHERE id = $1 that takes 1 ms end-to-end typically spends 0.6-0.8 ms in the parser, binder, rewriter, and planner, and only 0.2-0.4 ms actually reading the row. Prepared statements collapse that overhead on every call after the first, so throughput roughly doubles.

Plan caching is the generalisation. Even without an explicit PREPARE, modern engines store compiled plans keyed by normalised SQL text and reuse them when the same shape arrives again. SQL Server and Oracle cache aggressively by default; Postgres does it per-connection through the extended wire protocol; MySQL 5 shipped a heavy query cache that version 8 removed as more trouble than it was worth.

The gotcha: a cached plan is compiled against one set of parameter values (or against average statistics in the absence of values), and it is not re-examined on every call. For a column with wildly skewed selectivity — say country where 90% of rows are 'IN' and 0.001% are 'XX' — the optimal plan for the common value is a sequential scan, and the optimal plan for the rare value is an index scan. A single cached plan cannot be right for both, and when it is wrong it is wrong by three orders of magnitude. Postgres's plan_cache_mode (auto/force_generic/force_custom) is how you tell the engine which side of this tradeoff your workload sits on. This chapter walks the mechanics, builds a toy prepared-statement cache in roughly forty lines of Python, shows the generic-plan pathology with numbers, and explains why transaction-mode PgBouncer quietly breaks server-side prepared statements unless you know what to ask your driver for.

An OLTP web request that begs for amortisation

Picture a backend serving a common request: "fetch the last ten orders for this user". Ten thousand times per second, the database receives the same SQL text with a different user id spliced in.

SELECT id, placed_at, total_cents
FROM orders
WHERE user_id = $1
ORDER BY placed_at DESC
LIMIT 10;

Without prepared statements, every one of those 10,000 calls drives the full pipeline. The server parses the SQL into an AST. It resolves orders, user_id, placed_at, and total_cents against the system catalog, pulling column types and statistics. It applies rule-based rewrites (nothing to do here). It runs the cost-based optimiser, which considers a sequential scan, an index scan on (user_id), and an index-only scan on (user_id, placed_at) — and picks the last, because it can serve both the filter and the ORDER BY from one B-tree traversal. Finally it executes the plan, probes the B-tree, fetches ten tuples, and returns them.

Total server CPU: approximately 1 ms per call, dominated by parse-plus-plan, not execution. At 10,000 calls per second, the planner becomes the bottleneck.

Prepare the statement once per connection, execute it 10,000 times:

PREPARE q AS SELECT id, placed_at, total_cents FROM orders
             WHERE user_id = $1 ORDER BY placed_at DESC LIMIT 10;

EXECUTE q(42);
EXECUTE q(117);
EXECUTE q(9999);
...

The server parses and plans exactly once. Every EXECUTE binds the parameter and runs the index scan. Server CPU per call drops from ~1 ms to ~0.2 ms — a 5× improvement with zero semantic change. Prepared statements are one of the highest-leverage changes you can make to an OLTP stack, and every mature driver knows it.

What a prepared statement actually stores

The word "plan" is doing real work here. A prepared statement is not the SQL string with holes in it. It is a fully compiled, cost-evaluated physical operator tree, ready to execute, with placeholder nodes wherever a parameter goes.

Walk the pipeline backwards from the thing that runs. For the query above the server produces something like:

Limit (10)
 └─ IndexScan(orders, using=idx_orders_user_placed,
              condition: user_id = $1, order: placed_at DESC)

That tree is the plan. The $1 is a real object in the plan — a Param node that the executor reads from an "execution context" table filled in at EXECUTE time. Walking the tree and reading $1 is a pointer dereference. Parsing the SQL text and arriving at this same tree afresh is a thousand times more expensive.

The plan is reached by a three-stage pipeline:

Parse. SQL text → abstract syntax tree. Pure string work.
Analyse (bind). AST → algebraic tree, with every column resolved against the catalog and every type inferred. Checks that user_id exists and has type integer, that $1 is declared as int, that the types are comparable.
Rewrite and plan. Algebraic tree → physical operator tree. This is where rule-based rewrites and cost-based optimisation run. The output is the plan above.

A prepared statement caches the output of stage 3. Everything upstream runs once per PREPARE; nothing upstream runs per EXECUTE.

One subtlety: at plan time the server does not know what value $1 will take, so the optimiser must either guess the value's selectivity (using average statistics) or assume a specific value (the first execution's, or one the client hints). Postgres does the former unless you force custom plans; SQL Server does the latter and calls it "parameter sniffing". Both approaches have failure modes we will see below.

The parse/plan/execute pipeline, with and without caching

The shape of what runs per call changes dramatically.

The fraction of total time each stage consumes depends heavily on query complexity. For a one-table SELECT ... WHERE id = $1 the planner is trivial, but so is execution, so parse-and-plan is easily 70% of total. For a ten-table join with subqueries, planning can take 50-200 ms and dominate execution by a factor of ten; here a prepared statement is the difference between a usable endpoint and a timeout.

The payoff line is: the more complex the plan, the more valuable the prepare, up until the query becomes so complex that any per-call savings are swamped by the query's own execution time.

A toy prepared-statement cache in Python

The essence of the mechanism fits in forty lines. The cache is a dict keyed by statement name; the value is a compiled plan. prepare does the expensive work; execute binds parameters and runs.

# query/prepared.py
from dataclasses import dataclass
from typing import Any, Callable

@dataclass
class Plan:
    """A physical operator tree with parameter placeholders."""
    op: str                    # e.g. "index_scan"
    table: str
    run: Callable[..., list]   # bound at plan time, takes parameters

class PreparedStatementCache:
    def __init__(self, catalog, planner):
        self.plans: dict[str, Plan] = {}
        self.catalog = catalog
        self.planner = planner

    def prepare(self, name: str, sql: str) -> None:
        ast = parse(sql)                          # stage 1
        algebra = analyse(ast, self.catalog)      # stage 2
        plan = self.planner.plan(algebra)         # stage 3
        self.plans[name] = plan

    def execute(self, name: str, params: tuple[Any, ...]) -> list:
        plan = self.plans.get(name)
        if plan is None:
            raise LookupError(f"no prepared statement {name!r}")
        return plan.run(*params)                  # bind + execute

    def deallocate(self, name: str) -> None:
        self.plans.pop(name, None)

parse, analyse, and planner.plan are the real work the production database runs for every ad-hoc query; in this toy the interesting move is that prepare calls all three and stashes the result, while execute skips straight to plan.run. The runtime shape matches what Postgres, MySQL, SQL Server, and Oracle all do — the names differ, the mechanism does not.

Extended wire protocol — unnamed prepared statements

Explicit PREPARE/EXECUTE is one of two ways a Postgres client uses prepared statements. The other is invisible and more common: the extended query protocol, used by every modern driver.

The Postgres wire protocol offers two query paths. Simple query — one message type Q carrying raw SQL — parses, plans, and executes; no caching. Extended query splits the same work across three messages: Parse (SQL text plus a statement name, server produces and caches the plan), Bind (parameters go in, producing a "portal"), and Execute (run the portal).

Drivers exploit this to get prepared-statement behaviour without explicit PREPARE from the application. A query through libpq's PQexecParams uses the unnamed prepared statement — cached under an empty name, reused if the next Parse sends the same SQL, discarded when the SQL changes. A driver that deduplicates client-side (psycopg3, pgjdbc, pgx) keeps a map from SQL text to statement name, sends PREPARE once per new SQL, and uses Bind/Execute thereafter. Your application sees no PREPARE; the pipeline still saves 50% of CPU.

The driver-level default matters. psycopg3 prepares automatically after the fifth execution of the same SQL. pgjdbc does the same with prepareThreshold=5. asyncpg prepares every query on first use. pgx prepares on demand. If your throughput ceiling is the planner, check your driver's prepared-statement config before reaching for bigger machines.

Plan caching beyond prepared statements

The idea generalises. Every major engine has mechanisms for "this query looks like one I have seen — can I reuse the plan?".

SQL Server's plan cache. Keyed by normalised SQL text (whitespace stripped, literals optionally substituted with parameters under "simple parameterisation"). The plan stays resident in the procedure cache — a region of the buffer pool — until memory pressure evicts it. SQL Server ships with aggressive text-match caching by default; it is a major source of the "parameter sniffing" war stories that shape DBA folklore.

Oracle's cursor cache. The library cache stores parsed SQL and cursors (plans). Oracle distinguishes soft parses (plan found, reuse) from hard parses (plan built from scratch). The classic Oracle performance advice — "always use bind variables" — exists because literal-laden SQL produces a new cache entry per literal, evicting useful plans; bind variables keep the cache dense.

MySQL's old query cache. Stored full result sets keyed by exact query text, invalidated on any write to any table touched by the query. Because even a single write invalidated every cache entry for the affected table, and because the cache was protected by a global mutex that serialised all lookups, the feature hurt more than it helped under real concurrency. MySQL 8 removed it entirely. The lesson: caching the plan is very different from caching the result; the former amortises parsing, the latter is a correctness minefield.

Postgres's path. Per-connection caching through the extended protocol; no cross-connection plan cache, because plans depend on session settings (search_path, work_mem, GUCs) that vary between connections. Sometimes cited as a weakness, but it sidesteps the parameter-sniffing problem other engines wrestle with.

The strategies vary along one axis: how aggressively does the engine decide two queries are the same shape? Exact text match is safe but rarely hits. Normalised text (whitespace, case, literal substitution) hits more often but needs careful invalidation. The production-grade answer is usually "cache what you can per connection, and pay the planner cost on the rest".

The generic plan gotcha

Here is the hard part.

When Postgres prepares a statement with a parameter, it does not know what value will be passed. To plan, it has to estimate the filter's selectivity — what fraction of rows pass WHERE country = $1? It has two options:

Assume an average value. Use 1/NDV for equality (one over the number of distinct values in country). Gives a selectivity number that is correct "on average" across all values.
Peek at the first value passed and plan for that specific value. Read the histogram bucket for 'IN' and plan assuming 90% selectivity. Correct for this parameter, wrong for every parameter with different distribution.

Postgres calls option (1) a generic plan and option (2) a custom plan. SQL Server and Oracle default to option (2) and call the approach parameter sniffing — the plan sniffs the first execution's parameter and commits to it. Either choice is a bet on uniformity.

Consider a users table with 100M rows and a country column with this distribution:

country	row count	selectivity
`'IN'`	90,000,000	90%
`'US'`	7,000,000	7%
`'GB'`	2,000,000	2%
`'XX'` (rare)	1,000	0.001%

For WHERE country = 'IN' the optimal plan is a sequential scan — the filter matches 90% of rows, and an index scan would still have to fetch almost every heap page through random I/O. For WHERE country = 'XX' the optimal plan is an index scan — 1,000 rows out of 100 million is tiny. The 1/NDV generic plan with NDV ≈ 4 sees 25% selectivity, estimates 25 million matches, and picks a sequential scan — right for 'IN', wrong for 'XX' by a factor of 25,000.

# query/selectivity_demo.py
from dataclasses import dataclass

@dataclass
class Stats:
    total_rows: int
    ndv_country: int
    mcv: dict[str, int]  # most-common-value → row count

def cost_seq_scan(stats: Stats) -> float:
    return stats.total_rows * 1.0  # 1 page cost per row (toy)

def cost_index_scan(matched: int, total: int) -> float:
    # index descent + random I/O per matched row
    return 3.0 + matched * 4.0

def generic_cost(stats: Stats) -> tuple[str, float]:
    sel = 1.0 / stats.ndv_country
    matched = int(stats.total_rows * sel)
    seq, idx = cost_seq_scan(stats), cost_index_scan(matched, stats.total_rows)
    return ("seq", seq) if seq < idx else ("idx", idx)

def custom_cost(stats: Stats, value: str) -> tuple[str, float]:
    matched = stats.mcv.get(value, 100)  # assume 100 for unseen values
    seq, idx = cost_seq_scan(stats), cost_index_scan(matched, stats.total_rows)
    return ("seq", seq) if seq < idx else ("idx", idx)

Run both against the skewed distribution: the generic plan picks seq with cost 100 million; the custom plan picks seq for 'IN' (correct) and idx for 'XX' (cost 3 + 4000 = 4003 — 25,000× cheaper than the generic plan's sequential scan for a thousand matching rows).

A generic plan trades correctness for planning time. When the trade is good — uniform distributions, PK lookups, equality filters on high-cardinality columns — prepared statements are pure upside. When the trade is bad — skewed columns, variable IN-lists, range predicates on tail-heavy data — you get subtle performance cliffs that look like mystery regressions.

Postgres's custom-vs-generic heuristic

Postgres does not commit up-front. It runs a small adaptive algorithm, one of the more pragmatic bits of engineering in the codebase.

For the first five executions of a prepared statement, Postgres plans a custom plan each time, using the specific parameter values. It records the cost of each. On the sixth execution, it also builds the generic plan once — with no parameter knowledge, using average statistics — and records its cost. If the generic plan's estimated cost is within some threshold of the average custom-plan cost (the code uses roughly a 10% margin, with adjustments for planning time saved), the planner switches to generic and uses it thereafter. Otherwise it keeps paying for custom plans indefinitely.

The switch decision is revisited; if statistics change enough (via ANALYZE), the generic plan is rebuilt.

The controlling setting is plan_cache_mode:

auto (default) — the heuristic above.
force_generic_plan — always use the generic plan, even if custom is cheaper. Good for queries where you know selectivity is uniform and you want to pay planning time exactly once.
force_custom_plan — always re-plan per execution. Good for queries on skewed columns where you would rather pay the planner's cost than risk a bad cached plan.

In practice the advice is:

Default OLTP lookups by primary key or high-cardinality column: leave at auto. The generic plan wins after five executions and never looks back.
Analytical queries with skewed filters or dynamic IN-lists: SET plan_cache_mode = force_custom_plan on the session, or disable prepared statements at the driver level for those queries.
Rarely: force_generic_plan — use when planning itself is a bottleneck (tens of joins, hundreds of QPS) and you have verified the plan is acceptable across parameters.

SQL Server's equivalent is the OPTION (RECOMPILE) query hint, which forces a per-execution plan and is the standard escape hatch when parameter sniffing bites. Oracle has CURSOR_SHARING = EXACT | FORCE | SIMILAR — the same tradeoff surface, under different names.

Prepared statements and connection poolers — the pain point

Prepared statements are per-connection state on the server. The PREPARE q AS ... and the subsequent EXECUTE q(...) must hit the same backend process, because that is where the plan lives.

This collides head-on with transaction-mode PgBouncer. In transaction pooling, a client's server connection is handed back to the pool at every COMMIT; the next statement from the same client may land on a different backend. PREPARE q AS ... runs on backend A; the subsequent EXECUTE q(42) runs on backend B, which has never seen q, and you get prepared statement "q" does not exist.

The failure mode is dev-vs-prod-flavoured: developers run against a local Postgres with no pooler (session mode effectively), prepared statements work perfectly, the feature lands. Production traffic hits PgBouncer in transaction mode, prepared statements fail sporadically, dashboards spike. The stack trace points at the ORM; the cause is the pooling mode.

Three workarounds, in ascending order of operational cost:

Driver-level protocol-only prepare. The driver sends Parse/Bind/Execute inside a single transaction, so the whole round-trip lands on one backend. Nothing persists across transactions; you lose plan reuse between calls but keep parameterisation and injection protection.
PgBouncer 1.21+ with max_prepared_statements. Recent PgBouncer versions track which statements each client has prepared and replay the Parse on whatever backend the client lands on. Requires max_prepared_statements > 0 in pgbouncer.ini.
Session-mode pooling. Each client gets a dedicated server connection for its whole session. Prepared statements work exactly like without a pooler; you lose most of transaction pooling's multiplexing.

Most production shops that rely on prepared statements now run PgBouncer 1.21+ with prepared-statement tracking enabled, or accept the loss of server-side plan reuse in exchange for protocol-level prepare.

The skewed-country query, prepared vs unprepared, on a synthetic 100-million-row users table.

Schema: users(id bigint, country text, ...) with a B-tree index on country. Distribution matches the earlier table: 90M 'IN', 7M 'US', 2M 'GB', 1K 'XX', others filling the tail. Run three scenarios on a warmed cache.

Scenario 1: ad-hoc query, no prepare. EXPLAIN ANALYZE SELECT count(*) FROM users WHERE country = 'XX'; Postgres peeks at the literal 'XX', looks up the most-common-value list, sees 1,000 rows, picks index scan. Runtime: 0.8 ms. The optimiser uses the actual literal; no cached plan is involved; works perfectly.

Scenario 2: prepared, plan_cache_mode = auto, queried with 'IN' first. First five calls use custom plans; Postgres picks sequential scan for 'IN' (correct) with estimated cost 1,000,000. Sixth call triggers generic-plan evaluation. Generic plan uses 1/NDV ≈ 1/50 → estimated 2M rows → sequential scan, cost 1,000,000. Generic is cheaper than custom by the avoided planning overhead; Postgres switches to generic. Now call EXECUTE q('XX'). The cached generic plan runs a sequential scan of 100 million rows to find 1,000 matches. Runtime: 4,200 ms — 5,000× slower than the ad-hoc version. Dashboards light up.

Scenario 3: prepared, plan_cache_mode = force_custom_plan. Every EXECUTE re-plans. EXECUTE q('IN') → sequential scan, 900 ms. EXECUTE q('XX') → index scan, 0.8 ms. Correct in both cases. The planning overhead is ~0.3 ms per call — negligible against the execution time of either plan.

Scenario 4: prepared, plan_cache_mode = force_custom_plan, on a primary-key lookup. PREPARE p AS SELECT * FROM users WHERE id = $1; EXECUTE p(...). Every call re-plans; the plan is always "index scan on the PK" because selectivity is always 1 row; you are paying planner cost to get the same plan every time. On 10,000 QPS workloads this matters — leave this query at auto. force_custom_plan is a targeted tool for skewed columns, not a global default.

The ratio between scenarios 2 and 3 is the core lesson: the same query, the same parameter, a different plan_cache_mode setting, and a 5,000× performance difference. Prepared statements are not a free speedup; they are a contract, and the contract has a skewed-parameter clause.

Common confusions

"Prepared statements protect against SQL injection." True, and worth stating. Because parameters are sent as a separate protocol field, not concatenated into the SQL text, there is no parsing path that can interpret a parameter as SQL syntax. A literal '; DROP TABLE users; -- passed as $1 is just a string to the executor. This security benefit is independent of the performance benefit — even queries that execute only once should use parameters.

"Caching is free." Plans take memory — typically 10-100 KB each for non-trivial queries, sometimes much more. On a connection that prepares hundreds of statements (common with ORMs that generate SQL for every combination of optional filters), the plan cache can grow to tens of megabytes per backend. Multiply by hundreds of backends and it is a real number. Postgres's max_prepared_statements has no hard default cap; drivers that auto-prepare everything can exhaust backend memory.

"My driver uses prepared statements automatically." Depends on the driver, the version, and the config. pgjdbc defaults to non-prepared until the fifth execution, then prepares. psycopg3 defaults to the same threshold. asyncpg prepares everything. pg (node-postgres) does not prepare by default and requires { name: 'stmt1' } on each call. Check the actual behaviour before assuming.

"Plan cache invalidates when the schema changes." Yes, automatically. ALTER TABLE on any table a plan touches marks every cached plan on that table invalid; the next EXECUTE rebuilds the plan from scratch. ANALYZE does the same for plans that depend on the re-gathered statistics. This is the one thing the cache gets right for free — without it, a column type change would silently return wrong data until every connection reconnected.

"Prepared statements always help." Only for queries that repeat. A one-shot SELECT * FROM logs WHERE timestamp > now() - interval '1 hour' executed once from an admin panel pays the prepare overhead for no benefit. Prepare is an amortisation trick; if your workload has no repetition, you have nothing to amortise.

"Server-side prepared statements survive across the connection pool." They do not. Each server backend has its own plan cache. If your app-side pool closes and re-opens connections (because of max_lifetime, or because a PgBouncer restart dropped them), prepared statements disappear with the connection. Drivers typically handle this transparently by re-preparing on demand; be aware it happens.

Going deeper

Oracle treats the cursor (parsed SQL plus plan plus runtime state) as the cache key, and offers three CURSOR_SHARING modes. EXACT is text-identity match — queries differing only in a literal like 42 vs 43 produce two cache entries. FORCE automatically substitutes every literal with a bind variable before caching — turns WHERE id = 42 into WHERE id = :sys_b0 — sharing plans across syntactically distinct queries. SIMILAR (deprecated) inspected statistics to decide per-literal whether to share. Oracle's long advice to "always use bind variables" exists because EXACT mode punishes literal-laden applications with plan-cache thrashing.

SQL Server's auto-parameterisation and the PSP problem

SQL Server does aggressive literal substitution through simple parameterisation and forced parameterisation. The result is high plan-cache density — almost every shape of query coalesces into one plan. The cost is the parameter sniffing problem (PSP): when the first parameter value is atypical, the sniffed plan is catastrophically wrong for typical values, and every future execution reuses the bad plan until the cache is flushed. The SQL Server DBA community has produced a literature of workarounds — OPTION (RECOMPILE), OPTIMIZE FOR hints, plan guides, DBCC FREEPROCCACHE — to cope.

The Oracle bind-variable-peeking story

Oracle added bind-variable peeking in 9i to drive planning from the first execution's parameter. The flaw: if that first execution was an outlier (rare value, small result set), the cached plan was locked in for every subsequent call with typical values. The canonical war story is a nightly report that warmed the cache with a point-lookup plan, leaving the daytime transactional workload running full-table scans until the DBA noticed. Oracle 11g added adaptive cursor sharing — cache multiple plans per statement, indexed by parameter range — which is closer to Postgres's per-execution custom plans than the old peeking behaviour.

Result caching vs plan caching, and query fingerprinting

MySQL's old query cache stored results, not plans — and had to invalidate on any write to any cached table, a synchronisation cost that dwarfed the speedup on write-heavy workloads. Plan caches only invalidate on schema or statistics changes, orders of magnitude rarer. Tools like pg_stat_statements (Postgres) aggregate query statistics by fingerprint — the SQL text with literals normalised — which is the same idea the plan cache uses for lookup, exposed for monitoring.

Where this leads next

Build 8 continues with the operational boundary of a running database. Chapter 65 covers authentication and authorisation — SCRAM, certificate auth, external providers, and the role/grant system. Chapter 66 widens the lens to client-scale concerns — the interaction of pooling, prepared statements, and retry logic under real load.

Then Build 9 opens the replication story: the WAL as a replication source, streaming replication, synchronous versus asynchronous replicas, and the read-after-write problems every prepared-statement app runs into when reads land on a replica. The per-connection plan caches and custom-vs-generic decisions in this chapter survive across that boundary — each replica is another Postgres server running the same planner.