Connection Lifecycle, Pooling, and PgBouncer

Note: Company names, engineers, incidents, numbers, and scaling scenarios in this article are hypothetical — even when they resemble real ones. See the full disclaimer.

In short

Opening a Postgres connection costs roughly 20 ms of round-trip time and 10 MB of server memory — because Postgres, unlike most modern databases, dedicates one OS process per connection (forked from the postmaster, loaded with system catalogs, authenticated, possibly TLS-wrapped). A web request that connects, runs a single SELECT, and disconnects spends about 95% of its time on the connection lifecycle and 5% on the actual query.

Connection pooling is the standard fix: keep a long-lived pool of already-open server connections, and hand them out to short-lived client requests. There are two layers in production.

Layer 1 — the application-side pool. HikariCP for the JVM, pgx's built-in pool in Go, psycopg_pool.ConnectionPool in Python. One pool per OS process, with min_size and max_size tunables. A request checks a connection out, uses it, and returns it.

Layer 2 — the external pooler. A separate process — PgBouncer being the canonical one, Odyssey and pgcat the newer multi-threaded alternatives — that accepts thousands of client connections on one side and multiplexes them onto a much smaller number of server connections on the other. The client sees a normal Postgres wire protocol and does not know the pooler is there.

Three pooling modes trade off compatibility against multiplexing. SESSION mode gives each client a dedicated server connection for its whole session — safe but no real multiplexing. TRANSACTION mode hands over a server connection only for the duration of a transaction (BEGIN→COMMIT) — great multiplexing, but breaks server-side prepared statements, LISTEN/NOTIFY, temporary tables, and session SET variables. STATEMENT mode hands over a server connection per statement — the highest multiplexing possible, and almost nothing stateful survives.

This chapter walks the connection lifecycle, builds a toy transaction-mode pooler in roughly sixty lines of Python, lists exactly what TRANSACTION mode sacrifices, and shows the production pgbouncer.ini variables that matter (max_client_conn, default_pool_size, server_idle_timeout, query_wait_timeout).

A Django app that drowns in handshakes

Picture a Django service taking 1000 HTTP requests per second, each running 3 SQL queries against Postgres. If the app opens a fresh connection per request and closes it at the end — the pattern you get from naive psycopg2.connect() calls with no pool — each connection costs about 20 ms end-to-end: TCP handshake, TLS handshake, Postgres startup message, auth round trip, fork() on the server, backend initialisation.

At 1000 rps, that is 1000 × 20 ms = 20 seconds of connection setup latency per wall-clock second. Why: you have to do 20 seconds of work in one second concurrently — 20 connections-in-flight just for the handshakes, before a single query runs. If the app opens a fresh connection per query instead of per request, multiply by three: 60 seconds of handshake time per second of wall clock — impossible on any machine that exists.

Add pooling and the math collapses. Keep 50-100 long-lived server connections open. Each request checks one out (microseconds), runs its 3 queries, returns it. The handshake cost is paid once per pool connection at startup, then amortised over millions of requests. The p99 latency is dominated by query planning, not TCP setup. The gap between naive and pooled is 30× to 100× throughput for realistic web workloads. This chapter is about closing that gap without breaking the application.

The cost of a Postgres connection

A Postgres connection is not a file handle or a socket. It is a live OS process, forked by the postmaster at connection time and dedicated to one client for the duration of the session. Forking and initialising a Unix process is expensive in ways a file handle is not.

Client side. DNS lookup (0-5 ms, cached). TCP three-way handshake (1 RTT — 0.5-1 ms local, 50-100 ms cross-region). TLS handshake if required (2 RTTs for TLS 1.2, 1 RTT for TLS 1.3). Postgres StartupMessage exchanging protocol version and database name (1 RTT). Auth — SCRAM-SHA-256 needs 2 round trips. Parameter negotiation and the first ReadyForQuery (1 RTT).

Server side. The postmaster fork()s a new backend process. The backend loads system catalogs into its per-process relcache — pg_class, pg_attribute, pg_type, pg_namespace, plus entries for your search path. Why: every planner call needs fast schema access, so each backend pre-warms its relcache rather than paying the cost on the first query. It allocates work_mem, temp_buffers, and per-query state. Resident memory per idle backend: 5-15 MB, sometimes higher after the backend has run queries.

Put together: connection open time 20-50 ms, steady-state memory 10 MB per connection. Close the connection and the process exits, releasing everything.

Postgres's default max_connections is 100; production tuning is 200-500. Above that, process count dominates — context switching, memory pressure, and Postgres's internal locks (ProcArrayLock in particular) turn into contention bottlenecks. Why: each backend registers itself in shared memory structures that the snapshot-taking code scans under a lock, so snapshot acquisition cost grows linearly with backend count — see MVCC snapshots for the mechanism. The process-per-connection model is a deliberate choice — it gives per-backend crash isolation — but it is why Postgres specifically needs pooling more than most databases.

Application-side connection pools

The simplest pooling lives inside your application process. A pool object holds N already-open connections; requests check one out, use it, return it. If the pool is empty, the checkout blocks (or errors after a timeout). The minimum viable pool has three operations: acquire(), release(conn), close(). Production pools add health checks, max-lifetime eviction, and instrumentation.

In Python with psycopg_pool.ConnectionPool:

from psycopg_pool import ConnectionPool

pool = ConnectionPool(
    conninfo="host=db.example.com dbname=app user=alice",
    min_size=10,
    max_size=50,
    max_idle=300,        # close idle connections after 5 min
    max_lifetime=3600,   # recycle connections every hour
    timeout=5.0,         # raise PoolTimeout after 5s of waiting
)

def handle_request(user_id):
    with pool.connection() as conn:
        row = conn.execute(
            "SELECT email FROM users WHERE id = %s",
            (user_id,),
        ).fetchone()
        return row[0] if row else None

What the with block does: pool.connection() calls acquire() — returns an idle connection in microseconds, or waits up to timeout seconds, or raises PoolTimeout. The pool can optionally run a cheap health check (SELECT 1) before handing over. On exit, the pool rolls back any in-flight transaction, resets session state, and puts the connection back in the idle queue.

max_lifetime matters because Postgres backends accumulate cruft over time (plan caches, shared-memory state) and middleboxes (NAT routers, load balancers) drop idle TCP connections after 5-30 minutes. Why: a pool whose idle connections silently die because a NAT table expired will hand out unusable connections on the first hot request after a quiet period — recycling before the NAT timeout guarantees any connection you hand out was alive recently.

This pool is one per OS process. In Gunicorn with N workers, you get N independent pools — the first scale limit.

Why in-process pools aren't enough at scale

One Gunicorn worker with max_size=50 is fine — 50 connections, well below max_connections = 200. But production does not run one worker. Suppose 10 pods, 10 Gunicorn workers each — that is 100 Django processes, each with its own pool. At steady-state utilisation of ~20 connections per pool, the total is 2000 open Postgres connections — mostly idle but nevertheless holding backend processes. At 10 MB each that is 20 GB of Postgres memory burned on idle connections, plus 2000 entries in the ProcArray.

Scale to 100 pods (a modest fleet) and you reach 10,000 connections against max_connections = 500 — immediate FATAL: too many connections for most of them. The root problem: each in-process pool makes locally optimal decisions — "keep enough connections hot for my peak" — but globally wasteful ones. Pools do not coordinate. The solution is to pull the pool out of the application process and share it across all of them.

PgBouncer — the external pooler

PgBouncer is a small C program that speaks the Postgres wire protocol on both sides. On the client side, it accepts connections exactly like Postgres does — your application points host= at PgBouncer and makes no code changes. On the server side, it opens its own smaller pool of connections to the real Postgres and multiplexes client traffic onto them.

Topology:

PgBouncer sits between a large fan-in of client connections and a small fan-out of server connections. The client-side connections are cheap (PgBouncer keeps them in an event loop, a few KB each) and the server-side connections are expensive (Postgres backend processes at 10 MB each). The multiplexing ratio depends on pooling mode and traffic pattern — 10:1 to 100:1 is typical.

PgBouncer is single-threaded and event-loop based, architecturally similar to nginx. A single process handles tens of thousands of client connections per core with negligible CPU — its per-connection state is just a socket, a buffer, and a few bytes of bookkeeping. The tradeoff is that all the work happens on one CPU. If you saturate it, you scale by running multiple PgBouncer processes (SO_REUSEPORT) or switching to a multi-threaded pooler like Odyssey.

The three pooling modes — the fundamental tradeoff

The core knob of an external pooler is how long a client "owns" a server connection. PgBouncer offers three levels of granularity.

Mode	Client owns server connection from...	...until	Multiplexing ratio	Breaks
SESSION	login (`StartupMessage`)	logout (client disconnect)	~1:1	nothing — fully transparent
TRANSACTION	`BEGIN` (or first statement in autocommit)	`COMMIT` / `ROLLBACK`	10:1 to 50:1	prepared statements, `LISTEN/NOTIFY`, temp tables, session `SET`, cursors across transactions
STATEMENT	start of a statement	end of that statement	50:1 to 200:1	everything above, plus any multi-statement transaction

SESSION mode: a client maps one-to-one onto a server for its whole life. PgBouncer still helps — it avoids the per-connection fork cost by keeping pre-opened servers warm — but if 2000 clients stay connected, you need 2000 servers. Nothing in the application breaks.

TRANSACTION mode is where multiplexing actually happens. When a client starts a transaction, PgBouncer picks an idle server and dedicates it until the transaction ends; between transactions, the server returns to the pool. Why: most web requests finish transactions in tens of milliseconds, but the client connection lingers for seconds or minutes (keepalive, user idle, pool idle) — transaction mode reclaims that idle time for other clients.

STATEMENT mode: each individual statement grabs a server, runs, releases. Multi-statement transactions are rejected. Useful only for purely autocommit analytics traffic.

One rule: TRANSACTION mode is the production default, and every deviation needs a specific reason.

What breaks in TRANSACTION mode

The price of transaction-mode multiplexing is that anything depending on persistent server-session state across transactions silently stops working.

Server-side prepared statements. PREPARE foo(int) AS SELECT ... followed by EXECUTE foo(42) in a later transaction may land on a different server and fail with prepared statement "foo" does not exist. Workaround: client-side prepared statements (psycopg's prepare_threshold, pgx's flags). PgBouncer 1.21+ added limited server-side prepared statement tracking — check your version.

LISTEN / NOTIFY. LISTEN channel_x installs a subscription on the specific backend. When the transaction ends and the server is returned, the subscription is gone. Workaround: a separate direct connection (bypassing PgBouncer) for pub/sub, or switch to a real broker (Redis, Kafka, NATS).

Temporary tables. CREATE TEMP TABLE staging ... is session-scoped. Under transaction mode the temp table exists only inside the creating transaction. Workaround: keep the entire use inside one transaction, or use CTEs (WITH staging AS (...)) which are transaction-local by design.

SET (not SET LOCAL). SET timezone = 'Asia/Kolkata' outside a transaction persists for the session — meaningless under transaction pooling. Use SET LOCAL inside a transaction (auto-resets on commit), or configure server_reset_query = DISCARD ALL in pgbouncer.ini.

Advisory locks. pg_advisory_lock(42) is session-scoped. Use pg_advisory_xact_lock(42) — the transaction-scoped variant.

Cursors WITH HOLD. Persist across transactions; they break under transaction pooling. WITHOUT HOLD is fine (transaction-local anyway).

Two-phase commit (PREPARE TRANSACTION). The PREPARE and the eventual COMMIT PREPARED will almost certainly land on different server connections. Do not use 2PC through PgBouncer.

Unifying principle: if its lifetime is the session rather than a transaction, it breaks. Ask "if the backend were replaced at every COMMIT, would this still work?" If no, move it to SET LOCAL, a CTE, or a direct connection.

A minimal pooler in Python

To understand what PgBouncer is actually doing, here is a transaction-mode pooler in two small pieces, about sixty lines together. It fakes protocol parsing by treating frames opaquely and matching a sentinel for transaction boundaries. Real PgBouncer parses the wire protocol properly.

The server pool:

import asyncio
from collections import deque

class ServerPool:
    def __init__(self, dsn, size):
        self.dsn = dsn
        self.size = size
        self.idle = deque()
        self.cond = asyncio.Condition()

    async def start(self):
        # Pre-open `size` server connections to Postgres.
        for _ in range(self.size):
            reader, writer = await asyncio.open_connection(
                *self.dsn.split(":"))
            # send StartupMessage, do auth, wait for ReadyForQuery...
            # elided for brevity — real code uses asyncpg or similar
            self.idle.append((reader, writer))

    async def acquire(self):
        async with self.cond:
            while not self.idle:
                await self.cond.wait()
            return self.idle.popleft()

    async def release(self, server):
        async with self.cond:
            self.idle.append(server)
            self.cond.notify()

The client handler that multiplexes:

async def handle_client(client_reader, client_writer, server_pool):
    in_txn = False
    server = None
    try:
        while True:
            msg = await client_reader.read(4096)
            if not msg:
                break

            # If not in a transaction, we don't own a server — grab one.
            if not in_txn:
                server = await server_pool.acquire()
                in_txn = True

            s_reader, s_writer = server
            s_writer.write(msg)
            await s_writer.drain()

            # Relay the server's response back to the client.
            resp = await s_reader.read(4096)
            client_writer.write(resp)
            await client_writer.drain()

            # Detect end-of-transaction: Postgres sends ReadyForQuery ('Z')
            # with a status byte 'I' meaning "idle, not in a transaction".
            if resp and b"Z\x00\x00\x00\x05I" in resp[-6:]:
                await server_pool.release(server)
                server = None
                in_txn = False
    finally:
        if server is not None:
            await server_pool.release(server)

async def main():
    pool = ServerPool("db.example.com:5432", size=10)
    await pool.start()
    srv = await asyncio.start_server(
        lambda r, w: handle_client(r, w, pool),
        host="0.0.0.0", port=6432,
    )
    async with srv:
        await srv.serve_forever()

Two invariants drive the design.

One server per in-flight transaction, zero servers per idle client. A client that has not issued BEGIN owns no server. This is the multiplexing trick: idle clients are cheap (a socket plus a buffer); only clients mid-transaction consume a server slot. Why: web request threads spend 90% of their time on network or application logic, not SQL — detaching the server during those idle windows frees it for another client.

Transaction boundaries come from the server, not the client. The pooler detects ReadyForQuery with status 'I' (idle, not in transaction) as the signal. It does not parse BEGIN / COMMIT from the client stream, because that would miss autocommit, implicit transactions, and savepoint nesting. The server is the source of truth.

PgBouncer does essentially this, with real protocol parsing, TLS, SCRAM auth pass-through, cancellation-request routing, reset queries, and per-pool accounting. The core idea — acquire-on-BEGIN, release-on-ReadyForQuery('I') — is the same thirty lines.

PgBouncer configuration for production

Here is a production-shaped pgbouncer.ini with the variables that actually matter.

[databases]
app = host=db-primary.internal port=5432 dbname=app

[pgbouncer]
listen_addr = 0.0.0.0
listen_port = 6432

; Total client-side connections this PgBouncer accepts.
max_client_conn = 10000

; Per-(database,user) server pool size.
default_pool_size = 20

; Cap on server connections to a single database, across all pools.
max_db_connections = 100

; Transaction-level pooling — the production default.
pool_mode = transaction

; Close idle server connections after 10 minutes — keeps
; Postgres memory bounded on quiet hours.
server_idle_timeout = 600

; If a client requests a server and none is free, wait at most 30s
; before erroring. Protects against thundering herds.
query_wait_timeout = 30

; Clean up any session-local state before handing a server back to the pool.
server_reset_query = DISCARD ALL

; Authentication — use SCRAM and delegate to Postgres.
auth_type = scram-sha-256
auth_file = /etc/pgbouncer/userlist.txt

; Log slower-than-normal events; silence the rest in production.
log_connections = 0
log_disconnections = 0
log_pooler_errors = 1
stats_period = 60

The four variables to tune first:

max_client_conn — the maximum client connections you ever expect. PgBouncer holds idle client sockets cheaply; keep this generously above peak.
default_pool_size — the server pool for each (database, user) pair. This is the number of real Postgres backends PgBouncer opens. Keep it small (10-30 per instance). Why: total backends = default_pool_size × databases × pgbouncer_instances, so the per-pool number must stay low to respect max_connections across a fleet.
max_db_connections — hard cap on server backends against a single database. Safety belt against misconfiguration.
server_idle_timeout — closes idle server connections after this long. Ten minutes is common; prevents quiet periods from leaving stale backends attached.

Two more that bite in production: query_wait_timeout — how long a client waits for a free server before erroring (without this, you queue indefinitely under load). server_reset_query = DISCARD ALL — runs DISCARD ALL between transactions, dropping any leaked session state.

PgBouncer exposes an admin pseudo-database on the same port. SHOW POOLS; and SHOW STATS; expose per-pool clients/servers and per-database rates. Point monitoring at these.

A 1000 rps Django app — three deployment options.

Traffic: 1000 rps, 3 queries per request, avg transaction 10 ms.

A: no pooling. 1000 rps × 3 × 20 ms handshake = 60 seconds of handshake per second. Impossible.

B: in-process pools only. 10 pods × 10 Gunicorn workers × pool of 20 = 2000 Postgres connections. Actual active concurrency is ~10. Postgres burns 20 GB of backend memory on mostly-idle connections and hits ProcArrayLock contention. max_connections needs to be 2500+. Works, but wasteful.

C: in-process pool + PgBouncer transaction mode. 10 pods × 10 workers × pool of 5 (the pool only covers the PgBouncer hop, not the Postgres handshake) × PgBouncer default_pool_size = 50 → 50 real server connections. 1000 rps × 10 ms = ~10 concurrent transactions; 50 server slots give 5× headroom. Postgres memory: 50 × 10 MB = 500 MB. max_connections = 100 is plenty. Extra latency per query: 0.1-0.5 ms through PgBouncer, negligible against 10 ms queries.

Forty-fold reduction in Postgres backend memory at the same throughput. Transaction-mode caveats apply: the app uses SET LOCAL for timezone overrides and routes LISTEN/NOTIFY to Redis. This is the pattern every production Postgres deployment converges on.

Common confusions

"PgBouncer speeds up queries." No — PgBouncer does not touch query execution. Slow queries stay slow. What it fixes is connection setup overhead and per-backend memory. A 50 ms query stays 50 ms (plus a sub-millisecond PgBouncer hop).

"Transaction pooling is always fine." Only if you do not use any session-state feature. The server-side prepared-statement trap bites silently — works in dev (session mode), breaks in prod (transaction mode) with prepared statement "S_1" does not exist. Always test under the production pool mode.

"More pool connections is always better." No. Past a point, Postgres's internal contention — ProcArrayLock, buffer pool latches, WAL insert locks — dominates any parallelism gain. 50-200 server connections is typically the sweet spot per primary. More often means less total throughput.

"Pooling makes database connections transparent." PgBouncer hides the cost of connection setup, but the semantic difference between session and transaction mode leaks into application code. You must know which mode you are running — the features that break under transaction pooling are exactly those that rely on server-session identity.

"Odyssey is a drop-in replacement for PgBouncer." Similar mission, not identical. Odyssey is multi-threaded and supports more auth modes, at the cost of higher memory footprint and a different config file. If PgBouncer's single-threaded ceiling is hurting you, upgrade — otherwise PgBouncer's simplicity is a feature.

Going deeper

Multi-threaded poolers: Odyssey and pgcat. PgBouncer's single-threaded event loop leaves 15 of 16 cores idle on a modern server — you scale by running multiple processes behind a TCP load balancer, which complicates admin introspection. Odyssey (Polara) and pgcat (Rust) run one event loop per core and scale near-linearly. Pgcat additionally does sharded read/write routing — sending SELECTs to replicas and writes to the primary based on statement inspection — something PgBouncer refuses to do because it does not parse SQL.

In-tree Postgres pooling. Postgres has discussed a built-in pooler for years. The blocker is that the backend process model is so tied to one-client-per-process that decoupling requires restructuring how transactions, snapshots, and locks are attributed. The 2018 pg_pool proposal and more recent threaded execution work have not landed. External pooling remains necessary.

Managed-service equivalents. AWS RDS Proxy is an in-VPC transaction-mode pooler for RDS/Aurora. Google Cloud SQL ships no built-in pooler — the guidance is PgBouncer in a sidecar. Azure Database for Postgres Flexible Server ships PgBouncer built in, tunable through the portal.

Connection pooling and replication. Once you have read replicas (next chapter), transaction-mode pooling interacts with read-after-write semantics — a user's follow-up read must either go back to the primary or to a replica caught up past their write. Pgcat and ProxySQL handle this routing; PgBouncer does not, and you partition by role at the application level.

Where this leads next

Build 8 continues into the story of what happens once you have multiple Postgres instances. The next chapters cover the Write-Ahead Log (WAL) as the source of truth for replication, streaming replication and how a standby stays in lockstep with its primary, read-replica lag and the tradeoffs between synchronous and asynchronous replication, and logical replication for selective data routing. Every one of those topics inherits the connection-pooling story — the same default_pool_size decisions apply per replica, and transaction-mode's session-state caveats apply to every backend you talk to.