In short

Point-in-Time Recovery (PITR) is the operational answer to the one question every replication strategy fails: what if the write was intentional, committed, and wrong? No amount of synchronous replication saves you from DELETE FROM users; without a WHERE clause — every replica applied it faithfully. PITR reaches back in time and produces a database as it was one second before the mistake.

The trick is two artifacts kept together. A base backup is a full physical copy of the data files at some past moment — usually taken nightly, stored in object storage. A WAL archive is every WAL segment the primary has generated since, streamed continuously to the same storage. Together they give you an unbroken chain: restore the base backup, replay WAL forward to any timestamp within the archive's retention window, stop there. Postgres exposes this through recovery_target_time, recovery_target_lsn, and recovery_target_xid.

The operational picture around that mechanism decides whether it saves you or not. Cadence: base backups daily (or more often for large databases), WAL archived every segment as it fills (~1-minute granularity under load). Retention: 30 days of base + WAL is a typical floor; regulated industries hold years. Tooling: pgBackRest, Barman, and WAL-G do this properly — parallel, compressed, deduplicated, verified. The 3-2-1 rule: three copies, two media, one offsite. And the discipline: an untested backup is not a backup. Monthly restore drills to a staging environment are the difference between a documented recovery and a 3 AM discovery that the WAL archive has a six-hour gap nobody noticed.

This chapter builds the mental model of the restore sequence, walks real retention numbers for a 1 TB database, covers the 3-2-1 rule applied to Postgres, and distinguishes PITR from what it is not (pg_dump, replicas, RAID). It closes with the tooling ecosystem and a concrete 500 GB accidental-UPDATE scenario worked end to end.

It is 3:47 PM on a Tuesday. An on-call engineer is cleaning up a staging dataset and, out of muscle memory, runs the query against production:

DELETE FROM users;

No WHERE. The transaction commits. Forty-eight million rows, gone in six seconds. The application begins returning 500s for every request that joins against users — which is almost every request. At 3:52 PM somebody notices. At 3:55 PM the engineer realises what happened and opens Slack.

Replication does not help. The replica in the neighbouring AZ applied the DELETE eight milliseconds after the primary; the one in Dublin applied it half a second later. All three databases agree: users is empty. There is no transaction to roll back — the COMMIT landed cleanly. The primary's WAL contains the deletion, durable on disk and in every replica's disk.

The only thing that saves the company is a backup archive sitting in S3 — a base backup from 02:00 that morning plus every WAL segment generated between 02:00 and 15:47. The recovery procedure restores the base, replays WAL forward to 2026-04-21 15:46:59 IST, and promotes. The database resumes exactly as it was one second before the fatal query. The company loses one minute of writes — the last fifty or so INSERTs, UPDATEs, and harmless DELETEs that arrived after 15:46:59 — in exchange for not losing forty-eight million users.

This chapter is about the machinery that makes that paragraph possible, and the operational practices that make it reliable rather than aspirational.

The backup components

PITR rests on three artifacts, and understanding them separately is the whole lesson.

Base backup. A full physical snapshot of the data files at some past LSN. In Postgres this is the contents of $PGDATA at the moment of the snapshot, produced by pg_basebackup or a tool that speaks the same protocol. The snapshot is consistent: the pg_backup_start() / pg_backup_stop() protocol brackets the copy so that replaying WAL from the start LSN to the stop LSN produces a recoverable state even though individual files were copied while the database was running. You do not need to stop the primary to take a base backup; that is the point.

Cadence: nightly for most systems, every few hours for very high-change-rate databases. Time to take one is roughly (size / read bandwidth) — a 1 TB database at 500 MB/s takes ~35 minutes; parallelised and compressed by modern tools, 10-15 minutes.

WAL archive. Every WAL segment file the primary generates, shipped off-cluster as soon as it is full. Postgres segments are 16 MB by default; under moderate write load a new segment fills every 30-120 seconds. The mechanism is the archive_command setting — a shell command Postgres invokes once per filled segment, whose contract is "this segment is now durably stored elsewhere; return 0". In practice the command is a wrapper from your backup tool: pgbackrest archive-push, barman-wal-archive, wal-g wal-push. The archive lives in S3, GCS, Azure Blob, or any object store with atomic-put semantics.

Why the WAL archive is the critical artifact: the base backup freezes the database at one moment; the WAL archive is the tape recording of everything that happened after. You can always take another base backup. You cannot regenerate lost WAL — if a segment is missing from the archive, the timeline from that point onward is unreachable. A base backup without its downstream WAL is useless for PITR; a WAL archive without a base to start from is useless by itself. They are a set.

Timeline tracking. When a database is restored to a past point and promoted, it begins a new timeline — Postgres increments the timeline ID and stamps all subsequent WAL with the new ID. The WAL from the old timeline is preserved in the archive: if the recovery target was wrong (15:46:59 restored, 15:40 was actually the bad moment), you restart from the base backup and replay forward on timeline 1 again. Timelines make the archive a branching history rather than a linear one. Every production backup tool tracks them automatically.

The restore sequence

The recovery recipe, once the components are in place, is mechanical. Postgres-specific but the shape holds for every system:

  1. Provision a fresh host. Do not overwrite the damaged primary; you may need to inspect it later.
  2. Restore the base backup. Copy the snapshot files from object storage to the new $PGDATA.
  3. Configure the recovery target. In postgresql.conf:
    restore_command = 'pgbackrest --stanza=prod archive-get %f "%p"'
    recovery_target_time = '2026-04-21 15:46:59 IST'
    recovery_target_action = 'pause'
    
    restore_command tells Postgres how to fetch each WAL segment from the archive on demand. recovery_target_time is the timestamp to stop at; recovery_target_lsn or recovery_target_xid exist for finer control.
  4. Start Postgres in recovery mode. Touch recovery.signal in $PGDATA and start the server. Postgres enters recovery, pulls WAL from the archive, and replays forward from the base backup's stop LSN.
  5. Wait for the target. When replay crosses recovery_target_time it pauses. The database is now live read-only at exactly the requested moment. Run sanity queries — count rows, check recent timestamps, verify the bad change is absent.
  6. Promote. Run pg_ctl promote. Postgres exits recovery, opens a new timeline, becomes read-write. If the state is wrong, shut down, adjust recovery_target_time, and start again — moving the target earlier is free, later requires more WAL.

In production this is scripted. You do not type these commands at 4 PM on the day of the incident; you run pgbackrest --stanza=prod --delta --type=time --target="2026-04-21 15:46:59" restore from a runbook.

PITR restore timelineA horizontal timeline running from 02:00 to 16:00. A filled block at 02:00 marks the nightly base backup. Green tick marks along the line between 02:00 and 15:47 represent WAL segments being archived continuously. A red vertical mark at 15:47 marks the accidental DELETE. A blue vertical mark just before it at 15:46:59 marks the restore target. Two restore arrows: one from the base-backup block forward along the timeline to the restore target, labelled "replay WAL", and one short arrow from the restore target branching upward to a new timeline, labelled "timeline 2: promoted".base02:00WAL segments archived continuouslytarget15:46:59DELETE15:47:00restore + replay WAL forward to targettimeline 2: promoted
The PITR restore timeline. A base backup is taken at 02:00 and the WAL archive captures every segment as it fills. At 15:47 a destructive query commits. Restore pulls the 02:00 base backup to a fresh host, then replays WAL forward from the archive until 15:46:59 and pauses. Promoting opens a new timeline — the old timeline's WAL is preserved in case the operator needs to retry with a different target.

Why this works — WAL contains the full deterministic history

The reason PITR is exact rather than approximate is the same reason async replication is byte-identical rather than divergent (see async log shipping): WAL is the complete, deterministic history of every committed change.

Every INSERT, UPDATE, DELETE, COMMIT, index update, sequence tick, and hint-bit flip lands in WAL before anything else. WAL records are physical — "page 17, offset 2048, these 73 bytes" — so replaying them on any disk that starts in the right state produces a byte-identical result. No interpretation, no re-execution of SQL, no dependency on wall clocks or sequences.

Why the timestamp-based target works even though WAL records are addressed by LSN: every commit record embeds the wall-clock timestamp at which the COMMIT landed on the primary. Recovery reads those timestamps as it replays forward and stops when it sees one at or after the target. Your recovery granularity is one transaction, not one second.

The only limit on how far back you can restore is the archive's retention. A base backup from 20 days ago plus 20 days of WAL reaches every moment in between.

The backup cadence — a real-world example

Concrete numbers for a 1 TB Postgres primary handling ~5,000 TPS:

These scale roughly linearly until you hit network bandwidth (cross-region archives) or apply throughput. A 10 TB database shifts the base-backup window into hours and pushes you towards filesystem snapshots (ZFS, EBS) as the base mechanism with WAL archive on top.

The 3-2-1 rule

The 3-2-1 rule predates databases; it is the standard hygiene for any irreplaceable data. Phrased minimally:

Applied to a production Postgres setup this maps cleanly onto components you already have:

Two media: block storage on primary and replica, object storage for the archive. One offsite: the cross-region S3 bucket. The rule is satisfied incidentally by a well-configured HA setup with a working PITR archive.

The common failure mode is conflating copies that are not independent. Two replicas in the same AZ are one copy from a fire-in-the-building standpoint. An RDS instance with automatic snapshots is one system with internal redundancy, not three copies — if the AWS account is compromised or the region has a bad day, you can lose the primary and every snapshot in the same incident. Serious setups cross-replicate the S3 archive to a bucket in a separate AWS account with write-only IAM permissions, so a compromised primary cannot also delete its backups.

Backup testing — the forgotten practice

The most common and most demoralising backup failure is not "the backup did not exist" — it is "the backup existed but did not restore". Causes are depressingly varied:

Every one of these is invisible until you try to restore. The only defence is exercising the restore path regularly.

The operational standard is a monthly restore drill at minimum, weekly for regulated environments: spin up a sandbox, restore the most recent base backup, replay WAL to a timestamp roughly 1 hour before the drill, promote, run row-count and checksum queries against canonical tables, compare against the primary or a replica, tear down, alert if any step fails.

pgbackrest restore --type=time --target=... plus pg_isready and a checksum script automates this to a nightly cron. Barman and WAL-G have equivalents. The pipeline is not complicated — what is hard is the discipline of running it and caring when it fails. A restore-drill failure three Tuesdays in a row that nobody pages on is the same as having no backups at all.

Common backup tools

Four tools dominate Postgres PITR:

All four speak the same underlying protocol; differences are operational ergonomics, not capability. Pick one, commit to it, learn its failure modes, test its restore path.

# pgBackRest: take a full backup, then run PITR to 15:46:59 IST
pgbackrest --stanza=prod --type=full backup

pgbackrest --stanza=prod \
  --type=time \
  --target="2026-04-21 15:46:59+05:30" \
  --target-action=promote \
  restore

# Verify archive completeness
pgbackrest --stanza=prod check

# Archive log inspection
pgbackrest --stanza=prod info

Logical backups — pg_dump is a complement, not a substitute

pg_dump produces a logical backup: a stream of SQL statements that, replayed, reconstruct the database. Useful but not PITR.

Where logical backups win: cross-version moves (pg_dump from PG15 restores cleanly into PG17; physical base backups do not — WAL formats change), dev seeding (schema plus sampled rows into a dev environment), cross-flavour migration (portable SQL imports into MySQL, Redshift with massaging), selective restore (pg_restore -t one_table; physical is all-or-nothing).

Where they fail: speed — restoring a 1 TB pg_dump takes 12-24 hours (single-threaded SQL, index rebuilds, FK revalidation); physical base + WAL for the same database is 10-15 minutes. Fidelity — a dump captures logical state at one moment; no in-flight transactions, no LSNs, no sub-dump granularity. PITR to "three seconds before the incident" is impossible with pg_dump.

Standard posture: physical PITR as primary strategy; nightly pg_dump to a separate bucket as a version-portable second copy. The dump is your fallback if something corrupted the physical archive; the physical archive is your go-to for everything else.

The restore-to-standby pattern

PITR's second-most-common use, after disaster recovery, is building a fresh standby. Restore base + WAL to a new host with no recovery target, then instead of promoting, configure the host as a streaming replica and restart; it connects to the live primary and resumes streaming from the archive's last LSN.

This bootstraps a new replica without running pg_basebackup against the primary — useful for large databases where a fresh base-backup run would spike primary I/O. It is also how you spin up branch databases (restore to yesterday, promote for a migration dry run), major-version upgrades (restore to a host running the new version, pg_upgrade, test, switch over), and forensics (restore to before an incident to inspect state).

pgbackrest restore --type=standby and wal-g backup-fetch --restore-replica automate the "don't promote, become a replica" variant. Cheap, low-risk, does not touch the live primary.

A 500 GB PITR drill, end to end

You run a Postgres 16 primary on an r6i.2xlarge, 500 GB, 2,000 TPS writes. At 15:47 a contractor with momentarily-excessive privileges runs:

UPDATE orders SET status = 'cancelled';

Six hundred thousand orders marked cancelled. Shipping, billing, and email pipelines all consume status — cancellation emails start going out within minutes.

At 16:30 the on-call detects the problem; by 16:45 the decision to PITR is made. Recovery target: 2026-04-21 15:46:45 IST (15 seconds of margin before the bad UPDATE).

Components in place: pgBackRest stanza prod, daily 02:00 base backup, continuous WAL archive, 30-day retention. Yesterday's base sits in S3. WAL segments from 02:00 through 16:47 are in the same bucket.

The restore:

pgbackrest --stanza=prod \
  --type=time \
  --target="2026-04-21 15:46:45+05:30" \
  --target-action=promote \
  --process-max=8 \
  restore

# Streams 500 GB from S3 with 8 parallel workers.
# Saturates ~1.25 GB/s -> ~7 min for base.
# Replays ~14 h of WAL (~11 GB) -> ~40 s.
# Total: ~8 min.

sudo -u postgres psql -c "SELECT count(*) FROM orders WHERE status = 'cancelled';"
# Expect ~12,000 (historical baseline), not 600,000.

Damage summary: restore completed at 17:00, incident-to-recovery 73 minutes. Transactions lost: everything between 15:46:45 and the UPDATE (~15 seconds × 2,000 TPS = ~30,000), plus everything committed between 15:47 and 16:45 that cannot coexist with the restored past (~7 million). These must be replayed from application-side event logs or accepted as business loss.

Followup: cut traffic to the restored database and the old primary off, re-establish the replica topology from the same archive (restore-to-standby), examine audit trails for writes between 15:46:45 and the UPDATE that must be preserved, update the runbook and the permission model, schedule a blameless post-incident review.

The PITR mechanism did exactly what it was designed to do, in eight minutes. Everything around it — decision time, permissions, application replay, topology reconstruction — is where hours go. That operational overhead is part of the strategy, not an afterthought.

Common confusions

Going deeper

WAL compression and delta encoding

WAL is repetitive — same record types, same page numbers, similar tuples — so general-purpose compression reliably yields 3-5× on the archive. Postgres 9.5 added wal_compression = on, which compresses full-page images inside WAL records on the primary before the record is written to disk. A primary generating 100 GB/day shrinks to ~25 GB/day at a few-percent CPU cost. pgBackRest and WAL-G apply a second zstd pass at archive-push; together they hit ~6× on OLTP.

Incremental base backups

Classic nightly full backups ship the entire database each time — wasteful when 99% of the data did not change overnight. pgBackRest has shipped differential and incremental backups for years. Postgres 17 added incremental backup as a core feature (pg_basebackup --incremental=<LSN> with pg_combinebackup to reassemble). For a 10 TB database with 1% daily change, the archive stores 10 TB once per week plus 6 × 100 GB incrementals — 10.6 TB/week instead of 70 TB.

Cross-region DR with geographically distributed WAL

For tier-1 systems that must survive a whole-region outage, the archive itself must live in a second region. Two patterns: S3 Cross-Region Replication asynchronously copies every object from the primary bucket to a secondary-region bucket (10-60 s lag); dual-push archive writes the WAL segment to both buckets synchronously (higher archive-push latency, lower RPO). pgBackRest supports multiple repositories natively.

The CRR gotcha: a malicious or buggy delete on the primary bucket replicates to the DR bucket. Defence is object-lock or versioning on the DR bucket so delete events are recorded but not destructive.

Where this leads next

You now have the full backup picture for a single database: base backup, WAL archive, restore procedure, 3-2-1 rule, tools, testing discipline, restore-to-standby. Patterns generalise — MySQL's equivalents are mysqldump, binlog archive, and Percona XtraBackup; Oracle has RMAN; SQL Server has differential backups and log shipping to URL. Shape: physical snapshot plus log archive equals PITR.

PITR has bad historical press because most people only think about it after it has already failed them. Done correctly — cadence, monthly drills, 3-2-1 coverage, engineers who have actually executed the runbook — it is the single highest-leverage piece of operational infrastructure a production database has.

References

  1. PostgreSQL documentation, Chapter 26: Backup and Restore — the canonical reference for base backups, continuous archiving, archive_command, restore_command, recovery_target_time, and timelines.
  2. pgBackRest User Guide — configuration, stanza layout, parallel restore, retention policies, and the full command reference for the most widely used Postgres backup tool in production.
  3. 2ndQuadrant, Barman documentation — the equivalent reference for Barman, with detailed coverage of incremental backup via rsync, the backup catalog, and the barman recover command.
  4. Citus Data / Microsoft, WAL-G GitHub repository — README, configuration, and architecture notes for the cloud-native alternative, with first-class support for S3, GCS, and Azure Blob.
  5. NIST Special Publication 800-34, Contingency Planning Guide for Federal Information Systems — the formal backing for the 3-2-1 rule and for restore testing as a required control in regulated environments.
  6. Smith, PostgreSQL 9.0 High Performance, Packt 2010, chapter 14 — Gregory Smith's original book-length treatment of Postgres backup strategy, including the observation that an untested backup is indistinguishable from no backup at all, which the industry has spent fifteen years re-learning.