In short

A 127-qubit chip is not a 127-qubit computer. Every real quantum processor has a coupling map — a graph that says which pairs of physical qubits can interact directly, because the physical coupler between them exists on the chip. IBM's Heron and Condor superconducting chips use the heavy-hex lattice, in which each qubit has at most 3 neighbours (most have 2). Google's Sycamore and Willow use the square grid, 4 neighbours per qubit. Linear chains and trapped-ion chains give 2 neighbours each, but ion traps also support effectively all-to-all coupling within a chain via collective motional modes. Photonic chips allow crossings, so they relax the planarity constraint but have their own cost model. When an algorithm asks for a two-qubit gate between physical qubits that are not wired together, the compiler must insert SWAP gates to physically move the quantum state along a path until the two qubits you want to gate are adjacent. Each SWAP is three CNOTs, so it is expensive — both in depth and in accumulated error. The problem of choosing which SWAPs to insert, and when, is called routing; the standard heuristic in Qiskit is SABRE (Li, Ding, Xie 2019). On sparsely connected devices, routing overhead inflates the physical gate count by a factor of \sim\sqrt{n} to \sim n compared to an all-to-all device. The coupling map is therefore one of the two most important numbers on a chip's spec sheet — the other being fidelity.

Open a textbook on quantum algorithms and you will read about circuits that assume any two qubits can interact. The Quantum Fourier Transform on n qubits involves O(n^2) controlled-phase gates between pairs of qubits — all pairs, including ones that are far apart in the register. Shor's algorithm is built on QFTs. Grover's diffusion operator is symmetric over all qubits. Every algorithm in Nielsen-Chuang is written as if the hardware were fully-connected.

Real chips are not fully connected. The physical couplers that allow two transmons to interact — the bus resonators, the tunable couplers — have to be fabricated on a 2D silicon chip without crossings (in the superconducting case). You cannot wire every qubit to every other qubit; the engineering does not allow it. So real hardware has a coupling map, a graph where nodes are qubits and edges are couplers, and gates can only act on pairs of qubits joined by an edge.

Four hardware topologiesFour side-by-side graphs. The first is IBM heavy-hex with interleaved hexagons. The second is Google square grid of qubits in a 4x4 layout. The third is a linear chain of six circles. The fourth shows a trapped-ion chain with all qubits connected to each other via dashed lines representing all-to-all motional-bus coupling.Coupling maps — four common hardware topologiesHeavy-hex (IBM)degree ≤ 3Square grid (Google)degree = 4 (interior)Linear chaindegree ≤ 2simplestIon trap (all-to-all)via motional busdegree = n − 1slower but universal
The four dominant coupling topologies. **Heavy-hex** (IBM): hexagons of qubits with extra qubits on each edge, chosen because the low degree (≤3) suppresses crosstalk between neighbouring two-qubit gates. **Square grid** (Google Sycamore and Willow): each interior qubit has 4 neighbours — higher connectivity, but the surface code's structure maps naturally onto it. **Linear chain**: the simplest — trivial to fabricate but forces linear SWAP overhead for any non-adjacent gate. **Ion trap** (Quantinuum H-series, IonQ): the ion chain shares a collective motional mode that acts like a shared bus, so any two ions can interact. Each topology is a trade-off between fabrication cost, crosstalk, and gate overhead.

This chapter explains what each topology looks like, what the compiler does when your algorithm wants an unavailable gate, how much overhead routing introduces, and why the coupling map of a chip is nearly as important as its gate fidelity.

Why you cannot just wire everything together

The obvious engineering question: why not make the couplers go everywhere? Build a fully-connected chip, done.

On superconducting hardware, the obstacle is physics. Superconducting qubits couple via microwave capacitance or inductance — metal-on-silicon interactions on the chip surface. You cannot route a microwave line from qubit 0 to qubit 50 without crossing other lines, and on a planar chip, crossings are problematic — every crossing risks spurious coupling, mode interference, or fabrication failure. You can do limited airbridge crossings (Google and IBM both do) but not arbitrary ones. So the coupling graph is constrained to be planar (or nearly so), which means each qubit can touch at most \sim 6 neighbours before the geometry gets ugly.

On top of the planarity constraint, there is the crosstalk constraint. Every coupler is an always-on interaction — even when idle, two transmons sharing a bus resonator have a small ZZ interaction that causes the state of one to shift the other's frequency. More couplers means more always-on interactions means more parasitic coupling when you are trying to do a clean gate on a single pair. IBM's heavy-hex was explicitly chosen to minimise the number of edges per qubit, trading gate-routing flexibility for cleaner gate fidelity. Google's square grid doubles the edges but uses tunable couplers that can be turned off — paying with more wires and more calibration for more flexibility.

On trapped ions, the constraint is different. The ion chain shares collective motional modes (the ions vibrate together, phonons bouncing along the chain), and a two-qubit gate uses one of these shared modes as a temporary bus — any two ions can be entangled via the shared phonon. That gives effectively all-to-all coupling. But the chain cannot be made arbitrarily long: ~30 ions before the mode structure becomes unmanageable. To scale, ion-trap architectures use QCCD — Quantum Charge-Coupled Device — where ions are physically shuttled between trap zones, so the "coupling map" is actually a set of zones with movement between them.

Photonic quantum computing sidesteps the planarity entirely — photons can fly past each other without interacting — but has its own cost model involving probabilistic gates and heralded success.

The four topologies in detail

Heavy-hex — IBM's choice

The heavy-hex lattice is a variant of the hexagonal grid where every edge of the hexagon has an extra qubit sitting on it. This gives two kinds of qubits — "data" qubits at the hex vertices (degree 3) and "bus" qubits on the edges (degree 2). The result: every qubit has degree 2 or 3, never 4 or more. The design was introduced in IBM's 2021 roadmap papers on Eagle and scaled up in Osprey, Condor, and Heron.

Why IBM picked it: on fixed-frequency transmons with cross-resonance gates, the dominant error is spectator-induced crosstalk — when you drive qubit A at qubit B's frequency (a CR gate), any nearby third qubit C that happens to also have a frequency near B's picks up a spurious rotation. Fewer neighbours = fewer spectators = cleaner gate. Heavy-hex deliberately sacrifices connectivity to buy fidelity.

Cost: long-range gates require traversing a sparse graph. Two qubits on opposite sides of a 127-qubit Heron chip can be \sim 10 edges apart, so a CNOT between them requires \sim 9 SWAPs in the naive chain — 27 CNOTs of SWAP overhead for one logical CNOT.

Square grid — Google's choice

Each qubit has up to 4 neighbours. Used on Sycamore (2019, 53 active qubits), Willow (2024, 105 qubits), and many other Google chips. The grid has three big advantages: it is natural for implementing the surface code (whose stabilisers live on 4-neighbour plaquettes), it allows faster compilation of diverse circuits (shorter SWAP chains on average), and the tunable-coupler architecture lets Google turn off the unwanted edges when idle.

Cost: more wires, more fabrication complexity, more tunable-coupler calibration per chip. Crosstalk can still be worse than heavy-hex for specific gate patterns, managed by scheduling.

Linear chain

Just a 1D line of qubits, each connected to its two neighbours (endpoints have only one). The simplest possible topology. Used historically on early demonstrations and still in some teaching platforms. Quantum dot spin qubits and some early photonic arrays use this. The SWAP overhead is the worst-case — a CNOT between the two endpoints of an n-qubit chain requires n - 1 SWAPs.

All-to-all — trapped ions

All-to-all within a chain. Quantinuum's H2 (2024) reports all-to-all connectivity on 56 qubits with 99.9% two-qubit fidelity. IonQ uses similar architectures. The "shared motional bus" of the ion chain means any two ions can be entangled in a single gate operation, no SWAPs needed.

Cost: gates are slow (100–500 μs versus 30–100 ns for superconducting), and chains beyond ~30 ions require QCCD shuttling, which itself takes time and has its own fidelity cost.

The routing problem — converting textbook circuits to hardware

Suppose your algorithm wants a CNOT between physical qubits q_0 and q_5 on a linear chain: q_0 - q_1 - q_2 - q_3 - q_4 - q_5. The hardware only supports CNOT between adjacent pairs. What do you do?

The answer: SWAP. A SWAP exchanges the states of two neighbouring qubits. If you SWAP(q_0, q_1), then the state that was on q_0 is now on q_1. SWAP along the chain to move your original q_0 state to where q_5 sits, then apply the CNOT locally, then SWAP back.

Routing a CNOT across a chainA schematic showing a 6-qubit linear chain. The top row shows the initial state with a CNOT requested between q0 and q5. The middle row shows q0's state being SWAPped along the chain to sit next to q5. The bottom row shows the local CNOT and the optional swap-back.Routing: CNOT across a 6-qubit linear chain needs 4 SWAPsq₀q₁q₂q₃q₄q₅|ψ⟩|φ⟩start|ψ⟩|φ⟩4 SWAPsafter SWAPslocal CNOTthen CNOT
Routing a CNOT between physical qubits $q_0$ and $q_5$ on a linear chain. The $|\psi\rangle$ on $q_0$ is swapped down the chain (4 SWAPs) until it sits next to $|\phi\rangle$ on $q_5$, a local CNOT is applied, then (optionally) swap back. Each SWAP is three CNOTs, so this operation cost 4 SWAPs $= 12$ CNOTs of overhead, plus the 1 target CNOT. If we skip swap-back, we have relabelled the logical-to-physical mapping in software — the next gate uses the new mapping.

The cost of a SWAP

A SWAP between adjacent qubits is three CNOTs (see the swap-and-iswap chapter for the derivation). Each CNOT has two-qubit gate fidelity \sim 99.7\%. So one SWAP accumulates error \sim 3 \times (1 - 0.997) = 0.009 = 0.9\%, roughly triple that of a single gate.

Routing a CNOT across a chain of n qubits requires n - 1 SWAPs = 3(n-1) CNOTs just for the routing, plus one target CNOT. An algorithm circuit with k long-range gates, each needing \sim n/2 SWAPs on average on a limited-connectivity device, blows up from k gates to \sim 3kn/2 gates. The depth grows correspondingly.

Routing is the single biggest cost introduced by limited connectivity, and reducing it — by better topology, better compilers, or architectural tricks like lattice surgery — is a major research direction.

Example 1: Routing a CNOT on a 5-qubit linear chain

You have 5 qubits in a line: q_0 - q_1 - q_2 - q_3 - q_4. Your algorithm wants CNOT(q_0, q_4) — a CNOT with control on q_0 and target on q_4. The hardware only supports CNOTs between adjacent qubits.

Step 1. Find the path. The two qubits are connected by the path q_0 \to q_1 \to q_2 \to q_3 \to q_4. The distance is 4 edges. Why: on a graph, the minimum number of SWAPs to bring two qubits adjacent is the graph distance minus 1. Here distance = 4, so we need 3 SWAPs to bring q_0's state adjacent to q_4.

Step 2. Plan the routing. Swap q_0 \leftrightarrow q_1: now q_0's original state is on q_1. Swap q_1 \leftrightarrow q_2: state is on q_2. Swap q_2 \leftrightarrow q_3: state is on q_3. Now the original q_0 state lives on q_3, which is adjacent to q_4. Why: we "walk" the state along the chain using local SWAPs, each moving it one step closer to q_4.

Step 3. Apply the target gate. Local CNOT(q_3, q_4) — using the current positions of the states. This is the physical CNOT between two hardware-adjacent qubits. Why: after the SWAPs, the logical control (originally on q_0) is physically on q_3, and the logical target (originally on q_4) is still on q_4. Applying CNOT(q_3, q_4) accomplishes the logical CNOT(q_0, q_4).

Step 4. Decide whether to swap back. If the next logical gate uses q_0 again, swap back (3 more SWAPs). If the next gate happens to use q_3, keep the new mapping — the virtual mapping from logical qubit 0 to physical qubit 3 is now part of the compiler's bookkeeping. Why: the compiler tracks a virtual-to-physical qubit map. Every SWAP updates this map. The compiler's job is to minimise the total number of SWAPs over the whole circuit, not just one gate — sometimes leaving the mapping in a shifted state is cheaper.

Result. 3 SWAPs = 9 CNOTs of routing overhead, plus 1 target CNOT = 10 CNOTs total for one logical CNOT(q_0, q_4). If we skip swap-back, it is still 9 + 1 = 10. Gate count inflated by 10\times.

Routing plan for 5-qubit linear chainA circuit diagram showing 5 horizontal wires labelled q0 to q4. Three SWAP gates are drawn between adjacent pairs — q0-q1, then q1-q2, then q2-q3 — followed by a single CNOT between q3 and q4 on the right side.q₀q₁q₂q₃q₄SWAP 1SWAP 2SWAP 3target CNOT
The compiled circuit for logical CNOT$(q_0, q_4)$ on a linear chain. Three SWAPs walk the $q_0$ state down to $q_3$, then a local CNOT$(q_3, q_4)$ completes the operation. Total physical gates: 3 × 3 + 1 = 10 CNOTs, compared to 1 on an all-to-all device.

What this shows. A single logical CNOT between non-adjacent qubits on a linear chain becomes a sequence of 10 physical CNOTs. Circuit depth grows from 1 to 4. On a noisier chip, this means the routed CNOT can fail more than 10\times as often as one on adjacent qubits — routing is a first-order design concern, not a side issue.

Compilers and routing algorithms

The compiler's job is, given a circuit written for an abstract all-to-all device and a specific hardware coupling map, to produce a physical circuit with as few SWAPs as possible. This is a hard combinatorial problem — finding the optimal routing is NP-hard in general — so compilers use heuristics.

SABRE — Qiskit's default

SABRE (SWAP-based Bidirectional heuristic search Algorithm, Li, Ding, Xie 2019) is the default routing pass in IBM's Qiskit compiler and widely ported elsewhere. The idea:

  1. Walk through the circuit in order. Maintain a "current front" — the set of gates whose predecessors have been scheduled.
  2. At each step, if the next gate's two qubits are already adjacent, emit it directly.
  3. If not, pick a SWAP that reduces the total distance for all pending long-range gates — a look-ahead heuristic weighted by a decay factor (gates soon matter more than gates far out).
  4. Apply the SWAP, update the virtual-to-physical mapping, continue.

SABRE runs in polynomial time (roughly O(n^2 \log n) per layer) and finds routings that are typically within \sim 20-50\% of optimal on random benchmark circuits. It is not optimal, but it is fast enough to be practical.

Alternative algorithms

Lattice surgery — routing for error correction

For fault-tolerant quantum computing, "routing" is a completely different problem. Logical qubits are encoded in patches of the surface code; two-logical-qubit gates are performed by lattice surgery — physically merging and splitting patches of code on the 2D chip. The routing problem for surface-code patches involves scheduling the merges and splits, respecting code-distance constraints, and minimising the spacetime volume of the computation. This is the 2D-grid topology put to work — the surface code exists precisely because it maps cleanly onto a square grid.

Hardware roadmaps

Each vendor is pursuing a specific connectivity strategy.

Indian context — compilation research at NQM

India's National Quantum Mission includes a compilation and algorithms track. Groups at IIT Bombay and IISc Bengaluru have published on routing heuristics tuned for sparse topologies, and TIFR's hardware effort uses a small-scale heavy-hex prototype — meaning the same routing algorithms that IBM uses are directly relevant to the domestic platform. Industry contributions from Infosys, TCS, and Tata Elxsi have focussed on compiler-optimisation heuristics for application-specific circuits (chemistry, optimisation). For a student in India interested in quantum computing, classical CS skills in graph algorithms and combinatorial optimisation are a direct on-ramp to routing research — which remains one of the most underweighted-versus-impactful parts of the NISQ software stack.

Common confusions

Going deeper

If you understand that a coupling map is the graph of physical qubit pairs that can directly interact, that IBM uses heavy-hex (degree ≤ 3) to cut crosstalk, Google uses square grid (degree 4) for surface-code compatibility, ion traps get all-to-all via the shared motional bus, that long-range gates require SWAP chains and each SWAP is three CNOTs, that SABRE is the standard polynomial-time heuristic for routing in Qiskit, and that compilation overhead is often the dominant cost on NISQ devices — you have chapter 174. The rest is for readers who want the SABRE pseudocode, QCCD architectures for trapped ions, lattice surgery for surface-code routing, and the detailed Qiskit transpiler passes.

SABRE in detail

The SABRE heuristic defines a cost function H over the pending gate list:

H = \sum_{g \in F} D(\pi(c_g), \pi(t_g)) + W \sum_{g \in E} D(\pi(c_g), \pi(t_g)) \cdot \lambda^{\text{depth}(g)}

where F is the front layer of gates ready to execute, E is the extended (lookahead) set, D(\cdot, \cdot) is shortest-path distance in the coupling graph, \pi is the current virtual-to-physical mapping, c_g, t_g are control and target, W is a weight, and \lambda \in (0, 1) is a decay factor. At each step, SABRE evaluates all SWAP candidates adjacent to a front-layer gate and picks the SWAP that most reduces H. A bidirectional version runs the same algorithm forward and backward through the circuit and averages the mappings — usually giving better results than a single pass.

QCCD — ion-trap routing by motion

Trapped-ion processors beyond \sim 30 ions cannot use a single chain. Quantinuum's H2 uses a QCCD architecture: the chip has multiple trap zones connected by "junctions" where ions can be physically shuttled between zones. The compiler sees the device not as a coupling graph but as a spatial layout, and "routing" becomes "motion scheduling" — deciding which ions to shuttle where, and when. Shuttling takes ~100 μs per hop and has its own fidelity cost (mostly heating of the motional mode, corrected by re-cooling). Despite the overhead, QCCD has enabled Quantinuum's 99.9% two-qubit fidelity on 56 qubits.

Surface-code lattice surgery

For a fault-tolerant architecture using the surface code, each logical qubit is a square patch of the 2D grid of physical qubits — say, a 7 \times 7 patch for code distance 7. Two-logical-qubit gates are performed by lattice surgery: temporarily merge two adjacent patches along an edge (making a single long patch), perform a joint stabiliser measurement, then split back into two patches. The result is equivalent to a CNOT or CZ, depending on which edge is merged. Routing a logical CNOT between non-adjacent patches therefore requires moving a patch across the chip — but the square grid topology is what makes this clean. Lattice surgery is the dominant model for long-range logical gates in planned fault-tolerant machines.

Qiskit transpiler passes

Qiskit's transpile pipeline for a specific backend runs multiple passes in sequence:

  1. Optimisation level 0: no optimisation; gates passed through as-is.
  2. Level 1: SABRE routing + trivial gate cancellation.
  3. Level 2: SABRE + 1-qubit gate collation + CZ/CNOT commutation.
  4. Level 3: full Clifford synthesis, block-by-block resynthesis with optimal 2-qubit decompositions, extensive commutation analysis.

Level 3 can reduce gate counts by 20–40% over level 1 for structured circuits, at the cost of longer compile time.

Reconfigurable connectivity — neutral atoms

QuEra's Aquila and Atom Computing's machines use arrays of neutral atoms trapped by optical tweezers. The coupling between any two atoms depends on their distance — Rydberg interactions have a power-law falloff. The tweezers can move the atoms between shots, so the effective coupling graph is reprogrammable at the shot level. This blurs the distinction between "architecture" and "algorithm" — you choose your coupling graph to suit the circuit. Research on "programmable topology" is active, with implications for routing as well as for fault-tolerance architectures.

Why routing improvements directly buy algorithm capacity

If your device has average two-qubit-gate fidelity F and your algorithm has N logical two-qubit gates, the overall circuit fidelity is approximately F^{N \cdot \text{overhead}}, where the overhead factor captures SWAP inflation. For a circuit with N = 1000 gates, fidelity F = 0.995, and overhead factor 3 (typical on heavy-hex), the circuit fidelity is 0.995^{3000} \approx 3 \times 10^{-7} — too small to be useful. Cutting the overhead to 2 via better routing raises this to 0.995^{2000} \approx 5 \times 10^{-5} — a 170× improvement, without touching the hardware. This is why compiler research is a 10%-gain-per-year engineering lever that multiplies directly with hardware improvements.

Where this leads next

References

  1. Gushu Li, Yufei Ding, Yuan Xie, Tackling the Qubit Mapping Problem for NISQ-Era Quantum Devices (SABRE, 2019) — arXiv:1809.02573.
  2. Wikipedia, Qubit routing problem and quantum circuit compilation.
  3. IBM Quantum, Heavy-hex lattice and Heron processor documentation.
  4. Qiskit Development Team, Qiskit Transpiler Documentation.
  5. Frank Arute et al. (Google AI Quantum), Quantum supremacy using a programmable superconducting processor (2019), NaturearXiv:1910.11333.
  6. John Preskill, Lecture Notes on Quantum Computation, Chapter 7 — theory.caltech.edu/~preskill/ph229.