In short
SWAP is the two-qubit gate that swaps the contents of two wires: whatever state qubit A was in, qubit B is in now, and vice versa. It is drawn as two small × symbols on the two wires, connected by a vertical line. Every real chip has to implement SWAP out of smaller pieces — the standard recipe is three CNOTs in a staircase, one forwards, one reversed, one forwards again. iSWAP is its close cousin: it also exchanges the two qubits, but multiplies the amplitudes of |01⟩ and |10⟩ by i. That extra phase makes iSWAP entangling, while plain SWAP is not — SWAP is just a re-labelling. iSWAP is native to superconducting-qubit hardware; SWAP has to be compiled down to it, or to three CNOTs.
Imagine a quantum chip laid out on a table in front of you. The qubits are little squares arranged in a grid, and between some pairs there is a coloured bar drawn on the chip — that bar is the physical coupler that lets those two qubits talk. Qubits without a coupler between them cannot interact directly. You cannot apply a CNOT from qubit 0 to qubit 5 if the chip does not wire them together. The hardware simply does not have the connection.
This is not a rare edge case. Every real superconducting chip in the world — every IBM Heron, every Google Willow, every Rigetti Ankaa — has this problem. An IBM heavy-hex processor has 127 qubits but each qubit touches at most three neighbours. A Google sycamore-style grid has each qubit touching four. A trapped-ion device gets closer to all-to-all connectivity, but even there the effective range is limited by how long you can hold the ion chain stable.
So what do you do when your algorithm wants a CNOT between two qubits that have no direct coupler? You SWAP the quantum information along the chain until the two qubits you care about end up as neighbours, apply the CNOT there, and (usually) swap back. The SWAP gate is the tool that makes this work — and the reason every QC compiler spends so much of its time deciding where and when to insert them.
This chapter is the story of that gate: what it does to basis states, why its matrix is so simple, how three CNOTs add up to one SWAP, and what iSWAP — a close relative native to some hardware — does differently.
What SWAP does
Start with the job description in one sentence: SWAP exchanges the two input qubits. If qubit 1 was holding the state |\psi\rangle and qubit 2 was holding |\phi\rangle, then after SWAP the first wire carries |\phi\rangle and the second wire carries |\psi\rangle. No measurement happens, no information is destroyed, nothing is entangled — the two states have simply switched places.
Why it is only a permutation: SWAP acts on basis states by relabelling which qubit is which. A permutation has no room to create a superposition that was not already there, so if the input is a product state it comes out a product state — with the roles of qubit 1 and qubit 2 exchanged.
On the four computational-basis states the action is especially clean. The state |ab\rangle is the one where qubit 1 reads a and qubit 2 reads b, so "swap the qubits" is the same as "swap the bits":
| Input | Output |
|---|---|
| $ | 00\rangle$ |
| $ | 01\rangle$ |
| $ | 10\rangle$ |
| $ | 11\rangle$ |
Two of the four basis states — |00⟩ and |11⟩ — are unchanged, because they already look the same on both wires. The other two — |01⟩ and |10⟩ — trade places. That is the entire gate, for every state: because any state is a complex linear combination of those four basis states, and the gate is linear, knowing what it does to the four is the same as knowing what it does to every state.
Circuit symbol
In a circuit diagram SWAP is drawn as a small × on each of the two wires, connected by a short vertical line. The × does not mean "cross multiply" or "delete" — it is a visual mnemonic for the letter X in "exchange." Wherever this symbol appears, read it as "these two qubits trade places right here."
The matrix
Now that you know what the gate does, you can read off its matrix. The rule for any two-qubit gate is: the columns of the matrix are labelled by the input basis states |00⟩, |01⟩, |10⟩, |11⟩ (in that order), and each column is the output state written as a column of four amplitudes.
SWAP sends |00⟩ → |00⟩ (column 1 is (1, 0, 0, 0)), |01⟩ → |10⟩ (column 2 is (0, 0, 1, 0)), |10⟩ → |01⟩ (column 3 is (0, 1, 0, 0)), and |11⟩ → |11⟩ (column 4 is (0, 0, 0, 1)). Writing the four columns side by side:
Reading the matrix. The top-left entry and the bottom-right entry are both 1 — that is |00⟩ and |11⟩ going to themselves. The two off-diagonal entries in the middle block are the swap: row 3 column 2 and row 2 column 3 are both 1, which says |01⟩ lands on the |10⟩ row and |10⟩ lands on the |01⟩ row. Every other entry is zero. The matrix is pure permutation — every column has a single 1, every row has a single 1, and that 1 is always real and positive.
Why "pure permutation" matters: SWAP cannot create interference or entanglement, because it never mixes amplitudes. It just moves them around.
SWAP is its own inverse. Apply it twice and you swap back: \text{SWAP} \cdot \text{SWAP} = I. You can see this from the matrix directly — multiplying the permutation by itself returns each basis state to where it started — or from the physical picture: swapping twice leaves the qubits in the original order. This also means SWAP is Hermitian (equal to its own conjugate transpose) and unitary (since U^\dagger U = U \cdot U = I).
The 3-CNOT decomposition — why SWAP is not a hardware primitive
Here is a puzzle. If SWAP just relabels the two qubits, why can't the hardware do it "for free" — just rename the wires in software, or physically rewire them? The answer is that in a physical quantum computer, a qubit is a specific lump of matter: a superconducting transmon, a trapped ion, a silicon spin. The information is encoded in that specific lump, and there is no "software relabel" that moves the actual quantum state to a different lump. To move the state, you have to apply physical operations — gates — that push the state from one qubit to the other.
And what operations does the hardware provide? On most real machines, the native two-qubit gate is some kind of controlled gate: CNOT, or CZ, or a related cousin like the cross-resonance gate. SWAP is not on the native list — every chip has to build it out of the primitives it does have.
The classical recipe — the one every student learns and every compiler uses — is the three-CNOT decomposition:
where \text{CNOT}_{12} means "CNOT with qubit 1 as the control and qubit 2 as the target" and \text{CNOT}_{21} is the same gate with the control and target flipped. Read left-to-right as matrix multiplication, right-to-left as time order — so in a circuit diagram you apply \text{CNOT}_{12} first, then \text{CNOT}_{21}, then \text{CNOT}_{12} again.
Why it works — tracing a basis state through the staircase
The fastest way to be convinced that three CNOTs equal a SWAP is to pick one basis state and walk it through. Take the input |10\rangle — qubit 1 is in state 1, qubit 2 is in state 0 — and apply the three gates one at a time.
Step 1: \text{CNOT}_{12} on |10\rangle.
CNOT with control on qubit 1 and target on qubit 2 flips qubit 2 whenever qubit 1 is 1. Qubit 1 is 1, so qubit 2 flips from 0 to 1:
Why: the control is on, so the target flips. Classically this is "XOR qubit 1 into qubit 2".
Step 2: \text{CNOT}_{21} on |11\rangle.
Now the control and target have swapped roles. Qubit 2 is the control, qubit 1 is the target. Qubit 2 is 1, so qubit 1 flips from 1 to 0:
Why: the middle CNOT is reversed on purpose — it reaches back and copies qubit 2's value into qubit 1, erasing the 1 that was on qubit 1 (because 1 XOR 1 = 0).
Step 3: \text{CNOT}_{12} on |01\rangle.
Back to control on qubit 1, target on qubit 2. Qubit 1 is now 0, so the control is off, and qubit 2 is untouched:
Why: the last CNOT does nothing because the control has been set to 0 by the previous step. This is the whole trick — the middle CNOT cleans up the control so the final CNOT is a no-op for this basis state.
Start: |10\rangle. End: |01\rangle. The bits have swapped. And because the argument above depended only on the rule "CNOT flips the target when the control is 1" — which is linear — the same three-step staircase works on every basis state, and therefore on every superposition of basis states.
The same argument, on the matrix
For the reader who prefers to see it as matrices: CNOT_{12} written in the \{|00\rangle, |01\rangle, |10\rangle, |11\rangle\} basis is
and CNOT_{21} (control on qubit 2, target on qubit 1) is
Why CNOT₂₁ has 1s where it does: the control is qubit 2 (the right bit). So it flips qubit 1 whenever the right bit is 1 — that moves |01⟩ to |11⟩ (row 4, column 2) and |11⟩ to |01⟩ (row 2, column 4), while leaving |00⟩ and |10⟩ alone.
Multiplying the three matrices in the order CNOT_{12} \cdot CNOT_{21} \cdot CNOT_{12} — you can verify entry by entry, or trust the basis-state trace above — gives exactly the SWAP matrix. There is no shortcut; the three CNOTs are genuinely needed. You cannot do SWAP in two CNOTs, and proving this is a short counting argument: two CNOTs cannot scramble the four basis states in the pattern SWAP demands.
Why three and not one
A natural question: if SWAP is "just relabelling," why does it cost three CNOTs? The answer is that a single CNOT is not a swap — it only copies one bit into another, and it does so by XOR rather than by replacement. To actually exchange two values classically using only XOR, the algorithm is a \leftarrow a \oplus b, then b \leftarrow a \oplus b, then a \leftarrow a \oplus b. Three XORs. Each XOR is a CNOT in the quantum picture. That is literally what the staircase above is: the reversible-computing version of the classic "swap without a temporary variable" trick.
iSWAP — the hardware-native cousin
On some platforms — most famously superconducting transmon chips — the two-qubit interaction that is actually cheap is not the CNOT but a gate called iSWAP. Its action on the computational basis is almost identical to SWAP, but with one crucial difference: the two states that actually move pick up a phase of i.
| Input | SWAP output | iSWAP output |
|---|---|---|
| $ | 00\rangle$ | $ |
| $ | 01\rangle$ | $ |
| $ | 10\rangle$ | $ |
| $ | 11\rangle$ | $ |
In matrix form:
Reading the matrix. The diagonal corners are still 1 (the states |00⟩ and |11⟩ are untouched), but the swapped block in the middle now has i instead of 1. The gate still permutes |01⟩ and |10⟩, but it multiplies each by i in the process.
Why iSWAP is entangling and SWAP is not
SWAP is a permutation of basis states. If you feed it a product state like |\psi\rangle \otimes |\phi\rangle, you get back the product state |\phi\rangle \otimes |\psi\rangle — still a product, just with the factors exchanged. It cannot create entanglement.
iSWAP is different. The factor of i on the off-diagonal entries is not a global phase — it is a relative phase between the basis states |01⟩ and |10⟩ and the unswapped states |00⟩ and |11⟩. Relative phases are observable, and in particular they can turn an unentangled input into an entangled output.
A concrete illustration. Take the product state
Apply iSWAP. The |00⟩ component is unchanged; the |10⟩ component becomes i|01\rangle. So the output is
Why the result is still a product: after pulling the common factor of |0⟩ out of the first slot, the remaining state on qubit 2 is the equal superposition with phase i, which is the state |+i⟩. No entanglement here — the factor of i was absorbed into qubit 2's phase.
That particular input happened to stay unentangled. But a different product input — try |+\rangle|+\rangle = \tfrac{1}{2}(|00\rangle + |01\rangle + |10\rangle + |11\rangle) — gives
and this state is genuinely entangled: you cannot factor it as any product |\alpha\rangle|\beta\rangle. Check by trying: if |\alpha\rangle = a|0\rangle + b|1\rangle and |\beta\rangle = c|0\rangle + d|1\rangle, then the product expansion gives coefficients ac, ad, bc, bd for the four basis states, and you would need ac = 1, ad = i, bc = i, bd = 1. From the first and fourth, ac = bd, so a/b = d/c. From the second and third, ad = bc, so a/b = c/d. Combining: d/c = c/d, which forces c^2 = d^2, i.e. c = \pm d. Neither choice satisfies ac = 1 and ad = i simultaneously — no factorisation exists.
Why this matters: an operation that can take a product state to an entangled state is called entangling. SWAP cannot; iSWAP can. That single difference — a phase of i on two matrix entries — is the entire pedigree of iSWAP as a quantum-computational resource.
Where iSWAP lives in the hardware
On a superconducting transmon chip the physical interaction that couples two qubits is typically described by a Hamiltonian that looks roughly like H_{\text{couple}} = g(\sigma_+ \sigma_- + \sigma_- \sigma_+) — a term that swaps excitations between the two qubits. Let that Hamiltonian run for the right length of time and the resulting unitary is exactly iSWAP, up to a global phase. The CNOT gate does not come out "for free" — the hardware has to synthesise it out of iSWAPs and single-qubit rotations, at a cost of one or two iSWAPs plus some extra boxes.
Google's superconducting processors expose iSWAP (actually a continuous family called fSim that includes iSWAP as a special point) as a hardware primitive. IBM's devices expose the related cross-resonance gate, from which CNOT is a short compile away. The ecosystem has settled on a few near-equivalent two-qubit primitives, and the compiler's job is to translate your circuit — whatever primitive you wrote it in — into the one the target hardware likes.
When you need SWAPs — routing on limited-connectivity chips
The physical reason SWAP matters is the topology of the chip. Two qubits that are not coupled cannot have a gate applied between them directly. If your algorithm calls for \text{CNOT}(q_0, q_3) on a device where q_0 and q_3 are not neighbours, the compiler must route the quantum information: insert SWAPs along the path that connects them, perform the CNOT at the meeting point, and (often) SWAP back to leave the logical layout intact.
The real topologies. IBM's current devices use a heavy-hex layout: a hexagonal lattice in which every qubit has degree 2 or 3, chosen to suppress frequency-crowding errors. Google's processors use a square grid with degree 4. Trapped-ion devices (IonQ, Quantinuum) can often do all-to-all gates within a chain of ~30 ions, because laser pulses can address any pair. Neutral-atom platforms (QuEra, Atom Computing) allow reconfigurable connectivity by physically moving the atoms. TIFR and IIT Madras have experimental superconducting and trapped-ion groups respectively; both face the same connectivity questions as the commercial players.
The cost. Every inserted SWAP is three CNOTs on a CNOT-native chip, or two iSWAPs plus some single-qubit rotations on an iSWAP-native chip. Every extra gate is an extra chance for decoherence to corrupt the state. A compiler that inserts one too many SWAPs can push a short noisy circuit over the reliability cliff. So a lot of modern QC software research — in tools like Qiskit's transpiler, Cirq's router, and academic projects from IIT Bombay's quantum-algorithms group — focuses on SWAP minimisation: given a logical circuit and a hardware topology, find the schedule with the fewest SWAPs.
This is genuinely hard. The general problem is related to graph embedding and is known to be NP-hard. In practice, compilers use heuristics — minimum-spanning-tree routing, look-ahead scheduling, subgraph isomorphism — and the difference between a naive router and a good one can be a 2-3× reduction in gate count on the same circuit.
Example 1: SWAP fixes a symmetric state — and therefore changes nothing
Consider the two-qubit state
Setup. This is one of the four Bell states — specifically |\Psi^+\rangle, the symmetric entangled pair. The name symmetric means: if you exchange the two qubits, the state stays the same. You might guess that SWAP leaves it alone. Let's verify.
Step 1 — apply SWAP term by term. SWAP is linear, so you can apply it to each term in the superposition:
Why term by term: a two-qubit gate is a linear operator, so it distributes over the sum just like multiplication distributes over addition.
Step 2 — use the basis-state table. SWAP sends |01⟩ to |10⟩ and |10⟩ to |01⟩:
Result. |\psi\rangle is an eigenstate of SWAP with eigenvalue +1 — it is fixed. More generally, any symmetric two-qubit state (one that equals itself after swapping the qubits) is fixed by SWAP; any antisymmetric state (one that picks up a minus sign under exchange) is sent to -1 times itself. These two properties pick out the symmetric and antisymmetric subspaces of the two-qubit Hilbert space.
What this shows. When a compiler sees a SWAP acting on a state it knows to be symmetric, it can drop the SWAP entirely. This kind of structural simplification is one of the easier wins in circuit optimisation — spotting it across longer circuits is one of the hard ones.
Example 2: routing a long-range CNOT on a 4-qubit line
Suppose your algorithm needs a \text{CNOT}(Q_0 \to Q_3) on a device where only nearest-neighbour couplers exist:
Setup. The CNOT you want is across a path of length 3. You cannot apply it directly because there is no coupler between Q_0 and Q_3. You have to route.
Step 1 — move Q_0's state next to Q_3. Apply \text{SWAP}(Q_0, Q_1). Now whatever state was on Q_0 lives on Q_1, and vice versa. Then apply \text{SWAP}(Q_1, Q_2). After these two SWAPs, the original state of Q_0 now sits on Q_2, which is a neighbour of Q_3.
Why two SWAPs to move one step to a distance-3 neighbour: each SWAP hops the state across one coupler. Going from Q_0 to Q_2 needs two hops.
Step 2 — do the CNOT where it's allowed. Apply \text{CNOT}(Q_2 \to Q_3). This is a legal gate on this chip.
Step 3 — swap back to restore the logical layout. Apply \text{SWAP}(Q_1, Q_2), then \text{SWAP}(Q_0, Q_1). Now Q_0 holds its original state (as modified by the CNOT), Q_3 holds its post-CNOT state, and Q_1, Q_2 are back where they started. (Sometimes the compiler skips the swap-back if the next few gates in the circuit happen to benefit from the moved layout — another optimisation target.)
Gate count.
- 2 SWAPs out + 1 CNOT + 2 SWAPs back = 5 two-qubit gates logically.
- Each SWAP is 3 CNOTs on a CNOT-native chip: 4 \times 3 + 1 = 13 CNOTs in total.
Result. One logical CNOT between distant qubits has cost you 13 physical CNOTs. On NISQ hardware with two-qubit fidelities around 99%, that is a non-trivial error budget spent just on routing.
Common confusions
-
"iSWAP is just SWAP plus a harmless phase." The phase is not harmless. A global phase is unobservable; the i in iSWAP is a relative phase between the
|01⟩/|10⟩block and the|00⟩/|11⟩block, and relative phases change measurement statistics in any basis other than the computational one. Concretely: SWAP applied to a product state leaves a product state; iSWAP applied to a generic product state gives an entangled state. They are different gates. -
"SWAP is its own inverse — so it's a trivial gate." It is its own inverse (\text{SWAP}^2 = I), which means it is self-adjoint and unitary. But "trivial" is a stretch — SWAP performs real physical work on the hardware. On a CNOT-native device it costs three gates. On an iSWAP-native device it costs two iSWAPs plus single-qubit phases. Every SWAP you can remove is gate time you have saved against decoherence.
-
"Why not just rename the qubits in software?" Because "the qubit" is a specific physical object. If you want the state currently living on transmon 7 to be available at transmon 12 (which might be where the next gate in the circuit is wired), you cannot relabel your way there — transmon 12 is a different chunk of superconductor, and its electromagnetic environment is different. You have to move the quantum state across the chip with real gate operations. A compiler is free to keep a logical-to-physical mapping that tracks which logical qubit is currently stored on which physical qubit, and that mapping can be updated with a SWAP — but the SWAP is a real gate, not a software comment.
-
"SWAP and CNOT can both be called 'entangling gates.'" No. CNOT is entangling (applied to appropriate inputs it creates Bell states). SWAP is not — it is a permutation and cannot create entanglement. The fact that the SWAP circuit is built out of CNOTs is unrelated: three entangling gates can compose to a non-entangling one, just as three reflections can compose to a rotation that has different properties from any single reflection.
-
"If I apply SWAP, the qubits are physically swapped." The qubits as physical hardware stay where they are — the transmons don't move. What changes is which quantum state is stored on which transmon. Think of the qubits as two mailboxes; SWAP exchanges the letters in them, not the mailboxes.
Going deeper
If you are here to understand what SWAP and iSWAP do and why they matter, you have it. The rest of this article takes you into more advanced uses — using a controlled SWAP to compare two quantum states without measuring them, fractional variants of iSWAP that give finer-grained hardware control, and why SWAP networks are a central subroutine in surface-code error correction.
The SWAP test — comparing two quantum states
SWAP has a wonderful application beyond plumbing: you can use a controlled SWAP (a SWAP that fires only when a third "control" qubit is in |1\rangle) to estimate the overlap |\langle \psi | \phi \rangle|^2 between two unknown quantum states. The circuit is compact: put an ancilla in |+\rangle, controlled-SWAP the two states on it, Hadamard the ancilla, measure.
The probability of measuring the ancilla in |0\rangle is
Run the circuit many times and count the zeros. The count tells you the overlap. This is remarkable because you never have to read |\psi\rangle or |\phi\rangle directly — both are consumed by the test, but you learn a specific functional of them that would otherwise require quantum-state tomography (exponentially expensive in qubit count). The SWAP test shows up in quantum-machine-learning subroutines, fingerprinting protocols, and as a building block in the Fredkin-based reversible-computing tradition.
Fractional iSWAP and √iSWAP
iSWAP is the special case \theta = \pi/2 of a continuous family of gates generated by the hardware's coupling Hamiltonian. Let the Hamiltonian run for a shorter time and you get a partial swap — denoted iSWAP^{1/n} or \sqrt{\text{iSWAP}} for the half-rotation case. \sqrt{\text{iSWAP}} is a two-qubit gate that, combined with arbitrary single-qubit rotations, is a universal gate set. Google's Sycamore-class processors used \sqrt{\text{iSWAP}} as a native two-qubit primitive because the pulse is shorter (less decoherence) than the full iSWAP and still leaves the hardware in the entangling regime [4].
More generally, the family
contains iSWAP (at \theta=\pi/2, \phi=0) and \sqrt{\text{iSWAP}} (at \theta=\pi/4, \phi=0) and several other native points used by superconducting hardware. Calibrating the exact (\theta, \phi) delivered by a given chip is one of the headache-inducing tasks a hardware team actually spends time on.
SWAP networks in quantum error correction
Surface-code error correction — the most studied approach to fault-tolerant quantum computing — requires repeated measurement of stabiliser operators involving a qubit and its four nearest neighbours on a grid. When the qubits needed for a logical operation are not physically adjacent on the lattice, the compiler schedules a SWAP network: a coordinated sequence of SWAPs that moves qubits through the lattice to line up for the next round of stabiliser measurements. The cost of these SWAP networks is a major driver of the physical-qubit overhead of surface codes (roughly 1000-to-1 physical-to-logical in current estimates), and a big part of why practical fault-tolerance still needs orders of magnitude more hardware than we have today.
Hardware comparison — trapped ions vs superconducting
The SWAP cost is not the same on every platform. On a trapped-ion chain (Quantinuum H2, IonQ Forte), any two ions can be addressed by a laser with no intermediate SWAPs needed — the ion chain is collectively quantized and a pair of lasers can entangle any pair directly. The overhead cost there is calibration and slower gates, not routing. On a superconducting grid, SWAPs are the dominant compile-time cost for long-range interactions, but the gates themselves are fast (tens of nanoseconds).
This is one of the fundamental architecture trade-offs in quantum hardware: all-to-all connectivity buys you small circuits at the cost of slow gates; grid connectivity buys you fast gates at the cost of SWAP overhead. Different algorithms are sensitive to different costs, which is part of why there is not yet a clear "winning" hardware platform.
Where this leads next
- CNOT gate — the primitive that SWAP is built out of; the canonical two-qubit entangling gate.
- Bell states — the four maximally-entangled two-qubit states; symmetric and antisymmetric eigenspaces of SWAP.
- Quantum circuit identities — more simplification rules: H\,X\,H = Z, \text{CNOT}^2 = I, and of course SWAP = three CNOTs.
- Hardware-native gate sets — iSWAP, cross-resonance, Mølmer-Sørensen: what the compiler translates to on each platform.
- Fredkin gate — the controlled-SWAP, the ingredient that powers the SWAP test.
- Surface code — SWAP networks in context of the leading approach to fault-tolerant quantum computing.
References
- Nielsen and Chuang, Quantum Computation and Quantum Information (2010), §4.3 (two-qubit gates and the SWAP decomposition) — Cambridge University Press.
- John Preskill, Lecture Notes on Quantum Computation, Ch. 6 — theory.caltech.edu/~preskill/ph229.
- Wikipedia, Swap gate and iSWAP — matrix forms and hardware notes.
- Frank Arute et al., Quantum supremacy using a programmable superconducting processor (2019) — Sycamore used \sqrt{\text{iSWAP}}-style fSim gates as its native two-qubit primitive. arXiv:1910.11333.
- Harper, Flammia, and Wallman, Efficient learning of quantum noise (2020) — and references within on iSWAP calibration and fSim. arXiv:1907.13022.
- Qiskit Textbook, More Circuit Identities — SWAP = 3 CNOTs and related decompositions with runnable code.