A Qubit is a Unit Vector in C²

Note: Company names, engineers, incidents, numbers, and scaling scenarios in this article are hypothetical — even when they resemble real ones. See the full disclaimer.

In short

A qubit is a unit vector in a two-dimensional complex vector space. You write it as |\psi\rangle = \alpha|0\rangle + \beta|1\rangle, where \alpha and \beta are complex numbers called amplitudes, satisfying the normalisation condition |\alpha|^2 + |\beta|^2 = 1. If you measure in the computational basis, you get 0 with probability |\alpha|^2 and 1 with probability |\beta|^2 — the Born rule. The amplitudes are allowed to be negative or complex, which lets them cancel. That cancellation, called interference, is the reason the amplitudes cannot be real probabilities and why quantum computation can do things a coin-flip cannot.

You are shown two small boxes. Each contains, by hand of the experimenter, one physical object. You are told the first box contains a fair coin — already flipped, lying heads or tails with equal probability, lid closed. The second box contains a qubit in a specific quantum state called |+\rangle. You are told nothing else about what is inside the boxes, only that if you measure either one right now, you will get "0" or "1" with exactly fifty-fifty odds.

So far they look identical. Same statistics, same ignorance, same probabilities. If quantum computing is supposed to be fundamentally different from classical computing, where does the difference live?

Here is the difference. Take each box and feed it through a specific transformation, labelled "H" — a reversible quantum gate called the Hadamard. You do not need to know its matrix yet; just treat it as a physical procedure. On the quantum box it is well-defined; on the classical coin box it is simply meaningless (there is no classical operation that takes a flipped coin and deterministically un-flips it). Measure after the transformation.

The classical coin, after "H", still flips 50-50 on measurement. Nothing can turn a random coin into a predictable one; randomness is forgotten information.

The quantum box — which was also giving 50-50 statistics — now gives 0 with probability 1. Every time. Deterministically. The randomness was not ignorance; it was a specific amplitude pattern that the Hadamard gate reshaped into a definite outcome.

This is the part of quantum mechanics that the hand-wavy "a qubit is 0 and 1 at the same time" line refuses to explain. A qubit is not a coin. It is a structured object that carries more information than its measurement statistics alone reveal — an information called amplitude, and amplitudes can do something probabilities cannot: they can cancel.

This chapter builds that object from first principles. By the end you will know exactly what the symbol |\psi\rangle means, why the coefficients are complex numbers, why their squared magnitudes are probabilities, and why the word "superposition" describes something strictly richer than classical randomness.

Why a vector — the shape of a qubit

The formal definition is short:

Qubit (single-qubit state)

A qubit is a unit vector in the two-dimensional complex vector space \mathbb{C}^2. In Dirac notation,

|\psi\rangle = \alpha|0\rangle + \beta|1\rangle,

where \alpha, \beta \in \mathbb{C} are called amplitudes and must satisfy the normalisation condition

|\alpha|^2 + |\beta|^2 = 1.

The two basis kets

|0\rangle = \begin{pmatrix}1 \\ 0\end{pmatrix}, \qquad |1\rangle = \begin{pmatrix}0 \\ 1\end{pmatrix}

form the computational basis, the quantum counterpart of the classical bits 0 and 1.

Pick that definition apart slowly.

"Two-dimensional complex vector space" means each qubit state is built from exactly two "directions" — the two basis kets |0\rangle and |1\rangle — with coefficients that are complex numbers, not just real ones. You have met this already in chapter 5: an inner-product space over \mathbb{C}, with the bra acting as the conjugate-transpose of the ket.

"Unit vector" means the squared magnitudes of the coefficients add up to 1. Drop that condition and you get an arbitrary vector in \mathbb{C}^2; keep it, and you get exactly the set of physical states.

"Amplitude" is the word quantum mechanics uses for the coefficients \alpha and \beta. Amplitudes are not probabilities. They are the intermediate objects whose squared magnitudes are probabilities:

P(\text{measure } 0) = |\alpha|^2, \qquad P(\text{measure } 1) = |\beta|^2.

And because these two probabilities must sum to 1 (you always get some outcome when you measure), the normalisation condition |\alpha|^2 + |\beta|^2 = 1 is not optional — it is the requirement that your two probabilities add up correctly.

Why amplitudes and not just probabilities: probabilities are real numbers between 0 and 1. They add. They never cancel. If quantum states were described by probabilities alone, the theory could not express interference — the phenomenon where one path contributes a positive amount and another contributes a negative amount and the two wipe each other out. Amplitudes, being signed (and in general complex), can cancel. The entire algorithmic advantage of quantum computing rides on this difference.

A qubit visualised as a vector sum. Each basis ket contributes one coordinate; the state is the resulting arrow. The normalisation condition forces the tip of the arrow onto a unit surface. (The real plot above is a schematic — the actual space is $\mathbb{C}^2$, which you cannot literally embed in a flat page.)

The arrow picture is a schematic, not a literal geometry. The space is complex — each component is allowed to have an imaginary part — so a true diagram would need four real dimensions. The Bloch sphere (chapter 14) is the honest way to visualise all single-qubit states in 3D. For now, treat the arrow diagram as a reminder that the qubit is a linear combination of two basis directions, with a length constrained to 1.

Worked basics — probabilities from amplitudes

Before you go further, drill the amplitude-to-probability move on two concrete states.

State A. Take |\psi_A\rangle = \tfrac{1}{\sqrt{2}}|0\rangle + \tfrac{1}{\sqrt{2}}|1\rangle. Both amplitudes are \tfrac{1}{\sqrt{2}}. Squared magnitudes: |\tfrac{1}{\sqrt{2}}|^2 = \tfrac{1}{2} each. Total: \tfrac{1}{2} + \tfrac{1}{2} = 1 ✓. Probability of 0: \tfrac{1}{2}. Probability of 1: \tfrac{1}{2}. Equal-superposition state. This state is important enough to have its own name — it is called |+\rangle.

State B. Take |\psi_B\rangle = \tfrac{1}{2}|0\rangle + \tfrac{\sqrt{3}}{2}|1\rangle. Squared magnitudes: \tfrac{1}{4} and \tfrac{3}{4}. Total: 1 ✓. Probability of 0: \tfrac{1}{4} (25%). Probability of 1: \tfrac{3}{4} (75%). The qubit is "mostly |1\rangle" in the sense that a measurement is three times as likely to return 1 as 0.

Squaring the amplitudes gives the probabilities. $\tfrac{1}{2}$ squares to $\tfrac{1}{4}$; $\tfrac{\sqrt{3}}{2}$ squares to $\tfrac{3}{4}$. The two probabilities sum to $1$ — the normalisation condition in disguise.

The pattern to internalise: amplitudes live on one side of a squaring operation, probabilities on the other. Amplitudes are complex and structured; probabilities are real and \leq 1. Every time you predict what a measurement will do, you cross from one side of that square to the other.

The Born rule, previewed

You have just seen the measurement rule three times in three different dresses. It deserves a name.

Born rule (computational-basis version)

Let |\psi\rangle = \alpha|0\rangle + \beta|1\rangle be a qubit state. If you measure in the computational basis \{|0\rangle, |1\rangle\}, the outcomes and their probabilities are:

P(\text{outcome } 0) = |\alpha|^2, \qquad P(\text{outcome } 1) = |\beta|^2.

After the measurement, the state collapses onto whichever basis ket the outcome named: if the result is 0, the state becomes exactly |0\rangle; if the result is 1, the state becomes exactly |1\rangle. The original amplitudes \alpha, \beta are gone — in general, irrecoverable.

Three things are worth noting right away.

Measurement destroys the amplitude. A single measurement only ever returns one classical bit per qubit, and the state collapses to the corresponding basis ket. If you wanted to estimate |\alpha|^2 numerically, you would have to prepare the qubit afresh many times and count how often you got 0 — the statistics converge to |\alpha|^2, but no single measurement reveals it.
The phase of the amplitude is invisible to a single measurement. Only the magnitude |\alpha| enters the probability. If \alpha = \tfrac{1}{\sqrt{2}} or \alpha = -\tfrac{1}{\sqrt{2}} or \alpha = \tfrac{i}{\sqrt{2}}, the probability of 0 is \tfrac{1}{2} in all three cases. The phase hides, waiting for the next gate to either cancel it or reinforce it.
The Born rule is a rule of the theory. You cannot derive it from more primitive assumptions within standard quantum mechanics — it is one of the axioms (you will see it stated formally as postulate 3 in the next chapter). It was proposed by Max Born in 1926 as a way to reconcile the wave-like mathematics of Schrödinger's theory with the particle-like outcomes of experiments; the rule survived every test for the next century.

Chapter 13 (projective measurement) explores the general form — measurement in any orthonormal basis, not just the computational one. For now the computational-basis version is all you need.

Superposition is not a classical random mixture

This is the single most misunderstood point about qubits. You now have enough tools to see why.

Here are two scenarios for a single qubit:

Scenario A (quantum superposition). The qubit is prepared in the state |+\rangle = \tfrac{1}{\sqrt{2}}(|0\rangle + |1\rangle). The amplitudes are fixed: both are \tfrac{1}{\sqrt{2}}. Nothing about the state is random yet; the state has a specific, deterministic description.
Scenario B (classical mixture). A classical coin is flipped in secret. With probability \tfrac{1}{2} the experimenter prepares |0\rangle; with probability \tfrac{1}{2} they prepare |1\rangle. You do not know which. The "state" from your perspective is "|0\rangle with probability \tfrac{1}{2}, else |1\rangle" — the state itself is a classical random variable.

If you measure either scenario in the computational basis, the statistics are the same: 0 with probability \tfrac{1}{2}, 1 with probability \tfrac{1}{2}. On that measurement alone, you cannot tell Scenario A from Scenario B.

But now apply a Hadamard gate H before measuring. The Hadamard is a 2 \times 2 unitary matrix whose effect on the two basis states is:

H|0\rangle = |+\rangle, \qquad H|1\rangle = |-\rangle,

where |{-}\rangle = \tfrac{1}{\sqrt{2}}(|0\rangle - |1\rangle). (You will derive this in chapter 25, when the Hadamard is the central topic. Right now trust the action and watch the consequences.)

Scenario A after Hadamard. Apply H to |+\rangle. The linearity of quantum gates lets you distribute:

H|+\rangle = H\!\left(\tfrac{1}{\sqrt{2}}(|0\rangle + |1\rangle)\right) = \tfrac{1}{\sqrt{2}}(H|0\rangle + H|1\rangle) = \tfrac{1}{\sqrt{2}}(|+\rangle + |{-}\rangle).

Now expand |+\rangle and |{-}\rangle:

\tfrac{1}{\sqrt{2}}\bigl(|+\rangle + |{-}\rangle\bigr) = \tfrac{1}{\sqrt{2}} \cdot \tfrac{1}{\sqrt{2}}\bigl((|0\rangle + |1\rangle) + (|0\rangle - |1\rangle)\bigr) = \tfrac{1}{2}(2|0\rangle + 0 \cdot |1\rangle) = |0\rangle.

Why the |1\rangle coefficient vanishes: the +|1\rangle from |+\rangle and the -|1\rangle from |{-}\rangle have equal magnitudes and opposite signs. They cancel. The +|0\rangle and +|0\rangle reinforce and add. This is the first concrete demonstration of interference: amplitudes with opposite phases destroy each other.

So H|+\rangle = |0\rangle — deterministically. Measuring this gives 0 with probability 1.

Scenario B after Hadamard. The Hadamard is a quantum gate — it acts on one quantum state at a time. In Scenario B you genuinely do not know which state you have; you have a classical probability distribution over \{|0\rangle, |1\rangle\}. Apply H to each case:

With probability \tfrac{1}{2} you had |0\rangle; after H you have |+\rangle. Measuring gives 0 or 1 with probability \tfrac{1}{2} each.
With probability \tfrac{1}{2} you had |1\rangle; after H you have |-\rangle. Measuring gives 0 or 1 with probability \tfrac{1}{2} each.

Combine the two branches: the final probability of measuring 0 is \tfrac{1}{2} \cdot \tfrac{1}{2} + \tfrac{1}{2} \cdot \tfrac{1}{2} = \tfrac{1}{2}. Still fifty-fifty.

That is the difference, dressed in precise numbers.

Before $H$, the two scenarios are indistinguishable by measurement. After $H$, they are not. Only the quantum superposition has interference available to it — the classical mixture has no amplitudes to cancel.

The punchline: superposition is not ignorance, and a qubit is not a coin you have not yet looked at. A superposition is a specific structural object — two amplitudes with specific phases — that behaves differently from a classical probability distribution under further quantum operations. The classical mixture ("the experimenter flipped a coin and prepared |0\rangle or |1\rangle") is a different beast; chapter 13 and the density-matrix chapters will make that precise.

Why the amplitudes have to be complex

If amplitudes were real-valued — positive reals, say — there would be no way to express cancellation. Two positive contributions always add to something larger; they never wipe each other out. Real signed amplitudes (positive and negative reals) fix that: +\tfrac{1}{\sqrt{2}} and -\tfrac{1}{\sqrt{2}} cancel. The Hadamard example you just worked through used exactly this.

So why do you need the full complex plane, not just positive and negative reals? Because the Hadamard-on-|+\rangle cancellation is only the simplest special case. Many quantum gates produce amplitudes with phases that are not just +1 and -1 but e^{i\pi/4}, e^{i\pi/3}, or any complex phase. The T-gate (chapter 27) is the clean example: it multiplies |1\rangle by e^{i\pi/4}, a phase that is neither purely real nor purely imaginary. Whole families of quantum algorithms rely on phases that are not \pm 1; eliminating them would cripple the theory.

Here is a concrete pair of states that shows why the full complex plane matters.

State C. |\psi_C\rangle = \tfrac{1}{\sqrt{2}}(|0\rangle + |1\rangle) = |+\rangle.

State D. |\psi_D\rangle = \tfrac{1}{\sqrt{2}}(|0\rangle - |1\rangle) = |-\rangle.

Both are unit vectors. Both have measurement probabilities \tfrac{1}{2}, \tfrac{1}{2} in the computational basis (same magnitude of amplitudes). They are distinct states — you verified \langle + | {-} \rangle = 0 in chapter 5 — yet a single computational-basis measurement cannot tell them apart.

The Hadamard reveals the difference. H|+\rangle = |0\rangle (always 0). H|{-}\rangle = |1\rangle (always 1). Same inputs by measurement statistics, different outputs after further quantum evolution. The relative phase between the |0\rangle and |1\rangle amplitudes — + in one, - in the other — is what makes them distinguishable.

Now add a genuinely complex case. Consider |\psi_E\rangle = \tfrac{1}{\sqrt{2}}(|0\rangle + i|1\rangle). The amplitude on |1\rangle is i — a pure imaginary unit. Its squared magnitude is |i|^2 = 1, but 1/\sqrt{2} \cdot 1 = 1/\sqrt{2}, so after the overall normalisation, P(1) = |i/\sqrt{2}|^2 = 1/2. Still 50-50 by direct measurement. This state is also called |{+}i\rangle, and it is a distinct physical state from |+\rangle and |{-}\rangle — though you cannot tell them apart by Z-basis measurement alone, you can by a different choice of basis (the Y-basis, which you will meet in chapter 14).

Why three states with identical computational-basis statistics can still be physically distinct: each state has a different phase structure. Phase is not a probability; phase is what controls the outcome of further quantum evolution. Three states with the same |\alpha|^2, |\beta|^2 but different \alpha, \beta (up to an overall global phase) are genuinely different states, with genuinely different behaviours under gates.

Two paths with opposite signs cancel; two with the same sign reinforce. This is what amplitudes can do that probabilities cannot — and it is why the coefficients of a quantum state must be signed (in fact, complex) numbers, not real probabilities.

The physics punchline. Interference — constructive (reinforcement) and destructive (cancellation) — is the signature quantum phenomenon, and it is the technical engine behind every quantum algorithm that beats its classical counterpart. Shor's factoring algorithm, Grover's search, the quantum Fourier transform, phase estimation — all of them engineer destructive interference on wrong answers and constructive interference on the right answer, before measurement. Remove complex amplitudes, and you remove the engine.

The computational basis and other bases

The states |0\rangle and |1\rangle are the computational basis — the default coordinate system for a single qubit. Any single-qubit state can be written as a linear combination of these two. And, as chapter 5 established, they are orthonormal:

\langle 0|0\rangle = 1, \qquad \langle 1|1\rangle = 1, \qquad \langle 0|1\rangle = 0.

Inner products handle every consistency check. The completeness relation |0\rangle\langle 0| + |1\rangle\langle 1| = I expresses the fact that \{|0\rangle, |1\rangle\} is a complete orthonormal basis for \mathbb{C}^2.

But the computational basis is not the only orthonormal basis. Other bases are available and physically meaningful:

The X-basis, \{|+\rangle, |{-}\rangle\} with |+\rangle = (|0\rangle + |1\rangle)/\sqrt{2} and |{-}\rangle = (|0\rangle - |1\rangle)/\sqrt{2}. These are the eigenstates of the Pauli-X operator.
The Y-basis, \{|{+}i\rangle, |{-}i\rangle\} with |{+}i\rangle = (|0\rangle + i|1\rangle)/\sqrt{2} and |{-}i\rangle = (|0\rangle - i|1\rangle)/\sqrt{2}. These are the eigenstates of the Pauli-Y operator.

A measurement in a different basis asks a different question about the state. Measuring |+\rangle in the X-basis returns |+\rangle with probability 1 (the state is a pure "+" in that basis). Measuring |+\rangle in the computational basis returns 0 or 1 with probability \tfrac{1}{2} each. Same state, different basis, different statistics — this is one of the distinguishing features of quantum mechanics and is treated carefully in chapter 15.

Why the computational basis gets special status: not because it is mathematically privileged, but because it is the basis that aligns with how most quantum hardware reads out qubits. The physical measurement apparatus on a superconducting qubit or a trapped ion is built to distinguish |0\rangle from |1\rangle; other bases require an extra gate before measurement. Conceptually every orthonormal basis is equal; experimentally, \{|0\rangle, |1\rangle\} is the built-in readout.

Worked examples

Example 1 — normalisation and measurement probabilities for a given state

Given

|\psi\rangle = \tfrac{1}{2}|0\rangle + \tfrac{\sqrt{3}}{2}|1\rangle,

verify that |\psi\rangle is a valid qubit state, and compute the probabilities of each measurement outcome in the computational basis.

Step 1. Identify \alpha and \beta.

\alpha = \tfrac{1}{2}, \qquad \beta = \tfrac{\sqrt{3}}{2}.

Why: the coefficient of |0\rangle is \alpha; the coefficient of |1\rangle is \beta. Both are real here, so the complex conjugate of each is just itself.

Step 2. Compute |\alpha|^2 and |\beta|^2.

|\alpha|^2 = \left|\tfrac{1}{2}\right|^2 = \tfrac{1}{4}, \qquad |\beta|^2 = \left|\tfrac{\sqrt{3}}{2}\right|^2 = \tfrac{3}{4}.

Why: for a real number r, |r|^2 = r^2. Both amplitudes here are real and positive, so squaring directly gives the squared magnitude.

Step 3. Check normalisation.

|\alpha|^2 + |\beta|^2 = \tfrac{1}{4} + \tfrac{3}{4} = 1. \ \checkmark

Why: the state is normalised iff the squared magnitudes sum to 1. If they did not, you would have to divide the whole state by \sqrt{|\alpha|^2 + |\beta|^2} before extracting probabilities.

Step 4. Apply the Born rule.

P(0) = |\alpha|^2 = \tfrac{1}{4} = 25\%, \qquad P(1) = |\beta|^2 = \tfrac{3}{4} = 75\%.

Why: the probability of observing outcome k in the computational basis is the squared magnitude of the amplitude on |k\rangle.

Result. |\psi\rangle is a valid qubit state with measurement probabilities P(0) = 25\% and P(1) = 75\%. If you prepared this state on real hardware and measured 1000 times, you would expect roughly 250 zeros and 750 ones, with some statistical fluctuation.

The state is "mostly $|1\rangle$" — three-quarters of the measurements will return $1$.

What this shows. Moving from amplitudes to probabilities is a two-step drill: square the magnitudes, then check they sum to 1. If the sum is anything other than 1, the state was not normalised and needs rescaling before you read probabilities.

Example 2 — a state with a non-trivial relative phase

Consider the state

|\psi\rangle = \tfrac{1}{\sqrt{2}}|0\rangle + \tfrac{i}{\sqrt{2}}|1\rangle.

Verify normalisation. Compute the measurement probabilities. Then contrast this state with |+\rangle = \tfrac{1}{\sqrt{2}}(|0\rangle + |1\rangle) — same magnitudes, different phase.

Step 1. Identify \alpha and \beta and their conjugates.

\alpha = \tfrac{1}{\sqrt{2}}, \qquad \beta = \tfrac{i}{\sqrt{2}}, \qquad \alpha^* = \tfrac{1}{\sqrt{2}}, \qquad \beta^* = \tfrac{-i}{\sqrt{2}}.

Why: the complex conjugate flips the sign of the imaginary part. \alpha is purely real, so \alpha^* = \alpha. \beta is purely imaginary, so \beta^* = -\beta.

Step 2. Compute |\alpha|^2 = \alpha^*\alpha and |\beta|^2 = \beta^*\beta.

|\alpha|^2 = \tfrac{1}{\sqrt{2}} \cdot \tfrac{1}{\sqrt{2}} = \tfrac{1}{2}.

|\beta|^2 = \tfrac{-i}{\sqrt{2}} \cdot \tfrac{i}{\sqrt{2}} = \tfrac{-i^2}{2} = \tfrac{-(-1)}{2} = \tfrac{1}{2}.

Why -i \cdot i = 1: from the definition i^2 = -1, the product (-i)(i) = -i^2 = -(-1) = 1. The squared magnitude of any complex number is non-negative real; here it came out to \tfrac{1}{2} as expected.

Step 3. Check normalisation.

|\alpha|^2 + |\beta|^2 = \tfrac{1}{2} + \tfrac{1}{2} = 1. \ \checkmark

Step 4. Apply the Born rule.

P(0) = \tfrac{1}{2}, \qquad P(1) = \tfrac{1}{2}.

Step 5. Compare with |+\rangle.

|+\rangle = \tfrac{1}{\sqrt{2}}|0\rangle + \tfrac{1}{\sqrt{2}}|1\rangle, \qquad \text{probabilities:}\ P(0) = \tfrac{1}{2},\ P(1) = \tfrac{1}{2}.

Both |\psi\rangle and |+\rangle give identical probabilities under a computational-basis measurement. But they are different states: the relative phase between the |0\rangle and |1\rangle amplitudes is +1 in one and i in the other. A measurement in a different basis — for instance, the Y-basis \{|{+}i\rangle, |{-}i\rangle\} — distinguishes them:

|+\rangle in the Y-basis: 50\% / 50\% (even split).
|\psi\rangle = |{+}i\rangle in the Y-basis: 100\% / 0\% (deterministic).

Why identical Z-statistics but different Y-statistics: the Z-probability formula only sees the magnitudes |\alpha|^2, |\beta|^2, which are the same for the two states. The Y-probability formula (which you will derive in chapter 15) depends on the relative phase, and it is exactly there that +1 differs from i.

Result. |\psi\rangle = (|0\rangle + i|1\rangle)/\sqrt{2} is a normalised qubit state with computational-basis probabilities \tfrac{1}{2}, \tfrac{1}{2} — identical to |+\rangle by that measurement alone — yet physically distinct because of its imaginary relative phase.

The two states give identical probabilities under a computational-basis ($Z$) measurement, so a single such measurement cannot tell them apart. But the relative phase — $+1$ versus $i$ — is a genuine physical property, detectable by measurements in other bases.

What this shows. The squared magnitude of an amplitude is just one piece of information about it; the phase is another. Two states can share the former and differ in the latter, and such states are physically distinct. This is why the amplitudes of a qubit live in \mathbb{C}, not in the reals — and why the Bloch sphere (chapter 14) is a 2-dimensional surface, not a 1-dimensional line: it needs one angle for the magnitude split and another for the phase.

Common confusions

"A qubit is 0 and 1 at the same time." This is the pop-science line, and it is misleading. A qubit is a unit vector in \mathbb{C}^2. The amplitudes \alpha and \beta are not "both values at once" — they are complex numbers whose squared magnitudes are the probabilities of the respective outcomes. When you measure, you get exactly one classical bit, 0 or 1. What is really happening is that before measurement, the qubit carries an entire structured object (an amplitude assignment) that can interfere with itself and with other qubits — but a single measurement only extracts one classical bit from it. The "both at once" framing discards the phase structure and hides the measurement rule. You already saw, in the Scenario A / Scenario B comparison, why it is wrong.
"Amplitudes are probabilities." No — probabilities are squared magnitudes of amplitudes. Amplitudes can be negative or complex; probabilities cannot. Amplitudes can cancel; probabilities cannot. An amplitude of -\tfrac{1}{\sqrt{2}} is a perfectly valid coefficient; a probability of -\tfrac{1}{2} is nonsense. Whenever you see the symbol \alpha attached to a ket, you are looking at an amplitude; only after squaring its magnitude do you have a probability.
"Any 2D complex vector is a qubit." No — it must be a unit vector, i.e., |\alpha|^2 + |\beta|^2 = 1. A non-normalised vector is not a physical state. Unnormalised kets are fine as intermediate objects in algebra (example 2 of chapter 5 walked through the normalisation step), but before you extract probabilities you must divide by the norm.
"Superposition and a classical probability mixture are the same thing." They are not. A superposition |\psi\rangle = \alpha|0\rangle + \beta|1\rangle is a specific, fully-determined quantum state — the experimenter knows exactly what it is. A classical mixture "state |0\rangle with probability p, state |1\rangle with probability 1-p" is ignorance: the experimenter prepared one of two states, and nobody knows which. The Scenario A / Scenario B comparison above is the sharp demonstration. Chapter 13+ formalises this distinction in the language of density matrices: superpositions are pure states, mixtures are mixed states.
"Global phase is a physical property." No. Multiplying the whole state by e^{i\gamma} gives e^{i\gamma}|\psi\rangle, a ket that produces identical measurement statistics in any basis. The global phase is unobservable. What is observable is the relative phase between amplitudes: the phase of \alpha compared to the phase of \beta. In |+\rangle the relative phase is 0 (both positive real); in |{-}\rangle it is \pi (opposite signs); in |{+}i\rangle it is \pi/2 (quarter-turn). Different relative phases, different physical states.
"Superposition implies randomness." The randomness comes from the measurement, not from the state. The state |+\rangle is a definite state; the amplitudes are fixed. The randomness enters only when you measure. Quantum evolution (unitary gates) is entirely deterministic on the state vector. The Born rule is the bridge where determinism on |\psi\rangle becomes probabilism on experimental outcomes.

Going deeper

You now have the minimal kit to work with single-qubit states: you know the formula, the normalisation, the Born rule, and why complex amplitudes are necessary. The rest of this section is optional context — global vs relative phase made precise, the extension to d-level systems (qudits), a concept called fidelity that measures how distinguishable two states are, and a brief note on the Indian origin of identical-particle statistics. Return here after chapter 14 (Bloch sphere) and the picture will sharpen considerably.

Reread this section after the Bloch sphere (chapter 14) and the projective-measurement chapter (chapter 13) — the formal content below cross-references both, and the geometric picture the Bloch sphere gives will make the global-vs-relative phase distinction obvious.

Global phase versus relative phase

For any complex number \gamma with |e^{i\gamma}| = 1, the states |\psi\rangle and e^{i\gamma}|\psi\rangle produce identical measurement probabilities in every basis. The proof is short: for any basis ket |k\rangle,

P_k\bigl(e^{i\gamma}|\psi\rangle\bigr) = \bigl|\langle k|\,e^{i\gamma}\,|\psi\rangle\bigr|^2 = \bigl|e^{i\gamma}\bigr|^2 \cdot \bigl|\langle k|\psi\rangle\bigr|^2 = 1 \cdot \bigl|\langle k|\psi\rangle\bigr|^2 = P_k\bigl(|\psi\rangle\bigr).

Why: the scalar e^{i\gamma} pulls out of the bra-ket sandwich (it is just a number), and its squared magnitude is 1. The probabilities are untouched.

So two kets that differ only by a global phase are the same physical state. This is why physicists often say pure states are "rays" in Hilbert space — equivalence classes under global phase — rather than vectors.

Contrast this with relative phase: the phase of \beta relative to \alpha in |\psi\rangle = \alpha|0\rangle + \beta|1\rangle. Write \alpha = |\alpha|e^{i\gamma_\alpha} and \beta = |\beta|e^{i\gamma_\beta}. Pull out the overall phase e^{i\gamma_\alpha} to get

|\psi\rangle = e^{i\gamma_\alpha}\bigl(|\alpha||0\rangle + |\beta|e^{i(\gamma_\beta - \gamma_\alpha)}|1\rangle\bigr).

The e^{i\gamma_\alpha} in front is the unobservable global phase. The factor e^{i(\gamma_\beta - \gamma_\alpha)} on |1\rangle is the relative phase — fully physical. It changes how the state behaves under gates and measurement in bases other than Z. The whole content of example 2 above — |+\rangle versus |{+}i\rangle — was a demonstration that relative phase is observable.

On the Bloch sphere (chapter 14), the global phase literally does not appear — two rays that differ by e^{i\gamma} correspond to the same point on the sphere. Relative phase shows up as the azimuthal angle \varphi around the equator. That geometric fact is the clean encoding of "global irrelevant, relative physical" in a single picture.

Qudits — higher-dimensional quantum systems

A qudit is a unit vector in \mathbb{C}^d for d \geq 2. For d = 2 you have the qubit. For d = 3 you have a qutrit — three basis states |0\rangle, |1\rangle, |2\rangle and amplitudes \alpha_0, \alpha_1, \alpha_2 satisfying \sum_k |\alpha_k|^2 = 1. For general d,

|\psi\rangle = \sum_{k=0}^{d-1} \alpha_k |k\rangle, \qquad \sum_k |\alpha_k|^2 = 1.

The Born rule generalises trivially: P(k) = |\alpha_k|^2. Everything you learned about qubits still applies — only the dimension changes.

Why does quantum computing focus on qubits rather than qudits? Partly because \{|0\rangle, |1\rangle\} maps naturally to classical binary logic, which makes algorithm design and error correction easier to reason about. Partly because many physical qubit implementations (a two-level atomic transition, the two states of a spin-1/2) are naturally two-dimensional. But higher-d systems are an active research area — some error-correction codes are more efficient on qudits, and several hardware platforms (trapped ions, photonic modes, superconducting qudits) can expose more than two levels if you want them.

Fidelity — how distinguishable two states are

Given two pure states |\phi\rangle and |\psi\rangle, their fidelity is

Fidelity is a number between 0 and 1. F = 1 means the two states are identical (as physical states — possibly differing by a global phase). F = 0 means they are orthogonal: perfectly distinguishable by a measurement in a basis containing both. Intermediate values quantify how confusable two states are.

One of the many uses of fidelity: when you run a quantum gate U on hardware that is not perfectly calibrated, the actual operation is some slightly-different unitary \tilde U. The fidelity F(U|\psi\rangle, \tilde U|\psi\rangle) quantifies how close the real output is to the ideal one. Hardware specs are often quoted as "gate fidelities" — a gate fidelity of 99.9\% means the real gate produces a state whose inner-product-squared with the ideal state is 0.999 on average.

Fidelity is the quantum-information theoretic generalisation of the "overlap" concept from chapter 5. It will reappear throughout the error-correction chapters (Part 11), where 1 - F is a natural measure of noise.

Bose statistics and indistinguishable qubits

One implicit assumption runs through everything in this chapter: all qubits are identical. Two |0\rangle states, in two different pieces of hardware, are the same physical state. There is no "this |0\rangle" versus "that |0\rangle". Particles (or more generally, quantum systems) that share this indistinguishability have specific statistical rules, and those rules were first made precise by Satyendra Nath Bose.

In 1924, Bose wrote a short paper re-deriving Planck's blackbody radiation law by counting photon states in a new way: instead of treating the photons as distinguishable, he treated every assignment of N photons to M energy levels as a single state, regardless of which-photon-in-which-level. The paper was initially rejected; Bose mailed it to Einstein, who translated it into German, published it, and extended the argument to massive particles. The resulting statistics — Bose-Einstein statistics — are why photons can pile up into the same quantum state (the principle behind lasers), why helium becomes a superfluid near absolute zero, and why the whole theory of identical bosons works. The particles we now call bosons are literally named after Bose.

For quantum computing the connection is this. A qubit's computational basis state |0\rangle or |1\rangle is defined by the quantum state of a physical system, not by which particular atom or electron or photon is in that state. If you have two qubits and both are in |0\rangle, the joint state is |00\rangle = |0\rangle \otimes |0\rangle (chapter 8) — a symmetric expression that reflects the indistinguishability of the two underlying systems. Every two-qubit, three-qubit, n-qubit state you will ever write relies silently on the Bose-counting discipline to make the notation consistent. The Indian connection here is not decoration; it is load-bearing physics.

Where this leads next

The Four Postulates of Quantum Mechanics — chapter 11. The qubit is the one-qubit case of postulate 1 (state space is a Hilbert space, states are unit vectors). The next chapter states all four postulates and links them together.
Projective Measurement — chapter 13. The Born rule in this chapter is the computational-basis case. Projective measurement is the Born rule for any orthonormal basis.
The Bloch Sphere — chapter 14. The geometric visualisation of every single-qubit state as a point on a sphere — global-vs-relative phase becomes obvious, gates become rotations, measurements become projections.
Global vs Relative Phase — the detailed treatment of the phase distinction previewed above.

References

John Preskill, Lecture Notes on Quantum Computation, Chapter 2 — theory.caltech.edu/~preskill/ph229. The clearest free exposition of the single-qubit state space and its rules.
Nielsen and Chuang, Quantum Computation and Quantum Information, §1.2 and §2.2 — Cambridge University Press. The canonical textbook treatment.
Wikipedia, Qubit — the encyclopedic treatment with links to amplitudes, Bloch vectors, and hardware implementations.
Wikipedia, Born rule — the history and the precise statement of the measurement-probability rule.
Qiskit Textbook, Representing Qubit States — interactive Bloch-sphere visualisations and code for preparing single-qubit states.
Wikipedia, Satyendra Nath Bose — context for identical-particle statistics, which underlies the "every |0\rangle is the same |0\rangle" assumption used silently throughout.