Tensor Products the Quantum Way

Note: Company names, engineers, incidents, numbers, and scaling scenarios in this article are hypothetical — even when they resemble real ones. See the full disclaimer.

In short

The tensor product is the rule for combining two quantum systems into one bigger system. If qubit A lives in a 2-dimensional space and qubit B lives in a 2-dimensional space, the joint system lives in the tensor product of those two spaces — which has dimension 2 \times 2 = 4. Three qubits give 2 \times 2 \times 2 = 8. And n qubits give 2^n — the number that Feynman pointed at when he argued that classical computers cannot simulate quantum systems. The three notations |0\rangle \otimes |1\rangle, |0\rangle|1\rangle, and |01\rangle all mean the same state, and you pick among them based on what you are trying to show.

You know what one qubit is. It is a unit vector in a 2-dimensional complex space — one amplitude for |0\rangle, one amplitude for |1\rangle, total magnitude 1. A small thing.

Now put two qubits on the same chip. The chip is still just sitting there on your desk, but now you have a joint quantum system, and a natural question: what does its state look like? The answer is not "a pair of single-qubit states, one per qubit." The answer is a single vector in a larger space — a 4-dimensional complex space — and the only way to build that space out of the two single-qubit spaces is a construction called the tensor product.

This chapter is about that construction. The tensor product is what lets us write two-qubit states. It is what makes three-qubit and four-qubit and thousand-qubit states possible. It is the source of the infamous 2^n — the reason Querion's 105-qubit Willow chip lives in a space whose description requires 2^{105} \approx 4 \times 10^{31} complex numbers, a number larger than the count of atoms in every human body on Earth. And it is the setting in which entanglement will eventually live.

By the end you will know how to compute |0\rangle \otimes |1\rangle in components, how to read the three notations |0\rangle \otimes |1\rangle, |0\rangle|1\rangle, |01\rangle as the same state, and how to spot the one kind of two-qubit state that cannot be written as a tensor product — the preview of entanglement that closes the chapter.

Why a bigger space is needed

Here is the natural mistake. You have qubit A, sitting in its own 2D space with basis \{|0\rangle_A, |1\rangle_A\}. You have qubit B, sitting in its own 2D space with basis \{|0\rangle_B, |1\rangle_B\}. You might guess that the joint system has 4 basis states — that part is correct — but you might also guess that a joint state is just "some state on A" and "some state on B" side by side. That part is wrong, and the reason it is wrong is the whole point of the tensor product.

To see it, count the possible joint configurations. If qubit A is in |0\rangle and qubit B is in |0\rangle, the joint system is in one particular state. If A is |0\rangle and B is |1\rangle, that is a different joint state. Likewise |1\rangle_A, |0\rangle_B and |1\rangle_A, |1\rangle_B. That is four distinct joint configurations, each of which must be a basis vector in the joint space. So the joint space has dimension at least 4.

And that is exactly the dimension. The joint space of two qubits is a 4-dimensional complex vector space whose basis is the four combinations:

\{\,|00\rangle,\; |01\rangle,\; |10\rangle,\; |11\rangle\,\}.

The name for this space is \mathbb{C}^2 \otimes \mathbb{C}^2 — "C-two tensor C-two." It has dimension 2 \times 2 = 4, but the \otimes symbol matters: this is not the direct sum \mathbb{C}^2 \oplus \mathbb{C}^2 (which would have dimension 2 + 2 = 4 for a different reason and a different basis). Tensor products multiply dimensions; direct sums add them.

Two single-qubit spaces combine via the tensor product into a 4-dimensional joint space. The basis multiplies: 2 × 2 = 4.

Why multiply and not add: the joint system has to keep track of both qubits' states simultaneously. A direct sum would say "the system is either in A's space or in B's space, not both." A tensor product says "the system has a state on A and a state on B at the same time." That is what a joint description must do.

The tensor product of kets — the Kronecker recipe

Now the mechanical part: given two single-qubit kets, compute their tensor product as a 4-component column vector.

Write the single-qubit states in components:

|a\rangle = \begin{pmatrix} a_0 \\ a_1 \end{pmatrix}, \qquad |b\rangle = \begin{pmatrix} b_0 \\ b_1 \end{pmatrix}.

The tensor product |a\rangle \otimes |b\rangle is computed by the Kronecker product rule: multiply each component of the first ket by the whole second ket, and stack the results into one tall column.

|a\rangle \otimes |b\rangle \;=\; \begin{pmatrix} a_0 \cdot |b\rangle \\ a_1 \cdot |b\rangle \end{pmatrix} \;=\; \begin{pmatrix} a_0\,b_0 \\ a_0\,b_1 \\ a_1\,b_0 \\ a_1\,b_1 \end{pmatrix}.

Why this stacking rule: the joint-space basis is ordered |00\rangle, |01\rangle, |10\rangle, |11\rangle, and the four components are the amplitudes the joint state assigns to those four basis states. Reading each row: the amplitude for |00\rangle is whatever A has on |0\rangle times whatever B has on |0\rangle, which is a_0 b_0. The amplitude for |01\rangle is a_0 b_1. And so on. Multiplication in, stacking out.

Try it on a concrete case. Let |a\rangle = |0\rangle = \begin{pmatrix} 1 \\ 0 \end{pmatrix} and |b\rangle = |1\rangle = \begin{pmatrix} 0 \\ 1 \end{pmatrix}. Then

|0\rangle \otimes |1\rangle \;=\; \begin{pmatrix} 1 \cdot 0 \\ 1 \cdot 1 \\ 0 \cdot 0 \\ 0 \cdot 1 \end{pmatrix} \;=\; \begin{pmatrix} 0 \\ 1 \\ 0 \\ 0 \end{pmatrix}.

Run through the other three combinations the same way and you get the four basis columns of the joint space:

|00\rangle = \begin{pmatrix} 1 \\ 0 \\ 0 \\ 0 \end{pmatrix},\quad |01\rangle = \begin{pmatrix} 0 \\ 1 \\ 0 \\ 0 \end{pmatrix},\quad |10\rangle = \begin{pmatrix} 0 \\ 0 \\ 1 \\ 0 \end{pmatrix},\quad |11\rangle = \begin{pmatrix} 0 \\ 0 \\ 0 \\ 1 \end{pmatrix}.

They are exactly the standard basis of \mathbb{C}^4. The ordering is "binary counting on the two qubits" — qubit A is the higher-order bit, qubit B is the lower-order bit, so reading the index in binary gives the joint configuration. This convention is the reason everyone writes multi-qubit states this way.

The three notations

You will see three notations in textbooks and papers, all for the same state:

Full tensor-product form: |0\rangle \otimes |1\rangle.
Concatenated kets: |0\rangle|1\rangle.
Compact multi-qubit ket: |01\rangle.

All three mean "qubit A in state |0\rangle, qubit B in state |1\rangle." The only difference is typographic — the \otimes symbol emphasises that you are performing a tensor product, the two juxtaposed kets emphasise the two subsystems as separate factors, and the single combined ket emphasises that the result is one state of the whole system.

The house convention on padho-wiki, and in most modern QC literature:

Use |01\rangle in running text when the state is just a label you need to name.
Use |0\rangle \otimes |1\rangle when you are explicitly teaching or computing the tensor product — for example, when expanding a product of superpositions.
Use |0\rangle|1\rangle — intermediate between the other two — when you want to remind the reader that there are two subsystems without breaking the line with a \otimes symbol.

The three notations for the same two-qubit basis state. Switch between them freely; they mean the same thing.

The tensor product of superpositions

The interesting case is when both qubits are in superposition. Write

Use the fact that \otimes is linear in each slot — just like ordinary multiplication distributes over addition:

Expand term by term exactly as you would an algebraic product of two binomials. You get four terms, one for each way to pick one factor from each parenthesis:

Switch to the compact notation for readability:

Why the four coefficients have exactly this form: each basis state |ij\rangle in the joint system is |i\rangle \otimes |j\rangle, which pulls the amplitude a_i from the first qubit and b_j from the second, and the tensor product multiplies them. The coefficient on |ij\rangle in a product state is always the product a_i \cdot b_j. That structural fact is the fingerprint of a product state.

This "multiply each pair" rule is so central that every first QC course draws it as a 2 \times 2 grid — the central visual of this chapter.

The four amplitudes of a product state are the four products of the single-qubit amplitudes — one for each cell of the 2×2 grid.

Dimension counting — where 2^n comes from

The tensor product is associative: three qubits live in \mathbb{C}^2 \otimes \mathbb{C}^2 \otimes \mathbb{C}^2, with dimension 2 \times 2 \times 2 = 8 and basis

\{\,|000\rangle, |001\rangle, |010\rangle, |011\rangle, |100\rangle, |101\rangle, |110\rangle, |111\rangle\,\}.

In general, n qubits live in a 2^n-dimensional complex Hilbert space, whose basis consists of the 2^n bit strings of length n.

This single number — 2^n — is the whole headline of quantum information. A classical n-bit register can store one of 2^n configurations, but at any moment it is in one configuration. A quantum n-qubit register's state is a vector with 2^n complex amplitudes, one per classical configuration, all present simultaneously. To write down an arbitrary state, you need 2^n complex numbers. To simulate one step of its evolution on a classical computer, you need to multiply a 2^n \times 2^n matrix into that 2^n-vector. And 2^n grows fast.

The Hilbert-space dimension $2^n$ plotted on a logarithmic scale. At $n = 50$ it exceeds $10^{15}$; at $n = 100$ it exceeds $10^{30}$. This is the cliff that motivates quantum computing.

Concrete checkpoints:

n = 10 qubits: dimension 2^{10} = 1024. A laptop shrugs at this.
n = 20: dimension \sim 10^6. A laptop handles this with some memory.
n = 30: dimension \sim 10^9. Doable on a workstation; state vector fills tens of GB of RAM.
n = 40: dimension \sim 10^{12}. A full state-vector simulation now needs a supercomputer.
n = 50: dimension \sim 10^{15}. Beyond classical state-vector simulation except by approximation.
n = 100: dimension > 10^{30}. Even storing the amplitudes would require more atoms than Earth contains.
n = 300: dimension > 10^{90}, more than the number of atoms in the observable universe.

This is Feynman's 1982 argument made precise: classical simulation of an arbitrary n-qubit state costs 2^n in memory and worse in time, which means that for a few tens of qubits the task is already outside what any machine we can ever build will do by brute force. A real quantum computer, if you can build it, holds that 2^n-dimensional state naturally, because the tensor-product structure is physics, not bookkeeping.

Product states vs entangled states — a first look

Every state in the joint 4-dimensional space is a linear combination of the four basis states |00\rangle, |01\rangle, |10\rangle, |11\rangle — in general

|\psi\rangle_{AB} \;=\; c_{00}\,|00\rangle + c_{01}\,|01\rangle + c_{10}\,|10\rangle + c_{11}\,|11\rangle

with four complex amplitudes satisfying |c_{00}|^2 + |c_{01}|^2 + |c_{10}|^2 + |c_{11}|^2 = 1.

Some of these joint states are tensor products of two single-qubit states — product states, where you can write |\psi\rangle_{AB} = |a\rangle_A \otimes |b\rangle_B for some single-qubit |a\rangle, |b\rangle. And some are not. The ones that are not are called entangled.

To see that entangled states exist, try to factor the state

|\Phi^+\rangle \;=\; \tfrac{1}{\sqrt{2}}\bigl(|00\rangle + |11\rangle\bigr)

as a tensor product. Suppose you could write it as (\alpha|0\rangle + \beta|1\rangle) \otimes (\gamma|0\rangle + \delta|1\rangle). Expanding the right-hand side (using the expansion rule from earlier):

Match coefficients with \tfrac{1}{\sqrt{2}}(|00\rangle + |11\rangle):

coefficient of |00\rangle: \alpha\gamma = 1/\sqrt{2}
coefficient of |01\rangle: \alpha\delta = 0
coefficient of |10\rangle: \beta\gamma = 0
coefficient of |11\rangle: \beta\delta = 1/\sqrt{2}

From the first equation, \alpha \neq 0 and \gamma \neq 0. From \alpha\delta = 0 with \alpha \neq 0, we need \delta = 0. But then \beta\delta = 0, which contradicts \beta\delta = 1/\sqrt{2} \neq 0.

Why the contradiction is the whole story: it says there is no way to pick the four numbers \alpha, \beta, \gamma, \delta that makes the product-state expansion match |\Phi^+\rangle. The state |\Phi^+\rangle exists — it is a perfectly good unit vector in \mathbb{C}^4 — but it is not in the image of the tensor-product map. It is a state of the joint system that cannot be decomposed into separate states of qubit A and qubit B.

|\Phi^+\rangle is your first entangled state. It is one of the four Bell states, and it is the most famous object in quantum information. The full theory of entanglement — Schmidt decomposition, purity, concurrence, monogamy — is the subject of later chapters, but the definition you need right now is purely algebraic:

A joint state |\psi\rangle_{AB} is entangled if it cannot be written as a tensor product |a\rangle_A \otimes |b\rangle_B of single-qubit states. Otherwise it is a product state.

Entanglement is not an exotic add-on feature of quantum mechanics. It is the default: the set of product states is a 3-complex-dimensional surface sitting inside the 4-complex-dimensional joint space, and almost every point off that surface is entangled. When people say "quantum is exponentially richer than classical," this is the precise shape of the claim.

The tensor product of operators

Single-qubit gates act on one qubit at a time. Two-qubit gates can be more general — but a natural and important family consists of gates that act on the two qubits independently, one operation on each.

If U is a single-qubit operator acting on qubit A and V is a single-qubit operator acting on qubit B, the combined operation on the joint system is written U \otimes V. Its action on a product state is exactly what you would guess:

(U \otimes V)(|a\rangle \otimes |b\rangle) = U|a\rangle \otimes V|b\rangle.

Why this is the only sensible rule: each operator has its own subsystem to act on, and the tensor-product structure says they act independently. The left factor sees only qubit A, the right factor sees only qubit B, and the two results reassemble into a joint state by the tensor product.

In matrix form, U \otimes V is the Kronecker product of the two matrices — the same recipe as for kets, applied to matrices instead: multiply each entry of U by the whole matrix V, stacked into a 4 \times 4 grid.

A useful concrete example. Take U = X (the bit-flip) and V = I (identity). Then X \otimes I is the two-qubit gate that flips qubit A and leaves qubit B untouched:

and so on. In circuit notation, X \otimes I looks like an X box on qubit A's wire with nothing on qubit B's — a single-qubit gate drawn on a two-qubit circuit.

The Hadamard tensor — where Grover's search begins

The most important application of tensor-product operators in early QC is the Hadamard tensor. Recall the single-qubit Hadamard gate:

H|0\rangle = \tfrac{1}{\sqrt{2}}(|0\rangle + |1\rangle) = |+\rangle.

Apply H \otimes H to the initial state |00\rangle. Using the product rule:

Why this factors cleanly: because the starting state |00\rangle is itself a product (|0\rangle \otimes |0\rangle), and the operator is a tensor product of single-qubit pieces. Product-state input + tensor-product operator = product-state output.

Expand |+\rangle \otimes |+\rangle using the grid:

|+\rangle|+\rangle = \tfrac{1}{\sqrt{2}}(|0\rangle + |1\rangle) \otimes \tfrac{1}{\sqrt{2}}(|0\rangle + |1\rangle) = \tfrac{1}{2}(|00\rangle + |01\rangle + |10\rangle + |11\rangle).

Every basis state gets the same amplitude \tfrac{1}{2}. This is the uniform superposition over all 2-qubit basis states — every outcome equally likely on measurement, with probability (1/2)^2 = 1/4 each.

Applying $H \otimes H$ to $|00\rangle$ yields the uniform superposition over all four 2-qubit basis states, each with measurement probability $1/4$.

The same trick scales. Applying H^{\otimes n} = H \otimes H \otimes \cdots \otimes H (n Hadamards in parallel) to the all-zeros state |0\rangle^{\otimes n} = |00\ldots 0\rangle produces the uniform superposition over all 2^n bit strings:

H^{\otimes n}\,|0\rangle^{\otimes n} = \frac{1}{\sqrt{2^n}} \sum_{x \in \{0,1\}^n} |x\rangle.

In one layer of gates, you have put the quantum register into an equal superposition of every possible classical input. That is the first step of Grover's search algorithm, of Deutsch-Jozsa, of Simon's algorithm, of most quantum algorithms that exhibit a speedup. Tensor products of Hadamards are the starting gun of quantum computing.

Example 1 — computing $|+\rangle \otimes |+\rangle$ from scratch

Setup. You are told |+\rangle = \tfrac{1}{\sqrt{2}}(|0\rangle + |1\rangle) on each of two qubits. Compute |+\rangle \otimes |+\rangle explicitly.

Step 1 — substitute the definition. Each factor is \tfrac{1}{\sqrt{2}}(|0\rangle + |1\rangle). So

|+\rangle \otimes |+\rangle = \tfrac{1}{\sqrt{2}}(|0\rangle + |1\rangle) \otimes \tfrac{1}{\sqrt{2}}(|0\rangle + |1\rangle).

Why pull the constants up front: they multiply through the tensor product because \otimes is linear — a scalar times a ket, tensored with a scalar times a ket, is the product of the scalars times the tensor product of the kets.

Step 2 — pull scalars out front.

= \tfrac{1}{\sqrt{2}} \cdot \tfrac{1}{\sqrt{2}}\,(|0\rangle + |1\rangle) \otimes (|0\rangle + |1\rangle) = \tfrac{1}{2}\,(|0\rangle + |1\rangle) \otimes (|0\rangle + |1\rangle).

Step 3 — expand the tensor product using bilinearity. Treat the expansion like distributing a product of binomials:

Why four terms: the first parenthesis contributes two kets, the second contributes two, and every pair appears exactly once — same rule as (a+b)(c+d) = ac + ad + bc + bd.

Step 4 — switch to compact notation.

= |00\rangle + |01\rangle + |10\rangle + |11\rangle.

Step 5 — restore the prefactor.

|+\rangle \otimes |+\rangle = \tfrac{1}{2}\bigl(|00\rangle + |01\rangle + |10\rangle + |11\rangle\bigr).

Result. The uniform superposition. Four basis states, four equal amplitudes \tfrac{1}{2}, four equal probabilities (\tfrac{1}{2})^2 = \tfrac{1}{4}, which sum to 1 — the state is normalised.

What this shows. A product of two plus-states is the state that H \otimes H produces from |00\rangle. Every quantum algorithm that begins "apply Hadamard to every qubit" is computing exactly this, and you now know what the output looks like in the computational basis.

The four amplitudes of $|+\rangle \otimes |+\rangle$. Each is $\tfrac{1}{2}$; squaring gives the probability $\tfrac{1}{4}$ per basis state.

Example 2 — a product of a basis state and a superposition

Setup. Suppose qubit A is in the definite state |0\rangle and qubit B is in the superposition |0\rangle + |1\rangle (not yet normalised). Compute the joint state, then normalise.

Step 1 — tensor the states together.

Why the first qubit's |0\rangle threads through both terms: the tensor product is linear in the second slot, so distributing |0\rangle over (|0\rangle + |1\rangle) just copies the first factor onto each term.

Step 2 — check the norm. The amplitudes are (1, 1, 0, 0) on the basis (|00\rangle, |01\rangle, |10\rangle, |11\rangle). Sum of squared magnitudes: 1^2 + 1^2 + 0 + 0 = 2. So the norm is \sqrt{2}, not 1.

Step 3 — normalise. Divide through by \sqrt{2}:

\tfrac{1}{\sqrt{2}}(|00\rangle + |01\rangle).

Result. |0\rangle \otimes \tfrac{1}{\sqrt{2}}(|0\rangle + |1\rangle) = \tfrac{1}{\sqrt{2}}(|00\rangle + |01\rangle) = |0\rangle \otimes |+\rangle.

What this shows. The resulting joint state is a product state — qubit A is definitely |0\rangle, qubit B is definitely |+\rangle. Measuring qubit A always gives 0; measuring qubit B gives 0 or 1 with equal probability. The two measurements are independent; there is no correlation to exploit. Compare this with the entangled |\Phi^+\rangle from earlier — there, measuring A forces B to agree. That difference is exactly what "entangled" picks up that "product" misses.

Common confusions

"|0\rangle \otimes |1\rangle, |0\rangle|1\rangle, and |01\rangle are different states." No — all three are identical. They are three typographies for the state "qubit A in |0\rangle, qubit B in |1\rangle." Pick the notation that reads most cleanly in context; do not change the physics.
"The dimension of n qubits is n." No. The dimension is 2^n. A qubit is a 2-dimensional vector; n qubits live in the tensor product of n copies of \mathbb{C}^2, which has dimension 2 \cdot 2 \cdots 2 = 2^n. A 1000-qubit register has dimension 2^{1000}, not 1000.
"The tensor product is commutative." No. |0\rangle \otimes |1\rangle = |01\rangle and |1\rangle \otimes |0\rangle = |10\rangle, and these are distinct basis states — a measurement yielding 01 is not the same event as a measurement yielding 10. The ordering of qubits matters. (What is true: there is a SWAP gate that converts one into the other, but SWAP is a physical operation on the hardware, not a free relabelling. See the article on SWAP and iSWAP.)
"Every 2-qubit state can be written as |a\rangle \otimes |b\rangle for some |a\rangle and |b\rangle." Absolutely not — this is the false claim that entanglement contradicts. The Bell state \tfrac{1}{\sqrt{2}}(|00\rangle + |11\rangle) has no such factorisation, as the algebra in "Product states vs entangled states" showed.
"A \otimes B is just A \cdot B." The Kronecker product A \otimes B is a larger matrix (if A is 2 \times 2 and B is 2 \times 2, then A \otimes B is 4 \times 4). The ordinary matrix product A \cdot B is the same size as A and B. They are different operations on different spaces.
"|0\rangle^{\otimes n} means |0\rangle raised to a power." It is notation for |0\rangle \otimes |0\rangle \otimes \cdots \otimes |0\rangle with n factors. The exponent is a tensor-product count, not an arithmetic power.

Going deeper

If you came here to understand what a tensor product is and why it matters, you have it. The rest of this chapter goes into the formal definition (the universal-property characterisation used in linear algebra), a preview of the Schmidt decomposition (which makes entanglement quantitative), tensor products in continuous-variable quantum mechanics (wavefunctions of two particles), and Bose's 1924 paper, which is the original Indian contribution to the quantum statistics that the whole tensor-product formalism is built to describe.

Formal definition: the Hilbert-space tensor product and its bilinearity

For Hilbert spaces \mathcal{H}_A and \mathcal{H}_B, the tensor product \mathcal{H}_A \otimes \mathcal{H}_B is the unique (up to isomorphism) Hilbert space, together with a bilinear map \otimes: \mathcal{H}_A \times \mathcal{H}_B \to \mathcal{H}_A \otimes \mathcal{H}_B, such that:

Bilinearity: \otimes is linear in each slot:

Basis spanning: if \{|i\rangle_A\} is a basis of \mathcal{H}_A and \{|j\rangle_B\} is a basis of \mathcal{H}_B, then \{|i\rangle_A \otimes |j\rangle_B\} is a basis of \mathcal{H}_A \otimes \mathcal{H}_B. In particular, \dim(\mathcal{H}_A \otimes \mathcal{H}_B) = \dim(\mathcal{H}_A) \cdot \dim(\mathcal{H}_B).
Inner product: the inner product on \mathcal{H}_A \otimes \mathcal{H}_B is fixed by

Why bilinearity matters: it is exactly what you used to expand (α|0\rangle + β|1\rangle) \otimes (γ|0\rangle + δ|1\rangle) into four terms. The rule is not a convenience — it is the defining property, and every other computation with tensor products is an application of it.

The universal property is that any bilinear map \mathcal{H}_A \times \mathcal{H}_B \to \mathcal{K} factors uniquely through \mathcal{H}_A \otimes \mathcal{H}_B. This is abstract-algebra language for: "the tensor product is the freest space that encodes all bilinear information about the two factors, and nothing more." For QC, this perspective unifies states and operators and density matrices under one formalism.

Non-product states and the Schmidt decomposition (preview, ch.40)

Every bipartite pure state |\psi\rangle_{AB} \in \mathcal{H}_A \otimes \mathcal{H}_B — even an entangled one — admits a Schmidt decomposition: there exist orthonormal bases \{|u_i\rangle_A\} of \mathcal{H}_A and \{|v_i\rangle_B\} of \mathcal{H}_B and non-negative real numbers \lambda_i (the Schmidt coefficients) such that

|\psi\rangle_{AB} = \sum_i \lambda_i\,|u_i\rangle_A \otimes |v_i\rangle_B, \qquad \sum_i \lambda_i^2 = 1.

The number of non-zero \lambda_i is the Schmidt rank. A state has Schmidt rank 1 if and only if it is a product state; Schmidt rank greater than 1 means entangled.

The Bell state |\Phi^+\rangle has Schmidt decomposition

|\Phi^+\rangle = \tfrac{1}{\sqrt{2}}|0\rangle_A|0\rangle_B + \tfrac{1}{\sqrt{2}}|1\rangle_A|1\rangle_B,

Schmidt rank 2, Schmidt coefficients (1/\sqrt{2}, 1/\sqrt{2}) — maximally uneven Schmidt coefficients give maximally entangled states in this sense. You will see the full theorem in its own chapter; for now, know that every bipartite entangled state has a "diagonal" form that makes its entanglement properties readable at a glance.

Tensor products in continuous-variable QM — two-particle wavefunctions

The tensor product is not specific to qubits. When you studied the hydrogen atom in physics class, the wavefunction of the electron was \psi(x, y, z) — one function of three spatial coordinates. Two particles have a joint wavefunction \Psi(x_1, x_2) of six coordinates (three for each particle), and this joint wavefunction lives in the tensor product of the two single-particle Hilbert spaces:

L^2(\mathbb{R}^3) \otimes L^2(\mathbb{R}^3) \cong L^2(\mathbb{R}^6).

Product states are those where \Psi(x_1, x_2) = \psi_A(x_1)\,\psi_B(x_2) — factored. Entangled states — which are the norm for any pair of interacting particles — cannot be so factored, and the particles' coordinates become correlated in a way classical statistics cannot mimic. The entanglement that qubit tensor products carry is the same phenomenon that makes two-electron atoms quantitatively different from two non-interacting electrons, and it is why quantum chemistry is hard.

Bose's 1924 paper and symmetrised tensor products

When you have two identical quantum particles — two electrons, two photons, two helium-4 atoms — the tensor product alone is not the whole story. Identical particles must satisfy an exchange symmetry: swapping the two particles in the joint state must return either the same state (symmetric, for bosons) or the state multiplied by -1 (antisymmetric, for fermions).

The symmetrised subspace of \mathcal{H} \otimes \mathcal{H} — spanned by states of the form |a\rangle|b\rangle + |b\rangle|a\rangle — is the state space of two identical bosons. The antisymmetrised subspace — spanned by |a\rangle|b\rangle - |b\rangle|a\rangle — is the state space of two identical fermions. Which subspace a particle lives in is a fundamental property called its statistics.

The bosonic case was introduced by Satyendra Nath Bose in a 1924 paper, "Planck's law and the light quantum hypothesis," in which he derived Planck's blackbody spectrum by assuming that photons obey what we now call Bose-Einstein statistics. Einstein translated the paper and championed it; the statistics are named after Bose, as are the particles that obey them. Bosons carry the photon, the Higgs, the gluon; Bose-Einstein condensation (predicted by the statistics, observed experimentally in 1995) is an entire research programme that came from the symmetrised tensor product.

For qubits, we usually treat the qubits as distinguishable — we can point at qubit 1 and qubit 2 as physical objects on different parts of the chip — and so we use the unsymmetrised tensor product |a\rangle \otimes |b\rangle without worrying about statistics. When the underlying physical qubits are implemented by identical bosons or fermions (photons in a waveguide; electrons in a quantum dot), the statistics sneak back in through the details of how operations are implemented. But at the level of the abstract qubit, the tensor product in this chapter is the complete story.

Where this leads next

The partial trace — the operation that goes the other way: from a joint state on \mathcal{H}_A \otimes \mathcal{H}_B to the best description of just one subsystem. Where the algebra of entanglement starts getting concrete.
Bell states and entanglement — the four canonical entangled two-qubit states, their properties, and why they power teleportation, superdense coding, and quantum cryptography.
Two-qubit computational basis — a careful look at the ordering conventions, binary-counting correspondence, and how measurement outcomes label basis states.
Schmidt decomposition — the theorem that every bipartite pure state has a canonical diagonal form; the technical heart of entanglement theory.
Tensor networks — when qubit counts get large, the full 2^n amplitude table becomes unusable; tensor-network representations keep the information that matters and compress away the rest.

References

Nielsen and Chuang, Quantum Computation and Quantum Information (2010), §2.1.7 — Cambridge University Press.
John Preskill, Lecture Notes on Quantum Computation, Ch. 2–3 — theory.caltech.edu/~preskill/ph229.
Wikipedia, Tensor product of Hilbert spaces — formal definition and bilinearity.
Wikipedia, Quantum entanglement — why non-factorable tensor-product states are the default.
Qiskit Textbook, Multiple qubits and entanglement — a hands-on tour of tensor products with code.
Wikipedia, Satyendra Nath Bose — the 1924 paper behind symmetrised tensor products and Bose-Einstein statistics.