In short

The Discrete Fourier Transform (DFT) is the rule that takes a length-N list of complex numbers (x_0, x_1, \ldots, x_{N-1}) — a signal — and produces a new length-N list (y_0, y_1, \ldots, y_{N-1}) — its frequency coefficients — via the single formula

y_k \;=\; \frac{1}{\sqrt N}\sum_{j=0}^{N-1} x_j\, e^{2\pi i\, jk/N}.

Geometrically it is a change of basis, rotating a signal from the time/position basis into the frequency basis. As a matrix it is an N\times N unitary — its own inverse (up to conjugation of phases). Naively computed in O(N^2) time, the Fast Fourier Transform (Cooley–Tukey, 1965) drops this to O(N \log N) — a speedup that enabled JPEG, MP3, 4G, MRI, radio astronomy. The quantum Fourier transform is the same linear transformation, but run on the amplitude vector of an n-qubit state. For that reason, every pixel of the next chapter sits on this chapter's foundation.

Open your phone's photo app and tap a picture. Somewhere in the chain between the sensor that captured the light and the pixels now glowing on your screen, a Discrete Fourier Transform was computed. The same transform decoded the YouTube video you watched earlier. The same transform found the faint radio echo of a pulsar inside the noise at the Giant Metrewave Radio Telescope near Pune. The same transform is inside every MRI scan, every 5G tower, every Diwali song compressed to fit on an SD card.

It is the most important linear operation in signal processing — almost certainly the most frequently executed numerical algorithm on Earth. And in the next chapter you will meet its quantum cousin, the Quantum Fourier Transform, which is the engine at the heart of Shor's factoring algorithm and phase estimation. Before you can see what the quantum version adds, you need the classical version crisp in your head: the formula, the geometry, the matrix, and the fast algorithm that makes it practical.

This chapter is a careful recap. If you have met the DFT before in a signals course, some of what follows will be familiar — stay for the matrix form and the orthogonality argument, which are what the quantum version builds on. If you have not met it before, do not worry: everything you need is in here, derived from scratch, no handwaving.

The idea — a signal, and its frequency recipe

Start with the picture.

You have a string of N numbers — think of them as samples of a sound wave, measured once every millisecond; or pixel intensities along one row of an image; or the closing price of a stock on each of N consecutive days. Call them x_0, x_1, \ldots, x_{N-1}. These N numbers live in "sample space" or "time domain" — the axis you are indexing along is whatever the input is indexed along (time, position, day number).

The DFT takes this signal and gives you back a second list of N numbers: y_0, y_1, \ldots, y_{N-1}. These new numbers live in "frequency space" — y_k tells you how much of frequency k is present in the original signal.

DFT maps a signal to its frequency recipeOn the left, a box labelled time domain contains a list of N sample values x_0 through x_{N-1}. A large arrow labelled DFT points to a box on the right labelled frequency domain containing a list of N coefficients y_0 through y_{N-1}. A caption below notes the two lists have the same length N and contain the same information encoded differently.time domainx_0x_1x_2x_{N-1}what you measuredDFTy_k = (1/√N) Σ x_j e^(2πi jk/N)frequency domainy_0y_1y_2y_{N-1}how much of each frequency
The DFT is a one-to-one linear map between two length-$N$ lists. The inputs are samples; the outputs are frequency coefficients. Both lists contain exactly the same information — they are two coordinate systems for the same signal.

What does "frequency k" mean here? It means a specific pure oscillation — a sinusoid whose value at sample j is e^{2\pi i\, jk/N}. That complex exponential is a pure rotation in the complex plane: as j runs from 0 to N-1, the exponent 2\pi i\, jk/N sweeps around the unit circle k full times. So frequency k=1 is the slowest non-trivial wave (one full turn in N samples); frequency k=2 goes round twice; frequency k = N/2 is the fastest meaningful wave (flips sign every sample).

The DFT says: any signal — any list of N complex numbers — can be written as a sum of these N pure waves, and y_k is exactly the amount of wave-k you need to use.

Why this is interesting: some signals look messy in time but tidy in frequency. A cricket commentator's voice in a raw microphone recording looks like a jagged line with no pattern. The same signal in the frequency domain looks like a few narrow bumps — one at the vocal-cord fundamental, a few at harmonics. You can now delete the bumps you do not care about (say, a 50 Hz hum from the electrical mains in the stadium), inverse-transform, and get back a cleaner voice. That is digital signal processing in one paragraph — and the DFT is the step that makes it possible.

The formula — and decoding every symbol

Here is the DFT, written precisely.

The Discrete Fourier Transform

For a length-N list of complex numbers x_0, x_1, \ldots, x_{N-1}, the Discrete Fourier Transform is the length-N list y_0, y_1, \ldots, y_{N-1} defined by

y_k \;=\; \frac{1}{\sqrt N}\sum_{j=0}^{N-1} x_j\, e^{2\pi i\, jk/N} \qquad \text{for } k = 0, 1, \ldots, N-1.

The inverse DFT is

x_j \;=\; \frac{1}{\sqrt N}\sum_{k=0}^{N-1} y_k\, e^{-2\pi i\, jk/N}.

The forward and inverse formulas differ only in the sign of the exponent. With the 1/\sqrt N normalisation shown here, the DFT is a unitary transformation: the matrix equals its own inverse-transpose-conjugate.

Reading the formula. Walk through it piece by piece.

Why the formula has this exact shape: any complex number e^{2\pi i\, jk/N} is a point on the unit circle, and the N values \omega^0, \omega^1, \ldots, \omega^{N-1} for \omega = e^{2\pi i/N} are the N "roots of unity" — evenly spaced points on the circle. The DFT is the change of basis from the standard basis (N spikes, one per sample) into the basis made of these N complex waves. The formula is just "compute coefficients of the signal in the new basis" — the same way you project a 3D vector onto the x-axis by dotting it with \hat x.

Roots of unity — the circle you need to know

Define \omega = e^{2\pi i/N}. This single complex number is the N-th primitive root of unity: it is the unit-modulus complex number whose N-th power is 1. Specifically, \omega^N = e^{2\pi i} = 1.

Its powers — \omega^0, \omega^1, \omega^2, \ldots, \omega^{N-1} — are N different points evenly spaced around the unit circle. Together, they are called the N-th roots of unity, because each one satisfies z^N = 1.

Eighth roots of unity on the unit circleA unit circle in the complex plane with eight evenly spaced dots on it, labelled omega to the zero through omega to the seven. The zeroth point is at the rightmost position on the real axis, and the subsequent points progress counterclockwise.ReImω^0 = 1ω^1ω^2 = iω^3ω^4 = −1ω^5ω^6 = −iω^7eight evenly spaced points, each one an N=8 root of unity
The $N=8$ roots of unity: eight dots evenly spaced around the unit circle, starting at $\omega^0 = 1$ and stepping counter-clockwise by $2\pi/8 = 45°$ each time. These are the "frequencies" the DFT uses as its basis.

With this shorthand, the DFT formula collapses to

y_k \;=\; \frac{1}{\sqrt N}\sum_{j=0}^{N-1} x_j\, \omega^{jk}.

Much tidier. The whole transform is built from a single complex number \omega and its powers. Everything that follows — the matrix form, the orthogonality proof, the inverse formula — is just arithmetic on \omega^{jk} for various j and k.

The matrix form — a unitary F

The DFT is linear: scaling the input by c scales the output by c; adding two inputs gives the sum of their outputs. Every linear map from \mathbb{C}^N to \mathbb{C}^N is a matrix, so the DFT is a matrix. Define

F_{kj} \;=\; \frac{1}{\sqrt N}\, \omega^{jk} \qquad \text{for } 0 \le j, k \le N-1.

Then y = Fx as column vectors. The row index is k (the output frequency); the column index is j (the input sample).

Here is the N=4 case written out. With \omega = e^{2\pi i/4} = i, the powers \omega^{jk} cycle through 1, i, -1, -i:

F_4 \;=\; \frac{1}{2}\begin{pmatrix} 1 & 1 & 1 & 1 \\ 1 & i & -1 & -i \\ 1 & -1 & 1 & -1 \\ 1 & -i & -1 & i \end{pmatrix}.

Why the rows look like that: row k has entries \omega^{0\cdot k}, \omega^{1\cdot k}, \omega^{2\cdot k}, \omega^{3\cdot k} — that is, the powers of \omega^k. Row 0 is all 1's. Row 1 is 1, i, -1, -i (one full turn). Row 2 is 1, -1, 1, -1 (two full turns — Nyquist). Row 3 is 1, -i, -1, i (three full turns, same as -1 full turn going the other way).

F is unitary. Compute F^\dagger F. The (k, k') entry is

(F^\dagger F)_{k k'} \;=\; \sum_{j=0}^{N-1} \overline{F_{jk}}\, F_{jk'} \;=\; \frac{1}{N}\sum_{j=0}^{N-1} \omega^{-jk}\, \omega^{jk'} \;=\; \frac{1}{N}\sum_{j=0}^{N-1} \omega^{j(k'-k)}.

Why that sum is easy: if k = k', each term is \omega^0 = 1 and the sum is N, giving (F^\dagger F)_{kk} = 1. If k \neq k', the exponent k'-k is a non-zero integer between -(N-1) and N-1, and the sum \sum_{j=0}^{N-1} \omega^{jm} is a geometric series with ratio \omega^m \neq 1, which sums to (1 - \omega^{Nm})/(1 - \omega^m) = (1 - 1)/(1-\omega^m) = 0. So the off-diagonal entries vanish.

Put those together: (F^\dagger F)_{kk'} = \delta_{k,k'} — the identity matrix. So F^\dagger F = I, meaning F is unitary, and F^{-1} = F^\dagger. The inverse of the DFT is the complex conjugate of its transpose — which, looking at entries, is exactly the formula you would get by flipping the sign of the exponent in the DFT itself. That is why the inverse DFT formula has e^{-2\pi i\, jk/N}: it is the complex conjugate of e^{2\pi i\, jk/N}.

The geometric orthogonality-of-roots-of-unity calculation you just saw — \sum_{j=0}^{N-1} \omega^{jm} = N \delta_{m,0} — is the single algebraic fact that makes Fourier analysis work. It is worth memorising: the N roots of unity sum to zero unless they are all 1.

Worked examples

Example 1: DFT of an impulse for N=4

Compute the DFT of the signal x = (1, 0, 0, 0) with N = 4.

Step 1. Identify \omega. With N = 4, \omega = e^{2\pi i/4} = e^{i\pi/2} = i. Why: \omega is always the first non-trivial N-th root of unity. For N = 4, the four roots of unity are 1, i, -1, -i, and \omega = i.

Step 2. Write the DFT formula.

y_k \;=\; \frac{1}{\sqrt 4}\sum_{j=0}^{3} x_j\, \omega^{jk} \;=\; \frac{1}{2}\sum_{j=0}^{3} x_j\, i^{jk}.

Why: \sqrt N = \sqrt 4 = 2, and \omega^{jk} = i^{jk} by substitution.

Step 3. Evaluate each y_k. Because x = (1, 0, 0, 0), only the j = 0 term survives the sum:

y_k \;=\; \frac{1}{2}\cdot x_0 \cdot i^{0 \cdot k} \;=\; \frac{1}{2}\cdot 1 \cdot 1 \;=\; \frac{1}{2} \qquad \text{for all } k.

Why: x_1 = x_2 = x_3 = 0, so those terms drop out. The surviving term has j = 0, which means i^{jk} = i^0 = 1 no matter what k is. The output is the same for every frequency index.

Step 4. Write the result.

y \;=\; \left(\tfrac{1}{2},\; \tfrac{1}{2},\; \tfrac{1}{2},\; \tfrac{1}{2}\right).

Result. The DFT of a single-sample impulse at j = 0 is a uniform distribution across every frequency. Every frequency has the same coefficient 1/2.

Impulse and its flat spectrumTwo bar charts side by side. The left shows the input signal with one tall bar at j equals zero and three zero-height marks at j equals one two three. The right shows the output DFT with four equal-height bars at k equals zero one two three, all at the value one-half.input x_jj=0j=1j=2j=31joutput y_kk=0k=1k=2k=3½k
An impulse at $j=0$ in the time domain produces a flat, uniform spectrum in the frequency domain — every frequency appears with the same coefficient $1/2$.

What this shows. A spike in time spreads out evenly across all frequencies. This is the "time-frequency uncertainty" you may have heard of: a signal localised in time (one spike) has no localisation in frequency (flat spectrum). The tightest-possible spike in time produces the widest-possible spread in frequency — and vice versa.

Example 2: DFT of a cosine wave for N=4

Compute the DFT of x_j = \cos(2\pi j/4) with N = 4, again using \omega = i.

Step 1. Write out the input samples.

x_0 = \cos(0) = 1,\; x_1 = \cos(\pi/2) = 0,\; x_2 = \cos(\pi) = -1,\; x_3 = \cos(3\pi/2) = 0.

So x = (1, 0, -1, 0). Why: \cos(2\pi j/4) = \cos(j\pi/2). Plugging j = 0, 1, 2, 3 gives 1, 0, -1, 0 — the values of cosine at the four angles 0, \pi/2, \pi, 3\pi/2.

Step 2. Expand y_k = \frac{1}{2}\sum_{j=0}^{3} x_j\, i^{jk}.

Only x_0 = 1 and x_2 = -1 contribute, so

y_k \;=\; \tfrac{1}{2}\bigl(1\cdot i^{0} + (-1)\cdot i^{2k}\bigr) \;=\; \tfrac{1}{2}\bigl(1 - i^{2k}\bigr).

Why: drop the zero terms. The remaining two samples give the j=0 contribution (trivially 1) and the j=2 contribution (-i^{2k}, since x_2 = -1).

Step 3. Evaluate for each k. Since i^{2} = -1, we have i^{2k} = (-1)^k, so 1 - i^{2k} = 1 - (-1)^k.

k (-1)^k 1 - (-1)^k y_k = \tfrac{1}{2}(1 - (-1)^k)
0 +1 0 0
1 -1 2 1
2 +1 0 0
3 -1 2 1

Why the table: for even k, (-1)^k = 1 and the difference is 0. For odd k, (-1)^k = -1 and the difference is 2, halved to give 1. The coefficient lands entirely on the odd-k entries.

Step 4. Write the result.

y \;=\; (0,\, 1,\, 0,\, 1).

Result. The DFT of the cosine concentrates at frequencies k = 1 and k = 3, with y_1 = y_3 = 1.

Cosine wave and its two-peaked spectrumLeft: a bar chart of the cosine samples with values 1, 0, minus 1, 0 at j equals 0 through 3 — one bar up, one flat, one bar down, one flat. Right: a bar chart of the DFT coefficients showing two equal bars at k equals 1 and k equals 3, and zero height at k equals 0 and k equals 2.input x_j = cos(2πj/4)j=0j=1j=2j=31−1output y_kk=0k=1k=2k=311
A pure cosine at the fundamental frequency produces a two-peaked spectrum: non-zero at $k = 1$ and $k = 3$, zero elsewhere.

What this shows. A pure cosine in the time domain shows up as exactly two non-zero coefficients in the frequency domain — at k = 1 (the frequency itself) and k = 3 (its "mirror image" at N - 1 = 3). The mirror happens because cosine is \tfrac{1}{2}(e^{i\theta} + e^{-i\theta}) — a sum of a positive-frequency and a negative-frequency wave — and in the DFT, k = N - 1 acts like the negative-frequency partner of k = 1. Every real signal has this mirror symmetry in its DFT.

Complexity — naive vs Fast

Computing the DFT from the formula directly means, for each of the N output values y_k, summing N terms. That is N \times N = N^2 multiplications and additions — quadratic in the signal length. For N = 1024 (a small chunk of audio), that is a million operations. For N = 2^{20} \approx 10^6 (a modest image row), it is 10^{12} — a trillion. Naive DFT does not scale.

The Fast Fourier Transform (Cooley and Tukey, 1965 — though Gauss seems to have known it as early as 1805) is an O(N \log N) algorithm that computes exactly the same DFT using a divide-and-conquer trick: split the sum into even-j and odd-j pieces, each of which is itself a DFT of half the length. The idea recurses: each length-N DFT is two length-N/2 DFTs plus N combining operations, so T(N) = 2T(N/2) + O(N), which solves to T(N) = O(N \log N).

Naive DFT versus FFTA bar chart comparing two algorithms. Left: naive DFT, with a bar labelled N-squared. Right: FFT, with a bar labelled N log N, much shorter. For N equals one million, the naive bar represents a trillion operations and the FFT bar represents twenty million.naive DFTFFTO(N²)O(N log N)≈ 10¹² ops≈ 2 × 10⁷ opsat N = 10⁶
For $N = 10^6$, naive DFT needs $\sim 10^{12}$ operations; the FFT needs $\sim 2 \times 10^7$. A factor-of-50,000 speedup that is the difference between "computable in milliseconds" and "wait 17 minutes."

Why this speedup was transformative: before Cooley–Tukey, spectrum analysis on a 10^6-sample signal was simply impossible at interactive speeds. After, it was interactive. Digital audio, JPEG compression, real-time radar, MRI reconstruction, OFDM in 4G and 5G, GNR radio data pipelines — all of these are practical because O(N \log N) is fast and O(N^2) is not.

Why this chapter sets up the quantum version

Here is where the classical DFT points toward the Quantum Fourier Transform, the topic of the next chapter.

An n-qubit quantum state has N = 2^n complex amplitudes — one for each basis string |00\ldots 0\rangle, |00\ldots 1\rangle, \ldots, |11\ldots 1\rangle. Those amplitudes form a length-N complex vector, exactly like a signal.

The Quantum Fourier Transform is the unitary operation on an n-qubit state whose action on the amplitude vector is the DFT. Mathematically it is the same linear map F you just met. Algorithmically it is wildly different, because:

That is the speedup. It does not mean the QFT solves the DFT problem faster in the sense of "read out all N frequency coefficients." Quantum measurement only reveals one basis outcome per run. But it does mean that certain algorithmic patterns — where you only need a sample from the Fourier-transformed distribution, or where you are running Fourier as an intermediate step and then using interference — become exponentially cheaper.

Shor's factoring algorithm and quantum phase estimation both exploit this. Their engines run a QFT on a state whose amplitude vector has N = 2^n entries — and the full transform would take a classical FFT O(n \cdot 2^n) \approx 2^n operations, exponential in n. The quantum version does it in O(n^2). The quantum advantage in Shor's algorithm is, at its heart, the exponential gap between O(n \cdot 2^n) and O(n^2) on this very specific linear transformation.

The next chapter — QFT defined — makes this precise.

Common confusions

Going deeper

If you are just here to recap the DFT before meeting its quantum cousin, you have it: y_k = \tfrac{1}{\sqrt N}\sum_j x_j\, \omega^{jk}, F is a unitary N \times N matrix, and the FFT computes it in O(N\log N). The rest of this section goes deeper: Cooley–Tukey in detail, the DFT as a Fourier transform on the abelian group \mathbb{Z}_N, the convolution theorem, and fast polynomial multiplication.

Cooley–Tukey in one derivation

Split the DFT sum by even and odd j (assume N is even, write N = 2M):

y_k \;=\; \frac{1}{\sqrt N}\sum_{j=0}^{N-1} x_j\, \omega_N^{jk} \;=\; \frac{1}{\sqrt N}\left(\sum_{\text{even } j} x_j\, \omega_N^{jk} \;+\; \sum_{\text{odd } j} x_j\, \omega_N^{jk}\right).

Relabel: write even j as 2m and odd j as 2m+1, with m running from 0 to M-1:

y_k \;=\; \frac{1}{\sqrt N}\left(\sum_{m=0}^{M-1} x_{2m}\, \omega_N^{2mk} \;+\; \omega_N^k \sum_{m=0}^{M-1} x_{2m+1}\, \omega_N^{2mk}\right).

Notice that \omega_N^{2} = e^{2\pi i \cdot 2/N} = e^{2\pi i/M} = \omega_M — the primitive M-th root of unity. So both inner sums are themselves length-M DFTs of the even-indexed and odd-indexed sub-signals. Let E_k and O_k denote those half-length DFTs; then

y_k \;=\; \tfrac{1}{\sqrt 2}\bigl(E_k + \omega_N^k\, O_k\bigr), \qquad y_{k+M} \;=\; \tfrac{1}{\sqrt 2}\bigl(E_k - \omega_N^k\, O_k\bigr)

for k = 0, 1, \ldots, M-1. (The second line uses \omega_N^{k+M} = -\omega_N^k, which follows from \omega_N^M = e^{i\pi} = -1.)

That is the Cooley–Tukey step: one length-N DFT becomes two length-M = N/2 DFTs plus N "butterfly" combinations. Recursing gives T(N) = 2T(N/2) + O(N) = O(N \log N).

The construction assumes N is a power of 2 for clean recursion. More general splittings (mixed-radix, prime-factor, Rader, Bluestein) handle other sizes; all land at O(N \log N) with different constants.

DFT as a change of basis and as a group Fourier transform

Two views of the DFT that will pay off in the QFT chapter:

Change of basis. Think of \mathbb{C}^N as having two natural bases. The standard basis is \{\delta_0, \delta_1, \ldots, \delta_{N-1}\}, where \delta_j is the vector with a 1 in slot j and zeros elsewhere — the "signal is a spike at j" basis. The Fourier basis is \{f_0, f_1, \ldots, f_{N-1}\}, where f_k is the vector whose j-th entry is \omega^{jk}/\sqrt N — the "signal is a pure wave of frequency k" basis. The DFT is exactly the unitary that takes standard-basis coordinates to Fourier-basis coordinates. In quantum language: the DFT rotates you from the "position" basis to the "momentum" basis.

Fourier on \mathbb{Z}_N. The set \{0, 1, 2, \ldots, N-1\} with addition mod N forms an abelian group called \mathbb{Z}_N. Harmonic analysis on a finite abelian group G gives a canonical "group Fourier transform" — for \mathbb{Z}_N, that transform is exactly the DFT. The Fourier basis vectors are the characters of the group: the homomorphisms \chi_k : \mathbb{Z}_N \to U(1), \chi_k(j) = \omega^{jk}. Every DFT formula and identity you know has a group-theoretic interpretation.

This second view generalises. The quantum Fourier transform over an arbitrary abelian group G is the analog of the DFT for G, and underlies the hidden subgroup problem — a unifying abstraction that contains Shor's factoring, discrete log, and Simon's algorithm as special cases.

The convolution theorem and fast polynomial multiplication

Convolution of two length-N signals x and h is the signal x * h defined by (x*h)_k = \sum_j x_j\, h_{k-j} (indices mod N). Convolution is expensive to compute directly — O(N^2) operations.

The convolution theorem. If \mathcal{F} denotes the DFT, then

\mathcal{F}(x * h) \;=\; \sqrt N\cdot \mathcal{F}(x) \cdot \mathcal{F}(h)

where the product on the right is entry-wise (not matrix multiplication). Because entry-wise multiplication is O(N) and the FFT is O(N\log N), the whole pipeline — FFT both inputs, multiply entry-wise, inverse-FFT — computes convolution in O(N\log N) time. A factor-N/\log N speedup for any problem that boils down to convolution.

And a lot of problems do. Polynomial multiplication is the clearest example: multiplying two degree-N polynomials p and q amounts to convolving their coefficient lists. The Schönhage–Strassen algorithm uses exactly this to multiply N-bit integers in time O(N \log N \log\log N), dramatically faster than long multiplication's O(N^2). The Harvey–van der Hoeven 2019 algorithm pushes this down to O(N \log N), at last matching the theoretical lower bound.

Every high-performance integer-multiplication library (including those used internally by the Python arbitrary-precision integer implementation for very large integers) uses an FFT-based multiplication under the hood.

The DFT in India's applied work

At the Giant Metrewave Radio Telescope (GMRT, Narayangaon, 80 km from Pune), radio data from the array's thirty dishes is cross-correlated and Fourier-transformed in real time to produce sky images. At IIT Madras and IIT Bombay, signal-processing research groups work on FFT-based algorithms for millimetre-wave 5G receivers and for medical imaging. Every Aadhaar biometric comparison at UIDAI runs FFT-accelerated image-processing filters on fingerprint and iris data. The FFT is not abstract; it is the workhorse numerical algorithm behind India's largest scientific and infrastructure projects.

Where this leads next

References

  1. Wikipedia, Discrete Fourier transform — definitions, conventions, and the full table of symmetries.
  2. Wikipedia, Fast Fourier transform — the algorithmic side, Cooley–Tukey and beyond.
  3. Nielsen and Chuang, Quantum Computation and Quantum Information (2010), §5.1 — the classical–quantum bridge, setting up QFT — Cambridge University Press.
  4. John Preskill, Lecture Notes on Quantum Computation, Chapter 6 — the DFT-to-QFT correspondence with full derivations — theory.caltech.edu/~preskill/ph229.
  5. J.W. Cooley and J.W. Tukey, An Algorithm for the Machine Calculation of Complex Fourier Series (1965) — DOI:10.1090/S0025-5718-1965-0178586-1. The original O(N\log N) paper.
  6. Qiskit Textbook, Quantum Fourier Transform — hands-on QFT with Qiskit code, including a classical DFT recap.