In short
When two sound waves of nearly equal frequencies f_1 and f_2 superpose, the result at a fixed point is a tone at the average frequency \bar{f} = (f_1 + f_2)/2 whose amplitude varies slowly as a cosine envelope at the half-difference frequency (f_1 - f_2)/2. Because loudness responds to the square of the amplitude, the ear perceives a throb — a waxing and waning of intensity — every time the envelope reaches a peak, whether the cosine is +1 or -1. This happens twice per envelope cycle, so the audible beat frequency is
Beats are how you can hear whether two instruments are in tune without knowing their absolute pitches: when the throbbing slows to zero, the two notes match. Every tabla player tuning their bayan against a tanpura drone, every piano tuner locking a string against a fork, and every sitar student pulling two chikari strings into unison is using this one formula.
A tanpura drones steadily in a corner of the rehearsal room — the tonic sa, a clean sinusoidal hum filling the space. A sitar student turns the peg of the sitar's main string, trying to bring it into tune with the drone. At first the two sounds clash — you hear a rough, throbbing sound, loud-quiet-loud-quiet, three or four times a second. The student tightens the peg a fraction. The throbbing slows — now twice a second. Tightens another hair's-width. Once a second. Tightens one more tiny amount and the throb disappears entirely. The two notes have locked into perfect unison; the rehearsal can begin.
You did not need to know what frequency either instrument was playing. You did not need a tuning machine or a reference fork or any measuring device at all. The sound itself told you, with precision of better than half a hertz, how far out of tune the sitar was. That is the phenomenon of beats: a ready-made tuning device that the superposition of two waves hands to you for free.
This article derives where that throbbing comes from, proves that its rate equals |f_1 - f_2|, and shows how to read the phenomenon in the algebra and in the ear. By the end you will understand why every stringed instrument in Hindustani and Carnatic music is tuned by listening for beats, and why the same trick underlies laser interferometry, radio heterodyning, and the way the human ear itself distinguishes nearly-identical pitches.
Two SHMs at a fixed point
Start with two sound waves of equal amplitude A but slightly different frequencies, both reaching your eardrum at the same location. At your ear, each wave is an SHM in time (see Principle of Superposition for why the wave at a single point is always an SHM). Take the two to be
with f_1 and f_2 close to each other — say both near 440 Hz, differing by a few hertz. By the principle of superposition, the net displacement of your eardrum is simply the sum:
The question is: what does this sum sound like?
Deriving the beat formula
Step 1. Apply the sum-to-product identity for cosines:
Set \alpha = 2\pi f_1 t and \beta = 2\pi f_2 t:
Why: the identity is a standard trigonometric result proved by expanding \cos(\alpha + \beta)/2 \cdot \cos(\alpha - \beta)/2 using the angle-sum formulas — the cross terms cancel and the direct terms combine. What matters here is that it converts the sum of two cosines at different frequencies into a product of two cosines: one at the average frequency, one at the half-difference frequency.
Step 2. Name the two timescales.
Define
Then
Why: if f_1 and f_2 are both around 440 Hz and differ by \Delta f = 4 Hz, then the "carrier" \bar{f} \approx 440 Hz is the rapid oscillation you hear as a tone, while the "envelope" \cos(\pi \Delta f\, t) varies at \pi \Delta f = 4\pi rad/s — a period of 0.5 s. The two timescales are cleanly separated by more than two orders of magnitude.
Step 3. Interpret the structure.
The sound you hear is a sinusoid at the average frequency \bar{f}, but whose amplitude is not constant — it is multiplied by the slowly varying factor 2A\cos(\pi \Delta f\,t). At moments when the cosine is +1, the amplitude is 2A (full constructive interference). At moments when the cosine is 0, the amplitude is 0 (full destructive interference — momentary silence). At moments when the cosine is -1, the amplitude is -2A — but the sinusoid is still at full magnitude, just with a flipped phase, which the ear does not distinguish from the +2A case.
Step 4. The key point — the ear tracks loudness, not amplitude.
The loudness of a sound at any instant is proportional to the square of the amplitude (intensity — see Intensity and Loudness of Sound). The envelope factor 2A\cos(\pi\Delta f\,t) is squared, giving 4A^2\cos^2(\pi\Delta f\,t) = 2A^2[1 + \cos(2\pi\Delta f\,t)].
That is the essential observation: the perceived loudness oscillates at \Delta f, not at \Delta f/2. The envelope swings positive, negative, positive, negative — but the magnitude of the envelope, which is what the ear registers, cycles at twice that rate.
Why: \cos^2\theta = \tfrac{1}{2}(1 + \cos 2\theta). Squaring a cosine doubles its frequency. The envelope of y(t) cycles at \Delta f /2, but the squared envelope — which is what the ear detects — cycles at \Delta f. So the number of loud-silent-loud transitions you hear per second is \Delta f, exactly the frequency difference.
Watch the beats
Count the throbs in the animation. Between t = 0 and t = 4 s you see exactly four amplitude peaks — one per second, matching f_{\text{beat}} = 1 Hz. If you made f_1 = 4 Hz and f_2 = 6 Hz (difference 2 Hz), you would see two throbs per second. The formula is exact.
Why the tuning trick works
The strategy for tuning one instrument against another is now transparent. The moment the two instruments are in exact tune, f_1 = f_2, so \Delta f = 0 and f_{\text{beat}} = 0 — the throbbing stops entirely. Conversely, if you hear beats at all, the instruments are out of tune, and the number of beats per second tells you by how much.
- 4 beats per second → you are 4 Hz off. If the reference is at 220 Hz, you are at either 216 Hz or 224 Hz (the beat tells you the magnitude, not the sign).
- 1 beat per second → you are 1 Hz off. Very close.
- 1 beat per three seconds (\approx 0.33 Hz) → within half a hertz.
- No audible beats → locked to better than the ear can detect, typically \pm 0.1 Hz.
The musician listens to the rate, adjusts the tuning peg, listens again, iterates. The rate slows as the tuning converges, and snaps to zero when it is exact. The sense that a particular tuning is "right" is not ear-magic; it is the disappearance of f_{\text{beat}}.
How to tell which way is sharp
The beat tells you |f_1 - f_2| but not the sign. Two tricks:
- Over-tighten and listen. Tighten the string a tiny bit past the tuning point. If beats slow (you were below), slacken back to the null. If beats speed up (you were above), slacken further past. A few iterations pin the sign.
- Use the tanpura as reference. A tanpura's drone is a thick chord of harmonics. Beats against a single pure tone produce a characteristic throb; beats against the rich harmonics are subtly coloured, and an experienced ear reads the colour as "flat" or "sharp."
In Indian classical tuning, the usual practice is to tune upward: start with the string slightly slack, bring it up slowly while listening to the beats against the tanpura, and stop the moment the beats disappear. The physiological logic is that the peg is easier to control when tightening than when loosening.
Worked examples
Example 1: Tabla against tanpura
A tanpura is droning a steady s (tonic) at exactly 220.0 Hz. A tabla player tightens the skin of the bayan (left drum) and strikes it while the drone sounds. The player hears three beats per second. What are the two possible frequencies of the bayan? If a further half-turn of the tightening ring produces a beat rate of one per second, and a full turn eliminates the beats, what was the direction of the initial mistuning?
Step 1. Apply the beat formula.
Why: the perceived throb frequency equals the magnitude of the frequency difference. It does not tell you which of the two tones is higher — only how far apart they are.
Step 2. Identify the two candidate bayan frequencies.
Step 3. Use the tightening experiment to pick between them.
Tightening the bayan's skin raises its pitch (shorter wavelength in the drum membrane for a given mode, hence higher frequency). After a half-turn of further tightening, the beat rate drops from 3 Hz to 1 Hz; after a full turn it drops to 0. The pitch is moving toward 220 Hz.
If the bayan had started at 217 Hz (below the drone), tightening would raise it toward 220, the beats would slow and finally disappear — consistent with what was observed.
If the bayan had started at 223 Hz (above the drone), tightening would raise it further, the beats would speed up — inconsistent with what was observed.
Why: the throb rate moves in the same direction as |f_{\text{bayan}} - f_{\text{drone}}|. When tightening decreases the rate, the bayan was below the drone.
Result: The bayan was initially at f_{\text{bayan}} = 217 Hz, i.e. 3 Hz below the tanpura drone. After one full turn of the tightening ring, it reached 220 Hz and was in tune.
What this shows: The beat rate is a magnitude. Whether you are above or below the reference is disambiguated by the direction of change when you deliberately shift your pitch. This is the universal procedure for tuning by ear — by watching how the beats respond to a known adjustment.
Example 2: Two sitar strings and a counted beat rate
A sitar has two unison drones called chikari, nominally tuned to the same high pitch (440 Hz). The student plucks both and counts 20 beats in 8 seconds, then tightens the peg of one string. Now they count 20 beats in 16 seconds. Find: (a) the frequency difference in each case, (b) which string was adjusted and by how many hertz.
Step 1. Compute the beat frequencies.
Before tightening: f_{\text{beat},1} = 20/8 = 2.5 Hz.
After tightening: f_{\text{beat},2} = 20/16 = 1.25 Hz.
Why: beat frequency is the number of throbs per second — just count and divide by elapsed time.
Step 2. Translate into frequency differences.
The pitch gap halved, which means the tightened string moved 2.5 - 1.25 = 1.25 Hz closer to the other.
Step 3. Which string was tightened, and what is the new frequency?
If string B was originally at f_B = 440 Hz (reference), and string A was 2.5 Hz off, then f_A = 437.5 Hz or 442.5 Hz. Tightening raises the pitch by 1.25 Hz.
- If f_A = 437.5 Hz (below B), tightening gives f_A = 438.75 Hz. Gap is |438.75 - 440| = 1.25 Hz. ✓
- If f_A = 442.5 Hz (above B), tightening gives f_A = 443.75 Hz. Gap is |443.75 - 440| = 3.75 Hz. ✗
Only the first case is consistent with the measurement. So f_A was initially below 440 Hz, and tightening has moved it upward toward f_B.
Why: to pick between the two algebraically possible initial frequencies, use the observation that tightening made the beats slow down. If the adjusted string was below the other, tightening brings them closer (slower beats); if it was above, tightening moves it further (faster beats).
Result: String A was adjusted. Its frequency went from 437.5 Hz to 438.75 Hz, i.e. was raised by 1.25 Hz. Another similar adjustment would bring the two strings into unison.
What this shows: The beat method measures frequency differences with remarkable precision — one beat per ten seconds corresponds to an error of 0.1 Hz out of 440 Hz, or about 4 parts in 10^4. Beyond that, the ear can sometimes detect absence of beats more sensitively than a frequency counter could, because the tuning converges exactly in the limit of no throb.
Example 3: Stroboscopic beats — the spinning fan
A ceiling fan in a Mumbai apartment spins at an unknown speed close to 900 RPM. Under the flicker of a 50 Hz tubelight (which pulses at 100 Hz because each AC cycle produces two brightness peaks), the blades appear to creep backward at 1 revolution per minute. Find the true rotational frequency of the fan. (This is the same physics as acoustic beats: the superposition of the light's pulse rate with the fan's rotational rate produces an apparent slow motion at the difference frequency.)
Step 1. Convert strobe rate to RPM.
Step 2. Find the nearest synchronous fan rate near 900 RPM.
If the fan rotated at exactly 6000/k RPM for some positive integer k, each strobe would illuminate the blades at the same angular position, and the blades would appear frozen. Candidates:
- k = 6: 6000/6 = 1000 RPM
- k = 7: 6000/7 \approx 857 RPM
- k = 8: 6000/8 = 750 RPM
Nearest to 900: the sync at 1000 RPM with k = 6.
Why: when the fan completes exactly one full revolution per six flashes, each flash catches each blade at the same spot as the previous flash — perfect sync. Near-sync shows as slow drift rather than full rotation.
Wait — there is also a three-blade consideration. With three identical blades, the apparent blade pattern repeats when the fan has turned one-third of a revolution. So "sync" happens whenever the fan has turned by any multiple of 120° in the inter-flash interval. Divide 6000 RPM by 3: effective sync rates are 2000/k RPM.
- k = 2: 1000 RPM
- k = 3: 667 RPM
Still, 1000 RPM is the nearest sync near 900 RPM.
Step 3. Apply the beat formula.
The blades appear to drift backward at 1 RPM. In the stroboscopic analogy, this is the difference between the fan's true rate and the nearest sync rate:
Therefore f_{\text{fan}} = 999 RPM or f_{\text{fan}} = 1001 RPM.
Why: the "beat" between a rotating object and a strobing light works exactly like an acoustic beat. The apparent motion reveals the difference f_{\text{fan}} - f_{\text{sync}} in sign and magnitude, because you can see which way the blades appear to move.
Step 4. The direction of apparent motion resolves the sign.
If the fan is slightly slower than 1000 RPM (say 999), each strobe catches each blade slightly behind where it was one flash ago — the blades appear to drift backward.
If the fan is slightly faster than 1000 RPM (say 1001), the blades appear to drift forward.
The problem states that the blades drift backward — so f_{\text{fan}} = 999 RPM.
Result: The fan's true rotational rate is 999 RPM.
What this shows: The same superposition mathematics that produces audible beats in acoustics produces visible stroboscopic beats in optics. This is the principle of the automobile timing light, the ultrasonic diagnostic velocimeter, and the heterodyne receiver in every radio: mix two close frequencies and read the slow beat to measure their difference with far greater precision than you could measure either one directly.
Common confusions
-
"The beat frequency is the average of f_1 and f_2." No — the perceived pitch is the average (the fast carrier), but the beat frequency (the throb rate) is the difference |f_1 - f_2|. Two close-frequency tones sound like one tone that throbs; the pitch of the tone is about the average, the rate of the throb is the difference.
-
"The beat frequency is half the difference, because that's what the envelope frequency is." This is the most common mistake. The envelope \cos(\pi\Delta f\,t) does oscillate at \Delta f/2, but the ear tracks loudness, not signed amplitude. Loudness depends on the square of the amplitude, which doubles the effective frequency. So the perceived beat rate is \Delta f, exactly the difference — not half of it.
-
"Beats only happen with sound waves." Beats arise in any superposition of two close-frequency waves. Light waves beat — this is the basis of Doppler interferometry, laser heterodyne detection, and LIGO's data analysis. Radio waves beat — every radio receiver uses the principle to down-convert an incoming signal to a lower audio frequency. Even water waves beat, though the slow time scales make it hard to perceive. The phenomenon is universal to waves, not peculiar to sound.
-
"Perfectly pure tones are required for beats." Not strictly. Two tones with rich harmonic content also produce beats, but the pattern is more complex: you get beats at the fundamental difference, at the harmonic difference, at the difference of every pair of harmonics. In a tanpura plus sitar setup, the "beats" you hear at any moment are a superposition of many simultaneous beats from all the matched harmonic pairs. The dominant beat, used for tuning, is usually the one between the strongest matched harmonics.
-
"The beat formula f_{\text{beat}} = |f_1 - f_2| is approximate." It is exact when the two amplitudes are equal and the wave equation is linear. For unequal amplitudes, the envelope does not reach zero — it oscillates between |A_1 + A_2| and |A_1 - A_2|, and the beats are less pronounced but still perfectly periodic at |f_1 - f_2|. For strictly nonlinear media (very loud sounds), extra frequencies can appear, but the fundamental beat rate is still |f_1 - f_2|.
-
"If you can hear the beat rate, you can always tune to exact zero." The human ear can distinguish beat rates reliably down to about 0.1 Hz, i.e. one beat per ten seconds. Below that, the beat disappears into the noise of variable room reverberation and non-ideal instrument tones. For tuning to 0.01 Hz accuracy (needed in, say, atomic clock stabilisation), you use instrumentation rather than the ear.
If you came here to understand why two close-frequency tones throb, what sets the beat rate, and how to tune by ear, you have it. What follows is for readers who want the energy accounting, the generalisation to unequal amplitudes, the connection to group velocity, and a look at the physiology of hearing that makes beats possible.
Unequal amplitudes — beats with partial cancellation
Repeat the derivation with y_1 = A_1\cos(2\pi f_1 t) and y_2 = A_2\cos(2\pi f_2 t), unequal amplitudes. The sum is
The sum-to-product identity does not apply directly to unequal amplitudes. Instead, write the sum using phasors: at each instant, add two vectors of lengths A_1 and A_2 with a phase difference \varphi(t) = 2\pi\Delta f\, t that drifts slowly. By the cosine rule, the resultant amplitude is
Step 1. Maximum: \cos = +1, amplitude A_1 + A_2.
Step 2. Minimum: \cos = -1, amplitude |A_1 - A_2|.
The envelope oscillates between A_1 + A_2 and |A_1 - A_2|, never reaching zero unless A_1 = A_2. The beats are still periodic at |f_1 - f_2|, but they are partial — the sound never completely dies away in the troughs.
This is why real-world beats (two instruments, never perfectly matched in volume) sound more like a gentle throb than a full on-off pulsing. In a perfect tuning demonstration with two electronic oscillators of equal amplitude, the silence at the trough is crisp; in the real world it is usually just a noticeable dip.
Energy accounting — where does the energy go at the null?
At the instant of maximum destructive interference, the eardrum is motionless — the sound has vanished. But the two sources are still pumping out energy. Where does it go?
The answer: nowhere. There is no contradiction, because conservation of energy applies to the entire spatial field, not to one point.
Consider two speakers pumping out pure tones at close frequencies. At any moment, the pattern of constructive and destructive interference is spatially distributed in the room. When the envelope is at zero at your ear, it is at a maximum at some other place. As time passes, the zones of constructive and destructive interference swirl through space, always rearranging. The total energy delivered to the room integrated over time, and over all positions, equals the sum of the two sources' outputs. Beats are a local phenomenon in time and space; the global energy balance is untouched.
More precisely: if you sum the intensity |y_1 + y_2|^2 over one full beat period, the cross-term 2 y_1 y_2 averages to zero (because it is periodic at the beat frequency). What is left is |y_1|^2 + |y_2|^2 — the sum of the two intensities, exactly what each source would contribute if the other weren't present.
Beats and group velocity
A wave packet — a pulse that is a superposition of waves at slightly different frequencies — travels through space at the group velocity v_g = d\omega/dk. The beat formula is the one-location special case of the general envelope-dynamics problem. At a single point, the envelope oscillates in time at the difference frequency \Delta\omega. As a function of position and time, the envelope moves at v_g.
For a packet whose frequencies are centred near \omega_0 with width \Delta\omega, the envelope has spatial extent \sim 1/\Delta k and moves at v_g. If \Delta\omega and \Delta k satisfy the wave's dispersion relation \omega(k), then v_g = d\omega/dk is generally different from the phase velocity v_\phi = \omega/k. For non-dispersive waves (like sound in air), v_g = v_\phi — the packet moves at the same speed as the individual wave crests. For dispersive waves (like deep-water waves or quantum-mechanical wave packets), they differ, and a packet can move faster or slower than its carrier wavelets.
The two-tone beat pattern is the simplest visualisable example of this group-velocity behaviour: the slow envelope is the "packet", and its oscillation in time at a fixed point (the beat) is what you hear as the throb. This is the link between the prosaic phenomenon of tabla tuning and the abstract formal machinery of wave packets and group velocity used throughout modern physics and communications engineering.
Multi-tone beats and the cochlea's place principle
Three or more close-frequency tones produce more complex beat patterns. For three tones at f_0 - \Delta, f_0, f_0 + \Delta with equal amplitude, the sum has a carrier at f_0 and an envelope (1 + 2\cos(2\pi\Delta t)) — the amplitude varies between -1 and 3. The pattern still has period 1/\Delta, but within one period it has a subtle substructure: a sharp main peak, a small secondary peak at the negative excursion. The ear hears a throb with a "skip" character rather than a simple pulse.
The human ear discriminates nearby pitches through the cochlea, a spiral-shaped fluid-filled organ in the inner ear whose basilar membrane has a position-dependent resonant frequency: high frequencies excite the thin end near the entrance, low frequencies excite the thick far end. Two nearby frequencies excite nearby regions of the basilar membrane, and their excitation patterns overlap. The local hair cells fire with the sum of the two oscillations — and what they send to the brain is, essentially, the beat envelope. The perception of "out-of-tuneness" and "roughness" in music, far from being mystical, is the local beat patterns in the cochlea being read out by the auditory nerve.
The just-noticeable frequency difference at 440 Hz is about 0.5 Hz for most listeners — which corresponds to a beat rate of 0.5 per second, i.e. one beat per two seconds, perfectly consistent with the ear's own beat-sensing machinery. Musicians trained in Indian classical music, where microtonal intonation is essential, can often discriminate 0.2 Hz differences at 440 Hz — the limit of practical biological precision.
Beats in three dimensions — the moiré pattern
If you draw two sets of parallel lines at slightly different spacings on two transparent sheets and overlay them, the interference of the two patterns produces a moiré pattern — visible dark and light bands at a much larger spacing than either of the line patterns. This is the two-dimensional beat: the spatial frequencies of the two line sets beat against each other to produce a coarse spatial modulation. Moiré is used in printing (to detect fakes), in strain analysis (to visualise microscopic displacement of a surface), and in displays (to prevent colour artefacts). It is the exact spatial analogue of the temporal beat you hear when tuning a tabla.
Where this leads next
- Standing Waves and Normal Modes — the related superposition phenomenon where two opposite-direction waves of the same frequency produce a spatial pattern of nodes and antinodes, as opposed to two same-direction waves of different frequencies producing a temporal pattern of beats.
- Principle of Superposition — the general rule that beats are a consequence of: two waves add, point by point, moment by moment.
- Superposition of SHMs — at any fixed point, a wave is an SHM in time; two nearby-frequency waves at a point are two close-frequency SHMs, which is where beats start mathematically.
- Doppler Effect — why a moving source produces a shifted frequency, which when heard alongside a stationary reference frequency produces beats whose rate reveals the source velocity. The principle of all Doppler-based velocity measurements.
- Sound Waves — Nature and Propagation — the physical context of sound in air, needed for numerical intuition about the frequencies that actually occur in musical acoustics.