FFT & Spectral Methods — Math Reference
1. At a glance
The Fourier transform decomposes a signal into a sum (or integral) of sinusoidal components, exposing the frequency content hidden in a time- or space-domain waveform. It is the mathematical hinge between two dual views of every physical process: time vs frequency, space vs wavenumber, configuration vs momentum.
The Fast Fourier Transform (FFT) is the family of algorithms that computes the discrete Fourier transform (DFT) of length N in O(N log N) operations rather than the naive O(N²). The modern formulation is the Cooley–Tukey algorithm (Cooley + Tukey 1965, Math Comp), though Heideman, Johnson + Burrus 1984 (IEEE ASSP Magazine) showed that Gauss had derived essentially the same decomposition in unpublished notes circa 1805, predating Fourier’s own 1807 work on the continuous transform.
The FFT is routinely cited (Dongarra + Sullivan 2000, IEEE CISE “Top 10 Algorithms of the 20th Century”) as one of the most consequential algorithms ever discovered. Without it, modern signal processing, image compression, wireless communications, MRI reconstruction, large-integer arithmetic, and direct numerical simulation of turbulence would be either impossible or thousands of times slower.
This note covers the continuous and discrete Fourier transforms, the principal FFT algorithm families, windowing and short-time analysis, multi-dimensional and non-uniform variants, the DCT/MDCT cousins that power audio and image codecs, the major numerical libraries, and the canonical pitfalls.
2. Fourier series and transforms
Continuous Fourier series (periodic, continuous-time)
A periodic signal x(t) with period T can be expanded as
x(t) = Σ_{k=−∞}^{∞} c_k · exp(2πi · k · t / T)
where the Fourier coefficients are
c_k = (1/T) · ∫_0^T x(t) · exp(−2πi · k · t / T) dt.
This is the original construction of Fourier 1822 (Théorie analytique de la chaleur), applied to heat conduction in a rod.
Continuous Fourier transform (aperiodic, continuous-time)
For an aperiodic, integrable x(t), the limit T → ∞ produces the continuous-time Fourier transform (CTFT):
X(f) = ∫_{−∞}^{∞} x(t) · exp(−2πi · f · t) dt,
x(t) = ∫_{−∞}^{∞} X(f) · exp(2πi · f · t) df.
There are several normalization conventions; the symmetric “Hz-frequency” form above is the engineering default. Physicists often write it in terms of angular frequency ω = 2πf, with a 1/(2π) factor split symmetrically (1/√(2π) each) or fully on the inverse.
Discrete-time Fourier transform (DTFT) — aperiodic, discrete-time
For a discrete sequence x[n], the DTFT is continuous and 2π-periodic in normalized frequency ω:
X(e^{iω}) = Σ_{n=−∞}^{∞} x[n] · exp(−iω · n).
Discrete Fourier transform (DFT) — periodic, discrete-time, finite-N
Sampling the DTFT at N equally spaced frequencies ω_k = 2πk/N gives the DFT (see §3). The DFT is what computers actually compute; the CTFT and DTFT are analytic tools.
Standard transform pairs (CTFT, symmetric Hz convention)
- δ(t) ↔ 1 (a Dirac impulse has flat spectrum).
- 1 ↔ δ(f) (DC has all energy at zero frequency).
- cos(2π f_0 t) ↔ [1/2](δ(f − f_0) + δ(f + f_0)).
- exp(−π t²) ↔ exp(−π f²) (the Gaussian is its own Fourier transform).
- rect(t) ↔ sinc(f) (rectangular pulse ↔ sinc; appears in every windowing analysis).
- sinc(t) ↔ rect(f) (ideal low-pass filter is non-causal in time).
- exp(−a|t|) ↔ 2a / (a² + (2πf)²) (Lorentzian; spectral line shape).
- comb_T(t) = Σ δ(t − nT) ↔ (1/T) · comb_{1/T}(f) (Dirac comb is its own transform up to scale; basis of Poisson summation and the sampling theorem).
3. DFT — definition and properties
Definition
For a length-N complex sequence x_0, …, x_{N−1}, the forward DFT is
X_k = Σ_{n=0}^{N−1} x_n · exp(−2πi · n · k / N), k = 0, …, N−1.
The inverse DFT (IDFT) is
x_n = (1/N) · Σ_{k=0}^{N−1} X_k · exp(2πi · n · k / N).
The factor 1/N can be placed on the forward, the inverse, or split as 1/√N on both (the “unitary” convention). NumPy, SciPy, MATLAB, and FFTW all place it on the inverse by default; this is the most common convention but the source of many bugs (see §20).
The “twiddle factor” W_N = exp(−2πi / N) is the primitive N-th root of unity. The DFT matrix has entry F_{k,n} = W_N^{kn} and is (up to the 1/N factor) unitary.
Key properties
- Linearity: DFT{a·x + b·y} = a · DFT{x} + b · DFT{y}.
- Convolution theorem (circular): x ⊛ y ↔ X · Y (element-wise product). Linear convolution requires zero-padding to length ≥ len(x) + len(y) − 1 to avoid wraparound.
- Parseval / Plancherel: Σ |x_n|² = (1/N) · Σ |X_k|². Energy is conserved up to the normalization factor.
- Time shift: x_{n−m} (cyclic) ↔ X_k · exp(−2πi · k · m / N). A time shift becomes a frequency-dependent phase ramp.
- Frequency shift / modulation: x_n · exp(2πi · n · m / N) ↔ X_{k−m} (cyclic). Multiplication by a complex exponential in time shifts the spectrum.
- Conjugate symmetry for real input: if x_n is real, then X_{N−k} = conj(X_k). Half of the output is redundant — exploited by the real-FFT (§5).
- Periodicity: X_k is periodic with period N (the spectrum lives on the circle, not the line).
- Duality: DFT of DFT = N · time-reversed sequence (modulo signs and the convention).
4. FFT algorithms
The FFT is not a single algorithm but a family. All exploit the factorization of N to recursively decompose a length-N DFT into smaller DFTs.
Cooley–Tukey radix-2 (1965)
The canonical algorithm. For N a power of 2, split the input into even-indexed and odd-indexed subsequences:
X_k = E_k + W_N^k · O_k, X_{k+N/2} = E_k − W_N^k · O_k
where E_k and O_k are length-(N/2) DFTs of the even and odd halves. Recurse, with base case N=1. The recurrence T(N) = 2·T(N/2) + O(N) gives T(N) = O(N log N).
Decimation-in-time (DIT) splits the input; decimation-in-frequency (DIF) splits the output. They are duals and give the same operation count: roughly 5 N log_2 N real flops for complex-to-complex.
In-place implementations require a bit-reversal permutation of either input or output. The classic “butterfly” diagram is the canonical pedagogical picture.
Radix-4, radix-8, split-radix
Higher radices reduce the number of complex multiplications. Split-radix FFT (Duhamel + Hollmann 1984) combines radix-2 and radix-4 in alternating stages and achieves the lowest known operation count for power-of-2 sizes among the classical algorithms (~4 N log_2 N flops). FFTW and Intel MKL use split-radix or further-refined variants internally.
Mixed-radix
For N = N_1 · N_2 · … · N_r with the factors small primes (2, 3, 5, 7), apply Cooley–Tukey recursively with the appropriate radix at each stage. SciPy’s scipy.fft is most efficient for N whose factorization contains only small primes.
Bluestein’s chirp-z algorithm (Bluestein 1968)
Re-expresses the DFT as a convolution by completing the square in the exponent: n·k = (n² + k² − (k−n)²) / 2. The convolution is then computed via a power-of-2 FFT of size ≥ 2N − 1. This handles arbitrary N (including prime N) in O(N log N) at the cost of a ~3× constant factor. Used in NumPy/SciPy and FFTW when N is awkward.
Prime-factor algorithm (PFA, Good–Thomas)
For N = N_1 · N_2 with gcd(N_1, N_2) = 1, the Chinese remainder theorem permutes the DFT into a 2D DFT of size N_1 × N_2 with no twiddle factors between stages — a strict win over Cooley–Tukey when applicable. Generalizes to more coprime factors. Used in some custom DSP implementations; less common in general-purpose libraries.
Rader’s algorithm (Rader 1968)
For prime N, re-expresses the length-N DFT as a length-(N−1) cyclic convolution by exploiting the multiplicative group of integers mod N. Combined with Bluestein, gives efficient handling of any size.
Winograd FFT (Winograd 1978)
Minimizes the number of multiplications (at the cost of more additions). Optimal for small fixed sizes (N = 2, 3, 4, 5, 7, 8, 9, 11, 13, 16). Used as building blocks inside larger mixed-radix transforms in older DSP libraries; less relevant on modern hardware where multiplications and additions cost the same.
5. Real FFT (rfft)
When x is real, conjugate symmetry X_{N−k} = conj(X_k) means only N/2 + 1 output bins are independent: X_0 (DC, real), X_1, …, X_{N/2−1} (complex), and X_{N/2} (Nyquist, real if N even). The remaining N/2 − 1 bins are redundant.
Specialized “real-to-complex” (R2C) routines exploit this to halve both compute and memory: NumPy’s numpy.fft.rfft, SciPy’s scipy.fft.rfft, FFTW’s fftw_plan_dft_r2c_1d, cuFFT’s cufftPlan1d with CUFFT_R2C. The inverse is irfft / C2R.
In practice almost all signal-processing input is real (audio, EEG, accelerometer, single-channel images), so rfft should be the default unless you have a specific reason to do complex.
6. Multi-dimensional FFT
The d-dimensional DFT is separable: it equals d successive 1D DFTs along each axis. A 2D FFT on an M × N grid is computed by FFTing each of M rows (length N), then each of N columns (length M). Cost is O(MN log(MN)).
Applications:
- Image processing: filtering, convolution, frequency-domain analysis. OpenCV’s
cv::dftandcv::idft, FFTW + GIMP. - 2D PDE solvers: Poisson equation, Helmholtz, Stokes flow with periodic boundaries.
- 3D molecular dynamics: Particle-Mesh Ewald (PME) (Darden, York + Pedersen 1993) computes long-range Coulomb interactions in O(N log N) via 3D FFT on a charge-density grid. GROMACS, AMBER, NAMD, LAMMPS, OpenMM all use PME for biomolecular simulations.
- 3D turbulence DNS (direct numerical simulation): pseudospectral codes on triply periodic boxes (§16).
- MRI: k-space data is acquired in the Fourier domain; reconstruction is an inverse FFT (§17).
- Crystallography: structure factors are 3D Fourier coefficients of the electron density.
7. Windowing
A finite-length DFT implicitly assumes the N-sample input is one period of an infinitely periodic signal. If the true signal is not periodic in N (the usual case), the discontinuity at the wraparound point smears energy across all frequency bins — spectral leakage.
Mitigation: multiply the input by a window function w_n that tapers smoothly to zero at both ends. The DFT then sees X * W (convolution in frequency), where W is the DTFT of the window. The trade-off is between main-lobe width (frequency resolution) and side-lobe level (dynamic range to detect weak components near a strong one).
Common windows (Harris 1978, “On the use of windows for harmonic analysis with the DFT”, Proc IEEE):
- Rectangular (no window): narrowest main lobe (≈ 2 bins), worst side-lobe level (−13 dB). Use only when the signal is genuinely periodic in N.
- Hann (Hanning): cosine-squared. Main-lobe width ≈ 4 bins, first side-lobe −32 dB. The default choice for general spectrum analysis.
- Hamming: tuned cosine. Slightly narrower main lobe, first side-lobe −43 dB, but higher far-out side-lobes than Hann.
- Blackman, Blackman–Harris (3-term, 4-term, 7-term): progressively wider main lobes, progressively lower side-lobes (down to −92 dB for 4-term Blackman–Harris). Use for high dynamic-range spectrum analysis.
- Flat-top: very wide main lobe (~9 bins), extremely flat in amplitude across the bin. Used in instrumentation when accurate amplitude estimation of a tone matters more than frequency resolution (e.g., calibration of test equipment).
- Kaiser (Kaiser + Schafer 1980): parameterized by β; smoothly interpolates between rectangular (β=0) and high-suppression (β=14 ~ Blackman–Harris). Near-optimal in the sense of concentrating energy in the main lobe.
- Gaussian (truncated): minimum time-frequency product (Heisenberg uncertainty), useful in STFT (§8).
8. STFT and spectrogram
The DFT gives the global frequency content of a signal but throws away all time information. The Short-Time Fourier Transform (STFT) restores it by computing a DFT on each of a series of short, overlapping windows of the input:
STFT{x}[m, k] = Σ_n x[n] · w[n − mH] · exp(−2πi · n · k / N)
where w is a window, N is the window length, H is the hop size (in samples), and m indexes successive frames. The spectrogram is |STFT|² visualized as a 2D image with time on the x-axis and frequency on the y-axis.
There is an irreducible trade-off (a form of the Heisenberg–Gabor uncertainty principle): a short window gives good time resolution but coarse frequency resolution; a long window gives the reverse. The hop size H controls overlap; H = N/2 (50%) or H = N/4 (75%) are typical.
Applications:
- Audio: mel-spectrograms (STFT magnitude warped to a perceptual mel frequency scale) are the standard front-end for speech recognition (Whisper (Radford et al. 2022, OpenAI), Wav2Vec2 (Baevski et al. 2020), older Kaldi pipelines), music information retrieval (genre classification, beat tracking, source separation), and audio synthesis (HiFi-GAN, neural vocoders).
- Seismic processing: time-frequency analysis of earthquake records to identify P, S, surface-wave arrivals.
- NDT vibration analysis: order tracking, bearing-fault frequencies in rotating machinery.
- Sonar / radar: range-Doppler maps.
- Bioacoustics: bat-call classification, whale-song analysis, birdsong (BirdNET).
9. Wavelets (brief)
Wavelets provide an alternative time-frequency decomposition with multi-resolution: the analysis window is short at high frequencies (good time localization) and long at low frequencies (good frequency localization), automatically. This matches how transients live in signals better than the fixed window of the STFT.
The continuous wavelet transform (CWT) uses a continuous family of scaled and translated copies of a mother wavelet. The discrete wavelet transform (DWT) is computed via a fast filter-bank algorithm (Mallat 1989, IEEE PAMI), with O(N) cost — even faster than the FFT.
Key wavelet families:
- Haar (Haar 1910): simplest, discontinuous, only second-order moments vanish.
- Daubechies (Daubechies 1988 Comm Pure Appl Math): compact-support orthogonal wavelets with maximum vanishing moments for given support length.
db4,db8, etc. - Symlets, Coiflets, Biorthogonal: refinements with various symmetry / vanishing-moment properties.
- Morlet (complex Gaussian-modulated sinusoid): standard for CWT in time-frequency analysis.
Applications: JPEG2000 image compression (CDF 9/7 biorthogonal wavelet), denoising via wavelet shrinkage (Donoho + Johnstone 1994), seismic reflection processing, ECG analysis. Libraries: PyWavelets, MATLAB Wavelet Toolbox.
10. Applications
Fast convolution
By the convolution theorem, x ⊛ h = IFFT(FFT(x) · FFT(h)). Cost is O((N + M) log(N + M)) versus O(N · M) for direct convolution. Crossover where FFT-conv beats direct depends on hardware but is roughly kernel length > 30–50 samples for 1D and kernel size > ~7×7 for 2D images. Below that, the FFT overhead (planning, twiddle setup, zero-padding) dominates.
OpenCV’s cv::filter2D uses direct convolution for small kernels and cv::dft-based convolution for large ones. SciPy provides both scipy.signal.convolve (auto-chooses) and explicit scipy.signal.fftconvolve. GIMP, ImageMagick, Photoshop use FFT-based convolution for large blur/sharpen kernels.
Spectrum analysis
Lab spectrum analyzers, software-defined radios (SDR), audio analyzers (Smaart, FuzzMeasure), vibration analyzers (B&K Pulse) — all compute windowed FFTs of digitized input and display magnitude or power spectra.
Audio codecs
- MP3 (ISO/IEC 11172-3, 1993): hybrid filter-bank + MDCT (modified discrete cosine transform).
- AAC (ISO/IEC 13818-7, 1997): pure MDCT, larger window sizes, used in iTunes, YouTube, broadcast.
- Vorbis (Xiph, 2000): MDCT-based, royalty-free.
- Opus (Valin, Vos + Terriberry 2012, RFC 6716): hybrid CELT (MDCT) + SILK (LPC); the default codec for WebRTC, Discord, Zoom.
All of these use the MDCT (§14) rather than the DFT because MDCT gives perfect reconstruction with overlapping windows at critical sampling (no redundancy).
Image and video codecs
- JPEG (ISO/IEC 10918, 1992): 8×8 block DCT-II + quantization + entropy coding.
- H.264/AVC, H.265/HEVC, AV1, VVC: integer approximations to 4×4 and 8×8 DCT.
- JPEG2000 (ISO/IEC 15444, 2000): DWT (CDF 9/7 lossy, CDF 5/3 lossless).
- JPEG XL (ISO/IEC 18181, 2022): DCT of variable block sizes (2×2 up to 256×256), modular mode that subsumes lossless wavelet-style.
Spectral PDE solvers
For periodic boundary conditions, spatial derivatives become multiplications in Fourier space: ∂/∂x ↔ ik. Pseudospectral methods compute nonlinear terms in physical space and linear / derivative terms in Fourier space, FFTing back and forth each step. Error converges exponentially in N (spectral accuracy) for smooth solutions. See §16.
Large-integer multiplication
Multiplying two N-digit integers naively costs O(N²). Karatsuba (1962) is O(N^{log_2 3}) ≈ O(N^{1.585}). The Schönhage–Strassen algorithm (1971) uses FFT over a finite ring to achieve O(N log N log log N). Used in GMP (mpn_mul_fft), OpenSSL’s BN_mul, factoring software (CADO-NFS), pi computations (y-cruncher).
The 2019 Harvey–van der Hoeven algorithm achieves O(N log N), matching the long-conjectured lower bound; not yet used in production libraries due to enormous constants.
Cross-correlation and template matching
The cross-correlation of two signals is the convolution of one with the time-reversed conjugate of the other; computable via FFT. Used in:
- GPS code acquisition: correlating the received signal against the known C/A code of each satellite.
- Image template matching:
cv::matchTemplatewithTM_CCORR_NORMEDuses FFT for large templates. - Audio fingerprinting: Shazam-style hash matching.
- Astronomy: matched filtering for gravitational-wave detection (LIGO PyCBC, GstLAL).
Polynomial multiplication
Polynomials of degree d are vectors of d+1 coefficients; their product is the convolution of those vectors. FFT-based polynomial multiplication is O(N log N). Used in computer-algebra systems (Magma, SymPy), number-theoretic transforms (NTT) for homomorphic encryption (Microsoft SEAL, OpenFHE, lattice-based post-quantum cryptography: CRYSTALS-Kyber, CRYSTALS-Dilithium, Falcon).
Beamforming
Phased-array radar (AESA), medical ultrasound (Philips EPIQ, GE LOGIQ), 5G massive-MIMO base stations compute spatial FFTs across antenna elements to steer beams in a chosen direction without mechanical motion.
OFDM communications
Orthogonal Frequency-Division Multiplexing modulates data onto N orthogonal subcarriers; the transmitter is an IFFT, the receiver is an FFT. Cyclic prefix turns the multi-path channel into a circular convolution that becomes diagonal in the FFT domain — single-tap equalization.
Deployed in: Wi-Fi (802.11a/g/n/ac/ax/be, FFT sizes 64 to 4096), LTE / 5G NR (FFT sizes 128 to 4096), DVB-T/T2 (digital terrestrial TV), DAB (digital audio broadcasting), ADSL/VDSL (DMT, discrete multi-tone — OFDM on copper), PLC (power-line communications), DOCSIS 3.1 cable modems.
Quantum FFT
The quantum Fourier transform acts on n qubits in O(n²) gates, exponentially faster in the number of basis states than the classical FFT. It is the heart of Shor’s algorithm (Shor 1994) for integer factorization and the hidden-subgroup problem.
11. FFT libraries
- FFTW (Frigo + Johnson 1998, “The Fastest Fourier Transform in the West”, ICASSP) —
fftw3. The reference implementation; auto-tunes plans at runtime via a “planner” that benchmarks candidate algorithms (Cooley–Tukey radices, split-radix, Rader, Bluestein) on the actual machine. GPL or commercial license from MIT. - Intel MKL FFT / oneMKL — high-performance, hand-tuned for Intel CPUs, fully optimized AVX-512. Free-as-in-beer since 2017 (still proprietary but no license fee). FFTW-compatible API wrappers shipped.
- cuFFT (NVIDIA) — GPU FFT for CUDA. Used by PyTorch, JAX, TensorFlow, CuPy, RAPIDS.
- rocFFT (AMD) — equivalent for ROCm/HIP.
- clFFT — OpenCL FFT (legacy; superseded by vendor SDKs on most platforms).
- VkFFT (Tolmachev 2019) — Vulkan / CUDA / HIP / OpenCL FFT; often outperforms cuFFT on small-to-medium sizes.
- Eigen unsupported FFT module — C++ template FFT, light dependency for Eigen users.
- kissfft (Borgerding) — “keep it simple, stupid” minimalist mixed-radix FFT in pure C, ~1k LOC. Used where dependency footprint matters (embedded).
- PocketFFT (Reinecke) — small, header-only C++ used as the default backend in NumPy >=1.17 and SciPy (replacing the older Fortran FFTPACK). Mixed-radix + Bluestein.
- MUFFT (RISC-V vector-extension FFT) — emerging optimized library for the RISC-V V extension.
- NumPy / SciPy —
numpy.fft(PocketFFT backend),scipy.fft(also PocketFFT; replaces olderscipy.fftpackfrom 2020). Both exposefft,ifft,rfft,irfft,fft2,fftn,fftshift,fftfreq. - PyTorch —
torch.fft.fft,torch.fft.rfft, autograd-aware and GPU-accelerated via cuFFT/rocFFT. - JAX —
jax.numpy.fft.fft, XLA-compiled and TPU-aware. - TensorFlow —
tf.signal.fft,tf.signal.stft. - MATLAB — built-in
fft,ifft,fft2, calls FFTW. - finufft (Barnett, Magland + Klinteberg 2019) — non-uniform FFT in C++/Python (§17).
12. Sampling theorem
The Nyquist–Shannon sampling theorem (Nyquist 1928 Trans AIEE; Shannon 1949 Proc IRE; also Kotelnikov 1933, Whittaker 1915 — the result has multiple independent discoverers): a band-limited signal with no frequency content above B Hz can be perfectly reconstructed from samples taken at any rate f_s > 2B.
Aliasing: if f_s < 2B, frequency components above f_s/2 (the Nyquist frequency) are folded back into the [0, f_s/2] range and become indistinguishable from lower frequencies. The damage cannot be undone after sampling. Prevention requires an anti-alias filter (analog low-pass) before the ADC.
Common rates:
- Audio: 44.1 kHz (CD, chosen for compatibility with NTSC video recorders used to master early CDs), 48 kHz (professional / film), 96 / 192 kHz (high-resolution audio). Human hearing tops out near 20 kHz, so 44.1 kHz is more than enough; higher rates exist for headroom during processing.
- Telephony: 8 kHz (μ-law / A-law PCM, narrowband), 16 kHz (wideband G.722), 24/48 kHz (super-wideband, fullband Opus).
- Images: when downscaling an image, you are resampling at a lower rate; you must apply a low-pass filter (e.g., Lanczos, area, Gaussian) first to avoid aliasing visible as moiré patterns and jagged edges. Naive nearest-neighbor downscaling always aliases.
cv::resizewithINTER_AREA, Pillow withLANCZOS.
13. DCT — Discrete Cosine Transform
The DCT is a real-input, real-output orthogonal transform closely related to the DFT but with the boundary handled by even-symmetric extension rather than periodic extension. The most common variant, DCT-II, is
X_k = Σ_{n=0}^{N−1} x_n · cos[(π/N) · (n + 1/2) · k], k = 0, …, N−1.
For real signals with smooth structure, the DCT-II concentrates more energy into the low-frequency coefficients than the DFT does, because it avoids the artificial discontinuity at the wraparound point (the periodic extension assumed by the DFT). This energy compaction is what makes the DCT the workhorse of lossy compression: quantize away the small high-frequency coefficients, encode the few large low-frequency ones.
- DCT-I, DCT-II, DCT-III, DCT-IV: four variants depending on whether the symmetry point is at a sample or half a sample, and at the start vs end of the sequence. DCT-II is the “forward” and DCT-III the “inverse” used in JPEG.
- Foundation of: JPEG (8×8 DCT-II), MP3 / AAC (via MDCT, §14), H.264 / HEVC / AV1 (integer DCT approximations, e.g., the 4×4 integer transform in H.264 designed to avoid drift).
A length-N DCT-II can be computed in O(N log N) via a length-2N or length-4N FFT plus pre/post-processing; FFTW provides direct DCT routines (FFTW_REDFT10, FFTW_REDFT01, etc.) that are roughly half the cost of computing it via a complex FFT.
14. MDCT — Modified Discrete Cosine Transform
The MDCT (Princen, Johnson + Bradley 1987 IEEE ICASSP) is a lapped, critically sampled, perfect-reconstruction transform. From 2N time samples it produces N frequency coefficients; successive blocks overlap by N samples, so the data rate is preserved (one frequency coefficient per time sample on average — “critical sampling”). When the analysis and synthesis windows satisfy the Princen–Bradley condition (w[n]² + w[n + N]² = 1), the original signal is reconstructed exactly despite the aliasing within each block.
This is the magic that makes lossless windowed analysis possible at no rate cost, and is why every modern perceptual audio codec uses MDCT rather than DFT:
- MP3 (1993): hybrid polyphase filter bank + MDCT, block lengths 12 or 36.
- AAC (1997): pure MDCT, block lengths 1024 or 128 (long/short window switching for transients).
- Vorbis (2000): MDCT, variable block sizes.
- AC-3 / E-AC-3 (Dolby Digital): MDCT.
- Opus (2012): MDCT in the CELT layer for music; LPC in the SILK layer for speech.
15. Z-transform and Laplace transform
Both are generalizations of the DFT/DTFT and CTFT to complex frequency, enabling analysis of unstable, non-stationary, and transient signals where the Fourier transform may not converge.
The Z-transform of a discrete sequence x[n] is
X(z) = Σ_{n=−∞}^{∞} x[n] · z^{−n}
for complex z. The DTFT is the Z-transform evaluated on the unit circle z = e^{iω}. The region of convergence (ROC) determines causality and stability. Used pervasively in digital filter design: poles and zeros in the z-plane, stability iff all poles lie inside the unit circle.
The Laplace transform of a continuous signal x(t) is
X(s) = ∫_0^{∞} x(t) · exp(−s · t) dt
for complex s = σ + iω. The CTFT is the Laplace transform on the imaginary axis s = iω. Used for continuous-time control theory, circuit analysis, mechanical and thermal systems. Poles in the left half-plane (Re(s) < 0) ↔ stable; transfer functions H(s) = Y(s)/U(s) capture LTI system dynamics in one compact algebraic object.
Both are computed analytically (closed-form tables) rather than numerically; the DFT/FFT is the numerical workhorse, the Z- and Laplace transforms are the analytic ones.
16. Spectral PDE methods
For PDEs on periodic spatial domains, the FFT enables spectral methods (Boyd 2001, “Chebyshev and Fourier Spectral Methods”; Trefethen 2000 “Spectral Methods in MATLAB”). The solution is represented as a truncated Fourier series; spatial derivatives become multiplications in Fourier space (∂/∂x ↔ ik), with error that decreases exponentially in the number of modes for smooth solutions (vs algebraically for finite differences or finite elements).
Typical workflow per time step (pseudospectral):
- FFT the current state to Fourier space.
- Multiply by ik (or (ik)² for Laplacian) for derivative terms.
- IFFT back to physical space.
- Compute nonlinear products there (avoiding O(N²) convolutions in Fourier space).
- FFT back, advance in time, repeat.
The 2/3-rule dealiasing (Orszag 1971) zeros the upper third of Fourier modes to prevent aliasing from quadratic nonlinearities folding back into the resolved range.
Applications:
- 3D turbulence DNS of incompressible Navier–Stokes on triply periodic boxes — the gold-standard reference for turbulence theory. Codes: NEK5000 (spectral element, ANL), CFR / CFM (Cornell), Channelflow (Gibson), pencil/decomp libraries (P3DFFT, AccFFT, 2DECOMP&FFT, dtFFT) for distributed-memory 3D FFT on tens of thousands of MPI ranks.
- PME molecular dynamics: GROMACS, AMBER, NAMD, OpenMM, LAMMPS (§6).
- Climate / atmosphere: spectral transforms on the sphere (spherical-harmonic equivalents of the FFT) in IFS (ECMWF), GFS, CAM-SE.
- Plasma simulation: gyrokinetic codes (GENE, GS2, GKW) use Fourier modes in the perpendicular plane.
- Optics: split-step Fourier method for the nonlinear Schrödinger equation (fiber-optic propagation, Bose–Einstein condensates).
17. NUFFT — non-uniform FFT
When sample points are not on a uniform grid, the standard FFT does not apply directly. The non-uniform FFT (Dutt + Rokhlin 1993; Greengard + Lee 2004 SIAM Review) computes either:
- Type 1: non-uniform samples in → uniform Fourier coefficients out.
- Type 2: uniform Fourier coefficients in → non-uniform samples out.
- Type 3: non-uniform in both.
It does this in O(N log N + M · |log ε|^d) by interpolating onto a fine uniform grid with a spreading kernel (Gaussian, Kaiser–Bessel, or exponential-of-semicircle), applying a standard FFT, and deconvolving the spreading kernel in Fourier space. Accuracy ε is user-tunable.
Applications:
- MRI reconstruction: radial, spiral, and other non-Cartesian k-space trajectories require NUFFT (gridding reconstruction). BART toolbox, SigPy, MIRT.
- Radio astronomy: interferometric imaging from sparse, irregular UV-plane sampling. CASA (Common Astronomy Software Applications), WSClean.
- Computational geometry, cryo-EM, X-ray crystallography refinement.
Libraries: FINUFFT (Barnett et al. 2019) — C++/Python/Julia/MATLAB bindings, threaded, fast and accurate; cufiNUFFT GPU port; NFFT (Keiner, Kunis + Potts) — German library, the older reference.
18. Convolution theorems
- Circular convolution theorem: (x ⊛ y)n = Σ_k x_k · y{(n−k) mod N} ↔ X · Y elementwise. This is what the FFT actually computes when you naively multiply transforms.
- Linear convolution theorem: (x ∗ y)n = Σ_k x_k · y{n−k} (no modular wraparound, output length N + M − 1) is obtained by zero-padding both inputs to length ≥ N + M − 1 before FFT, then multiplying and IFFT-ing.
Overlap-add and overlap-save are block-wise schemes for FFT-based filtering of long streams with a short filter: process in blocks of L samples, FFT-convolve each block with the zero-padded filter, then either add the overlapping tails (overlap-add) or discard the wrapped-around portion (overlap-save). Both run in O(L log L) per L samples, asymptotically O(log L) per sample.
19. Cepstrum
The cepstrum (Bogert, Healy + Tukey 1963; the name is “spectrum” with the first syllable reversed) is the inverse Fourier transform of the logarithm of the magnitude (or power) spectrum:
C[n] = IFFT( log |FFT(x)|² ).
Logarithm converts a convolution in the time domain (e.g., excitation ∗ vocal-tract impulse response in speech) into a sum in the cepstral domain, making the two components separable. The independent variable n is called “quefrency” (a permutation of “frequency”), measured in samples (or, if time-normalized, in seconds — the quefrency of a periodicity in the spectrum).
Applications:
- Pitch detection: the pitch period of a voiced speech segment appears as a strong peak at the corresponding quefrency in the cepstrum.
- Speaker recognition / speech recognition: MFCC (mel-frequency cepstral coefficients) — a perceptually weighted cepstrum — was the dominant speech-recognition feature from the 1980s until the deep-learning era replaced it with raw filter-bank features or even raw waveform input.
- Echo detection: an echo with delay τ produces a peak at quefrency τ in the cepstrum.
- Mechanical fault diagnosis: gear-tooth, bearing fault signatures.
20. Common pitfalls
-
Normalization convention. Forward-only 1/N, inverse-only 1/N, symmetric 1/√N — three popular conventions, and codes silently disagree. NumPy, SciPy, MATLAB, FFTW all put 1/N on the inverse. Always check
norm=(NumPy/SciPy) or document explicitly. Round-tripirfft(rfft(x))should equal x — test it. -
Aliasing from undersampling. No anti-alias filter before ADC ↔ permanent corruption (§12). Most painful in image downscaling (the default
cv::resizemode used to be bilinear, which aliases;INTER_AREAis the correct choice for shrinking). -
Spectral leakage from rectangular window. Default-windowing (no window) of a non-periodic signal smears each tone across all bins. Always Hann or better unless you know the signal is N-periodic (§7).
-
rfft output indexing.
numpy.fft.rfft(x)returns N/2+1 complex values, not N. Frequencies arenp.fft.rfftfreq(N, d=dt). Mixing this with full-FFT indexing produces off-by-half-spectrum bugs. -
Power vs amplitude spectrum. |X_k|² is power, |X_k| is amplitude. Doubling the input doubles the amplitude spectrum but quadruples the power spectrum. dB conversions are 20·log10 for amplitude, 10·log10 for power. Mixing them gives a 2× factor in dB.
-
FFT convolution for short kernels. FFT-based convolution has fixed overhead (plan creation, twiddle, zero-padding, two FFTs + multiply + IFFT). For kernels shorter than ~30–50 samples (1D) or ~7×7 (2D), direct convolution is faster. SciPy’s
scipy.signal.choose_conv_methodpicks automatically; OpenCV does the same internally infilter2D(but cv::dft-basedcv::filter2Dis only used for large kernels in practice — check the docs). -
Wraparound from circular convolution. Multiplying transforms gives circular convolution. For linear convolution, zero-pad both inputs to length ≥ N + M − 1 before FFT (§18). Forgetting this produces signal energy at the wrong time positions; in image processing it appears as content from one edge bleeding to the opposite edge.
-
DC and Nyquist bins for real input. X_0 (DC) and, for even N, X_{N/2} (Nyquist) are real-valued; their imaginary parts must be zero (and are zero up to floating-point rounding). Some packing schemes (FFTW’s “halfcomplex” format) exploit this to fit the N/2+1 complex outputs into N reals.
-
Frequency-axis labeling. Bin k corresponds to frequency f_k = k · f_s / N. Negative frequencies live in bins N/2+1 … N−1 (or equivalently in the “second half” of an
fftshift-ed array). Forgettingfftshiftwhen plotting gives a spectrum with positive frequencies on the left and negative on the right. -
Numerical precision. Single-precision (float32) FFTs accumulate roughly √(N log N) · ε rounding error; for N = 10⁶ this is ~10⁻⁴, sometimes inadequate. Use double precision for inversions / iterative reconstruction; single precision is fine for visualization.
-
In-place vs out-of-place. FFTW and cuFFT support in-place transforms but with specific stride/padding constraints; misusing them silently corrupts memory beyond the array.
-
Window scaling. Windowing reduces the effective signal energy (the coherent gain of a Hann window is 0.5, of a Hamming is 0.54). Quantitative amplitude / power measurements must divide by the window’s coherent gain (for tones) or its noise-bandwidth (for noise-like signals). Harris 1978 tabulates these for every common window.
21. Cross-references
- _index — the math reference index.
- linear-algebra-essentials — the DFT matrix is unitary; FFT is a structured matrix factorization.
- svd-pca-spectral — eigendecomposition of circulant matrices is diagonalized by the DFT.
- numerical-linear-algebra — FFT-based fast Toeplitz / circulant solvers.
- pde-methods — spectral PDE solvers, pseudospectral schemes.
- signal-processing-dsp — filter design, sampling, decimation/interpolation.
- rf-design — IQ demodulation, spectrum analyzers, OFDM front-ends.
- antenna-theory — phased-array beamforming via spatial FFT.
- control-theory — Laplace and Z-transform foundations.
- transformer-architecture — sinusoidal positional embeddings (Vaswani et al. 2017) and rotary positional embeddings (RoPE, Su et al. 2021) are Fourier-flavored.
- audio-ml — STFT/mel-spectrogram front-ends for Whisper, Wav2Vec2.
22. Citations
- Cooley, J. W. + Tukey, J. W. 1965. “An algorithm for the machine calculation of complex Fourier series.” Math Comp 19(90): 297–301. — The modern FFT paper.
- Heideman, M. T., Johnson, D. H. + Burrus, C. S. 1984. “Gauss and the history of the fast Fourier transform.” IEEE ASSP Magazine 1(4): 14–21. — Gauss derived essentially the same algorithm circa 1805.
- Frigo, M. + Johnson, S. G. 2005. “The design and implementation of FFTW3.” Proc IEEE 93(2): 216–231.
- Oppenheim, A. V. + Schafer, R. W. 2009. Discrete-Time Signal Processing, 3rd ed. Pearson. — The canonical DSP textbook.
- Press, W. H., Teukolsky, S. A., Vetterling, W. T. + Flannery, B. P. 2007. Numerical Recipes: The Art of Scientific Computing, 3rd ed. Cambridge University Press.
- Daubechies, I. 1992. Ten Lectures on Wavelets. SIAM CBMS-NSF Series.
- Harris, F. J. 1978. “On the use of windows for harmonic analysis with the discrete Fourier transform.” Proc IEEE 66(1): 51–83.
- Princen, J. P., Johnson, A. W. + Bradley, A. B. 1987. “Subband / transform coding using filter bank designs based on time domain aliasing cancellation.” IEEE ICASSP. — Origin of the MDCT.
- Mallat, S. G. 1989. “A theory for multiresolution signal decomposition: the wavelet representation.” IEEE PAMI 11(7): 674–693.
- Daubechies, I. 1988. “Orthonormal bases of compactly supported wavelets.” Comm Pure Appl Math 41(7): 909–996.
- Nyquist, H. 1928. “Certain topics in telegraph transmission theory.” Trans AIEE 47: 617–644.
- Shannon, C. E. 1949. “Communication in the presence of noise.” Proc IRE 37(1): 10–21.
- Bluestein, L. I. 1968. “A linear filtering approach to the computation of the discrete Fourier transform.” Northeast Electronics Research and Engineering Meeting Record 10: 218–219.
- Rader, C. M. 1968. “Discrete Fourier transforms when the number of data samples is prime.” Proc IEEE 56(6): 1107–1108.
- Duhamel, P. + Hollmann, H. 1984. “Split-radix FFT algorithm.” Electronics Letters 20(1): 14–16.
- Winograd, S. 1978. “On computing the discrete Fourier transform.” Math Comp 32(141): 175–199.
- Schönhage, A. + Strassen, V. 1971. “Schnelle Multiplikation großer Zahlen.” Computing 7: 281–292.
- Darden, T., York, D. + Pedersen, L. 1993. “Particle mesh Ewald: an N·log(N) method for Ewald sums in large systems.” J Chem Phys 98(12): 10089–10092.
- Orszag, S. A. 1971. “On the elimination of aliasing in finite-difference schemes by filtering high-wavenumber components.” J Atmos Sci 28(6): 1074.
- Dutt, A. + Rokhlin, V. 1993. “Fast Fourier transforms for non-equispaced data.” SIAM J Sci Comput 14(6): 1368–1393.
- Greengard, L. + Lee, J.-Y. 2004. “Accelerating the non-uniform fast Fourier transform.” SIAM Review 46(3): 443–454.
- Barnett, A. H., Magland, J. + af Klinteberg, L. 2019. “A parallel non-uniform fast Fourier transform library based on an exponential of semicircle kernel.” SIAM J Sci Comput 41(5): C479–C504.
- Shor, P. W. 1994. “Algorithms for quantum computation: discrete logarithms and factoring.” Proc 35th FOCS.
- Bogert, B. P., Healy, M. J. R. + Tukey, J. W. 1963. “The quefrency analysis of time series for echoes: cepstrum, pseudo-autocovariance, cross-cepstrum and saphe cracking.” Proc Symp Time Series Analysis, M. Rosenblatt ed., Wiley, ch. 15.
- Boyd, J. P. 2001. Chebyshev and Fourier Spectral Methods, 2nd ed. Dover.
- Trefethen, L. N. 2000. Spectral Methods in MATLAB. SIAM.
- Dongarra, J. + Sullivan, F. 2000. “Top 10 algorithms of the 20th century.” IEEE Computing in Science & Engineering 2(1): 22–23.