Functional Analysis — Banach + Hilbert Spaces, Operators, Spectral Theory, PDEs
Functional analysis is the study of infinite-dimensional vector spaces equipped with topology — normed spaces, Banach spaces, Hilbert spaces — and the continuous linear maps between them. It is the language of partial differential equations, quantum mechanics, signal processing, kernel methods, and modern probability. This note maps the territory: spaces, operators, spectral theory, distributions, Sobolev spaces, RKHS, and connections to applied domains. SI units where physical quantities appear.
1. Topological vector spaces and norms
1.1 Vector spaces
Take a real or complex vector space X. We need a topology compatible with addition and scalar multiplication. The cleanest case: a norm.
A norm ||·||: X → [0,∞) satisfies:
||x|| ≥ 0,||x|| = 0⇔x = 0(positive definite).||αx|| = |α| · ||x||(homogeneous).||x + y|| ≤ ||x|| + ||y||(triangle).
A norm induces a metric d(x,y) = ||x - y|| and hence a topology.
1.2 Banach and Hilbert spaces
- Banach space: a normed vector space that is complete under its norm — every Cauchy sequence converges in the space. Stefan Banach’s 1932 Théorie des opérations linéaires is the foundational text.
- Inner product space:
⟨·,·⟩: X × X → Clinear in the first argument, conjugate-symmetric, positive-definite. Induces norm||x|| = sqrt(⟨x,x⟩). - Hilbert space: complete inner-product space. Named after David Hilbert; the modern abstract definition is due to John von Neumann (1929, “Allgemeine Eigenwerttheorie Hermitescher Funktionaloperatoren”, Math. Ann. 102).
Every Hilbert space is Banach. Not every Banach space is Hilbert: the norm must satisfy the parallelogram law
||x + y||² + ||x - y||² = 2(||x||² + ||y||²)
for it to come from an inner product (Jordan-von Neumann 1935).
2. Canonical Banach spaces
2.1 L^p spaces
For a measure space (Ω, Σ, μ) and 1 ≤ p < ∞:
L^p(Ω) = { f measurable : ||f||_p = (∫ |f|^p dμ)^{1/p} < ∞ } / a.e. equality
For p = ∞: ||f||_∞ = ess sup |f|. All L^p are Banach; only L^2 is Hilbert (with ⟨f,g⟩ = ∫ f ḡ dμ).
L^1: integrable functions; useful for finite measures.L^2: square-integrable; the centrepiece of analysis — Fourier theory, quantum mechanics, signal processing.L^∞: essentially bounded.
Hölder’s inequality: ||fg||_1 ≤ ||f||_p · ||g||_q with 1/p + 1/q = 1. Minkowski’s inequality: triangle inequality for ||·||_p.
2.2 Sequence spaces
ℓ^p = { (x_n) : Σ |x_n|^p < ∞ } (Banach; Hilbert only for p=2)
c_0 = { (x_n) : x_n → 0 } (sup norm; Banach, not reflexive)
c = convergent sequences (Banach)
2.3 Continuous function spaces
C[a,b] with sup norm ||f||_∞ = max |f|: Banach but not Hilbert.
2.4 Hölder spaces
C^{0,α}(Ω) for 0 < α ≤ 1:
||f||_{C^{0,α}} = ||f||_∞ + sup_{x≠y} |f(x) - f(y)| / |x - y|^α
α = 1 is Lipschitz. Used in elliptic regularity theory (Schauder estimates).
2.5 BV (bounded variation)
Functions with finite total variation. Banach; central to image processing (Rudin-Osher-Fatemi 1992 ROF total-variation denoising) and conservation-law theory.
2.6 Sobolev spaces
Treated in detail in §6. Briefly: W^{k,p}(Ω) = functions whose weak derivatives up to order k are in L^p(Ω). H^k = W^{k,2}, Hilbert.
3. Hilbert space geometry
3.1 Orthogonality and projection
x ⊥ y if ⟨x,y⟩ = 0. For closed subspace M ⊆ H, every x ∈ H decomposes uniquely as x = m + m^⊥ with m ∈ M, m^⊥ ∈ M^⊥. The map x ↦ m is the orthogonal projection P_M, self-adjoint and idempotent.
3.2 Orthonormal bases
A set {e_α} is orthonormal if ⟨e_α, e_β⟩ = δ_{αβ}. Complete if span{e_α} is dense. Every separable Hilbert space has a countable orthonormal basis; every separable infinite-dim Hilbert space is isometrically isomorphic to ℓ^2.
Classical orthonormal systems on intervals (after normalisation):
- Fourier:
e_n(x) = e^{inx}/sqrt(2π)on[-π, π]. See fft-spectral. - Hermite:
H_n(x) e^{-x²/2}onR(weighte^{-x²}natural). - Legendre:
P_non[-1,1]. - Chebyshev:
T_n(cos θ) = cos(nθ). - Laguerre:
L_n(x) e^{-x/2}on[0,∞).
3.3 Riesz representation theorem
For Hilbert H, every bounded linear functional f: H → C has the form f(x) = ⟨x, y_f⟩ for a unique y_f ∈ H, with ||f|| = ||y_f||. The map y ↦ ⟨·,y⟩ is a conjugate-linear isometry H → H^*.
3.4 Weak convergence
x_n ⇀ x if ⟨x_n, y⟩ → ⟨x, y⟩ for all y ∈ H. Bounded sets are weakly sequentially compact (consequence of Banach-Alaoglu, §5). Crucial for PDE existence proofs.
4. Bounded linear operators
4.1 Definition and norm
T: X → Y linear is bounded if
||T|| = sup{ ||Tx||_Y : ||x||_X ≤ 1 } < ∞
For linear maps, bounded ⇔ continuous.
B(X,Y) is the Banach space of bounded operators; B(X) = B(X,X). For Hilbert H, B(H) is a C*-algebra.
4.2 Adjoint
For T ∈ B(H), the adjoint T* is the unique operator with ⟨Tx, y⟩ = ⟨x, T*y⟩ for all x,y. Properties: (αT + βS)* = ᾱT* + β̄S*, (TS)* = S*T*, T** = T, ||T*|| = ||T||, ||T*T|| = ||T||² (the C*-identity).
4.3 Special classes
- Self-adjoint (Hermitian):
T = T*. Real spectrum. - Normal:
TT* = T*T. Spectral theorem applies. - Unitary:
T*T = TT* = I. Preserves inner product. - Positive:
⟨Tx, x⟩ ≥ 0. Has a unique positive square root. - Projection (orthogonal):
T = T* = T². - Isometry:
||Tx|| = ||x||. Unitary if also surjective.
4.4 Compact operators
K ∈ B(X,Y) is compact if it maps bounded sets to relatively compact (precompact) sets. Equivalently: every bounded sequence has an image with a convergent subsequence.
Properties:
- Compact operators form a closed two-sided ideal
K(X)inB(X). - Limit (in operator norm) of finite-rank operators ⇒ compact. (Converse holds in Hilbert space — not in every Banach; counter-example: Per Enflo 1973.)
- For Hilbert
Hand self-adjoint compactK: countable real eigenvaluesλ_n → 0and an orthonormal eigenbasis (Hilbert-Schmidt theorem).
Sub-classes:
- Hilbert-Schmidt
T:||T||_{HS}² = Σ ||T e_n||² < ∞(independent of basis). Forms a Hilbert space. - Trace class
T:||T||_1 = tr(|T|) < ∞. Strictly stronger;||T||_{HS}² = tr(T*T). Dual of compact operators is trace class.
Hilbert-Schmidt and trace-class operators show up everywhere in kernel methods and quantum statistical mechanics. See kernel-methods.
5. The four cornerstones
5.1 Hahn-Banach theorem
Hans Hahn 1927, Stefan Banach 1929: any bounded linear functional defined on a subspace of a normed space extends to the whole space with the same norm. Consequence: the dual X* “sees” enough — X separates points of X*.
Geometric / separation forms: any two disjoint convex sets (one open) can be separated by a hyperplane. Foundation of duality, optimisation, weak topology.
5.2 Open Mapping Theorem
A bounded surjective linear operator between Banach spaces is open. Corollary (Banach Isomorphism Theorem): bijective bounded linear operator between Banach spaces has bounded inverse.
5.3 Closed Graph Theorem
For Banach spaces, a linear operator with closed graph is bounded. Hugely useful: lets you prove boundedness by checking graph closure (easier than checking continuity directly).
5.4 Uniform Boundedness Principle (Banach-Steinhaus)
A pointwise-bounded family of bounded operators on a Banach space is uniformly norm-bounded. Hugo Steinhaus 1927. Underlies many “if it works on a dense set, it works everywhere” arguments.
5.5 Banach-Alaoglu
The closed unit ball of X* is compact in the weak* topology. Foundational for variational PDE theory: extract weakly convergent subsequences from energy-bounded sequences.
6. Sobolev spaces and embeddings
6.1 Weak derivatives
For f ∈ L^1_{loc}(Ω), g is the weak α-th partial derivative if
∫ f · ∂^α φ dx = (-1)^|α| ∫ g · φ dx for all φ ∈ C_c^∞(Ω).
6.2 Definition
W^{k,p}(Ω) = { f ∈ L^p(Ω) : ∂^α f ∈ L^p(Ω) for all |α| ≤ k }
||f||_{W^{k,p}} = (Σ_{|α| ≤ k} ||∂^α f||_p^p)^{1/p}
H^k(Ω) := W^{k,2}(Ω) is Hilbert.
W_0^{k,p} = closure of C_c^∞ — functions vanishing on ∂Ω in a weak sense.
6.3 Sobolev embedding theorem (Sergei Sobolev 1938)
For Ω ⊂ R^n Lipschitz, with kp < n:
W^{k,p}(Ω) ↪ L^{q}(Ω) for 1/q = 1/p - k/n
For kp = n: embedding into all L^q, q < ∞. For kp > n: embedding into C^{m, α} with m + α = k - n/p.
Rellich-Kondrachov compactness: when the embedding is strict, it is also compact (bounded sets are precompact in the target space). Essential for nonlinear PDE.
6.4 Trace theorem
For f ∈ H^1(Ω), the boundary trace f|_{∂Ω} exists in H^{1/2}(∂Ω). Defines boundary conditions weakly.
6.5 Poincaré inequality
For f ∈ W_0^{1,p}(Ω), Ω bounded:
||f||_p ≤ C(Ω,p) · ||∇f||_p
Underlies coercivity of variational problems.
7. Lax-Milgram and weak solutions
7.1 Lax-Milgram theorem (Peter Lax & Arthur Milgram 1954)
Let H be Hilbert, a: H × H → R bilinear with:
- Continuity:
|a(u,v)| ≤ M ||u|| ||v||. - Coercivity:
a(u,u) ≥ α ||u||²for someα > 0.
Then for every f ∈ H* there is a unique u ∈ H with a(u,v) = f(v) for all v ∈ H, and ||u|| ≤ ||f||/α.
7.2 Application
For the Dirichlet problem -Δu = f on Ω with u = 0 on ∂Ω:
H = H_0^1(Ω).a(u,v) = ∫_Ω ∇u · ∇v dx.- Coercivity from Poincaré.
Lax-Milgram gives existence and uniqueness of the weak solution. This is the entire backbone of variational PDE — FEM (see ode-pde-solver-catalog) is the discretisation of exactly this weak form.
7.3 Galerkin approximation
Replace H by finite-dim subspace H_n and solve a(u_n, v) = f(v) for v ∈ H_n. Convergence follows from Céa’s lemma: ||u - u_n|| ≤ (M/α) inf_{w ∈ H_n} ||u - w||.
8. Distributions
8.1 Test functions and distributions
Laurent Schwartz (Fields Medal 1950 for distribution theory) developed the modern framework.
D(Ω) = C_c^∞(Ω) with the inductive limit topology (sequence convergent if supports in fixed compact set + uniform convergence of all derivatives).
Distribution: continuous linear functional on D(Ω). Space D'(Ω).
Every L^1_{loc} function defines a distribution by ⟨f, φ⟩ = ∫ f φ. Examples beyond functions:
- Dirac delta
δ_a:⟨δ_a, φ⟩ = φ(a). - Principal value
pv(1/x):⟨pv(1/x), φ⟩ = lim_{ε→0} ∫_{|x|>ε} φ(x)/x dx. - Derivatives of Dirac
δ^{(k)}:⟨δ^{(k)}, φ⟩ = (-1)^k φ^{(k)}(0).
8.2 Distributional derivative
Every distribution is infinitely differentiable: ⟨∂T, φ⟩ = -⟨T, ∂φ⟩. Heaviside step H has H' = δ.
8.3 Tempered distributions
Test space S(R^n) = Schwartz functions (rapidly decreasing smooth). Dual S' = tempered distributions. The Fourier transform extends from S to S':
⟨F(T), φ⟩ = ⟨T, F(φ)⟩
Useful Fourier transforms in S':
F(δ) = 1,F(1) = (2π)^n δ.F(x^α) = i^|α| (2π)^n ∂^α δ.F(e^{ia·x}) ∝ δ(ξ - a).
9. Spectral theory
9.1 Spectrum
For T ∈ B(X), the spectrum is
σ(T) = { λ ∈ C : T - λI is not invertible in B(X) }
Decomposes:
- Point spectrum
σ_p(T): eigenvalues. - Continuous spectrum
σ_c(T):T - λIinjective with dense range but not surjective. - Residual spectrum
σ_r(T):T - λIinjective but range not dense.
The spectrum is non-empty, compact, contained in {|λ| ≤ ||T||}. Spectral radius: r(T) = lim ||T^n||^{1/n} ≤ ||T||.
9.2 Compact self-adjoint operators
Hilbert-Schmidt theorem: for K = K* compact on Hilbert H, there is an orthonormal basis {e_n} of eigenvectors with real eigenvalues λ_n → 0, and K = Σ λ_n ⟨·, e_n⟩ e_n.
This is the abstract version of “every Hermitian matrix diagonalises by an orthonormal basis”, lifted to infinite dimensions but only for compact operators.
9.3 Spectral theorem for bounded self-adjoint operators
Multiplication operator form: for T = T* bounded on Hilbert H, there is a measure space (Ω, Σ, μ), a measurable function f: Ω → R, and a unitary U: H → L^2(Ω, μ) with
U T U^* = M_f (multiplication by f).
Projection-valued measure form: there exists a unique spectral measure E: B(σ(T)) → projections with
T = ∫_{σ(T)} λ dE(λ)
These let you define g(T) = ∫ g(λ) dE(λ) for bounded Borel g — the functional calculus.
9.4 Unbounded self-adjoint operators
Most physical operators (position, momentum, Hamiltonians) are unbounded. Defined on a dense domain D(T) ⊂ H. Symmetric: ⟨Tx, y⟩ = ⟨x, Ty⟩ for x,y ∈ D(T). Self-adjoint: also D(T*) = D(T). Spectral theorem extends.
9.5 Stone’s theorem
Marshall Stone 1932: one-parameter strongly continuous unitary groups {U_t} on H are in bijection with self-adjoint operators via U_t = e^{itA}. The infinitesimal generator A = (1/i) dU_t/dt |_{t=0} is self-adjoint.
This is the abstract justification for unitary time evolution in quantum mechanics: iħ ∂ψ/∂t = Hψ ⇔ ψ(t) = e^{-iHt/ħ} ψ(0).
9.6 Hille-Yosida theorem
Einar Hille 1948 / Kosaku Yosida 1948: characterises generators of strongly continuous (C_0) semigroups T_t = e^{tA}. The Laplacian generates the heat semigroup; the Stokes operator generates Navier-Stokes locally. Foundation of evolution equations.
9.7 Spectral mapping theorem
For g continuous on σ(T), σ(g(T)) = g(σ(T)).
10. Fredholm theory
10.1 Fredholm operators
T ∈ B(X,Y) is Fredholm if ker T and coker T are finite-dimensional and range is closed. Index:
ind(T) = dim ker T - dim coker T
The index is invariant under compact perturbations: ind(T + K) = ind(T) for compact K.
10.2 Fredholm alternative
For compact K on Hilbert H and the equation (I - K) x = y:
- Either
(I - K)is invertible (unique solution for every y), or (I - K)has nontrivial kernel, and solutions exist iffy ⊥ ker(I - K*).
This is the operator-theoretic generalisation of finite-dim “exists & unique vs orthogonality condition”.
10.3 Atiyah-Singer
Far-reaching: the index of an elliptic operator on a compact manifold is computed topologically (Atiyah-Singer 1963; Fields-class theorem). Outside the scope here.
11. Semigroups and evolution equations
11.1 C_0 semigroups
A family {T_t : t ≥ 0} on Banach X is a strongly continuous (C_0) semigroup if:
T_0 = I.T_{t+s} = T_t T_s.T_t x → xast → 0^+for everyx ∈ X.
Generator A = lim_{t→0} (T_t - I)/t on its natural domain.
11.2 Examples
- Heat semigroup on
L^2(R^n):T_t f(x) = ∫ f(y) (4πt)^{-n/2} e^{-|x-y|²/(4t)} dy. Generator:(1/2)Δ. Smooths instantly. - Schrödinger group:
e^{itΔ}— unitary, not smoothing. - Wave: a related cosine family with generator
-Δ. - OU semigroup: Ornstein-Uhlenbeck, generator
(1/2)Δ - x·∇; invariant Gaussian. See stochastic-calculus.
11.3 Contraction semigroups
||T_t|| ≤ 1. Hille-Yosida: A generates a contraction C_0 semigroup iff A is densely defined, closed, and (0,∞) ⊂ ρ(A) with ||(λ - A)^{-1}|| ≤ 1/λ.
12. Reproducing Kernel Hilbert Spaces
12.1 Definition
Nachman Aronszajn “Theory of reproducing kernels” (Trans. Amer. Math. Soc. 68, 1950): a Hilbert space H of functions on X is RKHS if point evaluation f ↦ f(x) is bounded for each x. By Riesz, there exists K_x ∈ H with f(x) = ⟨f, K_x⟩. The reproducing kernel is K(x,y) = ⟨K_x, K_y⟩.
12.2 Mercer’s theorem
James Mercer 1909: for K continuous, symmetric, positive-definite on compact X,
K(x,y) = Σ_n λ_n φ_n(x) φ_n(y)
with λ_n ≥ 0, {φ_n} orthonormal in L^2, convergence absolute and uniform.
12.3 Kernel machines
The “kernel trick” of SVMs replaces inner products with K(x,y) — effectively working in the RKHS. Common kernels:
| Kernel | Formula |
|---|---|
| Linear | x · y |
| Polynomial | (x · y + c)^d |
| Gaussian (RBF) | `exp(- |
| Matérn | parameter ν; family of GP kernels |
| Laplace | `exp(- |
| ANOVA | Σ exp(-(x_i - y_i)²/σ²) |
See gaussian-processes for the GP/Bayesian-regression view and kernel-methods for kernel-machine algorithms.
12.4 Representer theorem
Kimeldorf-Wahba 1971: the solution to regularised loss minimisation in an RKHS lies in the span of kernels at the training points. Reduces an infinite-dim optimisation to an n-dim one — the entire reason RKHS works computationally.
13. Operator algebras
13.1 Banach algebras and C*-algebras
A Banach algebra is a Banach space with a compatible associative product. With an involution * satisfying ||a*a|| = ||a||² it is a C-algebra*.
Gelfand-Naimark theorem (1943): every commutative C*-algebra with unit is isometrically isomorphic to C(X) for some compact Hausdorff X (its spectrum). Non-commutative: every C*-algebra embeds as a closed *-subalgebra of B(H) for some Hilbert H.
13.2 von Neumann algebras
Francis Murray & John von Neumann 1936-1943: a *-subalgebra of B(H) closed in the weak-operator topology. Equivalent to its double commutant (M = M''). Factor types I, II, III; classification major theme of mid-20th-century math. Modern: free probability (Voiculescu), Connes’ work (Fields 1982).
14. Quantum mechanics formalism
This is the canonical applied home of functional analysis.
- States: unit vectors
ψin Hilbert spaceH(up to phase); or density operatorsρ(positive trace-class,tr ρ = 1) for mixed states. - Observables: self-adjoint operators on
H. Spectrum = possible measurement outcomes. - Expectation:
⟨A⟩_ψ = ⟨ψ, Aψ⟩. - Born rule: probability of measuring
λ ∈ Bis⟨ψ, E(B) ψ⟩whereEis the spectral measure of A. - Time evolution:
ψ(t) = U_t ψ(0)withU_t = e^{-iHt/ħ}unitary (Stone). Hamiltonian H self-adjoint. - Canonical commutation:
[X, P] = iħI— only realisable by unbounded operators (Stone-von Neumann uniqueness theorem 1931). - Compact symmetries / coherent states / second quantisation: built on Fock space
F(H) = ⊕_n H^{⊗_s n}.
15. Pseudo-differential operators and microlocal analysis
A pseudo-differential operator of order m has symbol a(x, ξ) with |∂_x^α ∂_ξ^β a| ≤ C_{αβ} (1 + |ξ|)^{m-|β|}:
(P f)(x) = (2π)^{-n} ∫ e^{ix·ξ} a(x,ξ) F(f)(ξ) dξ
Lars Hörmander (Fields 1962) systematised the theory in his four-volume The Analysis of Linear Partial Differential Operators (1983-1985). Used in PDE regularity, index theory, and propagation of singularities.
16. Famous classical theorems
- Stone-Weierstrass (1885 / 1937): a unital subalgebra of
C(K)that separates points is dense in the sup norm. - Arzelà-Ascoli (1883/1895): bounded equicontinuous family in
C(K)is relatively compact (uniform convergence). - Mazur’s theorem: closed convex sets have the same closure under norm and weak topologies.
- Krein-Milman (1940): a compact convex set in a locally convex space is the closed convex hull of its extreme points.
- Stampacchia variational inequality (1964): for closed convex
K ⊂ Hand coercive bilineara, there exists uniqueu ∈ Kwitha(u, v - u) ≥ f(v - u)for allv ∈ K. Foundation of variational inequalities / obstacle problems. - Banach fixed point theorem (1922): contraction on complete metric space has unique fixed point. Picard iteration, ODE existence.
17. Connections to applied mathematics
- PDE theory: variational formulations, semigroup theory, energy methods — all functional-analytic. See pde-methods, ode-pde-solver-catalog.
- Signal processing:
L^2Fourier theory, wavelets, frames. See fft-spectral. - Probability theory:
L^2random variables, conditional expectation as projection, Karhunen-Loève (compact-operator decomposition of covariance). See probability-fundamentals. - Optimisation in Hilbert spaces: convex analysis, subdifferentials, proximal operators. See convex-optimization.
- Inverse problems: regularisation in Hilbert spaces, Tikhonov, SVD-based truncation. See svd-pca-spectral.
- Numerical PDEs: FEM is Galerkin in
H^1. See numerical-methods-reference.
18. Adjacent
- linear-algebra-essentials — finite-dim case; spectral theorem precursors.
- fft-spectral — Fourier analysis on
L^2. - pde-methods — variational and semigroup PDE methods.
- gaussian-processes — RKHS view of Gaussian processes.
- ode-pde-solver-catalog — numerical discretisations of variational forms.
- kernel-methods — kernel-machine algorithms built on RKHS.