Named Inequalities Catalog

A reference catalog of the named inequalities that underlie probability theory, analysis, information theory, statistical learning, convex optimization, and matrix analysis. Each entry states the inequality, its hypotheses, the sharp constants when known, and the original attribution with year. Primary references: Hardy-Littlewood-Pólya (1934, 2nd ed. 1952) Inequalities; Bullen (2003) Handbook of Means and Their Inequalities; Boucheron-Lugosi-Massart (2013) Concentration Inequalities: A Nonasymptotic Theory of Independence; Cover-Thomas (2006) Elements of Information Theory; Bhatia (1997) Matrix Analysis.

1. Foundational inequalities of analysis

1.1 AM-GM (Cauchy 1821)

For non-negative reals a_1, …, a_n: (a_1 + … + a_n)/n ≥ (a_1 … a_n)^{1/n}

Equality iff a_1 = … = a_n. Power-mean generalization M_r(a) = ((1/n) Σ a_i^r)^{1/r} is non-decreasing in r ∈ R ∪ {±∞}; M_{-∞} = min, M_0 = GM, M_1 = AM, M_2 = QM (quadratic), M_∞ = max.

1.2 Cauchy-Schwarz (Cauchy 1821 in R^n; Bunyakovsky 1859 integral; Schwarz 1888)

For inner-product space (V, ⟨·, ·⟩): |⟨x, y⟩| ≤ ‖x‖ · ‖y‖

Equality iff x and y are linearly dependent. In R^n: (Σ a_i b_i)² ≤ (Σ a_i²)(Σ b_i²). Integral form: |∫ f g| ≤ sqrt(∫ f²) sqrt(∫ g²). For random variables: |Cov(X, Y)| ≤ sqrt(Var X · Var Y); ρ(X, Y) ∈ [-1, 1].

1.3 Hölder (Hölder 1888 / Rogers 1888)

For p, q ∈ [1, ∞] with 1/p + 1/q = 1 (Hölder-conjugate / dual exponents): ‖f g‖_1 ≤ ‖f‖_p ‖g‖_q

Generalization (k functions): for 1/p_1 + … + 1/p_k = 1, ∫ Π_i f_i ≤ Π_i ‖f_i‖_{p_i}. Equality iff |f|^p ∝ |g|^q a.e. Reduces to Cauchy-Schwarz at p = q = 2.

Reference: Hölder, O. (1889). “Über einen Mittelwertsatz,” Göttinger Nachrichten, 38-47.

1.4 Minkowski (Minkowski 1896)

Triangle inequality in L^p, 1 ≤ p ≤ ∞: ‖f + g‖_p ≤ ‖f‖_p + ‖g‖_p

Reverse Minkowski for 0 < p < 1. Sharp constant Minkowski for sums of N functions: ‖Σ f_i‖_p ≤ Σ ‖f_i‖_p.

1.5 Jensen (Jensen 1906)

For convex φ: R → R and random X with E|X|, E|φ(X)| < ∞: φ(E[X]) ≤ E[φ(X)]

Reversed for concave φ. Conditional: φ(E[X|F]) ≤ E[φ(X)|F] a.s. Sharp; equality iff X is constant or φ is affine on the support of X. Generalizes via integral: φ((1/m(A)) ∫_A f dm) ≤ (1/m(A)) ∫_A φ ∘ f dm.

Reference: Jensen, J.L.W.V. (1906). “Sur les fonctions convexes et les inégalités entre les valeurs moyennes,” Acta Mathematica, 30(1), 175-193.

1.6 Young’s inequality (Young 1912)

For a, b ≥ 0 and p, q > 1 with 1/p + 1/q = 1: ab ≤ a^p/p + b^q/q

Equality iff a^p = b^q. Generalized Young (convolution form): if 1/p + 1/q = 1 + 1/r and 1 ≤ p, q, r ≤ ∞, ‖f ∗ g‖_r ≤ ‖f‖_p ‖g‖_q

Sharp constants (Beckner 1975, Brascamp-Lieb 1976).

Reverse Young: 0 < p, q, r ≤ 1 with 1/p + 1/q = 1 + 1/r, then ‖f ∗ g‖r ≥ A{p,q,r} ‖f‖_p ‖g‖_q on the cone of non-negative log-concave functions.

1.7 Hardy’s inequality (Hardy 1920)

For p > 1 and a_n ≥ 0: Σ_{n=1}^∞ (1/n Σ_{k=1}^n a_k)^p ≤ (p/(p-1))^p Σ_{n=1}^∞ a_n

Sharp constant (p/(p-1))^p; integral form ∫_0^∞ (1/x ∫_0^x f)^p dx ≤ (p/(p-1))^p ∫_0^∞ f^p.

1.8 Hilbert’s double-series inequality (Hilbert 1894/1909)

Σ_{m,n} a_m b_n /(m+n) ≤ π/sin(π/p) · ‖a‖_p ‖b‖_q (1/p + 1/q = 1).

1.9 Carleman (Carleman 1923)

For a_n > 0: Σ_{n=1}^∞ (a_1 a_2 … a_n)^{1/n} ≤ e Σ_{n=1}^∞ a_n

Sharp constant e; limiting case of Hardy as p → ∞.

1.10 Bernoulli (1689)

(1 + x)^r ≥ 1 + rx for x ≥ -1 and r ∈ R \ (0, 1) (≤ for r ∈ (0, 1)).

1.11 Triangle inequalities for sequences

‖x + y‖_p ≤ ‖x‖_p + ‖y‖_p (Minkowski); reverse for 0 < p < 1.

2. Inequalities of probability and concentration

2.1 Markov inequality (Markov 1884)

For non-negative random X and t > 0: P(X ≥ t) ≤ E[X] / t

Foundation for tail-bound chains.

2.2 Chebyshev inequality (Chebyshev 1867)

For finite variance: P(|X - E[X]| ≥ t) ≤ Var(X) / t²

Power-moment generalization: P(|X - E[X]| ≥ t) ≤ E|X - E[X]|^k / t^k.

2.3 Chernoff bound (Chernoff 1952)

P(X ≥ t) ≤ inf_{λ > 0} e^{-λ t} E[e^{λ X}]

For X = (1/n) Σ X_i with X_i iid ±1, Bernoulli, or bounded gives subexponential / subgaussian tail. Chernoff-Hoeffding for Bernoulli: P(X̄_n ≥ p + ε) ≤ e^{-n D(p+ε || p)} where D is KL divergence.

2.4 Hoeffding (Hoeffding 1963)

For independent X_i ∈ [a_i, b_i] and S_n = Σ X_i: P(S_n - E[S_n] ≥ t) ≤ exp(-2t² / Σ (b_i - a_i)²)

Symmetric two-sided bound 2 exp(-2t² / Σ (b_i - a_i)²). Sharp constant 2 in the exponent for bounded variables. Sub-gaussian (X_i - E X_i) with parameter (b_i - a_i)/2.

Reference: Hoeffding, W. (1963). “Probability inequalities for sums of bounded random variables,” JASA, 58(301), 13-30.

2.5 Bernstein inequality (Bernstein 1924, 1946)

For independent zero-mean X_i with |X_i| ≤ b and Σ Var(X_i) = σ²: P(Σ X_i ≥ t) ≤ exp(-t² / (2(σ² + b t / 3)))

Sharper than Hoeffding for small variance. Sub-exponential tail; transitions from subgaussian at t < σ²/b to subexponential beyond.

2.6 Bennett (Bennett 1962)

For |X_i - E X_i| ≤ b: P(Σ (X_i - E X_i) ≥ t) ≤ exp(-σ²/b² · h(b t / σ²)), h(u) = (1+u) log(1+u) - u

Refines Bernstein; sharp asymptotic form.

2.7 Azuma-Hoeffding (Azuma 1967; Hoeffding 1963)

For martingale (S_n) with bounded increments |S_n - S_{n-1}| ≤ c_n a.s.: P(|S_n - S_0| ≥ t) ≤ 2 exp(-t² / (2 Σ c_k²))

Foundation of analysis of stochastic processes; widely used in online learning, random graphs (Doob martingale).

2.8 McDiarmid / Bounded-Differences (McDiarmid 1989)

For independent X_1, …, X_n and f satisfying bounded differences |f(x) - f(x’)| ≤ c_i whenever x, x’ differ only in coordinate i: P(|f(X) - E[f(X)]| ≥ t) ≤ 2 exp(-2 t² / Σ c_i²)

Reduces to Hoeffding for f(x) = Σ x_i / n. Master tool for empirical processes, U-statistics, random matrices.

Reference: McDiarmid, C. (1989). “On the method of bounded differences,” Surveys in Combinatorics, LMS Lecture Notes 141, 148-188.

2.9 Efron-Stein inequality (Efron-Stein 1981, Steele 1986)

For Z = f(X_1, …, X_n) with X_i independent: Var(Z) ≤ (1/2) Σ_i E[(Z - Z_i’)²]

with Z_i’ = f(X_1, …, X_i’, …, X_n) and X_i’ an iid copy of X_i. Generalizes to L^p, exponential moments.

2.10 Talagrand’s concentration (Talagrand 1995, 1996)

For product probability space and 1-Lipschitz convex f: P(|f(X) - M f| ≥ t) ≤ 4 exp(-t² / 16)

Convex-Lipschitz concentration on the unit cube; sharper constants and many variants in Talagrand (1995) Concentration of measure and isoperimetric inequalities in product spaces, Pub. IHES 81, 73-205. Refined “generic chaining” (Talagrand 2014, Upper and Lower Bounds for Stochastic Processes, Springer).

2.11 Gaussian concentration (Borell 1975; Sudakov-Tsirelson 1974)

For X ~ N(0, I_n) and 1-Lipschitz f: P(|f(X) - E f(X)| ≥ t) ≤ 2 exp(-t² / 2)

Variance bound: Var f(X) ≤ E ‖∇f(X)‖² (Poincaré, see 2.14). Equivalent to isoperimetric inequality on Gauss space (Borell, Sudakov-Tsirelson independently).

2.12 Spherical concentration (Lévy 1922; Milman 1971)

For Haar X on S^{n-1} and 1-Lipschitz f: P(|f(X) - M f| ≥ t) ≤ 2 exp(-c n t²)

Foundation of geometry of Banach spaces (Milman’s QS theorem), Dvoretzky theorem on almost-spherical sections.

2.13 Log-Sobolev (Gross 1975)

For Gaussian measure γ on R^n: Ent_γ(f²) ≤ 2 E_γ[‖∇f‖²]

with Ent_μ(g) = E_μ[g log g] - E_μ[g] log E_μ[g]. Implies hypercontractivity (Nelson 1973) of Ornstein-Uhlenbeck semigroup. Sharp constant 2 for Gauss measure. Bakry-Émery (1985) criterion: log-Sobolev holds under curvature-dimension CD(K, ∞), K > 0.

2.14 Poincaré-Wirtinger inequality

Var_μ(f) ≤ C_P E_μ[‖∇f‖²]

with C_P = 1 for Gauss measure (sharp). For domain Ω ⊂ R^n: ∫_Ω (f - f̄)² ≤ C_Ω ∫_Ω |∇f|², with C_Ω the Poincaré constant. Cheeger-related: 1/C_P ≥ (1/4) h(μ)² where h is Cheeger isoperimetric constant.

2.15 Brunn-Minkowski (Brunn 1887, Minkowski 1896)

For compact convex A, B ⊂ R^n: m(A + B)^{1/n} ≥ m(A)^{1/n} + m(B)^{1/n}

with A + B = {a + b}. Equivalent multiplicative: m((1-λ) A + λ B) ≥ m(A)^{1-λ} m(B)^λ. Implies isoperimetric inequality in R^n.

2.16 Isoperimetric

Among sets of given measure, the surface area (perimeter) is minimized by:

Euclidean R^n: ball — m_{n-1}(∂A) ≥ n ω_n^{1/n} m(A)^{(n-1)/n}
Sphere S^n: spherical cap (Lévy)
Gauss R^n: half-space (Borell 1975)

2.17 Loomis-Whitney (Loomis-Whitney 1949)

For compact A ⊂ R^n with axis projections A_i ⊂ R^{n-1}: m(A)^{n-1} ≤ Π_{i=1}^n m_{n-1}(A_i)

Box inequality reverses for “rectangle”-like sets; generalizes to Brascamp-Lieb (1976) — sharp multilinear inequalities Π ∫ f_i ∘ B_i ≤ C Π ‖f_i‖.

2.18 Brascamp-Lieb (Brascamp-Lieb 1976; Lieb 1990)

For B_i: R^n → R^{n_i} linear surjective and c_i > 0 with rank/dimension condition Σ c_i n_i = n: ∫ Π_i f_i(B_i x)^{c_i} dx ≤ C Π_i (∫ f_i)^{c_i}

with sharp constant from extremal Gaussians. Includes Hölder, Loomis-Whitney, sharp Young, Marcinkiewicz-Zygmund as special cases.

2.19 Sobolev inequality

For f ∈ W^{1, p}(R^n), 1 ≤ p < n, p*= n p/(n - p): ‖f‖_{p*} ≤ C(n, p) ‖∇f‖_p

Sharp constant by Aubin (1976) and Talenti (1976). Embedding W^{1, p} ↪ L^{p*} compact on bounded domain (Rellich-Kondrachov).

2.20 Friedrichs / Poincaré on bounded domain

For f ∈ W^{1, 2}_0(Ω): ‖f‖_2 ≤ C_Ω ‖∇f‖_2.

2.21 Hardy-Littlewood maximal inequality (Hardy-Littlewood 1930)

Strong type (p, p) for 1 < p ≤ ∞: ‖M f‖_p ≤ C_p ‖f‖_p; weak (1, 1): m{Mf > t} ≤ C ‖f‖_1 / t.

2.22 Hardy-Littlewood-Sobolev (Hardy-Littlewood 1928, Sobolev 1938)

For 0 < λ < n, p, q ∈ (1, ∞) with 1/p + 1/q + λ/n = 2: ∫∫ f(x) g(y) /|x - y|^λ dx dy ≤ C(n, λ, p) ‖f‖_p ‖g‖_q

Sharp constant by Lieb (1983).

2.23 Pinsker inequality (Csiszár 1967, Kullback 1967, Kemperman 1969 — collectively called “Pinsker”)

For probability measures P, Q: TV(P, Q) ≤ sqrt(KL(P || Q) / 2)

Bretagnolle-Huber: 1 - TV(P, Q) ≤ exp(-KL(P||Q)) gives tighter bound for large KL. Reverse-Pinsker requires extra conditions.

2.24 Bretagnolle-Huber (Bretagnolle-Huber 1979)

TV(P, Q)² + (1 - TV(P, Q))² ≤ 1 - (1 - 2 H²(P, Q))²

where H is Hellinger. Pair with Pinsker; sharper at large divergence.

2.25 Le Cam’s inequality (Le Cam 1986)

For testing P vs Q: P_e ≥ (1/2)(1 - TV(P, Q)) ≥ (1/2)(1 - sqrt(KL(P||Q)/2)). Foundation of minimax lower bounds.

2.26 Fano (Fano 1952)

For decoding a uniform random variable θ ∈ {1, …, M} from observation Y with Markov θ → Y → θ̂: P_e ≥ (H(θ) - I(θ; Y) - log 2) / log(M - 1)

In particular if I(θ; Y) ≤ log M / 2 - log 2, then P_e ≥ 1/2 - I/log M. Master tool for minimax lower bounds; converse of channel coding theorem.

2.27 Cramér-Rao lower bound (Cramér 1946, Rao 1945)

For unbiased estimator T(X) of θ and Fisher information I(θ): Var T(X) ≥ 1/I(θ)

Multivariate: Cov(T) ≥ I(θ)^{-1} (Loewner order). Achieved by efficient estimators (MLE asymptotically). Multi-parameter Cramér-Rao: for any unbiased estimator, the covariance dominates I^{-1}.

2.28 Hammersley-Chapman-Robbins (1950, 1951)

Var T(X) ≥ sup_{θ’ ≠ θ} (E_θ T - E_{θ’} T)² / (χ²(P_{θ’} || P_θ))

Sharper than Cramér-Rao; works without smoothness in θ.

2.29 Lehmann-Scheffé / Rao-Blackwell-Lehmann-Scheffé (Rao 1945, Blackwell 1947, Lehmann-Scheffé 1950, 1955)

Rao-Blackwellization: T_RB = E[T | S] for sufficient S has Var T_RB ≤ Var T. Lehmann-Scheffé: if S complete sufficient, T_RB is UMVU.

2.30 Data-processing inequality (Csiszár-Körner 1981; Cover-Thomas 2006)

For Markov chain X → Y → Z: I(X; Z) ≤ I(X; Y); KL(P_Z || Q_Z) ≤ KL(P_Y || Q_Y) for any deterministic / stochastic Y → Z.

f-divergences (Csiszár 1963) satisfy DPI for any convex f with f(1) = 0.

2.31 Han’s inequality (Han 1978)

For discrete X = (X_1, …, X_n): H(X) ≤ (1/(n-1)) Σ_i H(X_{-i})

where X_{-i} omits coordinate i. Implies Loomis-Whitney as a corollary via entropy.

2.32 Shearer’s inequality (Chung-Graham-Frankl-Shearer 1986)

For F = {S_1, …, S_m} ⊂ 2^{[n]} a cover (each i in ≥ k sets): k H(X) ≤ Σ_j H(X_{S_j})

Generalizes Han; entropy version of Loomis-Whitney and Bollobás-Thomason box inequality.

3. Concentration for matrices

3.1 Matrix Bernstein (Tropp 2012)

For independent random Hermitian X_i with E X_i = 0, ‖X_i‖ ≤ R, ‖Σ E X_i²‖ ≤ σ²: P(λ_max(Σ X_i) ≥ t) ≤ d exp(-t² / (2(σ² + R t / 3)))

Matrix Chernoff, matrix Hoeffding, matrix Azuma, matrix McDiarmid (Tropp 2012, FoCM).

3.2 Lieb concavity (Lieb 1973)

For positive A, B and 0 ≤ t ≤ 1: (A, B) ↦ tr(K* A^t K B^{1-t}) is jointly concave. Key tool for matrix Bernstein.

3.3 Golden-Thompson (Golden 1965, Thompson 1965)

For Hermitian A, B: tr(exp(A + B)) ≤ tr(exp(A) exp(B))

Failed three-matrix analog (Lieb 1973 needed instead). Sharpened: Sutter-Berta-Tomamichel (2017) — multivariate refinement.

3.4 Klein inequality

For convex f: tr(f(A) - f(B)) ≥ tr((A - B) f’(B)).

3.5 Weyl’s inequality (Weyl 1912)

For Hermitian A, B with eigenvalues λ_i(A) ≥ … ≥ λ_n(A): λ_{i+j-1}(A + B) ≤ λ_i(A) + λ_j(B)

In particular max-min: |λ_i(A + B) - λ_i(A)| ≤ ‖B‖. Foundation of matrix perturbation theory.

3.6 Lidskii (Lidskii 1950)

For Hermitian A, B: vector λ(A + B) - λ(A) is majorized by λ(B). Generalization of Weyl.

3.7 Horn inequalities (Horn 1962, Klyachko 1998, Knutson-Tao 1999)

Necessary and sufficient inequalities for which triples (λ(A), λ(B), λ(A+B)) can occur for Hermitian A, B. Solved via honeycomb model and saturation conjecture (Knutson-Tao 2001).

3.8 von Neumann trace inequality (von Neumann 1937)

For matrices A, B with singular values σ_1(A) ≥ … and σ_1(B) ≥ …: |tr(A B)| ≤ Σ σ_i(A) σ_i(B)

Foundation of nuclear-norm duality, optimal transport.

3.9 Hoffman-Wielandt (Hoffman-Wielandt 1953)

For Hermitian A, B: min_π Σ (λ_i(A) - λ_{π(i)}(B))² ≤ ‖A - B‖_F². Stability of eigenvalues.

3.10 Davis-Kahan sin Θ theorem (Davis-Kahan 1970)

For Hermitian A, perturbation E, eigenspaces U_S, Û_S corresponding to interval S separated by δ from spectrum complement: ‖sin Θ(U_S, Û_S)‖ ≤ ‖E‖ / δ

Standard tool for spectral perturbation in statistics, PCA.

3.11 Lyapunov inequality (matrix)

For positive matrices A, B with A ≤ B (Loewner): A^p ≤ B^p for 0 ≤ p ≤ 1 (operator monotonicity of x ↦ x^p, Löwner 1934); fails for p > 1.

3.12 Furuta inequality (Furuta 1987)

Extends Loewner monotonicity beyond x ↦ x^p with p > 1 under correct exponent relations: for 0 ≤ A ≤ B, r ≥ 0, p, q with q ≥ 1 and (1+r) q ≥ p+r: (B^r A^p B^r)^{1/q} ≤ B^{(p+r)/q}.

4. Convexity, optimization, and learning theory

4.1 Convex / Jensen-like

φ convex: φ((1-t) x + t y) ≤ (1-t) φ(x) + t φ(y). α-strong convexity: φ(y) ≥ φ(x) + ⟨∇φ(x), y - x⟩ + (α/2) ‖y - x‖². L-smoothness: ‖∇φ(y) - ∇φ(x)‖ ≤ L ‖y - x‖.

4.2 Polyak-Łojasiewicz (Polyak 1963, Łojasiewicz 1963)

‖∇φ(x)‖² ≥ 2 α (φ(x) - φ*); implies linear convergence of gradient descent without strong convexity.

4.3 Cauchy-Bunyakovsky-Schwarz for inner product (see 1.2)

4.4 Rearrangement (Hardy-Littlewood-Pólya 1929; Riesz 1930)

For non-negative f, g, h decreasing-rearrangements f*, g*: ∫ f g≤ ∫ f g*; ∫∫ f(x) g(x - y) h(y) dx dy ≤ ∫∫ f*(x) g*(x - y) h*(y) dx dy

(Riesz; sharp by symmetrization). Used in Brascamp-Lieb-Luttinger and isoperimetric proofs.

4.5 Glivenko-Cantelli (Glivenko 1933, Cantelli 1933)

For empirical CDF F_n and true F (no assumptions on F): sup_x |F_n(x) - F(x)| → 0 a.s.

Uniform glivenko-cantelli: for VC class F, sup_{f ∈ F} |E_n f - E f| → 0. Foundation of empirical process theory.

4.6 Dvoretzky-Kiefer-Wolfowitz (Dvoretzky-Kiefer-Wolfowitz 1956; Massart 1990)

P(sup_x |F_n(x) - F(x)| > t) ≤ 2 e^{-2 n t²}

Sharp constant 2 from Massart 1990 (originally larger from DKW 1956). Tight uniform-confidence band on CDF.

4.7 Vapnik-Chervonenkis (Vapnik-Chervonenkis 1971)

For VC class F with VC dimension d: P(sup_{f ∈ F} |E_n f - E f| > t) ≤ 8 (n+1)^d e^{-n t²/32}

Foundation of PAC learning. Generalizations: Pollard’s pseudodimension, Rademacher complexity (Bartlett-Mendelson 2002), fat-shattering, covering numbers (Pollard 1984).

4.8 Sauer-Shelah (Sauer 1972, Shelah 1972, Vapnik-Chervonenkis 1971)

|{F ∩ S : F ∈ F}| ≤ Σ_{i=0}^d C(|S|, i) ≤ (e |S|/d)^d for VC class of dimension d.

4.9 Massart’s lemma

For finite F with |F| ≤ N and L^2 norm σ²: E sup_{f ∈ F} (E_n f - E f) ≤ sqrt(2 σ² log N / n)

4.10 Rademacher complexity bound (Bartlett-Mendelson 2002)

E sup_{f ∈ F} (E f - E_n f) ≤ 2 R_n(F), with R_n the empirical Rademacher complexity. Talagrand contraction: R_n(φ ∘ F) ≤ L · R_n(F) for L-Lipschitz φ.

4.11 Symmetrization (Giné-Zinn 1984)

E sup_F |E_n f - E f| ≤ 2 R_n(F).

5. Spectral inequalities

5.1 Cheeger inequality (Cheeger 1970)

For compact Riemannian (M, g): λ_1 ≥ h(M)² / 4

h(M) Cheeger isoperimetric constant. Buser (1982) reverse: λ_1 ≤ C h(M) + C’ h(M)² for negative Ricci-bounded. Discrete analog Alon-Milman (1985) for graphs.

5.2 Maz’ya inequality

Capacitary characterization of Sobolev embedding (Maz’ya 1985).

5.3 Faber-Krahn (Faber 1923, Krahn 1925)

Among domains of fixed Lebesgue measure, the ball minimizes the first Dirichlet eigenvalue λ_1.

5.4 Szegő-Weinberger (Szegő 1954, Weinberger 1956)

Among domains of fixed measure, the ball maximizes the first non-trivial Neumann eigenvalue μ_2.

5.5 Lieb-Thirring (Lieb-Thirring 1975, 1976)

Sum of negative eigenvalues of Schrödinger operator -Δ + V: Σ |λ_j|^γ ≤ L_{γ, n} ∫*{R^n} (V(x))*-^{γ + n/2} dx

with L_{γ, n} constant. Used in semi-classical limits, stability of matter (Lieb-Loss 2001).

5.6 Lieb-Loss for Laplacian

Sharp constants: e.g. classical Lieb-Loss eigenvalue bounds, Polya-Szegő symmetrization.

6. Inequalities of information theory

6.1 Entropy non-negativity

H(X) ≥ 0 for discrete; differential h(X) can be negative.

6.2 Subadditivity / conditioning reduces entropy

H(X, Y) ≤ H(X) + H(Y); H(X | Y) ≤ H(X). Strict equality iff X ⊥ Y.

6.3 Mutual information non-negativity

I(X; Y) = KL(P_{XY} || P_X ⊗ P_Y) ≥ 0; = 0 iff X ⊥ Y.

6.4 Gibbs inequality

For probability distributions p, q on same set: -Σ p log p ≤ -Σ p log q (= KL(p || q) + H(p) ≥ H(p)).

6.5 Maximum-entropy under constraints

Among densities on R^n with covariance Σ, Gaussian N(0, Σ) maximizes h. Among densities on [0, b] with mean μ, exponential maximizes h.

6.6 Entropy power inequality (Shannon 1948; Stam 1959)

For independent X, Y: N(X + Y) ≥ N(X) + N(Y), N(X) = (1/(2π e)) exp(2 h(X)/n)

Sharp; equality iff X, Y Gaussian with proportional covariances. Strengthened by Costa (1985), reverse EPI under log-concavity.

6.7 Fisher information inequalities

J(X+Y)^{-1} ≥ J(X)^{-1} + J(Y)^{-1} for indep X, Y (subadditivity of inverse Fisher); de Bruijn identity h(X + sqrt t Z) - h(X) = (1/2) ∫_0^t J(X + sqrt s Z) ds (Z ~ N(0, I)). Foundations of entropy power proofs.

7. Optimal transport and probability metrics

7.1 Kantorovich-Rubinstein duality (Kantorovich-Rubinstein 1958)

W_1(μ, ν) = sup_{‖f‖_Lip ≤ 1} (∫ f dμ - ∫ f dν).

7.2 W_2 quadratic OT and Brenier (1991)

W_2²(μ, ν) = inf_{T_# μ = ν} ∫ ‖x - T(x)‖² dμ; optimal T = ∇ψ with ψ convex (Brenier 1987, 1991).

7.3 Talagrand T_2 (Talagrand 1996)

For γ standard Gaussian: W_2²(μ, γ) ≤ 2 KL(μ || γ). Generalizes to log-Sobolev measures (Otto-Villani 2000).

7.4 HWI inequality (Otto-Villani 2000)

H(μ | ν) ≤ W_2(μ, ν) sqrt(I(μ | ν)) - (κ/2) W_2²; ties together H entropy, W_2 Wasserstein, I Fisher information.

7.5 Triangle inequalities for W_p, TV, Hellinger

All canonical probability metrics satisfy a triangle inequality (W_p is a true metric on P_p; KL is not symmetric and fails triangle).

7.6 Comparison: KL ≥ TV² /2 ≥ 2 H² and W_2 ≥ W_1 ≥ TV for diameter ≤ 1

Various conversion bounds (Gibbs-Su 2002).

8. Combinatorial and geometric

8.1 Bollobás set-pair / Bollobás 1965

Bollobás-Frankl-type LYM and matching inequalities.

8.2 Kruskal-Katona (Kruskal 1963, Katona 1968)

Shadow inequality in shifted hypergraphs.

8.3 LYM inequality (Lubell 1966, Yamamoto 1954, Meshalkin 1963)

For antichain F in 2^{[n]}: Σ_{S ∈ F} 1/C(n, |S|) ≤ 1; gives Sperner |F| ≤ C(n, ⌊n/2⌋).

8.4 FKG (Fortuin-Kasteleyn-Ginibre 1971)

Positive correlation of increasing functions on distributive lattices under log-supermodular measures. Spin systems, percolation.

8.5 Holley (1974)

Stochastic domination criterion via Markov chain coupling.

8.6 Lehmann-Scheffé bounds (see 2.29)

8.7 Plünnecke-Ruzsa (Plünnecke 1969, Ruzsa 1989)

For finite A, B in abelian group: |A + B| |A| ≥ |A + A + B|… Sum-product theory; foundations of additive combinatorics (Tao-Vu 2006).

8.8 Cauchy-Davenport (Cauchy 1813, Davenport 1935)

|A + B| ≥ min(p, |A| + |B| - 1) in Z_p prime.

8.9 Freiman-Ruzsa (Freiman 1959, Ruzsa 1994)

Sets with small doubling are contained in low-dim arithmetic progressions; deep structural inequality.

9. Functional-analytic

9.1 Banach-Steinhaus / uniform boundedness

If sup_α ‖T_α x‖ < ∞ for all x in Banach space, then sup_α ‖T_α‖ < ∞.

9.2 Open mapping / closed graph (Banach 1929)

Surjective continuous linear map between Banach spaces is open.

9.3 Hahn-Banach (Hahn 1927, Banach 1929)

Extension of bounded linear functionals; equivalent to existence of separating hyperplanes for disjoint convex sets.

9.4 Bessel’s inequality

For orthonormal {e_n}: Σ |⟨x, e_n⟩|² ≤ ‖x‖². Equality iff {e_n} is a basis (Parseval).

9.5 Plancherel / Parseval

‖f‖_2 = ‖f̂‖_2 (Fourier isometry on L²).

9.6 Hausdorff-Young (Hausdorff 1923, Young 1913)

For 1 ≤ p ≤ 2 with 1/p + 1/p’ = 1: ‖f̂‖_{p’} ≤ ‖f‖_p. Sharp constant (Beckner 1975, Babenko 1961): A_p^n in R^n.

9.7 Stein-Tomas (Tomas 1975, Stein 1986)

Restriction of Fourier to sphere: ‖f̂|{S^{n-1}}‖{L²} ≤ C ‖f‖_{L^p} for p ≤ 2(n+1)/(n+3).

9.8 Strichartz (Strichartz 1977)

Space-time estimates for Schrödinger / wave / KdV operators; ‖e^{i t Δ} f‖{L^q_t L^r_x} ≤ C ‖f‖{L²}.

10. Other named

10.1 Mills’ ratio bounds for normal tail

For standard normal Φ and φ = Φ’: (1 - Φ(x))/φ(x) ≤ 1/x; ≥ x/(1+x²). Sharper: Komatsu (1955).

10.2 Slepian’s inequality (Slepian 1962)

For centered Gaussians X, Y with same variances and EX_i X_j ≤ EY_i Y_j ∀i, j: P(sup X_i ≤ a) ≥ P(sup Y_i ≤ a).

10.3 Sudakov-Fernique (Sudakov 1971, Fernique 1975)

E sup_i X_i ≤ E sup_i Y_i given E(X_i - X_j)² ≤ E(Y_i - Y_j)². Tool for Gaussian process suprema.

10.4 Anderson’s inequality (Anderson 1955)

For symmetric convex set K and centered Gaussian shifted by m: P(K + m) ≤ P(K).

10.5 Borell-TIS / Cirel’son-Ibragimov-Sudakov

Concentration of sup of centered Gaussian processes around its expectation; the underpinning of generic chaining.

10.6 Khintchine inequality (Khintchine 1923)

For Rademacher ε_i and a_i ∈ R: A_p (Σ a_i²)^{1/2} ≤ ‖Σ ε_i a_i‖_p ≤ B_p (Σ a_i²)^{1/2}. Sharp constants (Haagerup 1981).

10.7 Marcinkiewicz-Zygmund (Marcinkiewicz-Zygmund 1937)

Khintchine-style with iid symmetric random variables in place of Rademacher.

10.8 Burkholder-Davis-Gundy (Burkholder 1966, Davis 1970, Gundy 1968)

For martingale M with quadratic variation [M]: ‖sup_t |M_t|‖*p ≍ ‖[M]^{1/2}*∞‖_p, p ≥ 1.

10.9 Doob’s L^p maximal (Doob 1953)

For p > 1 and martingale M ≥ 0: ‖sup_t M_t‖p ≤ (p/(p-1)) ‖M∞‖_p.

10.10 Kolmogorov-Doob maximal

For mean-zero submartingale: λ P(sup_t M_t ≥ λ) ≤ E M_n^+.

11. Inequalities for matrices, traces, and operators (extended)

11.1 Schatten p-norm Hölder

For Schatten norms ‖A‖_p = (Σ σ_i(A)^p)^{1/p}: ‖AB‖_1 ≤ ‖A‖_p ‖B‖_q, 1/p + 1/q = 1.

11.2 Russo-Dye theorem

‖A‖_op = sup{|tr(A U)| : U unitary} for operator on B(H).

11.3 Bhatia-Davis variance bound (Bhatia-Davis 2000)

For random X with values in [m, M]: Var(X) ≤ (M - E X)(E X - m).

11.4 Powers-Størmer (Powers-Størmer 1970)

‖√A - √B‖_F² ≤ ‖A - B‖_1 for positive A, B; quantum hypothesis-testing bound.

11.5 Fannes-Audenaert continuity (Fannes 1973, Audenaert 2007)

|S(ρ) - S(σ)| ≤ ε log(d - 1) + h(ε) when ‖ρ - σ‖_1 ≤ ε ≤ 1; sharp constants.

12. Useful identities tied to inequalities

Cauchy-Schwarz ⇔ correlation in [-1, 1]
Hölder p = q = 2 ⇒ Cauchy-Schwarz
Jensen on -log gives KL ≥ 0
Jensen on x log x gives information inequality
Hoeffding via Chernoff + (b-a)²/4 sub-Gaussian
McDiarmid via Azuma applied to Doob martingale
Talagrand convex from Herbst argument + log-Sobolev
Brunn-Minkowski ⇒ isoperimetric ⇒ Sobolev (Maz’ya)
LSI ⇒ Talagrand T_2 ⇒ Gaussian concentration (Otto-Villani)
DPI is the defining property of all f-divergences

Compendium

Explorer

Named Inequalities Catalog

Named Inequalities Catalog

1. Foundational inequalities of analysis

1.1 AM-GM (Cauchy 1821)

1.2 Cauchy-Schwarz (Cauchy 1821 in R^n; Bunyakovsky 1859 integral; Schwarz 1888)

1.3 Hölder (Hölder 1888 / Rogers 1888)

1.4 Minkowski (Minkowski 1896)

1.5 Jensen (Jensen 1906)

1.6 Young’s inequality (Young 1912)

1.7 Hardy’s inequality (Hardy 1920)

1.8 Hilbert’s double-series inequality (Hilbert 1894/1909)

1.9 Carleman (Carleman 1923)

1.10 Bernoulli (1689)

1.11 Triangle inequalities for sequences

2. Inequalities of probability and concentration

2.1 Markov inequality (Markov 1884)

2.2 Chebyshev inequality (Chebyshev 1867)

2.3 Chernoff bound (Chernoff 1952)

2.4 Hoeffding (Hoeffding 1963)

2.5 Bernstein inequality (Bernstein 1924, 1946)

2.6 Bennett (Bennett 1962)

2.7 Azuma-Hoeffding (Azuma 1967; Hoeffding 1963)

2.8 McDiarmid / Bounded-Differences (McDiarmid 1989)

2.9 Efron-Stein inequality (Efron-Stein 1981, Steele 1986)

2.10 Talagrand’s concentration (Talagrand 1995, 1996)

2.11 Gaussian concentration (Borell 1975; Sudakov-Tsirelson 1974)

2.12 Spherical concentration (Lévy 1922; Milman 1971)

2.13 Log-Sobolev (Gross 1975)

2.14 Poincaré-Wirtinger inequality

2.15 Brunn-Minkowski (Brunn 1887, Minkowski 1896)

2.16 Isoperimetric

2.17 Loomis-Whitney (Loomis-Whitney 1949)

2.18 Brascamp-Lieb (Brascamp-Lieb 1976; Lieb 1990)

2.19 Sobolev inequality

2.20 Friedrichs / Poincaré on bounded domain

2.21 Hardy-Littlewood maximal inequality (Hardy-Littlewood 1930)

2.22 Hardy-Littlewood-Sobolev (Hardy-Littlewood 1928, Sobolev 1938)

2.23 Pinsker inequality (Csiszár 1967, Kullback 1967, Kemperman 1969 — collectively called “Pinsker”)

2.24 Bretagnolle-Huber (Bretagnolle-Huber 1979)

2.25 Le Cam’s inequality (Le Cam 1986)

2.26 Fano (Fano 1952)

2.27 Cramér-Rao lower bound (Cramér 1946, Rao 1945)

2.28 Hammersley-Chapman-Robbins (1950, 1951)

2.29 Lehmann-Scheffé / Rao-Blackwell-Lehmann-Scheffé (Rao 1945, Blackwell 1947, Lehmann-Scheffé 1950, 1955)

2.30 Data-processing inequality (Csiszár-Körner 1981; Cover-Thomas 2006)

2.31 Han’s inequality (Han 1978)

2.32 Shearer’s inequality (Chung-Graham-Frankl-Shearer 1986)

3. Concentration for matrices

3.1 Matrix Bernstein (Tropp 2012)

3.2 Lieb concavity (Lieb 1973)

3.3 Golden-Thompson (Golden 1965, Thompson 1965)

3.4 Klein inequality

3.5 Weyl’s inequality (Weyl 1912)

3.6 Lidskii (Lidskii 1950)

3.7 Horn inequalities (Horn 1962, Klyachko 1998, Knutson-Tao 1999)

3.8 von Neumann trace inequality (von Neumann 1937)

3.9 Hoffman-Wielandt (Hoffman-Wielandt 1953)

3.10 Davis-Kahan sin Θ theorem (Davis-Kahan 1970)

3.11 Lyapunov inequality (matrix)

3.12 Furuta inequality (Furuta 1987)

4. Convexity, optimization, and learning theory

4.1 Convex / Jensen-like

4.2 Polyak-Łojasiewicz (Polyak 1963, Łojasiewicz 1963)

4.3 Cauchy-Bunyakovsky-Schwarz for inner product (see 1.2)

4.4 Rearrangement (Hardy-Littlewood-Pólya 1929; Riesz 1930)

4.5 Glivenko-Cantelli (Glivenko 1933, Cantelli 1933)

4.6 Dvoretzky-Kiefer-Wolfowitz (Dvoretzky-Kiefer-Wolfowitz 1956; Massart 1990)

4.7 Vapnik-Chervonenkis (Vapnik-Chervonenkis 1971)

4.8 Sauer-Shelah (Sauer 1972, Shelah 1972, Vapnik-Chervonenkis 1971)

4.9 Massart’s lemma

4.10 Rademacher complexity bound (Bartlett-Mendelson 2002)

4.11 Symmetrization (Giné-Zinn 1984)

5. Spectral inequalities

5.1 Cheeger inequality (Cheeger 1970)

5.2 Maz’ya inequality

5.3 Faber-Krahn (Faber 1923, Krahn 1925)

5.4 Szegő-Weinberger (Szegő 1954, Weinberger 1956)

5.5 Lieb-Thirring (Lieb-Thirring 1975, 1976)