Measure Theory and Integration

Measure theory provides the rigorous foundation for length, area, volume, probability, and integration. Before Henri Lebesgue’s 1902 thesis Intégrale, longueur, aire there was no satisfactory way to integrate badly-behaved functions, no uniform notion of “size” of subsets of the real line, and no rigorous framework for probability. The modern formulation — measurable spaces, -algebras, measures, Lebesgue integral, and the convergence theorems — solved these problems and reshaped twentieth-century analysis. Andrey Kolmogorov’s 1933 Grundbegriffe der Wahrscheinlichkeitsrechnung used measure theory to give probability its first axiomatic treatment, and the entire edifice of modern probability, stochastic calculus, ergodic theory, functional analysis, harmonic analysis, and partial differential equations is built on it. This note covers -algebras and measurable functions, construction of the Lebesgue measure, the Lebesgue integral, the three convergence theorems, spaces, product measures and Fubini’s theorem, Radon-Nikodym differentiation, Carathéodory’s extension theorem, Lebesgue’s differentiation theorem, absolute continuity and singularity, Hausdorff measure and fractal dimension, and applications to probability and functional analysis.

See also

1. The need for measure theory

1.1 Why Riemann is not enough

The Riemann integral, taught in undergraduate calculus, partitions the domain into intervals and approximates by step functions. It works beautifully for continuous functions on compact intervals but breaks down in three crucial ways:

  1. The class of Riemann-integrable functions is too small. The Dirichlet function — equal to on rationals and on irrationals — is not Riemann-integrable. Yet it is “almost everywhere ” and ought to integrate to .

  2. Limit interchange is fragile. If pointwise with , we want . Riemann theory requires uniform convergence; for pointwise convergence under domination the answer requires the Lebesgue framework.

  3. Function spaces are not complete. The space of continuous functions on with the norm is incomplete (sequences of continuous functions can converge in to a discontinuous limit). For Hilbert-space methods we need a complete space; Lebesgue theory provides it.

1.2 The “size” problem

Even before integration, the question “what subsets of have a well-defined length?” has no naive answer. We want a set function on subsets of satisfying:

  • (consistency with length).
  • for disjoint (additivity).
  • (translation invariance).

If we demand all of these for every subset of , even just for finite additivity, no such exists in for (Banach-Tarski paradox, 1924, requires the axiom of choice). For -additivity (countable additivity), no such exists in for . The fix: restrict to a sufficiently rich but not universal class of subsets — the -algebra of measurable sets.

2. -algebras and measurable spaces

2.1 Definitions

A -algebra on a set is a collection of subsets of such that:

  1. .
  2. If , then .
  3. If , then .

Properties (2) and (3) imply closure under countable intersections via De Morgan. The pair is a measurable space; elements of are measurable sets.

2.2 The Borel -algebra

The smallest -algebra on containing all open intervals is the Borel -algebra . By symmetry it equally contains all closed intervals, half-open intervals, open sets, closed sets, sets (countable intersections of opens), sets (countable unions of closeds), and so on. The construction generalises to any topological space.

The cardinality of is (continuum), the same as — far smaller than the cardinality of the power set .

2.3 Generated -algebras and

For any collection of subsets of , the intersection of all -algebras containing is itself a -algebra — the -algebra generated by , written . The Borel -algebra is .

2.4 -systems and Dynkin’s - theorem

Two measures that agree on a -system (a family closed under finite intersections) agree on the entire -algebra it generates — provided one of the measures is -finite. This is Dynkin’s - theorem (also known as the monotone class theorem), an indispensable tool for proving measure-theoretic statements by reducing to a generating class.

2.5 Measurable functions

A function is measurable if for every . For real-valued measurability means for every .

Composition of measurable functions is measurable. Sums, products, , , , , and pointwise limits of measurable functions are measurable. This last fact is what distinguishes measurable from continuous functions — continuity is not preserved under pointwise limits, but measurability is.

3. Measures

3.1 Definition

A measure on is a function with:

  1. .
  2. Countable additivity (-additivity): for disjoint ,

The triple is a measure space.

Special cases:

  • Probability measure: .
  • Finite measure: .
  • -finite measure: with .
  • Signed measure: takes values in (not both and ).
  • Complex measure: takes values in , with countable additivity and finite.

3.2 Properties

For any measure :

  • Monotonicity: .
  • Subadditivity: .
  • Continuity from below: .
  • Continuity from above: and .

3.3 Examples

  • Counting measure on : if is finite, otherwise.
  • Dirac measure at point : if , otherwise.
  • Lebesgue measure on : the unique translation-invariant Borel measure with . Constructed in §4.
  • Radon measure: a Borel measure that is locally finite (finite on compact sets) and inner regular (every Borel set is approximated from inside by compact sets).
  • Gaussian measure on : density with respect to Lebesgue.
  • Haar measure on a locally compact topological group: the unique (up to scalar) translation-invariant Radon measure.

4. Construction of Lebesgue measure

4.1 Outer measure

Define the Lebesgue outer measure on :

The outer measure is defined on all subsets but is only countably subadditive in general, not additive.

4.2 Carathéodory’s criterion

Constantin Carathéodory 1914: a set is Lebesgue measurable if for every set ,

The collection of Lebesgue-measurable sets is a -algebra containing the Borel sets, and is a complete measure. Completeness: every subset of a -null set is -measurable (and has measure zero).

4.3 Carathéodory extension theorem

This is the workhorse for constructing measures. Let be an algebra (closed under finite unions and complements) on and a pre-measure (countably additive on insofar as countable unions remain in ). Then extends uniquely (provided is -finite) to a measure on .

This is the technique behind constructions of Lebesgue measure, product measures, Haar measure, and many others.

4.4 Lebesgue vs Borel

The Lebesgue -algebra is strictly larger than . In fact (a Lebesgue-null set has all subsets Lebesgue-measurable). But . So there are Lebesgue-measurable non-Borel sets — uncountably many.

4.5 Non-measurable sets

Without the axiom of choice (AC), every subset of is Lebesgue-measurable (Solovay 1970, in ZF + dependent choice + “every set is Lebesgue-measurable” being consistent if a measurable cardinal exists). With AC, Vitali sets are non-measurable: pick one representative from each equivalence class of in . If the resulting set were measurable, translation invariance plus countability of would force and , contradiction.

Banach-Tarski (1924) shows that in a ball can be partitioned into finitely many pieces and reassembled (using AC) into two balls of the same size — those pieces are necessarily non-measurable.

5. The Lebesgue integral

5.1 Simple functions

A simple function is a measurable function taking finitely many values: For , define with the convention .

5.2 Approximation theorem

Every non-negative measurable function is the pointwise limit of an increasing sequence of non-negative simple functions:

5.3 Lebesgue integral of non-negative functions

For measurable,

5.4 General Lebesgue integral

For measurable , write where and . Define provided not both terms are . is integrable if both terms are finite, equivalently . The space of integrable functions is .

5.5 Properties

  • Linearity: .
  • Monotonicity: .
  • Triangle: .
  • a.e. equality: almost everywhere .

5.6 Lebesgue vs Riemann

For bounded on a compact interval : is Riemann-integrable iff is continuous almost everywhere (Lebesgue’s criterion, 1902). When both integrals exist, they agree. The Lebesgue integral extends to all measurable functions whose absolute value integrates finitely.

Improper Riemann integrals (e.g., ) can exist when the Lebesgue integral does not — Lebesgue requires absolute integrability.

6. The three convergence theorems

These are the central theorems of integration theory and the reason Lebesgue’s framework dominates.

6.1 Monotone Convergence Theorem (Beppo Levi 1906)

Let measurable with pointwise (a.e.). Then

Proof sketch: is increasing and bounded above by . For the reverse, fix simple and ; the sets increase to , so , and letting then gives .

6.2 Fatou’s Lemma (Pierre Fatou 1906)

For non-negative measurable :

Strict inequality possible: on . Then but for all .

6.3 Dominated Convergence Theorem (Henri Lebesgue 1908)

Let measurable with a.e. and for some integrable . Then and

Proof sketch: Apply Fatou to and to . Get and . Combine.

This is the workhorse of analysis. Almost every limit-interchange argument — differentiating under the integral, Fourier inversion, weak convergence in PDE — uses DCT or a refinement.

6.4 Vitali’s convergence theorem

A refinement of DCT replacing the domination hypothesis with uniform integrability: Together with in measure, this gives in . Indispensable in probability — sequences of random variables are often uniformly integrable without a uniform dominant.

7. spaces

7.1 Definitions

For and measurable : For :

7.2 Hölder’s inequality

For (conjugate exponents):

Special case : Cauchy-Schwarz. Reverse direction with swapped at and .

7.3 Minkowski’s inequality

For , satisfies the triangle inequality:

For the inequality fails — for are not normed spaces (they remain metric spaces with but are not locally convex).

7.4 Riesz-Fischer completeness theorem

Frigyes Riesz 1907 and Ernst Fischer 1907 (independently): is complete (Banach) for . is Hilbert with inner product .

Proof sketch for : Cauchy sequence has subsequence with . The series converges a.e. by monotone convergence, so converges a.e. to some . DCT shows and , then standard Cauchy-sequence argument gives convergence of the whole sequence.

7.5 Duality

For and -finite: where . Via the pairing

For the dual is strictly larger than (it contains finitely additive measures). See functional-analysis.

7.6 Density

Continuous functions of compact support are dense in for (under Lebesgue measure). Smooth functions are dense via mollification. Step functions are dense. Simple functions are dense.

8. Product measures and Fubini’s theorem

8.1 Product -algebra and measure

Given measure spaces and , the product -algebra is generated by rectangles . The product measure is the unique (-finite case) measure on with . Existence by Carathéodory extension.

8.2 Fubini’s theorem (Guido Fubini 1907)

For :

8.3 Tonelli’s theorem (Leonida Tonelli 1909)

For -finite and measurable on the product: with all three (possibly ) equal.

Tonelli applies without integrability — it lets you exchange order of integration for non-negative measurands regardless of whether the answer is finite. The combined Fubini-Tonelli strategy is: prove the integral of is finite via Tonelli, then apply Fubini to itself.

8.4 Counterexamples without -finiteness or sign-change

  • on : not integrable; the two iterated integrals exist but differ ( and ).
  • On with counting measure: if , if , otherwise. The two iterated sums are and .

9. Radon-Nikodym theorem

9.1 Absolute continuity and singularity

Two measures on :

  • ( is absolutely continuous w.r.t. ) if .
  • (mutually singular) if there exists a measurable partition with and .

9.2 Radon-Nikodym theorem (Johann Radon 1913, Otto Nikodym 1930)

Let be -finite measures on with . Then there is a non-negative measurable function (unique a.e.) — the Radon-Nikodym derivative — such that

This is the abstract version of the fundamental theorem of calculus and the cornerstone of statistical theory: probability densities are Radon-Nikodym derivatives with respect to Lebesgue or counting measure.

9.3 Lebesgue decomposition

For -finite , there is a unique decomposition with and . Combined with Radon-Nikodym: . The singular part further decomposes into a discrete (atomic) part and a continuous singular part (e.g., the Cantor measure).

9.4 Chain rule and change of variables

If all -finite:

For probability: if has density w.r.t. and for a diffeomorphism, then (change-of-variables / Jacobian formula).

9.5 Information-theoretic application

The Kullback-Leibler divergence between : If , define . See information-theory.

10. Lebesgue differentiation theorem

10.1 Statement

Henri Lebesgue 1910: for (locally integrable), Points where this limit exists and equals are Lebesgue points of . The Lebesgue differentiation theorem says almost every point is a Lebesgue point.

10.2 Hardy-Littlewood maximal function

Define Hardy-Littlewood maximal inequality (1930): for , For , (Marcinkiewicz interpolation).

The maximal inequality and Vitali covering lemma together yield Lebesgue’s differentiation theorem and its generalisation. This is the entry point to real-variable harmonic analysis (Calderón-Zygmund theory).

10.3 Absolute continuity of functions

A function is absolutely continuous if for every there is such that for any finite disjoint family of intervals with , we have .

Theorem (FTC for Lebesgue): is absolutely continuous on iff is differentiable a.e., , and

The Cantor function (devil’s staircase) is continuous, monotone, has a.e., yet . It is not absolutely continuous — the assumption is essential.

10.4 Functions of bounded variation

has bounded variation if over all partitions. BV functions are differences of two monotone functions (Jordan decomposition). Every BV function is differentiable a.e. (Lebesgue). The BV norm relates to the total variation of the associated signed Borel measure.

11. Signed and complex measures

11.1 Signed measures

A signed measure on takes values in (never both signs of infinity) and is -additive.

Hahn decomposition theorem: with on and on (unique up to -null sets).

Jordan decomposition: with mutually singular. Both are uniquely determined. The total variation measure is , and .

11.2 Complex measures

Take values in with finite. Total variation defined via the supremum over countable partitions of summed. The space of complex measures with total variation norm is a Banach space.

11.3 Riesz-Markov-Kakutani representation

Frigyes Riesz 1909, Andrey Markov 1938, Shizuo Kakutani 1941: the dual of for locally compact Hausdorff is isometrically isomorphic to the space of finite signed (resp. complex) regular Borel measures on . Every bounded linear functional on has the form for a unique such , with .

This is the bridge from functional analysis to measure theory and the foundation of the modern theory of distributions and weak* convergence of measures.

12. Hausdorff measure and dimension

12.1 Hausdorff outer measure

Felix Hausdorff 1918: for and , Then . This is a Borel outer measure on any metric space.

12.2 Hausdorff dimension

For each there is a critical value such that At the value can be anything in .

12.3 Examples

  • .
  • .
  • .
  • .
  • a.s. (in for ).
  • a.s.

For self-similar sets satisfying the open-set condition (Hutchinson 1981), equals the similarity dimension for copies scaled by ratio .

12.4 Hausdorff vs Lebesgue

on equals Lebesgue measure up to a normalising constant (the reciprocal of the volume of the unit ball). On lower-dimensional surfaces gives intrinsic -dimensional area.

13. Application to probability

13.1 Kolmogorov axiomatization

Andrey Kolmogorov 1933: a probability space is with . A random variable is a measurable function . The distribution of is the pushforward on . The expectation is when this is defined.

13.2 Independence

Events are independent if . Random variables are independent if and are independent -algebras, equivalently the joint distribution on is the product of marginals.

13.3 Conditional expectation

For and sub--algebra , the conditional expectation is the unique (a.s.) -measurable random variable with for all . Existence by Radon-Nikodym (applied to on , dominated by ).

13.4 Modes of convergence

For random variables on :

  • Almost sure: a.s. if .
  • In probability: if for all .
  • In : .
  • In distribution: for all bounded continuous .

Implications: a.s. in probability in distribution; in probability. Converses fail.

13.5 Strong law of large numbers

For iid with : Andrei Kolmogorov 1933.

13.6 Central limit theorem

For iid with mean and finite variance : Pierre-Simon Laplace 1812 (special cases), Aleksandr Lyapunov 1901, Jarl Lindeberg 1922, Paul Lévy 1925.

14. Application to functional analysis

14.1 Sobolev and BMO

Sobolev spaces require Lebesgue integration. The space BMO (bounded mean oscillation, John-Nirenberg 1961) of locally integrable functions with bounded mean oscillation is the natural endpoint for many singular-integral arguments. See functional-analysis.

14.2 Distributions and weak derivatives

The space of distributions includes via . The weak derivative is defined via the measure-theoretic integration by parts: for test .

14.3 Spectral measures

The spectral theorem for bounded self-adjoint operators on a Hilbert space realises every such operator as for a projection-valued measure . Functional calculus follows by integrating against .

15. Open questions and current themes

15.1 Existence of “good” measures

Banach measure problem: does there exist a finitely additive measure on extending Lebesgue and invariant under isometries? Solution: yes for (Banach 1923, via amenability of the isometry group), no for (Banach-Tarski).

15.2 Measurable cardinals

A cardinal is measurable if there is a non-trivial countably additive 2-valued measure on . Existence is independent of ZFC and stronger than ZFC (Tarski-Hanf). Connected to large cardinal axioms and inner model theory.

15.3 Geometric measure theory

Federer 1969 Geometric Measure Theory — currents, varifolds, rectifiable sets. Modern uses: minimal surface regularity (Allard 1972), optimal transport (Brenier 1991, McCann 1995, Villani’s books), Plateau’s problem, mean curvature flow weak solutions.

15.4 Optimal transport

Kantorovich-Rubinstein-Wasserstein metrics on measures. Monge problem (1781), Kantorovich relaxation (1942), modern Brenier-McCann-Villani theory. The Wasserstein-2 distance turns the space of probability measures with finite second moment into a geodesic metric space (Otto 2001, Sturm-Lott-Villani 2005-2009 for Ricci-curvature lower bounds on metric measure spaces).

15.5 Non-commutative measure theory

Von Neumann algebras and the theory of traces extend measure theory to operator algebras. A trace on a von Neumann algebra is the non-commutative analogue of integration; type II_1 factors carry a unique trace, type III factors carry none. Connes’ classification of factors (Fields Medal 1982) is the apex.

16. Worked examples and counterexamples

16.1 The Cantor function

Construct: on . At step , replace on each of the middle-thirds of the previous step with a constant equal to the average of the values at the endpoints. The limit is continuous, monotone increasing, , , a.e. (it is constant on each removed interval, total Lebesgue measure ). The associated Borel measure defined by is singular with respect to Lebesgue measure and concentrated on the Cantor set.

16.2 Vitali-Hahn-Saks theorem

If are signed measures on , all absolutely continuous w.r.t. , with existing for every , then the limit defines a signed measure . This is the measure-theoretic Banach-Steinhaus.

16.3 Egorov’s theorem (Dmitri Egorov 1911)

If a.e. on a finite-measure set , then for every there is with and uniformly on . “Almost every pointwise convergence is almost-uniform.”

16.4 Lusin’s theorem (Nikolai Luzin 1912)

For measurable on and , there is a closed set with and continuous. “Almost every measurable function is almost continuous.”

16.5 Riesz-Thorin interpolation

Marcel Riesz 1927, Olof Thorin 1948: if a linear operator is bounded from and , then it is bounded from for all where similarly for . The bound interpolates logarithmically. Indispensable for Fourier multipliers, singular integrals, and many PDE estimates.

17. Connections to other libraries

  • Probability and statistics: every modern probability text (Williams Probability with Martingales, Billingsley Probability and Measure, Durrett, Klenke, Kallenberg) is measure theory plus probability-specific constructions.
  • Stochastic processes: filtered probability spaces, martingales, stochastic integrals; see stochastic-calculus.
  • Functional analysis: spaces, duality, weak/weak* topology, Sobolev spaces; see functional-analysis.
  • Harmonic analysis: Fourier transforms on for , singular integrals, Hardy-Littlewood-Sobolev inequality; see fft-spectral.
  • PDE theory: weak solutions, Sobolev spaces, distributional derivatives; see pde-methods.
  • Information theory: differential entropy, KL divergence, mutual information; see information-theory.
  • Optimal transport and machine learning: Wasserstein-GAN, sliced Wasserstein, gradient flows in the Wasserstein space; see _index.
  • Ergodic theory: measure-preserving dynamical systems, Birkhoff’s ergodic theorem (1931), entropy of dynamical systems.

Further reading

  • Royden, H. L. and P. M. Fitzpatrick. 2010. Real Analysis (4th ed.). Pearson.
  • Folland, G. B. 1999. Real Analysis: Modern Techniques and Their Applications (2nd ed.). Wiley.
  • Rudin, W. 1987. Real and Complex Analysis (3rd ed.). McGraw-Hill.
  • Bogachev, V. I. 2007. Measure Theory (two volumes). Springer.
  • Halmos, P. R. 1950. Measure Theory. Springer.
  • Billingsley, P. 1995. Probability and Measure (3rd ed.). Wiley.
  • Durrett, R. 2019. Probability: Theory and Examples (5th ed.). Cambridge.
  • Williams, D. 1991. Probability with Martingales. Cambridge.
  • Stein, E. M. and R. Shakarchi. 2005. Real Analysis: Measure Theory, Integration, and Hilbert Spaces. Princeton Lectures III.
  • Federer, H. 1969. Geometric Measure Theory. Springer.
  • Mattila, P. 1995. Geometry of Sets and Measures in Euclidean Spaces. Cambridge.
  • Tao, T. 2011. An Introduction to Measure Theory. AMS GSM 126.
  • Villani, C. 2008. Optimal Transport: Old and New. Springer.
  • Kallenberg, O. 2021. Foundations of Modern Probability (3rd ed.). Springer.

Adjacent