Probability Frameworks — Cross-Cutting Comparison

This note compares the philosophical and operational frameworks for reasoning under uncertainty — frequentist, Bayesian (subjective + objective), likelihoodist, information-theoretic, causal, measure-theoretic, max-entropy — across every probability/statistics note in the Math library. Read the dimension tables first; the closing decision tree picks a framework by data size, prior availability, decision need, and reporting requirement.

1. The eight frameworks

foundational          inferential                    decision / pragmatic
  |                       |                                |
measure-theoretic       frequentist                     Bayesian decision theory
(Kolmogorov 1933)       (Neyman-Pearson)                (Lindley, Berger)
                        (Fisher likelihood + MLE)
Cox theorem             Bayesian (subjective)           causal (Pearl do-calc, Rubin PO)
(Cox 1946 →             (Jeffreys, de Finetti, Savage)
Jaynes 1957 max-ent)
                        likelihoodist
                        (Edwards 1972, Royall 1997)
                        information-theoretic
                        (AIC, BIC, WAIC, LOO, DIC)

2. The frameworks defined

Frequentist

Probability = long-run frequency of an event under repeated trials.
Parameters = fixed, unknown constants.
Inference = construct estimators (MLE, MoM) and pivot statistics; report confidence intervals and p-values.
Founders: Fisher (likelihood, MLE, sufficiency, ANOVA), Neyman + Pearson (hypothesis testing, confidence intervals 1933), Wald (decision theory 1939).
Strengths: well-developed asymptotic theory, doesn’t require priors, dominant in regulatory science (FDA, EMA), reproducible — anyone with the data + model gets the same p-value.
Weaknesses: p-values misinterpreted at scale (ASA 2016 statement on p-values), confidence intervals not the probability statements they look like, struggles with sequential analysis (alpha-spending), can’t condition on observed data.

Bayesian (subjective)

Probability = degree of belief; subjective.
Parameters = random variables with prior distributions.
Inference = posterior ∝ likelihood × prior; report credible intervals, posterior summaries.
Founders: Bayes 1763 (posthumous), Laplace 1812, Jeffreys 1939, de Finetti 1937 (exchangeability), Savage 1954 (foundations of statistics, axioms of rational choice).
Strengths: handles small data + prior knowledge naturally, gives full posterior distributions (not point estimates), nests sequential analysis trivially (today’s posterior = tomorrow’s prior), coherent under Dutch-book / de Finetti.
Weaknesses: prior choice can be controversial, computation often expensive (MCMC), sensitivity to misspecified priors, harder to communicate to non-statisticians.

Bayesian (objective / reference)

Prior = chosen by rule, not subjective belief. Jeffreys prior (1946, ∝ √det I(θ)), reference prior (Bernardo 1979 / Berger-Bernardo 1992), maximum-entropy prior (Jaynes 1957).
Aims to be “non-informative” or “least-informative” subject to constraints.
Often improper (∫ prior = ∞); only valid if posterior is proper.
Used when subjective prior is unavailable or controversial (regulatory science).
Empirical Bayes (Robbins 1956; Efron 2010) — estimate hyperprior from data; pragmatic but breaks strict Bayesian coherence.

Likelihoodist

Probability = degree of support that data provides for parameter values.
Inference = report likelihood functions / likelihood ratios; no priors, no error rates.
Founders: Edwards (Likelihood 1972), Royall (Statistical Evidence 1997), Hacking (Logic of Statistical Inference 1965).
Strengths: free of prior choice, free of Type-I/Type-II error framework, clean philosophical position.
Weaknesses: doesn’t give a probability of hypothesis, niche adoption — mostly philosophy of statistics.

Information-theoretic / predictive

Probability = framework-neutral; the question is model selection by predictive accuracy.
Inference = compute information criteria: AIC (Akaike 1974), BIC (Schwarz 1978), DIC (Spiegelhalter et al 2002), WAIC (Watanabe 2010), LOO-CV (leave-one-out).
Founders: Akaike, Schwarz, Burnham + Anderson (Model Selection and Multimodel Inference 2002), Watanabe.
Strengths: model comparison without nested-hypothesis machinery, handles non-nested models, predictive focus (rather than truth-of-hypothesis), AIC ≈ predictive cross-validation, BIC ≈ marginal likelihood (asymptotic).
Weaknesses: BIC’s assumption of “true model in candidate set” rarely holds, AIC penalty for complexity is asymptotic.

Bayesian decision theory

Probability + utility function → choose action that maximizes expected utility.
Inference + action combined.
Founders: Wald 1950 (Statistical Decision Functions), Lindley 1972, Berger 1985 (Statistical Decision Theory).
Strengths: directly addresses “what should I do”, coherent under Savage axioms, integrates uncertainty + losses, nests all of frequentist’s “tests + decisions”.
Weaknesses: need to specify utility function (often more controversial than prior).

Causal inference (Pearl / Rubin)

Probability of an intervention ≠ probability of an observation. Layer separation: observation, intervention, counterfactual.
Two main schools:
- Pearl’s structural causal models (SCM) — DAGs + do-calculus + counterfactuals (Pearl 1995, Causality 2000, ACM Turing 2011).
- Rubin’s potential outcomes (Neyman-Rubin) — Y(0), Y(1) potential outcomes; SUTVA; propensity-score methods (Rubin 1974, 1978).
The two schools are mostly equivalent (Pearl 2009 has the mapping).
Founders: Sewall Wright (path analysis 1921), Neyman (1923 thesis), Rubin 1974, Pearl 1988+, Spirtes-Glymour-Scheines 1993 (PC algorithm).
Strengths: only framework that handles “what if?” without RCT data, integrates with ML (double-ML, causal forests, X-learner), supports identifiability theorems.
Weaknesses: requires assumptions about confounding (ignorability, exchangeability), often not testable from data alone, requires graphical model expertise.

Measure-theoretic (Kolmogorov axioms)

Probability = measure on a σ-algebra. Foundational, not inferential.
Founders: Borel, Lebesgue, Kolmogorov (Grundbegriffe 1933).
Used as the substrate for every framework above. Required for stochastic-process work (Brownian motion, Markov chains).
Does not by itself prescribe inference — that comes from the framework layered on top.

Cox/Jaynes (subjective derivation)

Cox 1946 — derives Bayesian probability from desiderata of rational belief (consistency, completeness).
Jaynes — extends with maximum entropy principle: among priors consistent with constraints, pick the one with maximum entropy (“least committed”).
The philosophical basis for subjective Bayesian probability that argues it’s the unique consistent framework, not a choice.

3. Frequentist vs Bayesian — the deciding axes

Axis	Frequentist	Bayesian
What’s random?	data (under fixed θ)	θ (given fixed data)
Prior	none (or improper as null hypothesis)	required
Output	point estimate + confidence interval + p-value	full posterior distribution
Sequential analysis	requires alpha-spending (O’Brien-Fleming, Pocock)	trivial — posterior updates
Small-data behavior	breaks (n=1 → no SE)	works (prior provides regularization)
Big-data behavior	works	works (posterior concentrates)
Computation	closed-form often available	MCMC / VI / Laplace often needed
Communication	”p < 0.05” — familiar to regulators	”P(θ > 0
Hypothesis testing	Neyman-Pearson lemma + UMP tests	Bayes factor (Kass-Raftery 1995), posterior odds
Multi-comparisons	Bonferroni, FDR (Benjamini-Hochberg 1995), Holm	hierarchical shrinkage, posterior pooling

4. Map every Math note to a framework

Math note	Primary framework	Notes
probability-fundamentals	measure-theoretic substrate	both views presented
probability-distributions	measure-theoretic	distribution-by-distribution
hypothesis-testing-mle	frequentist	Neyman-Pearson, MLE, Wald/LRT/score tests
bayesian-inference	Bayesian (subjective + objective)	priors, posteriors, posterior predictive, hierarchical models
causal-inference	causal (Pearl + Rubin)	do-calculus, PSM, IV, double-ML
mcmc-sampling	Bayesian	Metropolis-Hastings, Gibbs, HMC, NUTS, parallel tempering
variational-inference	Bayesian	ELBO + amortized inference (VAE, BBVI)
measure-theory-and-integration	measure-theoretic	foundational
information-theory	info-theoretic	KL divergence, mutual information, AIC/BIC underpinning
markov-chains-and-hmm	both — frequentist EM-on-HMM, Bayesian HMM with priors	discrete-time, ergodicity, Baum-Welch
time-series-and-hmm	both — Box-Jenkins frequentist, Bayesian state-space	ARIMA, Kalman, particle filter
copulas-and-dependence	both — Sklar’s theorem is measure-theoretic, fitting is frequentist or Bayesian	Gaussian/Student-t/Archimedean
gaussian-processes	Bayesian (non-parametric)	infinite-dim prior over functions
stochastic-calculus	measure-theoretic + frequentist	Brownian, Itô, martingales, change of measure
probability-distribution-zoo	measure-theoretic catalog	for reference
sampling-algorithms-catalog	both — frequentist MC, Bayesian MCMC	sampling primitives
statistical-distributions-catalog-extended	catalog	distribution properties

5. p-values, the ASA 2016 statement, and the reproducibility crisis

The American Statistical Association in 2016 (Wasserstein-Lazar; expanded 2019 Wasserstein-Schirm-Lazar) issued formal warnings about p-value misuse. Key points:

A p-value is not the probability the null is true.
A non-significant p does not mean no effect — power matters.
A “statistically significant” finding is not necessarily large or important.
p < 0.05 should not be a bright line — Benjamin et al 2018 proposed p < 0.005.
The 2019 follow-up urged abandoning “statistical significance” as a dichotomy entirely.

The reproducibility crisis (Ioannidis 2005 “Why Most Published Research Findings Are False”; Open Science Collaboration 2015 reproducibility project on psychology — only 36% of 100 studies replicated) catalyzed:

Preregistration — commit to analysis plan before data collection (OSF, AsPredicted).
Registered reports — peer review of design before data collection.
Multiverse analysis (Steegen et al 2016) — present results across all reasonable analytic choices.
Specification curve / sensitivity analyses (Simonsohn-Simmons-Nelson 2020).
Open data + open code — Open Science Framework, code repositories.
TOP guidelines — Transparency and Openness Promotion (Nosek et al 2015).

6. Bayesian computation — the practical stack

Method	When	Library
Conjugate analysis	small prior-likelihood pairs	textbook
Grid approximation	low-dim (≤ 3 params)	by hand
Laplace approximation	unimodal posterior	INLA (Rue-Martino 2009)
Variational inference (VI / ELBO)	large dataset, tractable family	Stan (ADVI), PyMC, NumPyro, Pyro, TensorFlow Probability
Black-box VI (BBVI)	flexible posterior	NumPyro, Pyro, BlackJAX
Normalizing flows for VI	non-Gaussian posterior	Pyro, BlackJAX
MCMC: Metropolis-Hastings	low-dim, any posterior	rare in practice now
MCMC: Gibbs sampling	conditionally conjugate	JAGS, BUGS
MCMC: HMC (Hamiltonian Monte Carlo, Duane et al 1987; Neal 2010)	continuous, smooth	Stan, PyMC, NumPyro
MCMC: NUTS (Hoffman-Gelman 2014)	continuous, smooth, no manual tuning	Stan, PyMC, NumPyro (default)
MCMC: parallel tempering	multimodal	BlackJAX
Sequential Monte Carlo (particle filter / SMC sampler)	sequential / dynamic / multimodal	particles, BlackJAX, ParticleSMC.jl
Approximate Bayesian Computation (ABC)	likelihood intractable	abc-py, ELFI
Simulation-based inference (SBI, Cranmer-Brehmer-Louppe 2020)	likelihood intractable but simulable	sbi (Mackelab Tübingen)

In 2025 the canonical Bayesian stack is Stan (Carpenter et al 2017, ~400 citations/month) for “I want a probabilistic programming language”, NumPyro / Pyro for “I want JAX-/PyTorch-integrated VI”, PyMC for “I want pythonic Bayesian modeling”, and INLA for spatio-temporal GLMs. BlackJAX for JAX-native MCMC. brms / rstanarm for R users wanting lme4-style syntax with Bayesian backend.

7. The information criteria

Criterion	Formula	When	Penalty for complexity
AIC	-2 log L̂ + 2k	model selection, predictive focus	2 per parameter
BIC	-2 log L̂ + k log n	asymptotic marginal likelihood	log n per parameter (heavier than AIC for n > 7)
DIC	-2 log L̂_θ̄ + 2 p_D	Bayesian, deviance + complexity	effective number of parameters p_D
WAIC (Watanabe-Akaike)	-2 (lppd - p_WAIC)	Bayesian, fully Bayesian	p_WAIC measured from posterior
PSIS-LOO	-2 lppd_LOO	Bayesian, leave-one-out approximation	accounts for posterior naturally
BPIC (Brooks 2002)	variant of DIC	n/a	n/a
Hannan-Quinn	-2 log L̂ + 2k log log n	between AIC + BIC	log log n

Gelman-Hwang-Vehtari 2014 review recommends PSIS-LOO as default for Bayesian model comparison (with WAIC second), and AIC for ML approaches where MLE is the norm.

8. Causal frameworks — Pearl vs Rubin in practice

Pearl SCM	Rubin Potential Outcomes
DAG over variables	Y(0), Y(1) for each unit
do(X = x) operator	”treatment received w”
Causal effect P(Y \| do(X))	E[Y(1) - Y(0)]
Identifiability via do-calculus rules	Identifiability via ignorability (Y(0), Y(1) ⊥ T \| X)
Counterfactuals as third tier	Implicit in Y(t) notation
Front-door, back-door criteria	Propensity score, IV, IPTW
dagitty (Textor et al) — software	MatchIt, ipw, twang — software

Both schools handle the same problems; Pearl’s DAG-language is better for causal discovery and explanatory mechanism work; Rubin’s potential-outcomes language is better for experimental design and estimation.

Modern ML-causal: Double Machine Learning (Chernozhukov et al 2018), Causal Forests (Wager-Athey 2018), X-Learner (Künzel et al 2019), CausalML library (Uber), EconML library (Microsoft Research), DoWhy (Microsoft, Sharma-Kiciman 2020). These integrate causal identification with off-the-shelf ML.

9. The Wasserstein-Lazar-Lazar 2019 framing

The ASA’s 2019 follow-up identified the “p < 0.05” problem as a bright-line fallacy. They recommended:

Stop dichotomizing — present effect sizes + uncertainty intervals.
“Statistical significance” is not the same as “scientific significance”.
Embrace uncertainty — multiple analyses, sensitivity analyses, robustness checks.
Use Bayesian, info-theoretic, or other frameworks as appropriate to the question.
Move toward “Accept Uncertainty, be Thoughtful, Open, and Modest” (ATOM principles).

10. Reporting frameworks

Framework	Standard report
Frequentist	Point estimate ± SE; 95% CI; p-value; sample size; effect size (Cohen’s d, OR, RR); power
Bayesian	Posterior mean / median / mode; 95% credible interval (CI); posterior probability of direction; Bayes factor
Information-theoretic	ΔAIC or ΔBIC across models; Akaike weights; LOO-CV
Causal (Rubin)	ATE / ATT estimate; SE under sample-size assumptions; sensitivity to unmeasured confounding (Rosenbaum bounds, E-values, VanderWeele-Ding 2017)
Causal (Pearl)	DAG; do-calculus derivation; identification result; estimator; standard error

Modern practice in epidemiology (Hernán-Robins 2020 Causal Inference: What If) blends Rubin + Pearl; preregistered DAG + IPTW + double-ML.

11. Modern (2020–2026) developments

Probabilistic programming — Stan, PyMC, NumPyro, Turing.jl, Pyro, Edward2 ubiquitous in research labs.
Differentiable / GPU-accelerated MCMC — BlackJAX (JAX), NumPyro (JAX) deliver 10–100× speedups.
Simulation-based inference (SBI) — sbi library (Tübingen), Bayesian inference when likelihood is intractable. Cosmology, neuroscience, particle physics.
Conformal prediction (Vovk-Gammerman-Shafer 2005; resurgence 2020+; Angelopoulos-Bates 2021 tutorial) — distribution-free prediction sets with finite-sample coverage. Major shift in uncertainty quantification.
Conformal inference + Bayesian — combine well.
Posterior predictive checks standard in any Bayesian analysis (Gelman et al Bayesian Data Analysis 3rd ed).
Causal ML at scale — EconML, CausalML, DoWhy in production at Uber, Microsoft, Meta.
Bayesian deep learning — variational dropout (Gal-Ghahramani 2016), MC dropout, deep ensembles, SWAG (Maddox et al 2019), Laplace approximation for NNs (Daxberger et al 2021).
Diffusion models as SDE-based generative inference — Song-Sohl-Dickstein-Kingma-Kumar-Ermon-Poole 2020+; bridges Bayesian inference + generative modeling.
Foundation models for inference — TabPFN (Hollmann et al 2023), prior-fitted networks; transformer-based Bayesian inference.

12. Decision tree — pick a framework

What's your question?
├─ "Is the effect zero?" (NHST, regulatory)
│    → Frequentist; report p-value + 95% CI + effect size.
│    → If preregistered → maintain alpha-spending.
│    → If multiple comparisons → FDR (BH) or Bonferroni.
├─ "What's my best estimate + uncertainty?"
│    ├─ Have prior info? → Bayesian; report posterior mean + 95% CI.
│    ├─ No prior info, want frequentist? → MLE + SE + CI.
│    └─ Want distribution-free finite-sample coverage? → Conformal prediction.
├─ "Which of several models is best?"
│    ├─ Predictive focus → AIC or LOO-CV.
│    ├─ Truth focus / penalize complexity → BIC.
│    ├─ Bayesian → posterior model probability / Bayes factor / WAIC.
│    └─ Non-nested → cross-validation.
├─ "What's the causal effect of X on Y?"
│    ├─ Have RCT? → Frequentist or Bayesian on Y ~ T.
│    ├─ Have observational data? → Causal inference (PSM, IPTW, IV, RDD, DiD, double-ML).
│    └─ Need to identify? → DAG + do-calculus + sensitivity analysis (E-value).
├─ "What's the probability of an event given a model?"
│    → Direct probability calculation; measure-theoretic.
├─ "What's the maximum-likelihood estimate?"
│    → MLE (Fisher, frequentist) or MAP (Bayesian point estimate).
├─ "I have a sequential / streaming experiment"
│    → Bayesian (natural) or frequentist w/ alpha-spending (O'Brien-Fleming) / sequential probability ratio test.
├─ "I want to forecast"
│    ├─ Time series → ARIMA / state-space / Bayesian structural / Prophet / NeuralForecast.
│    ├─ Probabilistic forecast → Bayesian or conformal.
│    └─ ML forecast → quantile regression / conformal calibration.
├─ "My data is messy / noisy / has outliers"
│    → Robust methods (M-estimators, MCD, S-estimators); Bayesian w/ heavy-tailed prior (Student-t).
├─ "I can simulate but not compute likelihood"
│    → Simulation-based inference (sbi library), ABC, or amortized inference.
└─ "I want to discover causal structure from data"
     → Causal discovery (PC, FCI, GES, NOTEARS); see [[Math/causal-inference]].

13. Anti-patterns

“P < 0.05 = effect exists” — see ASA 2016 / 2019 statements. Effect size matters; reproducibility matters.
Using Bayes factors without prior sensitivity analysis — BFs are heavily prior-dependent.
Using flat priors as “non-informative” — flat is not always non-informative; depends on parameterization.
MAP as Bayesian point estimate without considering posterior shape — MAP can be far from posterior mean for skewed posteriors.
Sequential frequentist tests without alpha-spending — alpha inflates beyond control.
Causal claims from observational data without DAG / identification argument — unidentified.
“Significant” with n=1,000,000 — every effect is significant; report effect size.
Confusing Bayesian credible interval with frequentist CI — different probability statements.
Hierarchical Bayes without convergence diagnostics — R̂ < 1.01, ESS > 400 per chain are minimum bars.
Reporting only mean without uncertainty — always include CI / SE / posterior summary.

14. The reproducibility crisis frame

The crisis is not a single discipline’s failure but a foundational issue with how inference is taught and used:

Publication bias — significant results published, null results filed away.
HARKing (Hypothesizing After Results are Known) — post-hoc hypotheses dressed up as a priori.
p-hacking — analytic flexibility until p < 0.05.
Low power — Cohen 1962 found median power 0.18; little has changed.
Multiple testing without correction.
Forking paths (Gelman-Loken 2014) — the analyst’s degrees of freedom.

The institutional responses (preregistration, registered reports, multiverse analysis, open data, registered direct replications, p < 0.005 advocacy, abandonment of p-value dichotomy) are all reactions to this. By 2026 most major psych / med / econ journals require preregistration or trial registration.

Adjacent

Bayesian inference depth — bayesian-inference for priors, hierarchical models, posterior predictive checks.
Frequentist hypothesis testing — hypothesis-testing-mle for NP lemma, UMP tests, multiple comparisons.
Causal inference depth — causal-inference for DAGs, do-calculus, PSM, IV, RDD, DiD, double-ML.
MCMC algorithms — mcmc-sampling for HMC, NUTS, Gibbs, MH, parallel tempering.
Variational inference — variational-inference for ELBO, VAEs, BBVI, amortized inference.
Measure theory — measure-theory-and-integration for the foundational substrate.
Information theory — information-theory for KL, mutual information, AIC/BIC.
Markov chains — markov-chains-and-hmm for chain ergodicity (underlying MCMC).
Time series — time-series-and-hmm for state-space + Kalman + particle filters.
Copulas — copulas-and-dependence for dependence beyond linear correlation.
Gaussian processes — gaussian-processes for Bayesian non-parametrics.
Stochastic calculus — stochastic-calculus for change-of-measure (Girsanov) and martingale-based inference.
Optimization — _compare_optimization-methods for the MLE / MAP / VI / MCMC optimization-side.
Probability distribution catalog — probability-distribution-zoo.
Statistical distribution catalog — statistical-distributions-catalog-extended.
Sampling algorithms — sampling-algorithms-catalog.
Finance application — risk measures and probability frameworks are central in _compare_risk-measures.

When to pick what

The fastest narrowing: regulatory / NHST → frequentist with multiplicity correction; small data + prior → Bayesian; prediction focus → information-theoretic / cross-validation; causal question → Pearl/Rubin causal; decision under uncertainty → Bayesian decision theory; distribution-free finite-sample coverage → conformal prediction; likelihood intractable but simulable → SBI; sequential / streaming → Bayesian or sequential frequentist (alpha-spending). The single biggest practical lesson of the 2010s reproducibility crisis is preregister your analysis — commit to your framework, model, and inference procedure before seeing data. Without that, every framework above can be gamed.

Compendium

Explorer

Probability Frameworks — Cross-Cutting Comparison

Probability Frameworks — Cross-Cutting Comparison

See also

1. The eight frameworks

2. The frameworks defined

Frequentist

Bayesian (subjective)

Bayesian (objective / reference)

Likelihoodist

Information-theoretic / predictive

Bayesian decision theory

Causal inference (Pearl / Rubin)

Measure-theoretic (Kolmogorov axioms)

Cox/Jaynes (subjective derivation)

3. Frequentist vs Bayesian — the deciding axes

4. Map every Math note to a framework

5. p-values, the ASA 2016 statement, and the reproducibility crisis

6. Bayesian computation — the practical stack

7. The information criteria

8. Causal frameworks — Pearl vs Rubin in practice

9. The Wasserstein-Lazar-Lazar 2019 framing

10. Reporting frameworks

11. Modern (2020–2026) developments

12. Decision tree — pick a framework

13. Anti-patterns

14. The reproducibility crisis frame

Adjacent

When to pick what

Graph View

Table of Contents

Backlinks