Econometrics Foundations

Econometrics translates economic theory into estimable models using data. Its core preoccupations are identification (does the data permit recovery of a causal parameter?), estimation (which estimator is consistent and efficient?), and inference (are standard errors valid under realistic deviations from the textbook assumptions?). The field has shifted twice in the past forty years. The first shift was toward the “credibility revolution” of quasi-experimental identification, led by Angrist, Imbens, Card, and Krueger from the late 1980s. The second is toward modern doubly-robust and machine-learning-augmented estimators since roughly 2014, associated with Chernozhukov, Athey, Wager, and the EconML / DoubleML software ecosystem.

Linear regression and the Gauss-Markov theorem

The workhorse model is the conditional expectation function E[y | x] = x’beta, estimated by ordinary least squares (OLS): beta_hat = (X’X)^{-1} X’y. Under the classical Gauss-Markov assumptions — linearity in parameters, strict exogeneity E[u | X] = 0, no perfect multicollinearity, homoskedasticity Var(u | X) = sigma^2 I, and zero autocorrelation — OLS is the Best Linear Unbiased Estimator (BLUE). Adding Gaussian errors makes it the maximum-likelihood estimator and the minimum-variance unbiased estimator among all estimators (not just linear ones). The proof traces to Gauss (1809 Theoria Motus) and was systematized by Markov (1912); the modern textbook statement is in Hayashi (Econometrics 2000) or Wooldridge (Introductory Econometrics 7e 2020).

The variance estimator under classical assumptions is Var(beta_hat) = sigma^2 (X’X)^{-1}, where sigma^2 is estimated by the sum of squared residuals divided by n - k. The R^2 = 1 - SSR/SST measures explained variance but is monotone in added regressors; adjusted R^2 = 1 - (1 - R^2)(n-1)/(n-k-1) penalizes complexity. F-tests for joint restrictions Rbeta = q use F = (R beta_hat - q)’ [R (X’X)^{-1} R’ s^2]^{-1} (R beta_hat - q) / J, distributed F(J, n-k) under H_0. Wald, likelihood-ratio, and Lagrange-multiplier tests are the three workhorses, equivalent asymptotically under correct specification.

Specification matters: omitted variables bias E[beta_hat_1] = beta_1 + beta_2 (X_1’X_1)^{-1} X_1’ X_2 when a relevant X_2 is excluded. Frisch-Waugh-Lovell (Ragnar Frisch and Frederick Waugh 1933, Econometrica 1:387; Michael Lovell 1963) shows partialling out: regressing y on X_2 after residualizing both on X_1 gives the same beta_2 estimate as the full regression. This identity underlies the modern “high-dimensional FE” implementations such as Sergio Correia’s reghdfe (Stata, 2014) and Laurent Bergé’s fixest (R, 2018) which solve billion-observation, multi-way fixed-effects models by iteratively absorbing groups. Multicollinearity inflates standard errors via the variance inflation factor VIF_j = 1/(1 - R_j^2); rules of thumb flag VIF > 10, though this is arbitrary.

Heteroskedasticity and autocorrelation-consistent inference

Real economic data routinely violate homoskedasticity. White (Halbert White 1980, Econometrica 48:817) introduced the heteroskedasticity-consistent (HC) variance estimator: Var_HC(beta_hat) = (X’X)^{-1} (sum_i x_i x_i’ u_hat_i^2) (X’X)^{-1}, now standard as HC0. MacKinnon-White (1985) proposed small-sample corrections: HC1 multiplies by n/(n-k), HC2 uses leverage-corrected residuals u_hat_i / sqrt(1 - h_ii), HC3 uses u_hat_i / (1 - h_ii). HC3 (sometimes called the jackknife) is preferred in finite samples per Long-Ervin (2000, American Statistician 54:217). Cameron-Miller (2015) Journal of Human Resources 50:317 reviews cluster-robust inference: cluster-robust SEs (Liang-Zeger 1986, Biometrika 73:13) treat observations within a cluster (firm, school, county) as potentially correlated. The Bertrand-Duflo-Mullainathan (2004 QJE 119:249) audit of DiD applications showed that ignoring serial correlation within state-year panels had been inflating reported t-statistics by factors of two to three.

Autocorrelated errors plague time series. Newey-West (Whitney Newey, Kenneth West 1987, Econometrica 55:703) constructed HAC heteroskedasticity-and-autocorrelation-consistent SEs using a Bartlett kernel weighting with bandwidth L. Optimal bandwidth selection follows Andrews (1991, Econometrica 59:817) and the data-dependent Newey-West (1994, Review of Economic Studies 61:631) rule L = floor(4 (T/100)^{2/9}). Driscoll-Kraay (1998, Review of Economics and Statistics 80:549) extends to panel HAC robust to cross-sectional dependence.

Wild bootstrap (Cameron-Gelbach-Miller 2008, Review of Economics and Statistics 90:414) handles few clusters (G < 30); the 6-1 rule and small-sample wild-cluster-T inference are now standard. Roodman et al’s boottest (Stata, 2019) and the R fwildclusterboot package implement the wild cluster bootstrap efficiently at scale.

Endogeneity and instrumental variables

When E[u | x] != 0, OLS is inconsistent. Sources: omitted variables, measurement error in X (attenuation bias toward zero), simultaneity (supply meets demand), self-selection. Instrumental variables (IV) require an instrument z satisfying relevance Cov(z, x) != 0 and exogeneity Cov(z, u) = 0. The two-stage least squares (2SLS) estimator: regress x on z to get x_hat, then regress y on x_hat. The exogeneity condition is fundamentally untestable when the model is just-identified; overidentification tests (Sargan 1958, Hansen J) provide a partial check when there are more instruments than endogenous regressors.

Weak instruments inflate bias and distort inference. Stock-Yogo (2005) tabulated critical values for the Cragg-Donald and Kleibergen-Paap F statistics; the rule of thumb F > 10 (Staiger-Stock 1997, Econometrica 65:557) is now considered too lenient. Andrews-Stock-Sun (2019, Annual Review of Economics 11:727) recommend F > 23 for valid 5% inference with one endogenous regressor. For weak-instrument-robust inference use the Olea-Pflueger (2013, Journal of Business and Economic Statistics 31:358) effective F or Anderson-Rubin (1949 Annals of Mathematical Statistics 20:46) confidence sets.

Imbens-Angrist (1994, Econometrica 62:467) showed that with heterogeneous treatment effects, IV identifies the Local Average Treatment Effect (LATE): the average effect among compliers (those whose treatment status responds to the instrument). Angrist-Imbens-Rubin (1996, JASA 91:444) provides the canonical exposition with the four subgroups (always-takers, never-takers, compliers, defiers). Joshua Angrist and Guido Imbens shared the 2021 Nobel Prize (with David Card) for the credibility revolution.

Famous instruments include Angrist (1990 American Economic Review 80:313) Vietnam draft lottery for veteran status; Angrist-Krueger (1991 QJE 106:979) quarter-of-birth for schooling (and Bound-Jaeger-Baker 1995 JASA 90:443 critique of its weakness); Card (1995) college proximity; Acemoglu-Johnson-Robinson (2001 American Economic Review 91:1369) settler mortality for institutions; Nunn (2008 QJE 123:139) slave exports for African development. Mendelian randomization in epidemiology uses genetic variants as instruments for modifiable health exposures (Davey Smith and Ebrahim 2003 International Journal of Epidemiology 32:1).

Difference-in-differences

DiD compares the change in outcome for a treated group to the change in a control group: tau_DiD = (Y_T,post - Y_T,pre) - (Y_C,post - Y_C,pre). Identification rests on parallel trends — absent treatment, the treatment group would have evolved like the control group. The two-way fixed effects (TWFE) regression y_it = alpha_i + gamma_t + tau D_it + eps_it is the canonical implementation. Card-Krueger (1994, American Economic Review 84:772) famously used DiD across New Jersey and Pennsylvania fast-food restaurants to find the 1992 NJ minimum wage hike did not reduce employment — a landmark application that catalyzed quasi-experimental methods.

The post-2018 staggered DiD revolution exposed serious problems with TWFE when treatment timing varies. Goodman-Bacon (2021, Journal of Econometrics 225:254) decomposed the TWFE coefficient into a weighted average of all possible 2x2 comparisons including “forbidden comparisons” using already-treated units as controls, which can flip the sign when treatment effects are heterogeneous over time. Callaway-Sant’Anna (2021, Journal of Econometrics 225:200) introduced group-time average treatment effects ATT(g, t), estimated using never-treated or not-yet-treated controls, with multiplier bootstrap inference. Sun-Abraham (2021, Journal of Econometrics 225:175) provided an interaction-weighted estimator using saturated event-time-by-cohort interactions. De Chaisemartin-D’Haultfœuille (2020, American Economic Review 110:2964) introduced DID_M for fuzzy designs and switching treatments. Borusyak-Jaravel-Spiess (2024, Review of Economic Studies 91:3253) proposed the imputation estimator: fit y_it = alpha_i + gamma_t on untreated observations, then average residuals among treated cells, achieving efficiency under homoskedasticity. Roth-Sant’Anna-Bilinski-Poe (2023, Journal of Econometrics 235:2218) is a synthesizing review. Honest DiD (Rambachan-Roth 2023, Review of Economic Studies 90:2555) provides sensitivity analysis to parallel-trends violations by bounding the magnitude of differential pretrends and tracing identified-set intervals.

Regression discontinuity

RDD exploits a sharp cutoff c in a running variable X: units with X >= c receive treatment, others do not. Identification requires that the conditional expectations E[Y(0) | X] and E[Y(1) | X] are continuous at c. The estimator is the gap at the cutoff, recovered by local linear regression on each side with optimal bandwidth. Imbens-Kalyanaraman (2012, Review of Economic Studies 79:933) introduced an MSE-optimal bandwidth selector; Calonico-Cattaneo-Titiunik (2014, JASA 109:1753) introduced robust bias-corrected inference, now standard via the rdrobust Stata/R/Python package. Imbens-Lemieux (2008, Journal of Econometrics 142:615) is the canonical methodological review.

McCrary (2008, Journal of Econometrics 142:698) test for manipulation of the running variable: estimate the density of X separately on each side of c and test for a jump. Cattaneo-Jansson-Ma (2020 JASA 115:1449) is a modern local-polynomial-density-test alternative implemented in rddensity. Covariate continuity tests provide additional placebo checks — predetermined covariates should not jump at the cutoff.

Fuzzy RDD: treatment probability jumps at c but not from 0 to 1. Then IV: instrument treatment D with indicator 1{X >= c}, recover LATE for compliers at the cutoff. Classic applications include geographic RDD (Melissa Dell 2010 Econometrica 78:1863 mita boundary), close-elections (David Lee 2008 Journal of Econometrics 142:675), school assignment by test score (Angrist-Lavy 1999 QJE 114:533 Maimonides rule), age-at-school-entry (Bedard-Dhuey 2006 QJE 121:1437), and program eligibility cutoffs across PROGRESA / Bolsa Familia conditional cash transfers.

Synthetic control

When one unit is treated and many are untreated, synthetic control constructs a weighted average of donors matching pre-treatment outcomes. Abadie-Diamond-Hainmueller (2010, JASA 105:493) is the methodological foundation; Abadie-Gardeazabal (2003, American Economic Review 93:113) the originating Basque-terrorism application. The original published applications were Basque terrorism (2003) and California Proposition 99 tobacco tax (2010). Inference uses placebo permutation tests across donor units, comparing the treated-unit gap to the distribution of donor-unit placebo gaps. Abadie (2021, Journal of Economic Literature 59:391) is the definitive review.

Generalized synthetic control (Yiqing Xu 2017, Political Analysis 25:57) handles multiple treated units via interactive fixed effects with cross-validation for factor-count selection. Synthetic difference-in-differences (Arkhangelsky-Athey-Hirshberg-Imbens-Wager 2021, American Economic Review 111:4088) combines unit and time weighting and reduces to DiD when weights are uniform. Augmented synthetic control (Ben-Michael-Feller-Rothstein 2021, JASA 116:1789) corrects bias when pre-period fit is imperfect via outcome-model debiasing. Matrix completion methods (Athey-Bayati-Doudchenko-Imbens-Khosravi 2021, JASA 116:1716) generalize further to panel imputation under low-rank structure.

Panel data

Panels follow N units over T periods. Fixed effects (FE) absorb time-invariant unit heterogeneity alpha_i via within transformation: y_it - y_bar_i = (x_it - x_bar_i)‘beta + (u_it - u_bar_i). Random effects (RE) assumes alpha_i is uncorrelated with regressors and is more efficient when this holds. Hausman (1978, Econometrica 46:1251) test: chi^2 = (beta_FE - beta_RE)’ [Var(beta_FE) - Var(beta_RE)]^{-1} (beta_FE - beta_RE). Reject random effects when correlation between alpha_i and x_it is detected. Mundlak (1978, Econometrica 46:69) re-cast RE with cluster-means of time-varying covariates included to allow correlated effects — the Mundlak-Chamberlain device.

Two-way fixed effects (alpha_i + gamma_t) absorb both unit and time heterogeneity. Cluster-robust SEs at the unit level (Bertrand-Duflo-Mullainathan 2004, QJE 119:249) corrected an earlier widespread underestimation of standard errors in DiD studies. Multi-way cluster-robust SEs (Cameron-Gelbach-Miller 2011 Journal of Business and Economic Statistics 29:238) handle clustering on multiple dimensions simultaneously.

Dynamic panels with lagged dependent variables suffer Nickell bias (Stephen Nickell 1981, Econometrica 49:1417) of order 1/T. Arellano-Bond (Manuel Arellano, Stephen Bond 1991, Review of Economic Studies 58:277) GMM uses lagged levels y_{i,t-2}, y_{i,t-3}, … as instruments for the first-differenced equation. Blundell-Bond (1998, Journal of Econometrics 87:115) “system GMM” augments with the level equation using lagged differences as instruments, more efficient when series are persistent. Roodman (2009, Oxford Bulletin of Economics and Statistics 71:135) cautions about instrument proliferation; xtabond2 in Stata is the standard implementation, with collapse and lag-limit options as essential tuning.

Time series

Stationarity requires that the joint distribution of (y_t, y_{t+k}) depend only on k, not t. Many economic series have unit roots (random walks): y_t = y_{t-1} + eps_t. Tests: Dickey-Fuller (David Dickey, Wayne Fuller 1979, JASA 74:427) and augmented Dickey-Fuller (1981, Econometrica 49:1057) test H_0: rho = 1 in y_t = rho y_{t-1} + sum gamma_j Delta y_{t-j} + eps_t; non-standard distribution tabulated by MacKinnon (1996 Journal of Applied Econometrics 11:601). Phillips-Perron (1988, Biometrika 75:335) uses non-parametric correction for serial correlation rather than augmenting with lags. KPSS (Kwiatkowski-Phillips-Schmidt-Shin 1992, Journal of Econometrics 54:159) reverses the null to test stationarity directly, complementing ADF/PP.

Spurious regression (Granger-Newbold 1974, Journal of Econometrics 2:111): regressing two independent unit-root processes yields high R^2 and significant t-statistics with probability close to one. Cure: difference both, or test for cointegration. Engle-Granger (1987, Econometrica 55:251, Clive Granger 2003 Nobel) two-step: estimate the long-run relationship by OLS on levels, test residuals for stationarity; if cointegrated, use the error-correction model (ECM) Delta y_t = alpha (y_{t-1} - beta x_{t-1}) + gamma Delta x_t + eps_t. Johansen (Søren Johansen 1988 Journal of Economic Dynamics and Control 12:231, 1991 Econometrica 59:1551) provides MLE-based multivariate cointegration with trace and max-eigenvalue tests for the cointegration rank.

Vector autoregression (VAR) y_t = A_1 y_{t-1} + … + A_p y_{t-p} + eps_t was introduced by Sims (Christopher Sims 1980, Econometrica 48:1; Nobel 2011 with Thomas Sargent) as a critique of the large-scale Keynesian macroeconometric models. Identification of structural VARs requires restrictions: short-run zero (Cholesky), long-run zero (Blanchard-Quah 1989, American Economic Review 79:655), sign restrictions (Uhlig 2005, Journal of Monetary Economics 52:381), or external instruments / proxy SVARs (Stock-Watson 2018 Economic Journal 128:917; Mertens-Ravn 2013 American Economic Review 103:1212 on tax shocks). Impulse response functions (IRFs) trace the response of variables to a structural shock. Jordà (2005, American Economic Review 95:161) local projections estimate IRFs by direct multi-horizon regression — robust to misspecification but less efficient than VAR-implied IRFs.

GARCH (generalized autoregressive conditional heteroskedasticity, Tim Bollerslev 1986, Journal of Econometrics 31:307) extends Engle’s ARCH (Robert Engle 1982, Econometrica 50:987; Engle 2003 Nobel with Granger): sigma_t^2 = omega + alpha eps_{t-1}^2 + beta sigma_{t-1}^2. Extensions: EGARCH (Nelson 1991 Econometrica 59:347), TGARCH/GJR-GARCH (Glosten-Jagannathan-Runkle 1993 Journal of Finance 48:1779) for asymmetric leverage where negative shocks raise volatility more than positive shocks of equal size, multivariate DCC (Engle 2002 Journal of Business and Economic Statistics 20:339) for time-varying correlation matrices.

Maximum likelihood, GMM, and structural estimation

MLE maximizes L(theta) = prod_i f(y_i | x_i, theta) under correct specification, achieving the Cramér-Rao bound asymptotically. Wald, likelihood-ratio, and Lagrange-multiplier tests are asymptotically equivalent. Quasi-MLE retains consistency under conditional mean specification for some exponential family models (Gourieroux-Monfort-Trognon 1984 Econometrica 52:701).

Generalized method of moments (GMM, Lars Hansen 1982, Econometrica 50:1029; Hansen 2013 Nobel with Eugene Fama and Robert Shiller) generalizes OLS, IV, and MLE. Choose theta to minimize g_N(theta)’ W g_N(theta) where g_N is sample moment conditions and W is a weight matrix. The efficient weight matrix is the inverse of the long-run variance of moments. Two-step GMM uses first-step identity weighting, then plugs in the efficient weight. Continuous-updating GMM (Hansen-Heaton-Yaron 1996, Journal of Business and Economic Statistics 14:262) updates W with theta jointly. Hansen J-test for overidentifying restrictions: NJ ~ chi^2(L-K) under correct specification.

Structural estimation imposes economic theory tightly enough to recover policy-invariant parameters (à la Lucas critique). Berry-Levinsohn-Pakes (BLP, Steven Berry, James Levinsohn, Ariel Pakes 1995, Econometrica 63:841) estimate random-coefficients logit demand from market-level data. A nested fixed-point inversion of market shares delivers the unobserved product characteristic xi_jt; endogenous prices are instrumented with cost shifters and BLP instruments (characteristics of rival products). The pyBLP package (Conlon-Gortmaker 2020, RAND Journal of Economics 51:1108) is now standard. Daniel McFadden’s conditional logit (1973) and his 2000 Nobel (with James Heckman) recognized discrete-choice econometrics for transportation and consumer demand.

Sample selection: Heckman (James Heckman 1979, Econometrica 47:153; Nobel 2000) two-step for the Tobit-type model where y is observed only when a selection equation z’gamma + v > 0. First-stage probit yields inverse Mills ratio lambda_hat included in the second-stage outcome regression; the coefficient on lambda tests for selection bias. Full MLE versions follow Amemiya’s Type 2 Tobit (Takeshi Amemiya 1985 Advanced Econometrics). Modern selection methods include matching (propensity score, Rosenbaum-Rubin 1983 Biometrika 70:41), inverse probability weighting, and the Imbens-Rubin (2015 Causal Inference for Statistics, Social, and Biomedical Sciences) potential-outcomes framework.

Machine learning meets causal inference

Wager-Athey (2018, JASA 113:1228) introduced causal forests (generalized random forests, Athey-Tibshirani-Wager 2019 Annals of Statistics 47:1148) for heterogeneous treatment effect estimation with pointwise confidence intervals via honest splitting and subsampling. The grf R/Python package is now widely used in policy and applied work for conditional-average-treatment-effect (CATE) estimation.

Chernozhukov-Chetverikov-Demirer-Duflo-Hansen-Newey-Robins (Victor Chernozhukov et al. 2018, Econometrics Journal 21:C1) introduced double/debiased machine learning (DML). Estimate nuisance functions (E[Y | X], E[D | X]) with cross-fitted ML, then plug residuals into the Neyman-orthogonal moment for the target parameter. This achieves N^{1/2} consistency for the causal parameter even when nuisance estimators converge at slower N^{1/4} rates. Implementations include doubleml in Python and R (Bach-Chernozhukov-Kurz-Spindler 2022), EconML (Microsoft Research, with Athey + Vasilis Syrgkanis as scientific leads), and CausalML (Uber).

Targeted maximum likelihood (TMLE, van der Laan-Rubin 2006 International Journal of Biostatistics 2:11) and the augmented inverse-probability-weighted (AIPW) estimator (Robins-Rotnitzky-Zhao 1994 JASA 89:846) provide doubly-robust estimation. Consistent if either the outcome model or the propensity model is correct, and efficient when both are. Influence-function based inference (Edward Kennedy 2023 Annual Review of Statistics 10:189 review) ties together the modern semiparametric efficient-influence-function machinery.

Synthetic difference-in-differences (cited above) and matrix completion sit at the interface of ML and traditional panel econometrics. Athey-Imbens-Wager (2018) Journal of the Royal Statistical Society B 80:597 introduced approximate residual balancing for high-dimensional treatment effects, related to the De Vito-Wager (2024) “balance trick” for high-dimensional propensity scores.

Software

Stata (StataCorp) is the lingua franca for applied microeconomics. Essential user-contributed commands include ivreg2 (Baum-Schaffer-Stillman), xtreg, xtabond2 (Roodman), reghdfe (Sergio Correia), did_imputation (Borusyak), csdid (Callaway-Sant’Anna), eventstudyinteract (Sun-Abraham), rdrobust (Calonico-Cattaneo-Titiunik), synth, and synth_runner.

R (R Core Team) dominates academic econometrics with fixest (Laurent Bergé 2018, package downloaded >300k/month per CRAN) for high-dimensional FE, did, DRDID, gsynth, grf, plm, vars, urca, sandwich, AER, and lmtest. Python’s statsmodels, linearmodels (Kevin Sheppard), econml, doubleml, and PyFixest (Jose Cuesta and Alexander Fischer, replicating fixest) have grown rapidly post-2020. Julia’s FixedEffectModels.jl (Matthieu Gomez) handles billion-row panels with reweighting in seconds. Modern compiled-language solvers leveraging multi-threaded BLAS and CSR sparse matrix algebra have closed the speed gap that historically favored proprietary Stata/SAS over open-source tools.

Replication crisis in economics

Camerer et al. (2016, Science 351:1433) reproduced 18 lab experiments from top economics journals; 11 (61%) replicated significant in the same direction. Chang-Li (2015 Federal Reserve Finance and Economics Discussion Series) re-examined 67 macro/finance papers, replicating only one third with author-supplied data and code. The American Economic Association established a Data and Code Availability Policy in 2019. The AEA Data Editor (Lars Vilhuber, Cornell) since 2020 verifies replication packages before publication, becoming a de facto editorial bottleneck. The Social Science Reproduction Platform (SSRP) and I4Replication (Institute for Replication, founded 2021, Abel Brodeur Ottawa) coordinate systematic replications across leading journals. Vilhuber’s 2020 audit found 21% of submitted packages had non-trivial replication failures. Pre-analysis plans (AEA RCT Registry since 2013, OSF Registries since 2012) are now expected for field experiments and increasingly for observational studies. The credibility revolution has thus arrived at its accountability moment — methods to identify causal parameters now coexist with infrastructure to verify that reported numbers can in fact be reproduced from public code.

Discrete-choice and limited-dependent variables

Binary choice models — probit (normal CDF link) and logit (logistic link) — estimate Pr(y = 1 | x) by MLE. Multinomial logit (McFadden 1973) assumes Independence of Irrelevant Alternatives (IIA) — the red-bus/blue-bus problem prompted nested logit and the BLP random-coefficients model to relax IIA. Ordered probit/logit (Aitchison-Silvey 1957) handles ordered categorical outcomes.

Censored regression — Tobit (James Tobin 1958 Econometrica 26:24, Nobel 1981) for y observed only above a threshold (latent ywith y = max(y, 0)). Truncated regression discards observations outside a range; censored regression keeps them at the boundary. Type II Tobit (Heckman selection) handles dual processes for participation and amount.

Count data — Poisson regression with E[y | x] = exp(x’beta); negative binomial for over-dispersion (Cameron-Trivedi 1998 Regression Analysis of Count Data); zero-inflated Poisson and hurdle models for excess zeros (Lambert 1992 Technometrics 34:1).

Duration models — hazard regression with proportional-hazards Cox (1972 JRSS B 34:187) or accelerated failure time. Discrete-time duration via complementary log-log link.

Bayesian econometrics

Bayesian econometrics estimates the posterior p(theta | data) propto p(data | theta) p(theta). Conjugate priors (normal-normal, beta-binomial) yield closed-form posteriors; general models require MCMC.

Gibbs sampling (Geman-Geman 1984 IEEE PAMI 6:721) and Metropolis-Hastings (1953/1970) underlie most macro applications. Modern packages: Stan (Andrew Gelman et al, Carpenter et al 2017 Journal of Statistical Software 76:1) using NUTS Hamiltonian Monte Carlo; PyMC (Salvatier et al 2016); R-INLA for spatial models; Dynare for Bayesian DSGE.

Bayesian model averaging (BMA) handles model uncertainty by integrating over models with posterior model weights. Sala-i-Martin-Doppelhofer-Miller (2004 American Economic Review 94:813) applied BMA to growth regressions; Bayesian Model Averaging Cross-Country Growth (Fernandez-Ley-Steel 2001 JBE 100:381) is the canonical treatment.

DSGE Bayesian estimation (Smets-Wouters 2003, 2007 cited in macro entry) uses prior beliefs on structural parameters and likelihood from state-space Kalman filtering on observable series.

Quantile regression

Koenker-Bassett (1978 Econometrica 46:33) introduced quantile regression — estimate Q_tau(y | x) for arbitrary quantile tau instead of the mean. Quantile coefficients minimize the asymmetric absolute deviation rho-tau(u) = u(tau - 1{u < 0}). Robust to outliers, captures heterogeneous effects across the distribution (e.g., minimum-wage effects on low-quantile wages may differ from high-quantile).

Roger Koenker’s quantreg R package and the Frisch-Newton interior-point algorithm enable quantile estimation at scale. Quantile IV (Chernozhukov-Hansen 2008 Journal of Econometrics 142:379) and quantile regression for panel data (Canay 2011 Econometrics Journal 14:368) extend the framework.

Spatial econometrics

Spatial autoregressive (SAR) and spatial error (SEM) models account for cross-sectional dependence via a spatial weight matrix W. Luc Anselin’s Spatial Econometrics (1988) and GeoDa software are canonical. Spatial Durbin model (SDM) includes spatial lags of both dependent variable and covariates.

The James LeSage / Kelley Pace Bayesian spatial econometrics methodology (Introduction to Spatial Econometrics 2009) is widely used in regional and urban econometrics.

Big-data and high-dimensional methods

Lasso (Robert Tibshirani 1996 JRSS B 58:267) selects variables by L1 penalty. Group lasso (Yuan-Lin 2006), elastic net (Zou-Hastie 2005), and sparse-group lasso extend the framework. For inference on lasso-selected coefficients, post-selection inference (Berk-Brown-Buja-Zhang-Zhao 2013 Annals of Statistics 41:802) and the lasso-projection methods (Belloni-Chernozhukov-Hansen 2014 Review of Economic Studies 81:608) are essential.

Random forests, gradient boosting (XGBoost, LightGBM), and neural networks supplement classical econometrics for prediction tasks. The Athey-Imbens “Machine Learning Methods That Economists Should Know About” (2019 Annual Review of Economics 11:685) review surveys the integration.

Nonparametric and semiparametric methods

Kernel density estimation (Rosenblatt 1956 Annals of Mathematical Statistics 27:832; Parzen 1962 Annals of Mathematical Statistics 33:1065) with bandwidth h and kernel K. Optimal bandwidth via Silverman’s rule or cross-validation.

Nonparametric regression — Nadaraya-Watson (1964) kernel regression; local-linear regression (Fan-Gijbels 1996 Local Polynomial Modelling) reduces boundary bias. Series estimators using B-splines, wavelets, or polynomial bases.

Semiparametric models combine parametric and nonparametric components. Partially linear model y = x’beta + g(z) + u where g is unknown (Robinson 1988 Econometrica 56:931). Single-index models y = G(x’beta) + u where G is unknown. Generalized additive models y = sum g_j(x_j) + u (Hastie-Tibshirani 1990).

Sieve estimators (Chen 2007 Handbook of Econometrics Vol 6B) approximate infinite-dimensional parameters by sequences of finite-dimensional sieves with dimension growing in sample size.

Survey design and complex samples

Probability sampling (Cochran 1977 Sampling Techniques) underlies most survey-based microeconometrics. Sampling weights from stratified, clustered, or oversampled designs require Horvitz-Thompson estimation or weighted regression. Survey-design variance estimators (Taylor linearization, BRR, jackknife) implemented in Stata svy: prefix or R survey package.

Item nonresponse handled by multiple imputation (Rubin 1987 Multiple Imputation for Nonresponse in Surveys) — m imputed datasets analyzed and combined via Rubin’s rules.

Network econometrics

Peer effects (Manski 1993 Review of Economic Studies 60:531 reflection problem — identifying endogenous social interactions from individual data is fundamentally hard). Bramoullé-Djebbari-Fortin (2009 Journal of Econometrics 150:41) showed instrumental variables based on network structure can identify peer effects when intransitive triads exist.

Network formation models (Jackson-Wolinsky 1996 Journal of Economic Theory 71:44; Mele 2017 Econometrica 85:825 ERGM-based estimation). SAR with network adjacency matrix W. Graphical lasso for sparse precision matrices.

Compendium

Explorer

Econometrics Foundations

Econometrics Foundations

Linear regression and the Gauss-Markov theorem

Heteroskedasticity and autocorrelation-consistent inference

Endogeneity and instrumental variables

Difference-in-differences

Regression discontinuity

Synthetic control

Panel data

Time series

Maximum likelihood, GMM, and structural estimation

Machine learning meets causal inference

Software

Replication crisis in economics

Discrete-choice and limited-dependent variables

Bayesian econometrics

Quantile regression

Spatial econometrics

Big-data and high-dimensional methods

Nonparametric and semiparametric methods

Survey design and complex samples

Network econometrics

Adjacent

Graph View

Table of Contents

Backlinks