Numerical Methods Reference — Quadrature, Interpolation, Root-Finding, Linear Solvers
A practitioner’s catalogue of classical numerical analysis: floating-point reality, interpolation, integration (quadrature), root-finding, and linear systems. Companion to ode-pde-solver-catalog. SI units throughout.
1. Floating-point arithmetic
1.1 IEEE 754
The IEEE 754 standard (1985; revised 2008, 2019) defines binary and decimal floating-point formats. Binary formats:
| Format | Bits | Sign | Exp | Mantissa | Machine ε |
|---|---|---|---|---|---|
| half (binary16) | 16 | 1 | 5 | 10 | 9.77e-4 |
| bfloat16 | 16 | 1 | 8 | 7 | 3.91e-3 |
| single (binary32) | 32 | 1 | 8 | 23 | 1.19e-7 |
| double (binary64) | 64 | 1 | 11 | 52 | 2.22e-16 |
| quad (binary128) | 128 | 1 | 15 | 112 | 1.93e-34 |
Machine epsilon ε: smallest x with 1 + x ≠ 1 in the chosen format.
A normal number has the form (-1)^s · 1.m · 2^{e - bias}. Subnormals (also called denormals) fill the gap near zero: leading bit is 0, exponent at min — graceful underflow at the cost of precision.
bfloat16 (Google Brain) sacrifices mantissa precision for single-precision exponent range — designed for deep-learning training where dynamic range matters more than per-multiply precision.
1.2 ULP and rounding
ULP (Unit in Last Place) = spacing between consecutive representable floats. Rounding modes (IEEE 754):
- Round to nearest, ties to even (default).
- Round toward 0 (truncation).
- Round toward +∞, -∞.
Correctly rounded arithmetic ensures fl(a op b) is the nearest representable result.
1.3 Catastrophic cancellation
Subtracting nearly equal numbers loses significant digits:
(1 + ε)² - 1 - 2ε - ε² # algebraically 0
evaluated naively in double, gives roundoff O(ε²) ~ 5e-32 but if computed as (a-1) with a=1+ε, cancellation eats half the bits.
Classic example: quadratic formula. For b² >> 4ac:
x = (-b + sqrt(b² - 4ac)) / (2a) # cancellation if b > 0
Reformulate:
x1 = (-b - sign(b) sqrt(b² - 4ac)) / (2a) # no cancellation
x2 = c / (a x1) # Vieta
1.4 Kahan summation
William Kahan’s 1965 compensated summation reduces error from O(nε) to O(ε) regardless of n:
sum = 0; c = 0
for x_i:
y = x_i - c
t = sum + y
c = (t - sum) - y # error compensation
sum = t
Variants: Neumaier (handles x_i larger than sum), pairwise summation, Klein.
1.5 Backward error analysis
James H. Wilkinson’s 1963 Rounding Errors in Algebraic Processes and 1965 The Algebraic Eigenvalue Problem: instead of bounding forward error |computed - exact|, show that the computed result is the exact solution to a nearby problem. Reframes algorithm assessment around problem conditioning vs algorithmic stability.
2. Interpolation
Given (x_i, y_i), i = 0, ..., n, find f with f(x_i) = y_i.
2.1 Polynomial interpolation
Lagrange form:
p(x) = Σ_i y_i L_i(x) L_i(x) = ∏_{j≠i} (x - x_j) / (x_i - x_j)
Numerically poor in monomial basis; the barycentric form (Berrut-Trefethen 2004 SIAM Rev. 46, “Barycentric Lagrange interpolation”) is stable:
p(x) = Σ_i (w_i / (x - x_i)) y_i / Σ_i (w_i / (x - x_i)) w_i = 1 / ∏_{j≠i}(x_i - x_j)
Newton divided differences: incremental — add a node, update one coefficient. Useful for online interpolation.
2.2 Runge phenomenon
Equispaced high-degree polynomial interpolation diverges near endpoints. Carl Runge 1901: f(x) = 1/(1 + 25x²) on [-1,1] has polynomial interpolant whose error grows without bound at the endpoints as n → ∞.
Cure: Chebyshev nodes x_k = cos((2k+1)π/(2n+2)). Convergence for analytic f is geometric. Backbone of the Chebfun ecosystem.
2.3 Piecewise interpolation
| Type | Continuity | Notes |
|---|---|---|
| Piecewise linear | C^0 | Fastest; large second-derivative error |
| Piecewise cubic | C^1 / C^2 | Standard for smooth data |
| Cubic spline | C^2 | Min-curvature among all C^2 interpolants |
| Hermite cubic | C^1 with prescribed derivatives | Local; no global solve |
| PCHIP (Fritsch-Carlson 1980) | C^1, monotone-preserving | scipy.interpolate.PchipInterpolator |
| Akima (1970) | C^1, robust to outliers | Less oscillation than spline |
Spline boundary conditions:
- Natural:
p''(x_0) = p''(x_n) = 0. - Clamped:
p'(x_0),p'(x_n)prescribed. - Not-a-knot: continuity of
p'''at second and second-to-last knots (MATLAB default). - Periodic:
p^{(k)}(x_0) = p^{(k)}(x_n).
2.4 Multivariate
- Bilinear / trilinear on rectilinear grids.
- Bicubic spline (tensor product).
- Scattered data:
- Inverse distance weighting (Shepard 1968).
- Radial basis functions (Buhmann 2003) — RBF kernels (Gaussian, multiquadric, thin-plate spline).
- Kriging / Gaussian-process regression: probabilistic — see gaussian-processes.
- Delaunay-based linear / cubic.
- NURBS (Piegl-Tiller The NURBS Book, 1997) for CAD geometry.
2.5 Approximation (best fit, not exact match)
- Least squares: minimise
Σ (p(x_i) - y_i)². Vandermonde-or-QR vianumpy.polyfit. - Padé approximants: rational
R(x) = P_m(x) / Q_k(x)— Padé table; better than polynomials with poles. - Chebyshev approximation: expand on Chebyshev basis up to degree N, truncate. Near-best uniform approximation.
- Remez exchange algorithm (Yevgeny Remez 1934): produces minimax polynomial — best
L^∞approximation. Used in libm:expf,sinfimplementations via Remez.
3. Quadrature (numerical integration)
3.1 Newton-Cotes (closed)
Evaluate at equispaced nodes; integrate the interpolating polynomial.
| Rule | Order | Formula on [a,b] with h = b-a |
|---|---|---|
| Midpoint | 2 | h · f((a+b)/2) |
| Trapezoidal | 2 | (h/2)(f(a) + f(b)) |
| Simpson 1/3 | 4 | (h/6)(f(a) + 4f((a+b)/2) + f(b)) |
| Simpson 3/8 | 4 | uses 4 points |
| Boole’s | 6 | uses 5 points |
Composite versions on subintervals. Newton-Cotes of degree ≥ 8 have negative weights — avoid.
3.2 Romberg integration
Werner Romberg 1955: Richardson extrapolation applied to the trapezoidal rule. Halving the step and combining cancels successive even-power error terms — error O(h^{2k}) after k extrapolations on smooth integrands.
3.3 Gaussian quadrature
n-point Gauss quadrature is exact for polynomials of degree 2n - 1. Nodes = roots of orthogonal polynomial; weights = integrals of Lagrange basis.
| Family | Interval | Weight |
|---|---|---|
| Gauss-Legendre | [-1, 1] | 1 |
| Gauss-Chebyshev | [-1, 1] | 1/sqrt(1 - x²) |
| Gauss-Hermite | (-∞, ∞) | e^{-x²} |
| Gauss-Laguerre | [0, ∞) | e^{-x} |
| Gauss-Jacobi | [-1, 1] | (1-x)^α (1+x)^β |
Gauss-Kronrod (Aleksandr Kronrod 1965) adds n + 1 points to an n-point Gauss rule to create a (2n+1)-point rule of order 3n + 1 — gives a free error estimate from the difference. Foundation of QUADPACK.
3.4 Adaptive quadrature
Subdivide intervals where the error estimate is large.
- Adaptive Simpson (Lyness 1969; Gander-Gautschi “Adaptive quadrature — revisited” BIT 40, 2000 — modern fix to long-standing bugs).
- Adaptive Gauss-Kronrod (QUADPACK, Piessens-de Doncker-Kapenga-Überhuber 1983):
QAG,QAGS(with end singularities),QAGI(infinite interval),QAWO(oscillatory).scipy.integrate.quadwraps QUADPACK.
3.5 Tanh-sinh (double exponential)
Hidetoshi Takahasi & Masatake Mori 1974: change of variable x = tanh((π/2) sinh(t)) makes the integrand decay double-exponentially at the endpoints. Then trapezoidal rule on the t-line is near-optimal, and endpoint singularities are absorbed. Often the right choice for ∫_0^1 f(x) dx with f exhibiting boundary singularity. Used in arbitrary-precision packages (mpmath).
3.6 Multi-dimensional
- Product Gauss:
n^dpoints — curse of dimensionality. - Sparse grids (Smolyak 1963):
O(n (log n)^{d-1})points; works for moderate d (~10). - Quasi-Monte Carlo:
- Halton (1960) sequences.
- Sobol (Soviet, 1967) sequences — most common.
- Niederreiter (t,m,s)-nets (1992 book Random Number Generation and Quasi-Monte Carlo Methods).
- Convergence
O((log N)^d / N)— beats Monte Carlo for smooth integrands.
- Monte Carlo:
O(N^{-1/2})regardless of d — best when d very large or integrand rough. See mcmc-sampling. - Importance sampling, stratified, control variates — variance reduction.
4. Root-finding
4.1 Univariate f(x) = 0
| Method | Convergence | Notes |
|---|---|---|
| Bisection | linear (O(log 1/ε)) | Bracket required; never fails to converge |
| False position (regula falsi) | superlinear with care | Can stall (Anderson-Björck modification) |
| Secant | order (1+√5)/2 ≈ 1.618 | No derivative |
| Newton-Raphson | quadratic | Need f'; can diverge or cycle |
| Halley | cubic | Need f'' |
| Brent (1971 R.P. Brent) | superlinear, fail-safe | Combines bisection + IQI + secant; default in scipy.optimize.brentq |
| Ridders | quadratic, fail-safe | Bracketed; uses exponential transform |
| TOMS 748 (Alefeld-Potra-Shi 1995) | superlinear | Brent successor |
| ITP (Oliveira-Takahashi 2020) | optimal | Bracket + ITP; converges in fewer iterations than Brent |
Newton: x_{n+1} = x_n - f(x_n)/f'(x_n). Quadratic when starting near simple root; can completely fail (cycles, divergence to ∞, multiple roots).
4.2 Multivariate nonlinear
F: R^n → R^n, F(x) = 0.
- Newton:
x_{n+1} = x_n - J(x_n)^{-1} F(x_n). Quadratic locally; needs Jacobian. - Quasi-Newton:
- Broyden 1965 (rank-1 update of J^{-1}).
- BFGS for optimisation (gradient of objective ≡ zero).
- Anderson acceleration (Donald Anderson 1965 JACM): mixes m previous fixed-point iterates
g(x) = xvia least squares; widely used for SCF in electronic structure. - Trust region with Levenberg-Marquardt (Levenberg 1944, Marquardt 1963) for least squares — interpolates Gauss-Newton and gradient descent.
- Continuation / homotopy (Allgower-Georg 1990): solve
F(x; λ) = 0along a path from easyλ=0to hardλ=1.
4.3 Polynomial roots
- Companion matrix eigenvalues: stable, what
numpy.rootsdoes. CostsO(n³)for degree-n. - Jenkins-Traub algorithm (1970 ACM TOMS) — high-quality, still in many libraries.
- Durand-Kerner (Weierstrass-Kerner) — simultaneous iteration on all roots.
- Aberth-Ehrlich — better than Durand-Kerner.
5. Linear systems
A x = b, A ∈ R^{n×n} (or C^{n×n}).
5.1 Conditioning
Condition number: κ(A) = ||A|| · ||A^{-1}||. For ||·||_2, κ_2(A) = σ_max(A) / σ_min(A) (ratio of extreme singular values).
Forward error: ||δx||/||x|| ≲ κ(A) · ||δb||/||b|| — small perturbations amplified by κ.
Hilbert matrix H_{ij} = 1/(i+j-1): κ_2(H_n) ≈ 10^n. Ill-conditioned by construction.
5.2 Direct methods
Gaussian elimination + LU factorisation: A = P L U (P permutation from pivoting). Cost: (2/3)n³ flops. Stable with partial pivoting (row swap by max in column). Full pivoting (row + col swap) more stable but rarely needed.
Cholesky (symmetric positive definite): A = L L^T, cost (1/3)n³. No pivoting needed.
LDL^T: symmetric indefinite (A = LDL^T, D block-diagonal). Bunch-Kaufman pivoting (1977).
QR factorisation: A = QR with Q orthogonal, R upper triangular. Methods:
- Householder reflections (Householder 1958): most stable; standard.
- Givens rotations: useful for sparse / structured.
- Modified Gram-Schmidt: numerically stabler than classical GS; not as stable as Householder.
SVD: A = UΣV^T. Cost O(n³) constants larger than LU. Most robust for rank-deficient / ill-conditioned least-squares. Computed via bidiagonalisation + QR or divide-and-conquer.
5.3 Banded and structured
- Thomas algorithm (tridiagonal):
O(n)direct solve. Standard for 1D BVPs. - Block tridiagonal generalisation.
- Banded LU (LAPACK
dgbsv): O(n · b²) where b = bandwidth.
5.4 Sparse direct
For sparse SPD or general sparse, fill-in is the enemy. Reorder to minimise fill, then factor:
| Reordering | Notes |
|---|---|
| RCM (Reverse Cuthill-McKee 1969) | Bandwidth reduction |
| AMD (Approximate Minimum Degree, Amestoy-Davis-Duff 1996) | Fill-in heuristic |
| METIS (Karypis-Kumar 1998) | Nested dissection |
| Scotch (Pellegrini, Bordeaux) | Alternative partitioning |
Sparse direct solvers:
- UMFPACK (Tim Davis, Texas A&M;
scipy.sparse.linalg.spsolvebackend on Unix). - SuperLU (Demmel-Li-Eisenstat-Liu, Berkeley/LBL).
- PARDISO (Schenk-Gärtner, Basel; included in Intel MKL).
- MUMPS (Multifrontal Massively Parallel Sparse, Amestoy-Buttari-Guermouche-L’Excellent-Mary).
- CHOLMOD (Davis) — sparse Cholesky.
- WSMP (IBM Watson).
5.5 Iterative methods (stationary)
x^{k+1} = M^{-1}(N x^k + b) from splitting A = M - N.
| Method | Update |
|---|---|
| Jacobi | x_i^{k+1} = (b_i - Σ_{j≠i} a_{ij} x_j^k) / a_{ii} |
| Gauss-Seidel | Use latest values within sweep |
| SOR (Successive Over-Relaxation) | Weight ω between G-S and prior |
| SSOR | Symmetric SOR |
David Young’s 1954 PhD: optimal ω for SOR on Poisson grid is 2/(1 + sin(πh)) — twice the iterations of CG. Mostly historical now, except as smoothers in multigrid.
5.6 Krylov subspace methods
Build K_m(A, r_0) = span{r_0, A r_0, ..., A^{m-1} r_0} and project.
| Method | A type | Inventors |
|---|---|---|
| Conjugate Gradient (CG) | SPD | Hestenes-Stiefel 1952 |
| MINRES | symmetric indefinite | Paige-Saunders 1975 |
| SYMMLQ | symmetric indefinite | Paige-Saunders 1975 |
| GMRES | general | Saad-Schultz 1986 |
| BiCG | general | Fletcher 1976 |
| BiCGStab | general | van der Vorst 1992 |
| CGS | general | Sonneveld 1989 |
| IDR(s) | general | Sonneveld-van Gijzen 2008 |
| QMR | general | Freund-Nachtigal 1991 |
| TFQMR | general | Freund 1993 |
| LSQR | least squares | Paige-Saunders 1982 |
| LSMR | least squares | Fong-Saunders 2011 |
Yousef Saad’s Iterative Methods for Sparse Linear Systems (SIAM 2003) is the standard reference. Software: PETSc KSP, Trilinos Belos, hypre KSP, SciPy scipy.sparse.linalg.
Restart: GMRES storage and orthogonalisation cost grow with iteration; use GMRES(m) — restart every m iterations. Loses optimality but bounds storage.
5.7 Preconditioning
Solve M^{-1} A x = M^{-1} b with M an approximation to A that is cheap to apply.
| Preconditioner | Notes |
|---|---|
| Diagonal (Jacobi) | Cheapest; useful for diagonally dominant |
| ILU(k) / ILUT (Saad) | Incomplete LU; threshold or level-of-fill |
| IC(k) (incomplete Cholesky) | SPD analogue |
| Block-diagonal | Saddle-point systems |
| Multigrid (geometric or algebraic) | O(N) for elliptic; see ode-pde-solver-catalog |
| Domain decomposition (Schwarz, FETI, BDD) | Parallel scalability |
| AINV (approximate inverse) | Useful when factorisation hard |
| Sparse approximate inverse (SPAI) | Grote-Huckle 1997 |
| Polynomial (Chebyshev, Neumann) | Matrix-free; combines well with MG |
5.8 Eigenvalue problems
Computing eigenvalues / eigenvectors of A.
| Method | Use |
|---|---|
| Power iteration | Single dominant eigenvalue |
| Inverse iteration | Eigenvalue near a shift μ |
| Rayleigh quotient iteration | Cubic convergence near simple eigenvalue (SPD) |
| QR algorithm (Francis-Kublanovskaya 1961) | Dense small/medium |
| Hessenberg reduction → QR | Cost O(n³) |
| Symmetric tridiagonal QR | LAPACK dsyev |
| Divide-and-conquer | LAPACK dsyevd |
| MRRR (Multiple Relatively Robust Representations) | Dhillon-Parlett 2004 dsyevr |
| Lanczos (1950) | Large sparse symmetric |
| Arnoldi (1951) | Large sparse general |
| Implicitly Restarted Arnoldi | ARPACK |
| Krylov-Schur (Stewart 2001) | Improvement of IRA |
| Jacobi-Davidson (Sleijpen-van der Vorst 1996) | Selected interior eigenvalues |
| LOBPCG (Knyazev 2001) | Largest few SPD eigenpairs |
| FEAST (Polizzi 2009) | Contour integral; eigenvalues in interval |
See eigenvalue-problems for theory.
6. Least squares
Minimise ||Ax - b||_2² for A ∈ R^{m×n}, m > n.
- Normal equations:
A^T A x = A^T b. Cheap (A^T Aisn×n), but condition number squared — never use for ill-conditioned A. - QR:
A = QR_1, solveR_1 x = Q^T b. Standard. - SVD:
A = UΣV^T,x = V Σ^+ U^T b. Most robust; handles rank-deficient. - Iterative: LSQR, LSMR for sparse / large.
- Regularised:
- Tikhonov:
min ||Ax - b||² + λ ||x||²(= ridge regression). Closed formx = (A^T A + λI)^{-1} A^T b. - Truncated SVD: drop singular values below threshold.
- L1 / LASSO (Tibshirani 1996):
min ||Ax - b||² + λ ||x||_1. Sparse solutions.
- Tikhonov:
6.1 Nonlinear least squares
Minimise (1/2) Σ r_i(θ)².
- Gauss-Newton:
θ_{n+1} = θ_n - (J^T J)^{-1} J^T r. Linearise residuals. - Levenberg-Marquardt:
(J^T J + λ I) δθ = -J^T r. λ adaptive; trust-region interpretation. Standard for nonlinear fitting (MINPACK,scipy.optimize.least_squares). - Powell’s dogleg trust-region.
- Trust-region reflective (Coleman-Li 1996) — handles bounds.
7. Approximation theory
7.1 Chebyshev approximation
For continuous f: [-1, 1] → R, the truncated Chebyshev series
f(x) ≈ Σ_{k=0}^N c_k T_k(x) T_k(cos θ) = cos(kθ)
with c_k = (2/π) ∫_{-1}^1 f(x) T_k(x) / sqrt(1 - x²) dx is near-optimal in L^∞. Truncation error decays geometrically for analytic f. Foundation of Chebfun.
7.2 Best approximation
The n-th best polynomial approximation in L^∞ to a continuous function on [a,b] exists, is unique, and is characterised by the equioscillation theorem (Chebyshev): it equioscillates at ≥ n+2 points with alternating sign.
Remez exchange algorithm: iterative algorithm converging to the equioscillating polynomial. Used in libm and DSP filter design.
7.3 Padé approximants
Rational approximation R_{m,k}(x) = P_m(x) / Q_k(x) matching the Taylor series of f to order m+k. Often dramatically better than polynomials, especially for functions with poles. Henri Padé’s thesis 1892.
7.4 Norms
| Norm | Best approximation algorithm | Notes |
|---|---|---|
L^∞ (minimax) | Remez | Equioscillation |
L^2 | Orthogonal projection | Closed form |
L^1 | Linear programming | Robust to outliers; non-unique |
8. Software ecosystem
8.1 Reference libraries
| Library | Domain | Notes |
|---|---|---|
| BLAS | Dense linear algebra primitives | Levels 1/2/3; Reference (Lawson-Hanson 1979 / Dongarra et al. 1988/1990) |
| LAPACK | Dense linear algebra | Anderson-Bai-Bischof-… 1990 → current. LU, QR, SVD, eigenvalues. Calls BLAS |
| OpenBLAS | BLAS impl | Xianyi Zhang; descendant of GotoBLAS (Kazushige Goto) |
| Intel MKL / oneMKL | BLAS+LAPACK+more | Intel; tuned for x86 |
| AMD AOCL / BLIS | BLAS+LAPACK | AMD; BLIS framework (Field van Zee + FLAME, UT Austin) |
| ATLAS | BLAS auto-tuning | Whaley; older but still used |
| ARM Performance Libraries | BLAS+LAPACK | ARM |
| ScaLAPACK | Distributed dense | LAPACK + MPI |
| PLASMA / MAGMA | Multicore / GPU dense | Innovative Computing Lab, Tennessee |
| SLATE | Software for Linear Algebra Targeting Exascale | Successor to ScaLAPACK + MAGMA |
8.2 Sparse and iterative
- SuiteSparse (Tim Davis) — UMFPACK, CHOLMOD, KLU, SPQR.
- SuperLU (Demmel-Li, LBL).
- PARDISO (Schenk-Gärtner; in Intel MKL).
- MUMPS (Amestoy et al.).
- WSMP (IBM).
- hypre (LLNL) — BoomerAMG, KSP.
- PETSc (Argonne) — KSP, SNES, TS, DM stack.
- Trilinos (Sandia) — Tpetra, Belos, Ifpack2, MueLu, ROL.
8.3 Eigenvalue
- ARPACK (Lehoucq-Sorensen-Yang 1998) — implicitly restarted Arnoldi.
- SLEPc (Roman-Hernandez, Valencia) — built on PETSc.
- Anasazi (Trilinos) — block Krylov-Schur, LOBPCG.
- FEAST (Polizzi).
- PRIMME (Stathopoulos-McCombs 2010).
8.4 High-level numerical
| Tool | Language | Notes |
|---|---|---|
| NumPy | Python | Array stdlib; wraps BLAS/LAPACK |
| SciPy | Python | scipy.linalg, scipy.sparse.linalg, scipy.optimize, scipy.integrate, scipy.interpolate |
Julia stdlib LinearAlgebra | Julia | Wraps BLAS/LAPACK; \ operator |
| MATLAB | proprietary | Cleve Moler 1984+; numerical workhorse |
| Eigen | C++ header-only | Guennebaud-Jacob; very ergonomic |
| Armadillo | C++ | Conrad Sanderson; wraps BLAS/LAPACK |
| GSL (GNU Scientific Library) | C | General-purpose |
| NAG Numerical Library | proprietary | Numerical Algorithms Group, Oxford |
| Boost.Math | C++ | Special functions, distributions |
| JAX | Python | Google; AD + GPU + JIT |
| TensorFlow Probability | Python | Optimisers, distributions |
| Numba | Python | LLVM JIT for NumPy-style code |
8.5 Symbolic + arbitrary precision
- mpmath (Fredrik Johansson) — Python arbitrary-precision.
- SymPy — Python symbolic.
- MPFR / MPC / GMP — C arbitrary-precision (Granlund et al.).
- Arb (Fredrik Johansson) — interval arithmetic.
- Mathematica, Maple — proprietary CAS.
9. Stability and conditioning checklist
- Compute κ(A) before solving (or estimate via LU-based condition estimator —
dgeconin LAPACK). - Regularise if ill-conditioned (Tikhonov, truncated SVD).
- Iterative refinement:
x ← x + δxwithA δx = b - Axcomputed at higher precision — recovers digits. - Pivot — never run LU without partial pivoting on a real problem.
- Use the right factorisation: QR for least squares, Cholesky for SPD, LDL^T for symmetric indefinite, SVD when you need null space.
- Watch for cancellation in subtractions of near-equal quantities; reformulate.
- Kahan-sum when summing many small numbers, especially in single precision.
- Backward error: a useful algorithm produces
x̂that is the exact solution to(A + ΔA)x̂ = b + Δbwith||ΔA||/||A||,||Δb||/||b||of order ε. Forward error then bounded byκ(A) · ε.
10. Reference texts
- Trefethen & Bau, Numerical Linear Algebra, SIAM 1997.
- Golub & Van Loan, Matrix Computations, JHU Press, 4th ed. 2013.
- Higham, Accuracy and Stability of Numerical Algorithms, SIAM 2nd ed. 2002.
- Saad, Iterative Methods for Sparse Linear Systems, SIAM 2nd ed. 2003.
- Demmel, Applied Numerical Linear Algebra, SIAM 1997.
- Press, Teukolsky, Vetterling, Flannery, Numerical Recipes — pragmatic; some criticised algorithms; use as overview, cross-reference others.
- Quarteroni, Sacco, Saleri, Numerical Mathematics, Springer 2007.
- Stoer & Bulirsch, Introduction to Numerical Analysis, Springer.
- Davis, Direct Methods for Sparse Linear Systems, SIAM 2006.
- Wilkinson, The Algebraic Eigenvalue Problem, Clarendon 1965 — historical foundation.
11. Adjacent
- ode-pde-solver-catalog — companion catalogue: ODE/PDE time-stepping and PDE discretisations using these solvers.
- numerical-linear-algebra — dense linear algebra at greater theoretical depth.
- eigenvalue-problems — Krylov, Jacobi-Davidson, FEAST.
- svd-pca-spectral — SVD and its applications.
- optimization-algorithm-taxonomy — nonlinear least squares and broader optimisation.
- blas-lapack — implementation hierarchy of dense kernels.