System Identification — Engineering Reference

1. At a glance

System identification (sysid) is the empirical determination of a mathematical model of a dynamic system from measured input and output data. Where physics-based modelling derives $G (s)$ from first principles (Newton’s laws, KVL, energy balances), sysid fits a parametric or non-parametric model to time-series records of $u (t)$ and $y (t)$ collected from the real plant. The two activities are complementary — first-principles models supply structure, sysid supplies the numbers.

Two phases organise every project:

Experimental design — pick the input excitation, the sample rate, the operating point(s), the open-loop or closed-loop configuration, and the data length. A bad experiment kills the identification before the math begins.
Estimation — fit a model class (ARX, ARMAX, OE, BJ, state-space, grey-box) to the data, then validate residuals, cross-correlate against held-out data, and check structural identifiability.

Outputs span the modelling vocabulary: transfer functions $G (s)$ for classical-control loop tuning, state-space realisations $(A, B, C, D)$ for state-space-methods / LQR / Kalman work, frequency response functions (FRFs) for vibration-dynamics modal analysis, and structured grey-box models for digital twins and diagnostics. The same machinery underlies vehicle handling characterisation, biological (insulin-glucose, respiratory) modelling, structural modal testing on bridges and aircraft, MPC commissioning on refineries, and motor parameter estimation on factory floors.

Where it sits in the design stack: first-principles modelling → sysid → control synthesis (classical-control, state-space-methods, MPC) → deployment → online refinement (adaptive-control). Sysid drives commissioning time on process plants from weeks to days — a Honeywell or Aspen APC team will routinely fit thirty-plus FOPDT and state-space models in a fortnight where pure trial-and-error tuning would take a quarter.

2. Why it matters

Most plants do not arrive with a model. The first-principles description either does not exist (a paper machine’s wet end, a fermentation reactor, a vehicle suspension on uncertain road), is too complex to be useful for control design (full Navier-Stokes for an HVAC zone), or contains parameters that change with operating point (motor inductance with magnetic saturation, distillation gain with feed composition).

Without sysid, the engineer falls back on:

Trial-and-error PID tuning — workable for simple FOPDT-like loops, dangerous for safety-critical or coupled systems.
Worst-case overdesign — robust controllers that sacrifice performance for ignorance.
Blind acceptance of vendor numbers — motor nameplate parameters drift 10–30 % from datasheet across temperature, load, and aging.

With sysid, the engineer gets:

Predictable design — controller designed against an explicit model has bounded behaviour.
Performance traceability — when a loop misbehaves, the model + identification data say whether the plant changed, the actuator changed, or the controller tuning is stale.
Diagnostics — drift in identified parameters is an early warning of equipment degradation (bearing wear, fouling, valve stiction).
Model-based control — MPC, robust H∞, adaptive control, and digital twins all require an explicit model. No model, no MPC.

3. First principles

3.1 Model class taxonomy

Class	Form	Solver	Best for
ARX	$A (q) y = B (q) u + e$	Linear least-squares (closed form)	Quick first-cut, online RLS
ARMAX	$A (q) y = B (q) u + C (q) e$	Iterative PEM	Coloured output noise
OE (output-error)	$y = [B (q) / F (q)] u + e$	Iterative PEM	Noise on output sensor only
BJ (Box-Jenkins)	$y = [B / F] u + [C / D] e$	Iterative PEM	Independent process / measurement noise paths
State-space	$\overset{x}{˙} = A x + B u + w, y = C x + D u + v$	Subspace (N4SID, MOESP) or PEM	MIMO, modal, observer-based control
FRF (non-parametric)	$\hat{G} (jω) = Φ_{y u} / Φ_{uu}$	Welch / cross-spectral density	Vibration, structural, no model order chosen
Impulse / step response	$g (t)$ tabulated	Correlation or direct measurement	Quick gain / dead-time check
Grey-box	$\overset{x}{˙} = f (x, u; θ)$ structure known	Nonlinear PEM / MLE	Parameter ID under known physics
Volterra / Wiener-Hammerstein	static nonlinearity + LTI dynamic	Iterative; series expansion	Mild nonlinearity, saturating actuators
Neural / Gaussian process	$y = f_{θ} (regressors)$	Backprop / GP posterior	Strongly nonlinear, data-rich

$q$ is the forward shift operator ( $q y_{k} = y_{k + 1}$ ); $q^{- 1}$ is the backward shift. The polynomial $A (q) = 1 + a_{1} q^{- 1} + \dots + a_{n_{a}} q^{- n_{a}}$ encodes the autoregressive (denominator) dynamics; $B (q) = b_{1} q^{- 1} + \dots + b_{n_{b}} q^{- n_{b}}$ encodes the exogenous (numerator) dynamics; $e_{k}$ is the innovation (one-step prediction error).

3.2 Prediction-error method (PEM)

Ljung’s unifying framework: every parametric model class is fit by minimising the squared one-step prediction error.

$\hat{θ}_{N} = ar g min_{θ} \frac{1}{N} \sum_{k = 1}^{N} ε_{k} (θ)^{2}, ε_{k} (θ) = y_{k} - \overset{y}{^}_{k} (θ)$

For ARX, $\overset{y}{^}_{k} (θ)$ is linear in $θ$ and the global minimum is found by ordinary least squares in one matrix solve. For ARMAX, OE, BJ, and state-space the predictor is nonlinear in $θ$ and the optimisation runs as Gauss-Newton or Levenberg-Marquardt with multiple restarts to escape local minima. PEM has the desirable asymptotic property of consistency: as $N \to \infty$ , $\hat{θ}_{N} \to θ^{*}$ in probability provided the model class contains the true system and the input is persistently exciting (§4.2).

3.3 Subspace methods (4SID family)

The subspace-identification family (4SID — Subspace State Space System IDentification) — N4SID, MOESP, CVA — bypasses the nonlinear PEM iteration by exploiting geometric structure in stacked Hankel matrices of past and future inputs and outputs. The algorithms recover $(A, B, C, D)$ directly via SVD-based projections (Van Overschee & De Moor 1996). Three advantages drive industrial adoption:

No iteration, no local minima — one pass through the data returns the state-space estimate.
Model order chosen by inspection of singular values (the “elbow” in the SV plot).
MIMO native — no special handling for multiple inputs and outputs.

Trade-offs: subspace estimates are consistent but typically less efficient (higher variance) than PEM. Standard industrial workflow is to seed PEM with the subspace solution — n4sid then refine with pem in MATLAB.

3.4 Maximum likelihood estimation (MLE)

Under Gaussian innovation assumption, PEM with squared error is identical to MLE. For non-Gaussian noise (heavy tails, multi-modal), MLE uses the negative log-likelihood directly. MLE is the right framework when noise statistics are well-characterised and the cost of computing the likelihood is acceptable.

3.5 Recursive estimation (RLS)

For online / adaptive use, the Recursive Least Squares algorithm updates $\hat{θ}_{k}$ from $\hat{θ}_{k - 1}$ on each new sample with $O (n^{2})$ flops:

$K_{k} = \frac{P _{k - 1} φ _{k}}{λ + φ _{k}^{T} P _{k - 1} φ _{k}}, \hat{θ}_{k} = \hat{θ}_{k - 1} + K_{k} (y_{k} - φ_{k}^{T} \hat{θ}_{k - 1}), P_{k} = \frac{1}{λ} (P_{k - 1} - K_{k} φ_{k}^{T} P_{k - 1})$

where $φ_{k}$ is the regression vector (past inputs and outputs for ARX) and $λ \in (0, 1]$ is the forgetting factor. $λ = 1$ retains all history (LS estimate); $λ = 0.98$ has effective memory $\sim 1/ (1 - λ) = 50$ samples — track plants drifting on a 50-sample timescale. RLS is the backbone of adaptive-control and self-tuning regulators (Åström & Wittenmark 1973). It is also the Kalman filter applied to the parameter vector — see bayesian-estimation for the unifying formulation.

3.6 Non-parametric methods

Frequency response function (FRF) estimation via cross-spectral density (Welch 1967):

$\hat{G} (jω) = \frac{Φ _{y u} ( jω )}{Φ _{uu} ( jω )}$

with $Φ_{y u}$ the cross-power spectral density of $y$ and $u$ , computed by Welch’s averaged-periodogram method (segment the record, window, FFT, average $∣ U ∣^{2}$ , $∣ U Y^{*} ∣$ ). Coherence $γ^{2} (ω) = ∣ Φ_{y u} ∣^{2} / (Φ_{uu} Φ_{yy})$ ranges 0–1 and indicates how much of $y$ is linearly explained by $u$ at each frequency. $γ^{2} < 0.8$ at a frequency means either noisy data, low excitation, or nonlinearity at that frequency. See signal-processing-dsp for spectral-estimation depth.

Impulse / step response estimation via direct measurement or via correlation analysis when the input is broadband. For a PRBS input with autocorrelation $R_{uu} (τ) \approx σ_{u}^{2} δ (τ)$ , the cross-correlation $R_{y u} (τ) \approx σ_{u}^{2} g (τ)$ — i.e. the impulse response falls out of the cross-correlation directly.

4. Experimental design

4.1 Input signal selection

Signal	Spectral character	Crest factor	When to use
Step	$1/ ω$ envelope; rich at low ω	High	FOPDT first cut; not great for high-frequency content
Impulse	flat in theory	Very high	Structural impact testing (modal hammer)
PRBS (max-length sequence)	Nearly flat to $f_{s} /3$	Very low (≈1.4)	Workhorse: rich, low-amplitude, repeatable
Multi-sine (Schroeder phase)	User-designed spectrum	Low (≈1.6 with Schroeder phasing)	Custom frequency emphasis; lab work
Chirp / linear sweep	Time-localised frequencies	Moderate	Modal sweep; vibration shaker
Logarithmic chirp	Equal energy per decade	Moderate	Acoustic / audio FRF
Filtered Gaussian noise	Tailorable	Moderate (≈3)	Servos, easy to generate
White noise	Flat (ideal)	Unbounded	Theoretical; PRBS is the practical proxy
Operational data (no test signal)	Whatever the plant sees	n/a	Operational Modal Analysis (OMA); ergodicity required

Crest factor = peak amplitude / RMS amplitude. Low crest factor packs more energy into the system without saturating actuators. PRBS at $\pm A$ has crest factor $1 = 1$ (peak = RMS = $A$ ) — the best you can do for a binary signal. Schroeder-phased multi-sine achieves crest factor $\approx 1.6$ for arbitrary spectra (Schroeder 1970).

4.2 Persistence of excitation

An input is persistently exciting (PE) of order $n$ if its autocorrelation matrix $R_{u} (n)$ is positive definite. Equivalently, $u_{k}$ contains at least $n$ distinct frequencies. Consistency of PEM requires PE order $\geq 2 n_{a}$ for ARX, $\geq n_{a} + n_{b}$ for ARMAX, and similarly stiffer requirements for higher-order models. A step is PE of order 1 — sufficient to identify a gain, insufficient for any dynamic model. A PRBS of register length $m$ is PE of order $2^{m} - 1$ — overkill for most plants.

4.3 Sample rate

Rule	Value	Reasoning
Minimum	$f_{s} \geq 2 f_{b w}$	Nyquist
Typical	$f_{s} = 10 f_{b w}$ to $30 f_{b w}$	Pole placement and zero estimation accuracy
Vibration / modal	$f_{s} = 5 f_{m a x, interest}$	Captures shape at highest mode
Process plant	$f_{s} = 5 τ^{- 1}$ at most	Higher sampling buys nothing on slow chemistry

Oversampling beyond $\sim 30 f_{b w}$ degrades identification: the discrete-time poles cluster near $z = 1$ where small estimation errors blow up in continuous-time parameters. Anti-alias filter is mandatory even for sysid — typically a Bessel or Butterworth low-pass at $0.3 f_{s}$ to $0.4 f_{s}$ , matched on all channels to avoid phase mismatch.

4.4 Data length

Plant	Typical record duration
Process (slow chemistry)	10–20× the slowest time constant; hours to days
Servo / motor	100–1000× the closed-loop time constant; seconds
Structural modal	200+ averages of $T = 1/Δ f$ where $Δ f$ = frequency resolution; minutes to hours
Vehicle handling	One full manoeuvre repeated 5–10 times; minutes

Insufficient data → high variance in $\hat{θ}$ . Excessive data → diminishing returns and operational risk (operating-point drift, equipment fatigue from the test signal).

4.5 Open-loop vs closed-loop identification

Open-loop (preferred when feasible): inject the test signal directly into the plant input with no controller in the loop. Clean separation between cause $u$ and effect $y$ .

Closed-loop (mandatory when plant is unstable or operationally cannot run open-loop — reactors, aircraft, nuclear): controller correlates $u$ with $e$ (the innovation), biasing ordinary least squares. Three approaches:

Direct method — use PEM on the closed-loop data, ignoring the controller. Works for ARMAX, BJ, state-space with appropriate noise modelling; biased for ARX and OE.
Indirect method — identify the closed-loop transfer function, then “unwrap” the controller analytically to get the plant. Requires exact knowledge of the controller.
Joint input-output method — treat $u$ and $y$ jointly as outputs driven by the reference and the noise; use a 2-step or projection method.

For closed-loop, inject a dither signal at the controller reference or as an exogenous addition to the controller output — adds excitation that is not a function of $y$ , restoring identifiability.

4.6 Operating-point planning

Nonlinear plants require multiple identification experiments at different operating points. A motor’s $K_{t}$ varies with current (magnetic saturation); a distillation column’s gains flip sign across the bubble point; a tank’s level dynamics change with valve position. Design of Experiments (Pukelsheim 1993) frames the operating-point grid as an optimisation problem: minimise expected variance of $\hat{θ}$ subject to operational constraints. For practical work, three operating points (low, nominal, high) cover most situations; gain-schedule the resulting models.

5p. Theory — estimation procedures

5.1 Linear least squares for ARX

ARX model: $A (q^{- 1}) y_{k} = B (q^{- 1}) u_{k} + e_{k}$ rearranges as

$y_{k} = - a_{1} y_{k - 1} - \dots - a_{n_{a}} y_{k - n_{a}} + b_{1} u_{k - d - 1} + \dots + b_{n_{b}} u_{k - d - n_{b}} + e_{k} = φ_{k}^{T} θ + e_{k}$

where $φ_{k} = [- y_{k - 1}, \dots, - y_{k - n_{a}}, u_{k - d - 1}, \dots, u_{k - d - n_{b}}]^{T}$ and $θ = [a_{1}, \dots, a_{n_{a}}, b_{1}, \dots, b_{n_{b}}]^{T}$ . Stacking $N$ samples:

$Y = Φ θ + E, \hat{θ}_{L S} = (Φ^{T} Φ)^{- 1} Φ^{T} Y$

Variance of the estimate (for white $e$ ): $Var (\hat{θ}) = σ_{e}^{2} (Φ^{T} Φ)^{- 1}$ . The matrix $(Φ^{T} Φ)^{- 1}$ is the inverse Fisher information; its diagonal is the parameter variance vector.

Numerical practice: never form $Φ^{T} Φ$ directly. Use QR factorisation $Φ = QR$ and solve $Rθ = Q^{T} Y$ — preserves precision when $cond (Φ)$ is large. MATLAB’s arx() and \ operator do this automatically.

5.2 Iterative PEM for non-ARX

Output-error: $\overset{y}{^}_{k} (θ) = [B (q) / F (q)] u_{k}$ — nonlinear in $F$ . The predictor gradient with respect to $θ$ involves a filtered regression vector, and the cost surface has local minima. Gauss-Newton with backtracking line search is the standard solver; MATLAB pem() uses a variant of Levenberg-Marquardt. Multiple random initial conditions are mandatory for orders above $\sim 5$ ; ten restarts and take the best is typical.

5.3 Subspace algorithm sketch (N4SID)

Build Hankel matrices of past inputs and outputs $U_{p}, Y_{p}$ and future inputs and outputs $U_{f}, Y_{f}$ from the data record.
Project $Y_{f}$ onto the row space of $[U_{p}; Y_{p}; U_{f}]$ — gives the “oblique projection” $O_{i} = Γ_{i} X$ where $Γ_{i}$ is the extended observability matrix and $X$ is the state sequence.
SVD of $O_{i}$ : keep the dominant singular values (the “elbow” picks the order $n$ ), truncate.
Recover $A, C$ from $Γ_{i}$ (shift trick), $B, D$ from least squares on the residuals.

The whole procedure is non-iterative and returns a consistent estimate of $(A, B, C, D)$ in a specific (numerically conditioned) state basis. The singular-value plot is one of the cleanest model-order selection tools available.

5.4 Kalman filter as recursive state-space identifier

For known $A, C$ and unknown state, the Kalman filter (see state-space-methods §9p and bayesian-estimation) is the optimal estimator. For partly unknown $A, B$ — the joint parameter-and-state estimation problem — the dual Kalman filter or augmented-state EKF treats parameters as slow states and estimates both. This is the technical foundation of online adaptive control.

5.5 Bayesian and Gaussian-process methods

Bayesian sysid (Peterka 1981; Pillonetto et al. 2014, “Kernel methods in system identification”) places a prior on $θ$ — a Gaussian prior on impulse-response coefficients with a “stable-spline” kernel is the modern standard (Chen, Ohlsson & Ljung 2012). Trades the discrete model-order choice for a continuous hyperparameter (kernel scale, decay rate) tuned by marginal-likelihood maximisation. Excellent at regularising ill-conditioned identifications with short data records.

Gaussian process regression for non-parametric ID treats $g (t)$ or the FRF directly as a GP, with kernel encoding stability and smoothness. Covered in bayesian-estimation.

6p. Model selection and validation

6.1 Order selection

Choosing $(n_{a}, n_{b}, n_{k})$ for ARX (or $n$ for state-space): four standard criteria.

Criterion	Formula	Behaviour
AIC (Akaike 1974)	$- 2 lo g L + 2 k$	Slight overfit; consistent for large $N$ at low SNR
BIC (Schwarz 1978)	$- 2 lo g L + k lo g N$	Stronger penalty; prefers smaller models
FPE (Akaike 1969)	$σ_{e}^{2} \cdot (N + k) / (N - k)$	Pre-AIC criterion; equivalent up to constants
Cross-validation	RMSE on held-out data	The gold standard; computationally heavier

$L$ = likelihood, $k$ = parameter count, $N$ = data length. Plot the criterion vs $n$ and pick the elbow; AIC’s elbow is often noisy, BIC is steeper and cleaner. Cross-validation always overrides AIC/BIC when records are long enough — split 70/30, fit on the first part, score on the second.

For subspace methods, the dominant-singular-value plot replaces AIC/BIC. The order is where the SVs drop by an order of magnitude.

6.2 Residual analysis

A correctly identified model leaves residuals $ε_{k}$ that are (i) zero-mean, (ii) white (uncorrelated in time), and (iii) uncorrelated with past inputs.

Test	What it checks	Pass criterion
Anderson whiteness test	$R_{ε} (τ)$ for $τ \neq = 0$	$
Ljung-Box Q test	Joint whiteness	$Q = N (N + 2) \sum_{k = 1}^{h} R_{ε} (k)^{2} / (N - k) < χ_{h - k}^{2}$
Cross-correlation $R_{ε u} (τ)$	Residuals vs past inputs	Within $1.96/ N$ confidence band
Normality (Jarque-Bera)	Gaussian innovations	$p > 0.05$
Frequency-domain check	Identified $\hat{G} (jω)$ vs Welch FRF	Visual overlay with coherence-weighted band

The cross-correlation test $R_{ε u}$ is the most discriminating in practice — significant correlation at lag $τ$ means the model is missing dynamics at that lag (often a wrong dead-time estimate or under-modelled order).

6.3 Cross-validation and held-out data

Split the data into estimation and validation sets. Fit on estimation; simulate the model open-loop on the validation input and compare to the validation output. Metrics:

NRMSE (normalised root-mean-square error) = $1 - ∥ y - \overset{y}{^} ∥/∥ y - \overset{y}{ˉ} ∥$ , ranges $(- \infty, 1]$ , with 1 = perfect, 0 = no better than mean. MATLAB compare() reports this as “fit %“.
VAF (variance accounted for) = $1 - var (y - \overset{y}{^}) / var (y)$ , ranges $(- \infty, 1]$ .

NRMSE > 0.85 (85 %) is a good fit; NRMSE > 0.95 (95 %) is excellent. Be wary of NRMSE on smooth data — it inflates artificially.

6.4 Model reduction

Identified models often have spurious states (numerical artefacts, fitting noise). Hankel-norm balanced truncation (Moore 1981, Glover 1984) reduces an $n$ -state realisation to an $r < n$ state realisation with bounded error:

$∥ G - G_{r} ∥_{\infty} \leq 2 \sum_{i = r + 1}^{n} σ_{i}^{H}$

where $σ_{i}^{H}$ are the Hankel singular values. MATLAB balred() and hankelsv() automate this. Cross-references: state-space-methods §3.5 for balanced realisations.

6.5 Identifiability

A model is structurally identifiable if distinct parameter vectors $θ_{1} \neq = θ_{2}$ produce distinguishable input-output behaviour. Test: rank of the Fisher information matrix $I (θ) = E [\nabla_{θ} lo g L \cdot \nabla_{θ} lo g L^{T}]$ must equal $dim (θ)$ . A rank deficiency reveals unidentifiable parameters — they can only be estimated as a combination. Common in grey-box: motor inertia $J$ and viscous friction $b$ are together identifiable from a step response, but separately only if a free-decay test (no input) is added.

7p. Worked examples

Example A — ARX identification of a DC motor

Plant. A brushed DC motor driving a fan: armature voltage to shaft speed, nominal transfer function $G (s) = 100/ (0.05 s + 1)$ rpm/V with 5 ms dead time. Sensor noise σ ≈ 5 rpm (a few percent of nominal).

Experiment. PRBS input, $\pm 2$ V amplitude, 100 Hz clock rate (matching the bandwidth). Sample at 1 kHz. Record 10 s = 10 000 samples. The PRBS register length 7 gives sequence period $2^{7} - 1 = 127$ chips — repeats 7.9× in the record, sufficient for averaging.

Model class. ARX(2, 2, 1): $n_{a} = 2$ , $n_{b} = 2$ , $n_{k} = 1$ delay. Six parameters.

MATLAB:

data = iddata(y, u, 0.001);          % 1 ms sample period
data_est = data(1:7000);             % first 70 % for estimation
data_val = data(7001:end);           % last 30 % for validation
sys = arx(data_est, [2 2 1]);
present(sys)
%   y(t) = -1.823 y(t-1) + 0.842 y(t-2)
%        + 0.0021 u(t-1) + 0.0019 u(t-2) + e(t)
%   FPE: 25.4   MSE: 25.1
resid(sys, data_val)                  % Anderson / cross-corr plots
compare(data_val, sys)                % NRMSE = 0.94 (94 % fit)

Validation. Residual whiteness: $∣ R_{ε} (τ) ∣ < 1.96/ 3000 = 0.036$ for all non-zero lags up to 50 — passes. Cross-correlation $R_{ε u}$ within band — passes. NRMSE = 94 % on held-out 30 % — passes.

Continuous-time recovery. d2c(sys) returns $G_{c} (s) = 95.3 e^{- 0.005 s} / (0.052 s + 1)$ — recovers the nominal gain ( $\hat{K} = 95$ vs true $100$ , 5 % error from finite data) and time constant ( $\overset{τ}{^} = 52$ ms vs $50$ ms, 4 % error). Both within the noise-limited Cramér-Rao bound predicted by the inverse Fisher information.

Example B — Subspace ID of a flexible-mode plant

Plant. Cart driving a flexible beam: rigid-body integrator plus a lightly-damped 80 Hz first mode. Continuous-time:

$G (s) = \frac{K _{m}}{s ^{2}} \cdot \frac{ω _{n}^{2}}{s ^{2} + 2 ζ ω _{n} s + ω _{n}^{2}}, K_{m} = 250 rad/ s^{2} V^{- 1}, ω_{n} = 2 π \cdot 80 = 503 rad/s, ζ = 0.02$

A 4-state system with one unstable double integrator and a lightly-damped complex pair. Encoder reads tip angle; sensor noise σ = 0.001 rad.

Experiment. Closed-loop with a tuned PID (the plant is open-loop unstable). Inject a 30 s chirp from 0.1 Hz to 200 Hz at the controller reference. Sample at 2 kHz. Record 60 000 samples.

Subspace fit — N4SID.

data = iddata(y, u, 5e-4);
% pick order from SV plot
[A, B, C, D, K, x0] = n4sid(data, 1:8);     % returns best order automatically
% n4sid selects n = 4 from elbow at sigma_4 / sigma_5 = 12
sys_ss = idss(A, B, C, D);
sys_c = d2c(sys_ss);                         % back to continuous time
eig(sys_c.A)
% -0.000+0.00j, -0.001+0.00j,            (the integrators)
% -10.06 + 502.4j, -10.06 - 502.4j        (the 80 Hz mode)

Reading the result. Eigenvalues match: two near-zero (rigid-body integrators), and a complex pair at $- 10.06 \pm 502.4 j$ corresponding to $ω_{n} = 502.5$ rad/s ( $\hat{f}_{n} = 79.97$ Hz, error 0.04 %) and $ζ = 10.06/502.5 = 0.0200$ — recovers the damping ratio to four significant figures. Comparison with a Welch FRF (overlay of $∣ G (jω) ∣$ from N4SID against $∣ \hat{Φ}_{y u} / Φ_{uu} ∣$ ): match within 0.5 dB across the 1–150 Hz band.

Example C — Grey-box brushed DC motor with five parameters

Physics. Standard motor model:

$V = R i + L \frac{d i}{d t} + K_{e} ω, τ = K_{t} i = J \frac{d ω}{d t} + b ω$

Unknown parameters: armature resistance $R$ (Ω), inductance $L$ (H), torque constant $K_{t} = K_{e}$ (N·m/A = V·s/rad), inertia $J$ (kg·m²), viscous friction $b$ (N·m·s).

Identifiability check. $K_{t}$ and $K_{e}$ are coupled by the back-EMF principle; assume $K_{t} = K_{e}$ . The two states $(i, ω)$ with input $V$ give a 2nd-order system — five parameters, four dynamic relations (the eigenvalues, gain, zero, and DC torque). Free-decay test (V = 0, set initial speed, watch coast-down) separates $J$ and $b$ . With the free-decay test plus a step response, all five parameters are identifiable.

Experiment. Two records:

Step test. Apply 12 V step, measure $ω (t)$ and $i (t)$ at 5 kHz for 1 s.
Coast-down. Drive to 100 rad/s, open the contactor, measure $ω (t)$ for 5 s at 1 kHz.

Grey-box fit (MATLAB idnlgrey or greyest).

% Define ODE structure in a function file motor_ode.m
function [dx, y] = motor_ode(t, x, u, R, L, Kt, J, b, varargin)
    % x = [i; omega]; u = V; y = [omega; i]
    di_dt = (u(1) - R*x(1) - Kt*x(2)) / L;
    domega_dt = (Kt*x(1) - b*x(2)) / J;
    dx = [di_dt; domega_dt];
    y = [x(2); x(1)];
end
 
init = struct('R', 2.0, 'L', 0.010, 'Kt', 0.07, 'J', 1e-4, 'b', 1e-3);
sys0 = idnlgrey('motor_ode', [2 1 2], init, [0; 0]);
sys_fit = nlgreyest(merge(step_data, coast_data), sys0);
present(sys_fit)
%   R  = 2.10  +/- 0.04 Ohm
%   L  = 12.0  +/- 0.3 mH
%   Kt = 0.080 +/- 0.001 N.m/A
%   J  = 2.0e-4 +/- 3e-6 kg.m^2
%   b  = 4.0e-4 +/- 1e-5 N.m.s

Validation. Simulate the fitted model on a third held-out record (a triangle-wave voltage input). NRMSE on speed = 98.2 %, on current = 96.5 %. Standard errors (from the inverse Fisher information matrix) are 1–5 % on every parameter — well-identified. Engineering judgement: quote $R = 2.10 \pm 0.04 Ω$ , not $R = 2.10416 Ω$ . The uncertainty is the engineering deliverable.

8p. Design heuristics

General

Start simple. ARX(2, 2, 1) before ARMAX(3, 3, 3, 1) before state-space(8). Add complexity only when residuals fail.
PRBS over step. Better information per joule of disturbance to the plant.
Detrend / mean-remove. Subtract steady-state means from $u$ and $y$ before fitting — sysid is about deviations, not absolute levels. Document the operating point.
Don’t oversample. $f_{s} \approx 10 f_{b w}$ is a sweet spot. Higher fights numerical conditioning.
Anti-alias filter every time. Even on “obviously digital” signals — sensor electronics have analog bandwidth.
Validate on held-out data, always. A model that fits estimation data perfectly and fails validation is overfit. NRMSE on estimation > NRMSE on validation by more than 5 % is a red flag.
Pure delay first. Run cross-correlation $R_{y u} (τ)$ before fitting; the peak lag tells you $n_{k}$ .

Closed-loop and adaptive

Dither at reference. Adds excitation that is not a function of $y$ — restores identifiability for any closed-loop method.
Forgetting factor by drift rate. $λ = 1 - T_{s} / τ_{d r i f t}$ where $τ_{d r i f t}$ is the parameter drift time constant. For a refinery this is hours; for a battery 10× shorter.
Covariance reset in RLS prevents the covariance matrix from collapsing during long quiescent periods (no excitation → $P$ shrinks → algorithm is blind to new dynamics). Reset $P \leftarrow P_{0}$ on a periodic schedule or after a detected change.

Frequency-weight the cost. For control synthesis, fit accurately in the control bandwidth; tolerate misfit at extreme frequencies. MATLAB’s tfest accepts a frequency-weighting filter.
OMA (operational modal analysis) for civil structures: no input measurement; assume ambient excitation is white. PolyMAX, SSI-COV, EFDD algorithms. Standard for bridges, wind turbines, buildings under operational load.

Nonlinear

Hammerstein (static nonlinearity + linear dynamics) for actuator nonlinearity: deadband, saturation. nlhw() in MATLAB.
Wiener (linear dynamics + static output nonlinearity) for sensor nonlinearity.
Volterra series (kernels $h_{1}, h_{2}, h_{3}, \dots$ ) for mild bilinear/cubic systems. Truncate at second order if data permits.
Neural-network sysid for strongly nonlinear high-dimensional plants (combustion, plasma). NARX networks, LSTM, transformer-based. Massive data appetite. See scientific Deep Learning Toolbox.
Gaussian process sysid for low-data nonlinear ID — gives a calibrated uncertainty along with the estimate.

9p. Edge cases & gotchas

Overfitting. Doubling the parameter count almost always lowers the estimation cost; the test is validation cost. AIC/BIC are partial defences, cross-validation is the strong one. A wildly oscillating high-order ARX fit on data that obviously came from a 2-pole plant has overfit.
Local minima in PEM. OE, BJ, and nonlinear grey-box have non-convex cost surfaces. The MATLAB pem default uses the subspace estimate as initial condition; for grey-box, try 5–10 random initialisations and keep the best. If results vary wildly across restarts, the problem is unidentifiable in practice.
Numerical conditioning. $Φ^{T} Φ$ is the inverse of the covariance; ill-conditioning $cond (Φ^{T} Φ) > 1 0^{10}$ means small data perturbations swing $\hat{θ}$ by huge amounts. Cures: (i) scale states / inputs to similar ranges, (ii) use orthogonal regressors (Laguerre filters), (iii) regularise — add $λ_{R} ∥ θ ∥^{2}$ penalty (ridge regression).
Identifiability failures. Two parameters always appear as a ratio — $J / b$ in §7p Example C without the coast-down — and you only ever recover the ratio. Diagnostic: Fisher information matrix rank check. Engineering fix: add an experiment that breaks the symmetry (free decay, different operating point).
Closed-loop bias. Direct OLS on closed-loop data is biased even asymptotically. ARX with the right noise structure may avoid this; OE almost certainly does not. When in doubt, use PEM with a noise model (ARMAX, BJ) or use the indirect method with a known controller.
Discretisation effects. Continuous-time parameters fit through a discretised model carry the discretisation method’s bias (ZOH, Tustin, impulse-invariant). If physical parameters matter, fit in continuous time (tfest(data, np, nz), ssest) or back-compute via d2c() with the same discretisation used in the experiment.
Mean / trend removal. Forgetting trend removal puts a slow drift into $y$ that the ARX integrator term spuriously absorbs, biasing $a_{1}$ toward 1 (an apparent integrator). Always pre-process — detrend() for piecewise-linear trends, high-pass for periodic drift.
High-frequency noise in derivatives. Grey-box models with $\overset{x}{˙}$ on the LHS require derivative estimates from noisy data — pure differentiation amplifies noise. Mitigations: (i) integrate the model, fit to $x$ not $\overset{x}{˙}$ ; (ii) low-pass with matched zero phase (Savitzky-Golay).
Disturbance correlated with input. If a measurable disturbance $d$ correlates with the test input (e.g. ambient temperature drift correlated with the diurnal cycle that the experiment runs over), $\hat{θ}$ is biased. Cure: include $d$ as a measured input in a multivariable model.
Nonlinear plant fit to a linear model. Returns a linear model that is “best” only at the operating amplitude of the experiment — useless at other amplitudes. Symptom: residuals depend on $∣ u ∣$ . Cure: characterise the nonlinearity (Hammerstein/Wiener/Volterra) or accept the gain-scheduled-linear model with documented validity range.
Sample-rate mismatch in MIMO. Channels sampled at slightly different times produce phase mismatches that distort the FRF. Use synchronous DAQ hardware (NI PXI, Beckhoff EL3xxx with distributed clocks); avoid software-multiplexed sampling for sysid.
Operating point drift during the experiment. A long PRBS at slow process timescales runs while the plant slowly drifts (catalyst aging, ambient temperature). The fit then represents an average over the drift trajectory. Cure: shorter experiments at well-controlled conditions, repeated; or model the drift explicitly with a slow time-varying parameter.

10p. Tools / software

Commercial flagship

MATLAB System Identification Toolbox (MathWorks) — the industry reference. Functions: iddata, arx, armax, oe, bj, tfest, ssest, n4sid, pem, idnlgrey, idnlarx, idnlhw, compare, resid, aic, fpe, spa (spectral analysis), etfe (empirical TF estimate). GUI: systemIdentification (interactive). Integrates with Simulink Control Design.
MATLAB Predictive Maintenance Toolbox — sits on top of SIT for diagnostic / prognostic identification.
AspenTech Aspen DMC3 — APC commissioning workflow, includes step-test design, FOPDT identification, MPC model build. De facto standard in refining.
Honeywell Profit Design Studio (Honeywell UOP) — same niche as DMC3, Honeywell-aligned DCS integration.
Yokogawa Exapilot / PRM — process-management modules with embedded sysid for Yokogawa CENTUM VP.
Emerson DeltaV InSight — relay-feedback identification and IMC tuning bundled with DeltaV DCS.
ABB OPTIMAX MPC — commissioning toolkit for the OPTIMAX MPC product line.
MATLAB Modal Toolbox / Siemens Simcenter Testlab (PolyMAX) — operational and experimental modal analysis. Industry standard for aircraft / automotive vibration.

Open source

SIPPY (System Identification Package for PYthon) — Python translation of much of MATLAB SIT. ARX, ARMAX, OE, BJ, N4SID, MOESP. https://github.com/CPCLAB-UNIPI/SIPPY.
python-control (https://python-control.readthedocs.io) — provides tfest, ssest-like wrappers via slycot.
scikit-learn — for static regression-based identification (Hammerstein static nonlinearity, kernel regression).
GPy / GPflow / scikit-learn GP — Gaussian-process system identification.
ControlSystemIdentification.jl (Julia) — modern Julia toolkit; subspace, PEM, frequency-domain. Tight integration with ControlSystems.jl.
SLICOT subroutines IB01xx family — Fortran kernels underneath most commercial subspace implementations.
PySID / SysIdentPy — additional Python toolkits with NARX, NARMAX support.
R: dse, MARSS, KFAS — statistical-modelling oriented; state-space + Kalman filtering.

PolyMAX (Siemens, in LMS Test.Lab / Simcenter Testlab) — operational modal analysis on full-scale structures.
ARTeMIS Modal (Structural Vibration Solutions) — SSI-COV and SSI-DATA implementations.
Brüel & Kjær PULSE — frequency-domain modal acquisition + FRF curve-fitting.
MEScope (Vibrant Technology) — animated mode shapes; rotating-machinery diagnostics.
ME’scopeVES — affordable MAC, MIF, FRF processing.

Hardware

National Instruments PXIe + LabVIEW Real-Time — high-channel-count synchronous DAQ; standard in aerospace and automotive sysid labs.
Brüel & Kjær LAN-XI — distributed DAQ for modal / acoustic.
HBM SoMat eDAQ — rugged in-vehicle DAQ for vehicle-handling identification.
dSPACE MicroAutoBox / MicroLabBox — rapid-prototype HIL for sysid + controller-in-the-loop iteration.

Industrial sysid services

Aspen Technology professional services — APC step tests and commissioning for major refiners.
PAS / Hexagon PPM — DCS-integrated sysid + loop-tuning audits.
Matrikon Asset Sentinel — loop performance monitoring with embedded relay-feedback identification.
ABB Loop Optimizer — DCS-agnostic loop tuning via embedded sysid.

11. Cross-references

classical-control — the consumer of FOPDT and frequency-response sysid output for PID tuning.
state-space-methods — the consumer of subspace / state-space sysid output for LQR / Kalman design.
signal-processing-dsp — Welch’s method, windowing, anti-aliasing, FFT; the spectral foundations of frequency-domain sysid.
adaptive-control — direct consumer of recursive identification; self-tuning regulators, MRAC, gain scheduling.
digital-control — discretisation (ZOH, Tustin) and sampled-data effects that determine what continuous-time parameters can be recovered.
mpc-control — MPC needs an explicit model; sysid is upstream of every industrial MPC deployment.
vibration-dynamics — modal analysis, EMA, OMA; the structural-dynamics specialisation of sysid.
electric-motors — grey-box parameter ID of motor electromagnetics (R, L, $K_{t}$ , $J$ , $b$ ).
circuit-analysis — RLC parameter identification by impedance measurement.
power-electronics — converter small-signal model ID from network-analyser sweeps.
dynamics-rigid-body — link inertia / friction parameter identification for manipulators.
bayesian-estimation — Kalman filter, RLS as Kalman, GP regression as Bayesian sysid; the probabilistic backbone.
bayesian-estimation — EKF / UKF / particle filter applications, dual estimation.
scientific — canonical environment for sysid workflows and code generation.

12. Citations

Ljung, L. (1999). System Identification: Theory for the User, 2nd ed. Prentice Hall PTR. The canonical PEM textbook; the source of the unifying framework used in this note.
Söderström, T. & Stoica, P. (1989). System Identification. Prentice Hall. The statistical-control-theoretic complement to Ljung; deeper on consistency and asymptotic distribution theory.
Verhaegen, M. & Verdult, V. (2007). Filtering and System Identification: A Least Squares Approach. Cambridge University Press. Modern matrix-algebra-first treatment; strong on subspace and total least squares.
Van Overschee, P. & De Moor, B. (1996). Subspace Identification for Linear Systems: Theory, Implementation, Applications. Kluwer. The canonical reference for N4SID and the 4SID family.
Pintelon, R. & Schoukens, J. (2012). System Identification: A Frequency Domain Approach, 2nd ed. IEEE Press / Wiley. The frequency-domain identification reference; multi-sine and errors-in-variables.
Ewins, D.J. (2000). Modal Testing: Theory, Practice and Application, 2nd ed. Research Studies Press. The canonical experimental modal analysis text.
Heylen, W., Lammens, S. & Sas, P. (1998). Modal Analysis Theory and Testing. KU Leuven. Practitioner reference for EMA and OMA.
Akaike, H. (1969). “Fitting autoregressive models for prediction.” Annals of the Institute of Statistical Mathematics, 21, 243–247. The FPE criterion.
Akaike, H. (1974). “A new look at the statistical model identification.” IEEE Trans. Automatic Control, 19(6), 716–723. The AIC criterion.
Schwarz, G. (1978). “Estimating the dimension of a model.” Annals of Statistics, 6(2), 461–464. The BIC criterion.
Ljung, G.M. & Box, G.E.P. (1978). “On a measure of lack of fit in time series models.” Biometrika, 65(2), 297–303. The Ljung-Box residual whiteness test.
Anderson, T.W. (1971). The Statistical Analysis of Time Series. Wiley. Anderson’s whiteness-of-residuals test.
Welch, P.D. (1967). “The use of fast Fourier transform for the estimation of power spectra.” IEEE Trans. Audio and Electroacoustics, 15(2), 70–73. Welch’s averaged-periodogram FRF estimation.
Schroeder, M.R. (1970). “Synthesis of low-peak-factor signals and binary sequences with low autocorrelation.” IEEE Trans. Information Theory, 16(1), 85–89. Schroeder-phased multi-sine.
Åström, K.J. & Wittenmark, B. (1973, 4th ed. 2008). Adaptive Control. Addison-Wesley / Dover. RLS, self-tuning regulators.
Goodwin, G.C. & Sin, K.S. (1984, reprinted 2009 Dover). Adaptive Filtering, Prediction and Control. Prentice Hall. The adaptive-control / RLS reference.
Doyle, J.C., Francis, B.A. & Tannenbaum, A.R. (1992, reprinted 2009 Dover). Feedback Control Theory. Macmillan. Robust control with sysid uncertainty bounds.
Pillonetto, G., Dinuzzo, F., Chen, T., De Nicolao, G. & Ljung, L. (2014). “Kernel methods in system identification, machine learning and function estimation: A survey.” Automatica, 50(3), 657–682. The modern Bayesian / kernel sysid survey.
Chen, T., Ohlsson, H. & Ljung, L. (2012). “On the estimation of transfer functions, regularizations and Gaussian processes — revisited.” Automatica, 48(8), 1525–1535. Stable-spline kernel for regularised impulse-response estimation.
Peterka, V. (1981). “Bayesian system identification.” Automatica, 17(1), 41–53. Foundational Bayesian-sysid paper.
Bittanti, S. & Picci, G. (1996). Identification, Adaptation, Learning: The Science of Learning Models from Data. Springer. NATO ASI proceedings; survey of state of the art.
Pukelsheim, F. (1993, reprinted SIAM 2006). Optimal Design of Experiments. Wiley. The design-of-experiments reference for sysid input optimisation.
Brincker, R. & Ventura, C. (2015). Introduction to Operational Modal Analysis. Wiley. The OMA standard reference for civil structures and wind turbines.
Peeters, B., Van der Auweraer, H., Guillaume, P. & Leuridan, J. (2004). “The PolyMAX frequency-domain method: a new standard for modal parameter estimation?” Shock and Vibration, 11(3–4), 395–409. The PolyMAX algorithm.
MATLAB System Identification Toolbox documentation (MathWorks, current release). Reference for arx, armax, oe, bj, tfest, ssest, n4sid, pem, idnlgrey, compare, resid.
IFAC SYSID Symposium proceedings (triennial: 2003, 2006, 2009, 2012, 2015, 2018, 2021, 2024). The community’s primary venue for sysid research.
IEEE Control Systems Society / IEEE Trans. Automatic Control — primary journal venue for sysid theory.
ASTM E2552-08 Standard Guide for Modeling the Aerodynamic Heating of Hypersonic Vehicles — example of a domain standard that explicitly invokes sysid for parameter recovery.
AspenTech Aspen DMC3 user guide (Aspen Technology, current revision). MPC commissioning workflow including step-test design and identification.
Honeywell Profit Design Studio reference manual (Honeywell UOP, current revision). Companion industrial workflow.

Compendium

Explorer

System Identification — Engineering Reference

System Identification — Engineering Reference

1. At a glance

2. Why it matters

3. First principles

3.1 Model class taxonomy

3.2 Prediction-error method (PEM)

3.3 Subspace methods (4SID family)

3.4 Maximum likelihood estimation (MLE)

3.5 Recursive estimation (RLS)

3.6 Non-parametric methods

4. Experimental design

4.1 Input signal selection

4.2 Persistence of excitation

4.3 Sample rate

4.4 Data length

4.5 Open-loop vs closed-loop identification

4.6 Operating-point planning

5p. Theory — estimation procedures

5.1 Linear least squares for ARX

5.2 Iterative PEM for non-ARX

5.3 Subspace algorithm sketch (N4SID)

5.4 Kalman filter as recursive state-space identifier

5.5 Bayesian and Gaussian-process methods

6p. Model selection and validation

6.1 Order selection

6.2 Residual analysis

6.3 Cross-validation and held-out data

6.4 Model reduction

6.5 Identifiability

7p. Worked examples

Example A — ARX identification of a DC motor

Example B — Subspace ID of a flexible-mode plant

Example C — Grey-box brushed DC motor with five parameters

8p. Design heuristics

General

Closed-loop and adaptive

Frequency-domain and modal

Nonlinear

9p. Edge cases & gotchas

10p. Tools / software

Commercial flagship

Open source

Modal / vibration

Hardware

Industrial sysid services

11. Cross-references

12. Citations

Graph View

Table of Contents

Backlinks