Proteomics and Mass Spectrometry — Deep Reference

A Tier 2 deep-dive into the instrumentation, acquisition modes, quantification chemistries, and software pipelines that define modern proteomic mass spectrometry. Complements proteomics-metabolomics-and-computational-neuroscience (the broader omics-and-comp-neuro bundle) — this note assumes you have the overview and goes hands-on with instrument hardware, fragmentation chemistry, isobaric labels, DIA acquisition, post-translational modifications (PTMs), and crosslinking-MS workflows. Where the parent note maps the landscape, this one drills into the engineering of each method.

See also


Bottom-up, top-down, middle-down

Bottom-up

The dominant proteomics workflow since ~2000. Pipeline:

  1. Lysis in chaotropic buffer (urea 8 M, GdnHCl 6 M, SDS 2-4% for tough samples) + protease inhibitors.
  2. Reduction of disulfides — DTT (5-10 mM, 56 °C, 30-60 min) or TCEP (5 mM, room T, 15 min).
  3. Alkylation of cysteines — iodoacetamide (15-50 mM, dark, 30 min, room T) → carbamidomethyl Cys (+57.02 Da). Avoid IAM > 60 min — overalkylation of Lys, His, Met. Alternatives: chloroacetamide (lower over-alkylation), N-ethylmaleimide, MMTS.
  4. Digestion by sequence-specific protease:
    • Trypsin (porcine, bovine; Promega Gold/MS Grade, Pierce, Roche) — cleaves C-terminal to K, R (not before P). Most common; produces ~7-25 aa peptides with C-terminal basic residue → favorable +2/+3 ionization. Lys-C + trypsin double-digest (Lys-C in 8 M urea first, then dilute to 2 M urea for trypsin) — improves yield of missed-cleavages-free peptides.
    • Lys-C — C-terminal to K only; complements trypsin.
    • Glu-C (V8) — C-terminal to E, D (in phosphate buffer) or E only (bicarbonate).
    • Asp-N — N-terminal to D (and sometimes E).
    • Chymotrypsin — C-terminal to aromatics F, W, Y, large hydrophobics L, M.
    • Arg-C — C-terminal to R only.
    • Elastase, Proteinase K — broad-specificity for top-down complement.
    • Trypsin/P — engineered to cleave even before P (resolves missed-cleavage problem for trypsin).
  5. Sample cleanup. SP3 (Hughes-Coleman 2014; carboxylate-coated magnetic beads), S-Trap (Protifi; SDS-tolerant), in-StageTip (iST, PreOmics), FASP (filter-aided sample prep; Wisniewski 2009 — used to be standard but largely replaced by SP3/iST).
  6. Peptide-level fractionation (optional, for deep proteomes). High-pH reversed phase, SCX (strong cation exchange), HILIC.
  7. LC-MS/MS. Nano-flow (300 nL/min) or micro-flow (1-5 µL/min) UHPLC; reversed-phase C18 column (50-250 cm; 1.7-3 µm particles).
  8. Database search + statistical post-processing.

Pros: high throughput, broad proteome coverage (~10k proteins / human cell line), robust quantification. Cons: peptide-level information loses isoform/PTM context; protein inference is statistical (razor/unique peptides).

Top-down

Direct analysis of intact proteins, no proteolysis. Requires:

  • Ionization. Native ESI (denaturing or non-denaturing) or MALDI.
  • High-resolution MS. Orbitrap (Eclipse, Exploris, Astral), FT-ICR (Bruker solariX 12-21 T; SciLifeLab 21 T). FTMS gives isotope-resolved peaks even for 50-kDa intact proteins; <2 ppm mass accuracy.
  • Fragmentation. ETD (electron transfer dissociation) and EThcD (ETD + HCD supplemental activation) preserve labile PTMs and give c/z ions for backbone coverage. UVPD (193 nm ultraviolet photodissociation; Brodbelt UT-Austin) — broad sequence coverage and PTM retention.

Leaders: Kelleher (Northwestern; Consortium for Top-Down Proteomics), Heck (Utrecht), Smith (PNNL, retired). Proteoform-level information: charge variants, truncation isoforms, PTM combinatorial states. Lower throughput than bottom-up; coverage limited to <50 kDa for routine work, though >100 kDa demonstrated with optimization.

Middle-down

Limited proteolysis to 5-20 kDa fragments. Bridges top-down (proteoform context) and bottom-up (sensitivity, depth). Used for antibody analysis (IdeS cleaves IgG at lower hinge → 25 kDa Fc + 50 kDa F(ab’)2), histone PTM combinatorial analysis (Garcia, U Penn).


Mass spectrometer instrumentation

Ionization

  • ESI — electrospray ionization (Fenn 1989, Nobel 2002). Liquid sprayed at +2-5 kV through tapered emitter → Taylor cone → solvent evaporates → multiply-charged protein/peptide ions. Nano-ESI (~10-30 nL/min through pulled-glass emitter or Newobjective PicoTip) provides ~10-100× sensitivity gain vs conventional ESI.
  • MALDI — matrix-assisted laser desorption/ionization (Karas-Hillenkamp 1988, Tanaka 1987; Nobel 2002 shared). Sample co-crystallized with UV-absorbing matrix (CHCA, DHB, SA); pulsed N₂ (337 nm) or solid-state UV laser; ions ablated and ionized. Primarily singly-charged. Faster than ESI for screening; less compatible with online LC; widely used for tissue imaging (MALDI imaging — Bruker rapifleX, timsTOF flex).
  • APPI, APCI. Photoionization, chemical ionization — for nonpolar small molecules.
  • DESI, LAESI, SIMS, MASSIA. Ambient ionization for tissue imaging.

Mass analyzers

  • Linear quadrupole. Mass filter; unit resolution; cheap and robust. Single-quad used in triple-quadrupole (Q1, Q3) flanking q2 collision cell — SRM/MRM workhorse. Sciex 6500+/7500, Thermo TSQ Altis/Quantis.
  • Time-of-flight (TOF). Ions accelerated to fixed kinetic energy; flight time ∝ √m/z. Reflectrons fold flight path, focus energy spread → resolution 30k-60k. Bruker timsTOF, Sciex ZenoTOF, Waters Synapt, Agilent 6500/6600. High duty cycle for full-spectrum acquisition.
  • Orbitrap. Makarov 2000. Ions trapped in axial electrostatic field around central spindle electrode; oscillation frequency ∝ √(k/m). FT transform of image current → ultra-high resolution (240k-1M at m/z 200 on high-end Orbitraps); 1-5 ppm mass accuracy routine. Thermo Q Exactive, Orbitrap Exploris (240, 480, MX), Eclipse Tribrid, Ascend Tribrid, Astral (2023 — combines Orbitrap + asymmetric track lossless ion trap → ~200 Hz MS2).
  • FT-ICR. Ion cyclotron in strong magnetic field (7-21 T); cyclotron frequency f = qB/(2π m) → ultra-high resolution (>1M at m/z 400). Bruker solariX, magnetic-RT high-Tc superconducting magnet refits emerging. Slow (multi-second scans); reserved for top-down, ultra-high-mass-accuracy applications.
  • Ion trap (3D Paul, 2D linear). Lower resolution than Orbitrap, but MSⁿ capability (n up to 5-10). Now mostly used as ion routers / fragmenters in hybrid systems.

Ion mobility

Separation by m/z and shape — gas-phase electrophoresis. Adds an orthogonal dimension to LC × MS.

  • DTIMS — drift tube IMS (Agilent). Long drift tube under low-field; collision cross section (CCS) measurable.
  • TWIMS — traveling wave IMS (Waters Synapt). Pulsed traveling wave; calibrated CCS.
  • TIMS — trapped IMS (Bruker timsTOF). Inverse drift tube — ions held against gas flow by electric field gradient. PASEF acquisition (Meier-Mann 2018) couples TIMS scanning with quadrupole isolation → 10-fold sensitivity gain.
  • FAIMS — high-field asymmetric waveform IMS (Thermo FAIMS Pro Duo). Filtering selectivity; reduces chemical noise; widely used as ion-mobility prefilter on Orbitraps for low-input proteomics.
  • SLIM — structures for lossless ion manipulations (PNNL). Cyclic 10-100 m drift paths; ultra-high IMS resolution (1000+) for isomer separation.

Fragmentation methods

  • CID — collision-induced dissociation. Ar or N₂ at moderate energy in linear ion trap (~30% NCE). Yields b- and y-ions for peptides. Mostly historical; superseded by HCD on Orbitraps.
  • HCD — higher-energy collisional dissociation (Olsen-Mann 2007). Higher-energy CID outside the trap (in the HCD cell). Yields b/y ions plus internal/immonium ions; better for low-mass-cutoff loss issues and for TMT-reporter ion detection (~120-135 Da). Default for tryptic peptide MS2 on Orbitraps.
  • ETD — electron transfer dissociation (Syka-Hunt 2004). Singly charged radical anion (typically fluoranthene radical) transfers electron to multiply charged peptide cation → backbone N-Cα cleavage → c/z ions. Preserves labile PTMs (phospho, glyco, ADP-ribose, ubiquitin chains). Requires +3 or higher charge for efficient ETD; +2 peptides need supplemental activation (EThcD).
  • EThcD. ETD + low-energy HCD supplemental activation; co-existing c/z + b/y ions; best sequence coverage.
  • UVPD — 193 nm ultraviolet photodissociation (Brodbelt UT-Austin). High-energy photons; broad ion-type production (a/x/b/y/c/z); excellent for top-down and disulfide-mapping.
  • ETciD, AI-ETD. ETD with collisional supplement; activated-ion ETD for top-down.
  • EAD — electron-activated dissociation (Sciex ZenoTOF 7600). Tunable electron energy; pseudo-ETD without protonation requirement; useful for glycoproteomics, lipidomics, PTMs.

Acquisition modes

DDA — data-dependent acquisition

The original shotgun workflow. MS1 survey scan (full mass range, e.g., 350-1500 m/z) → pick top N (typically 10-30) most-intense precursors → isolate each with quadrupole (1-2 m/z window) → fragment → MS2 spectrum. Loop. Dynamic exclusion (~20-30 s) prevents re-selecting the same peptide.

Advantages: spectra are clean (one peptide per MS2 ideally); database search is straightforward. Disadvantages: stochastic — only most-intense peptides at each duty cycle are sampled; missing values across runs (40-60% of features in any single sample); peak-picking biased to high-abundance proteins.

DIA — data-independent acquisition

Pioneered by Aebersold (Gillet et al. 2012 MCP — SWATH-MS for Sciex TripleTOF). MS1 → MS2 isolation windows of fixed width (10-25 m/z) across full range → fragment all precursors in each window → composite MS2 spectrum. Loop. Every precursor in the m/z range gets fragmented every cycle → no stochasticity, no missing values across runs.

Advantages: complete sampling; low missing-value rate (5-20%); excellent run-to-run reproducibility; suits clinical and population-scale work. Disadvantages: chimeric MS2 spectra (multiple co-eluting peptides in same isolation window); requires spectral library or library-free search; much heavier compute.

Variants:

  • SWATH (Sciex TripleTOF) — original DIA.
  • DIA on Orbitrap. Q Exactive HF, Exploris 480, Eclipse, Astral. Astral with 0.5-2 m/z windows + 200 Hz MS2 → near-DDA selectivity at DIA coverage.
  • PASEF-DIA / diaPASEF (Bruker timsTOF). DIA in ion-mobility dimension; sensitivity gain.
  • WiSIM-DIA, Scanning-SWATH, BoxCar-DIA. Variants optimizing different aspects.
  • µDIA, sciDIA, fastDIA. Method names for short gradient + narrow window combinations.

PRM / SRM — targeted

PRM (parallel reaction monitoring; Peterson-Mann 2012) — Orbitrap-based targeted method; isolate one precursor at a time, full MS2 scan. Quantify fragment ion intensity sum. SRM/MRM (selected/multiple reaction monitoring) — triple quadrupole; Q1 selects precursor m/z, q2 fragments, Q3 selects specific fragment m/z. Several transitions per peptide. Cheapest hardware for absolute targeted quant; gold standard for clinical assays (FDA bioanalytical method validation).

DDA-PASEF

Bruker timsTOF DDA with TIMS pre-fractionation. ~30 Hz MS2; deep proteomes in 30-90 min gradients.


Quantification chemistries

Label-free quantification (LFQ)

  • MS1 intensity-based. Integrate precursor extracted ion chromatogram (XIC); align across runs with retention-time normalization. MaxLFQ (Cox-Mann 2014), directLFQ (Spectronaut 18+), DIA-NN MaxLFQ. Cross-experiment normalization (median, quantile, LIMMA-based).
  • Spectral counting. Number of MS2 spectra assigned to a protein. Legacy; superseded by intensity-based.

Advantages: no labeling cost; scalable to thousands of samples; works on any input. Disadvantages: requires very tight chromatography reproducibility for cross-run comparison; less accurate at low end of dynamic range.

TMT — tandem mass tags

Thompson et al. 2003 (Proteome Sciences), commercialized by Thermo. Isobaric: all label variants have same total mass at MS1, but cleave during MS2 to release reporter ions at different m/z. Sample-multiplex quantification at the MS2 (or MS3) level.

TMT 6-plex → 10-plex → 11-plex → 16-plex (TMTpro 2020) → 18-plex (TMTpro 18plex 2024) → 35-plex announced 2025 (Pierce-Thermo TMTpro+).

Workflow: label each sample with one TMT channel → pool → fractionate → LC-MS/MS → identify peptides by MS2 sequence ions → quantify by reporter ion intensities in low-mass MS2 region.

Ratio compression problem: co-isolation of contaminant peptides in the isolation window contaminates reporter ions toward “1:1:1” baseline. Mitigations:

  • SPS-MS3 (synchronous precursor selection MS3; McAlister-Gygi 2014). Top N fragment ions from MS2 selected and re-fragmented in MS3; reporters now isolated from co-isolation contamination. Adds time per duty cycle.
  • TMTc, TMTc+. Complementary reporter ions at higher mass; less affected by low-mass MS2 noise.
  • FAIMS or TIMS pre-filtering. Reduces co-isolation.
  • Narrow isolation windows (0.4-0.7 m/z); requires high-resolution quadrupole.

iTRAQ (Sciex; 4-plex, 8-plex) is older, similar concept; mostly historical now.

SILAC

Mann lab 2002. Cells grown in media containing 13C/15N-labeled lysine and arginine (“heavy”) vs natural (“light”) media. Mix samples 1:1 after extraction; MS1 peak doublets (light + heavy) quantify in single LC run. Triple SILAC adds an intermediate (“medium”) label.

Pros: MS1-level quant (no reporter chemistry); high accuracy. Cons: only works on culturable cells/animals (SILAC mouse — Mann 2008 — feeding 13C6-Lys diet); two-state typical, max three-state.

Other labeled methods

  • dimethyl labeling (Boersema-Heck 2009). Cheap (formaldehyde + reductant); 3-plex per LC-MS run.
  • AHA / pSILAC — incorporate methionine analog azidohomoalanine; click-pull-down newly synthesized proteins; turnover dynamics.
  • mTRAQ (Sciex; nonisobaric Sciex variant of iTRAQ). Used in plexDIA (Slavov-Demichev) for single-cell DIA multiplexing.

Absolute quantification

  • AQUA. Spike heavy-isotope-labeled synthetic peptide standard; quantify endogenous peptide vs known amount. Per-peptide standard cost limits scale.
  • QconCAT, QPrEST, FlexiQuant. Concatenated standard peptide proteins; cheaper per-protein.
  • iBAQ, top3, MaxLFQ-based “absolute.” Approximate; useful for relative protein-rank within sample.

Software and search engines

Bottom-up DDA search engines

  • Sequest (Yates-Eng 1994) — the original; integrated into Proteome Discoverer.
  • Mascot (Matrix Science 1999) — commercial; widely cited; web-based search.
  • Andromeda (Cox-Mann 2011) — integrated into MaxQuant; free.
  • MSFragger (Nesvizhskii-Yu, Michigan 2017) — fragment-ion indexing; ~50× faster than Sequest/Mascot. Powers FragPipe pipeline.
  • Comet (Eng 2013) — open-source Sequest-like.
  • MS-GF+ (Pevzner-Kim) — improved scoring; open-source.
  • Tide / Crux (Noble UW).
  • PEAKS (BSI; de novo + database hybrid).

Validation / FDR

  • Target-decoy. Search against target DB (forward) + decoy DB (reversed or scrambled). Estimate FDR from decoy hit rate at any score threshold; report 1% protein/peptide FDR.
  • Percolator (Käll-Noble 2007). Semi-supervised SVM/neural-net re-scoring of PSMs using target-decoy.
  • Mascot Percolator, Q-ranker, MS Amanda + Percolator, SAGE.

DIA software

  • Spectronaut (Biognosys; commercial, expensive but excellent UX). Library-based and library-free directDIA.
  • DIA-NN (Demichev-Ralser 2020 Nat Methods). Open-source; ML-based scoring; library-free or library-aided. Has displaced Spectronaut in many academic and public-data analyses 2022-2026.
  • Skyline (MacCoss UW). Open-source; targeted (PRM/SRM) plus DIA. Spectral library and method development.
  • OpenSWATH (Aebersold-Röst 2014). Original community SWATH tool.
  • EncyclopeDIA, MaxDIA, Scaffold-DIA, FragPipe-DIA. Increasingly converging on similar performance.

Spectral libraries

  • In-house DDA-based. Run pooled fractionated sample in DDA → build library → use for DIA quantification.
  • Predicted libraries. Prosit (Wilhelm-Wilhelm 2019) — deep neural net predicts MS2 spectra and iRT from peptide sequence. DeepMS, AlphaPeptDeep, MSFragger DDA→DIA. Reduces need for empirical libraries; especially valuable for low-input samples.
  • ProteomicsDB, ProteomeTools — synthetic peptide library reference spectra.

Public repositories

  • PRIDE (EBI; >1 M datasets; the GEO of proteomics).
  • MassIVE (UCSD; integrates with PRIDE via ProteomeXchange).
  • jPOST (Japan ProteOme STandard Repository).
  • iProX (China).
  • Panorama Public — Skyline targeted assays.

ProteomeXchange Consortium coordinates submissions across PRIDE, MassIVE, jPOST, iProX. PXD prefix dataset identifiers (e.g., PXD012345).

Protein-level inference

PSMs → peptides → proteins is non-trivial because of shared peptides (razor peptide assignment). Methods: ProteinProphet, Mayu, IDPicker, MSstats. MaxQuant’s protein groups handle shared-peptide ambiguity by grouping indistinguishable proteins.


Post-translational modifications (PTMs)

PTMs are the rich combinatorial layer of regulation. ~250 known PTM types; ~20 broadly studied. Mass-spec mapping is the leading method.

Phosphoproteomics

Phospho-Ser/Thr/Tyr — +79.966 Da delta. Enrichment essential (phosphopeptides are typically <1% of total peptides at biological abundances; pSer:pThr:pTyr ≈ 90:10:0.1).

Enrichment:

  • IMAC — immobilized metal affinity chromatography. Fe³⁺-NTA, Fe³⁺-IDA (TitanSphere TiO₂ commercially most common workhorse).
  • TiO₂. Titanium dioxide; older, still widely used; less specific for multiply-phosphorylated peptides.
  • HighSelect Fe-NTA (Thermo) — pre-packaged spin columns; reproducible.
  • EasyPhos (Mann lab) — single-step enrichment from crude lysate, very simple, scales to hundreds of samples per week.

Workflow: digest → desalt → enrich → optionally fractionate → LC-MS/MS with HCD + ETD (or EThcD) → search with PhosphoRS / Andromeda / Spectronaut site localization.

Phosphosite localization: Ascore (Beausoleil-Gygi 2006), ptmRS, PhosphoRS, MD-score, localization probability. Distinguishes pSer/pThr at S/T-rich tryptic peptides.

Databases: PhosphoSitePlus (Hornbeck, CST; curated phosphosites + functional annotation; >300k sites), Phospho.ELM, MIST/HIPHIN integrations.

Glycoproteomics

N-glycosylation at NXS/T (X ≠ P) Asn, O-glycosylation at S/T (no consensus). Glycans add 800-3500 Da typical; immense isomeric diversity.

Approaches:

  • Bottom-up glycopeptide MS. Enrich by HILIC, lectin (ConA, WGA), or boronate. Fragment with HCD-stepped or EThcD. Software: pGlyco (Liu lab), Byonic (Bern-Protein Metrics), GPSeeker, MSFragger-Glyco.
  • Released-glycan. PNGase F releases N-glycans; analyze with MALDI or LC-MS or HILIC-FLD with 2-AB labels.
  • IgG glycoproteomics. Drives biopharma QC — IgG1 Fc N-glycan composition affects ADCC/CDC; ~60% of mAb characterization is glycan analysis (HILIC-FLD post-PNGase F).
  • Top-down/middle-down glycoproteomics. Intact glycoform analysis on Orbitrap or FT-ICR with ETD/UVPD.

GlyTouCan database; Symbol Nomenclature for Glycans (SNFG); GlyConnect (ExPASy).

Ubiquitin and ubiquitin-like

  • K-ε-GG remnant. Tryptic digestion of ubiquitin-conjugated proteins leaves di-Gly (GG, +114.04 Da) on K-modified peptide. Anti-K-GG antibody enrichment (PTMScan from CST) → LC-MS/MS. Identifies ubiquitination sites globally.
  • Linkage-specific. K48 vs K63 vs K11 ubiquitin chain linkages diagnosed by ratio of K-GG peptides at each Ub Lys.
  • NEDD8, SUMO, ISG15 — analogous Ubl proteins; similar enrichment workflows but with Ubl-specific epitope antibodies.

Acetylation, methylation

K-ac (+42.011 Da), K-me/me2/me3 (+14/+28/+42; methylation isobaric with acetylation at me3). Enrichment with acetyl-K antibody (PTMScan, Cell Signaling). Histone PTM combinatorial — middle-down or bottom-up with arginine-derivatization workflows.

ADP-ribosylation, palmitoylation, lipidation

Specialized enrichments (Af1521 macrodomain for ADP-ribose; acyl-biotin exchange for palmitoyl-Cys; click-chemistry for AHA-labeled lipid mimics). Cross-link cell-molecular-biology.


Crosslinking-MS (XL-MS)

Cross-link two residues in a protein/complex with a bifunctional reagent → digest → identify crosslinked peptides → triangulate residue distance constraints for structure modeling.

Cross-linkers

  • DSSO (disuccinimidyl sulfoxide). MS-cleavable; spacer 10.3 Å; targets Lys ε-amine. Cleaves under CID to give characteristic alkene/sulfenic-acid signature ions → simplifies identification.
  • DSBU (disuccinimidyl dibutyric urea). MS-cleavable; spacer 12.5 Å.
  • BS3, DSS, DSG. Non-cleavable Lys-Lys; classical Sinz workhorse.
  • PhoX, leiker, tBuPhoX. Photo-cleavable or acid-cleavable.
  • EDC + sulfo-NHS. Lys-Asp/Glu zero-length.
  • SDA, NHS-diazirine. Photo-activatable; broad target (any C-H bond within distance).

Workflow

  1. Crosslink intact protein/complex (typically 1-100× molar excess; quench with Tris/NH4HCO3).

  2. Digest (trypsin or trypsin/Lys-C).

  3. Enrich for crosslinked peptides — size-exclusion (SEC), SCX, anti-DSS antibody.

  4. LC-MS/MS with stepped-HCD or EThcD.

  5. Search with crosslink-specific software:

    • pLink2 (He lab) — high sensitivity, MS-cleavable + non-cleavable.
    • MeroX — MS-cleavable.
    • Kojak (Hoopmann-Moritz) — open-source, integrates with Trans-Proteomic Pipeline.
    • xQuest/xProphet (Aebersold-Leitner).
    • XlinkX in Proteome Discoverer (Heck).
  6. Convert XL hits → distance restraints (~Cα-Cα < 30 Å for DSSO/DSS) → integrate into structure modeling (HADDOCK, Integrative Modeling Platform IMP, Modeller, AlphaFold restraints).

XL-MS is now standard complementary tool for cryo-EM-resolved complexes — particularly for flexible regions, transient interactions, and intrinsically disordered proteins. Used to validate ribosome structures, spliceosome, nuclear pore complex (Beck-Aebersold consortia), proteasome, ATP synthase.


HDX-MS

Hydrogen-deuterium exchange MS measures amide hydrogen exchange rates → reveals solvent accessibility, dynamics, binding-induced protection.

Workflow

  1. Protein in H₂O equilibrium.
  2. Dilute 10× into D₂O buffer (95-99% D); incubate 10 s, 1 min, 10 min, 1 h, 10 h time points.
  3. Quench (low pH, low T) at each time point.
  4. Pepsin digest at 0 °C (pepsin works at pH 2.5; minimizes back-exchange).
  5. LC-MS at 0 °C, fast 5-min gradient.
  6. Mass shift of each peptide vs unlabeled → fraction deuterated → fit exchange kinetics.
  7. Compare ± ligand, ± mutation → identify protected (slower exchange) or exposed (faster exchange) regions.

Software: HX-Express, HDExaminer (Sierra Analytics), Mass Spec Studio, DECA, HDXer. ExMS2 (Englander lab) for residue-level deconvolution.

Applications: epitope mapping (mAb-antigen interface), allosteric site mapping, ligand-binding-site identification, intrinsically disordered region characterization, biosimilar comparability for biopharma.

Commercial HDX system: Waters HDX Manager, Trajan LEAP HDX PAL, Sierra HDXmgr. ExclusiveHDX-MS service: NanoTemper, Malvern Panalytical.


Native MS

ESI from ammonium acetate buffer preserves non-covalent interactions → measures intact protein complex masses, stoichiometries, ligand-binding constants.

Instrumentation: Waters Synapt G2-Si or Cyclic IMS with high-mass quadrupole; Bruker timsTOF Pro 2 with sapphire skimmer; Thermo Q Exactive Orbitrap UHMR (ultra-high mass range, 2020); Thermo Q Exactive UHMR replaced by Astral UHMR.

Charge-reducing additives (TEAA, imidazole) and gentle source conditions preserve complex integrity. Can mass complexes >1 MDa (proteasome, ribosome, viral capsids — Heck consortium).

Applications: stoichiometry of homo/heteromers, ligand-binding (KD via titration), drug-target engagement, antibody-antigen complexes, IgG glycoform distribution at intact-protein level.


Affinity-MS interactomics

AP-MS — affinity purification mass spectrometry

Bait protein expressed with affinity tag (FLAG, HA, GFP, V5, His, MBP, Strep-tag II) or endogenously knocked-in (CRISPR-engineered tag). Pull down with anti-tag bead; wash; elute; trypsin digest on-bead or in-solution; LC-MS/MS; identify co-purifying proteins (preys).

Statistical scoring:

  • SAINT (Choi-Nesvizhskii 2011 Nat Methods). Probability-based contaminant filtering.
  • MiST / CompPASS (Krogan, Harper labs).
  • ProHits-viz (Gingras lab).

BioPlex (Harper-Gygi-Wade) is the canonical large-scale AP-MS interactome — >100,000 interactions across thousands of bait proteins in HEK293T.

BioID / TurboID / APEX2 — proximity labeling

Bait fused to promiscuous biotin ligase (BirA* in BioID, BioID2 smaller; TurboID, miniTurbo — Branon-Ting 2018 Nat Biotechnol — much faster labeling in 10 min vs 18 h BioID); biotin substrate added → labels nearby (≤10 nm) proteins → streptavidin pull-down → MS.

APEX2 (Lam-Ting 2015 Nat Methods) — engineered peroxidase + biotin-phenol + H₂O₂ for 1-min labeling pulses; spatial resolution ~20 nm. Used in organelle proteomics (mitochondrial matrix vs IMS vs OMM), synaptic protein networks (Schiapparelli-Cline).

Advantages over AP-MS: detects transient and weak interactions, preserved native cellular context (no lysis dispersion). Disadvantages: signal-to-noise lower than direct AP-MS.

Co-fractionation MS / Thermal proximity profiling

CF-MS — fractionate native lysate by size (SEC) or charge (IEX), MS-profile each fraction → co-elution patterns predict complexes. Heuristics: COMPASS, EPIC, PrInCE.

TPP (thermal proteome profiling; Mathieson-Savitski-Drewes 2014 Science) — heat lysate at series of temperatures, soluble fraction analyzed by TMT → drug-binding stabilizes target → Tm shift identifies drug targets in whole-proteome. CETSA-MS (Molina-Nordlund 2013).


Single-cell proteomics

Frontier: identify and quantify thousands of proteins per single mammalian cell.

Lysate-based MS

  • SCoPE-MS / SCoPE2 (Slavov lab; Specht-Slavov 2018, 2021 Nat Methods/Genome Biol). Single cells lysed in 1 µL each, TMT-labeled, multiplexed 14-16-plex with a “carrier” channel (~100× protein input from bulk lysate) to boost MS1 ions → reporter ions from single-cell channels quantifiable. ~1500-3000 proteins per single cell.
  • nanoPOTS — nanodroplet processing in one pot for trace samples (Zhu-Kelly-Smith PNNL 2018 Nat Comm). Sub-µL volumes (~200 nL); picogram-input sample prep. Coupled with Orbitrap or timsTOF.
  • plexDIA (Demichev-Ralser-Slavov 2022 Nat Biotechnol). mTRAQ 3-plex DIA single-cell.
  • N2 / SCP-DIA. Single-cell DIA without isobaric labels — emerging as cleanest workflow with Astral.
  • DISCO-MS, T-SCP, AccuTrap, OAD. Various startup commercial platforms.

State of the field (2026): ~3-5k proteins per single mammalian cell from Astral or timsTOF SCP routinely. Cell-type rare-population biology, drug-perturbation response heterogeneity.

Single-molecule protein sequencing (non-MS)

  • Quantum-Si Platinum (2022 launch). Fluorescent N-terminal-degradation single-molecule sequencer. ~16 amino-acid recognition; modest dynamic range; ~30 protein identification per cell vs 100s on MS but lower instrumentation cost.
  • Erisyon, Nautilus, Encodia (ProteoCode) — competing fluorescent recognition platforms with different chemistry.
  • Oxford Nanopore protein sequencing — early R&D 2023-2025; promise but unrealized at scale.

Clinical proteomics

Plasma proteomics

The plasma proteome spans ~10¹⁰-fold dynamic range (albumin 30-50 mg/mL → cytokines 1-10 pg/mL). Deep coverage requires either depletion (top-12, top-14 high-abundance protein removal — Sigma SEPPRO, Agilent MARS) or alternative ionization-friendly platforms.

UK Biobank Pharma Proteomics Project (PPP) — Olink Explore HT (5000 plasma proteins) on 54k participants (2023 first release), scaling to 250k+ by 2026. Generates pQTL maps, biomarker discovery, drug-target validation. Sun-Surendran-Howson-Butterworth 2023 Nature; Pietzner-Wheeler-Langenberg 2021 Science. Cross-link genetics-and-genomics.

Tissue proteomics / spatial proteomics

  • LC-MS on microdissected regions (LCM-MS).
  • MALDI-MS imaging. Spatial resolution ~10-50 µm; identifies lipids, drugs, small proteins/peptides direct from tissue. Bruker rapifleX MALDI Tissuetyper, timsTOF flex; Waters MALDI Synapt.
  • DESI-MS imaging. Ambient ionization on tissue; spatial ~50-100 µm; used in REIMS Intelligent Knife (Takats; mass-spec margin assessment during surgery).
  • CODEX / IBEX / IMC (imaging mass cytometry, Fluidigm-Standard BioTools) — antibody-based but multiplexed (40+ markers in IMC). Bridges proteomics and pathology.

Biomarker discovery

Discovery → verification → validation pipeline (FDA, CLIA, IVDR). Most proteomic biomarker discoveries fail validation. Successes:

  • Troponin I/T (cardiac MI); now point-of-care.
  • PSA (prostate cancer screening; controversial).
  • PCT (procalcitonin) — bacterial vs viral infection differentiation.
  • NT-proBNP — heart failure.
  • OVA1, ROMA — ovarian cancer (multiprotein algorithms).
  • ColonView, EarlyCDT — multiprotein cancer panels (mixed FDA/CE status).

Olink and SomaScan generate hundreds of candidate plasma biomarkers per year; mAb-based validation lags. Translation to clinic remains the bottleneck.


Practical workflows

Standard bottom-up DDA experiment (HeLa lysate)

  1. Lyse 1-5 × 10⁶ HeLa cells in 5% SDS + 100 mM TEAB + protease inhibitors (PIC); sonicate; clarify by centrifugation 16,000×g 10 min.
  2. BCA assay → take 50-100 µg.
  3. S-Trap mini protocol: reduce (TCEP 5 mM 15 min RT), alkylate (MMTS 10 mM 15 min), acidify with PA 1.2% → bind to S-Trap → wash → trypsin 1:25 enzyme:substrate, 47 °C 1 h.
  4. Elute peptides; dry in SpeedVac; resuspend in 0.1% FA + 2% MeCN.
  5. C18 cleanup if not already; quantify peptide (BCA or NanoDrop A280).
  6. LC: 1 µg on 25 cm × 75 µm Aurora C18 (or Thermo PepMap C18, Waters HSS T3), 90 min gradient 2-30% B (0.1% FA in MeCN), 300 nL/min.
  7. MS: Orbitrap Exploris 480 or Astral, DDA top-15, 1×10⁶ AGC, 60k MS1 resolution, 15k MS2 resolution, HCD NCE 30, dynamic exclusion 30 s.
  8. Search MaxQuant or FragPipe → UniProt human proteome (~20k canonical + isoforms) + Andromeda/MSFragger; FDR 1% protein + peptide; LFQ enabled.
  9. ~6000-9000 proteins from single-shot 90-min HeLa with Exploris 480; ~10000+ with Astral.

Phosphoproteomics enrichment (5 mg input)

  1. Lyse 50-100 × 10⁶ cells in 8 M urea + 50 mM TEAB + 1× phosphatase inhibitor (PhosSTOP); sonicate.
  2. Reduce/alkylate.
  3. Digest: dilute urea to 2 M with TEAB; Lys-C (1:100) 4 h 37 °C; trypsin (1:50) overnight 37 °C.
  4. C18 desalt (Sep-Pak C18 1 cc 100 mg or in-house StageTip equivalent).
  5. EasyPhos protocol: bind to TiO₂ in 80% ACN / 5% TFA / 1 M glycolic acid; wash; elute with 40% ACN / 15% NH₄OH.
  6. Re-acidify; C18 desalt.
  7. LC-MS/MS: Exploris 480 with TopN 12, HCD NCE 27, 50k MS2 resolution.
  8. MaxQuant phospho search; PTMRS site localization >0.75; FDR 1%.
  9. ~15,000-25,000 phosphosites quantified from 5 mg cell lysate.

TMT 18-plex workflow

  1. Process 18 samples in parallel through digestion (S-Trap or in-solution).
  2. ~100 µg peptide per sample; quantify by BCA on peptide.
  3. Label each sample with one TMTpro 18-plex channel (Pierce); 1 h RT; quench with hydroxylamine.
  4. Combine 18 samples 1:1; desalt; high-pH RP fractionate into 24-48 fractions.
  5. LC-MS/MS each fraction: Eclipse Tribrid with SPS-MS3 method; HCD NCE 36 MS2, HCD NCE 65 MS3, top-10 SPS notches.
  6. Proteome Discoverer with Sequest HT + Mascot + Percolator; TMT reporter quantification at MS3.
  7. ~9000 proteins quantified across 18 samples; CV 5-15% biological replicates.

Sample preparation deep dive

The single biggest determinant of proteomic data quality is sample prep — not the MS instrument. Modern best practices:

Lysis buffer choice

  • 5% SDS + 50 mM TEAB. Maximum protein extraction; SDS-tolerant downstream methods needed (S-Trap, suspension trap).
  • 8 M urea + 50 mM TEAB / TFE. Chaotropic; mild reduction in 4 M urea after Lys-C, 2 M urea after trypsin dilution.
  • GdnHCl 6 M. Stronger chaotrope; harder to remove than urea; common for membrane proteins.
  • RIPA, NP-40, Triton. Detergent lysis for membrane/organelle prep; detergent removal essential before MS.
  • MeOH-CHCl₃-water (Bligh-Dyer modified). Whole-cell precipitation; preserves PTMs; good for lipid-rich samples.

Detergent removal

  • SP3. Hughes-Coleman 2014, 2019. Carboxylate magnetic beads bind protein under aqueous-organic conditions; wash off detergent; on-bead digest. Streamlined and robust.
  • S-Trap (Protifi). SDS + acid + MeOH → trapped on quartz filter; wash; on-membrane digest. Tolerates 5% SDS input.
  • iST / PreOmics. All-in-one cartridge; spin-column based; minimal pipetting.
  • Acetone precipitation. Classical; 5× volume cold acetone overnight. Time-consuming but cheap.
  • MeOH-CHCl₃ precipitation. Wessel-Flügge 1984; works for small volumes but loses some proteins.

Tip and column conditioning

C18 desalt: Pierce PepClean, Thermo OMIX, Phenomenex Strata-X, Waters Sep-Pak μElution. Wet → equilibrate (0.1% TFA in water) → load → wash → elute (60-80% MeCN/0.1% TFA) → dry. Critical for removing salts that suppress ESI.

Common contaminants

  • Keratins — from skin, dust, lab coats. Pre-blank gel lanes, gloves, hood. cRAP database for filtering.
  • Polymers (PEG, Triton, polyglycols). From detergents, plasticware. Wash glassware with MeCN.
  • Trypsin self-digestion peptides. Inevitable; filtered post-search.
  • Streptavidin from biotin pulldowns. Cleavage products.
  • BSA from cell culture media or BSA-bead carryover.

LC method optimization

Gradient design

Steeper gradient = faster but fewer IDs; shallower = deeper but slower. Typical:

  • 30-min gradient — screening, single-shot QC; ~3500-5000 proteins.
  • 60-min — single-shot deep proteome; ~6000-8000 proteins.
  • 90-120 min — DDA single-shot best; ~8000-10000 proteins.
  • 3-h gradient + fractionation (high-pH RP into 6-12 fractions) — phosphoproteomics with deep coverage.

Column: 25-75 cm × 75 µm i.d. fused-silica with C18 1.6-1.9 µm particles (IonOpticks Aurora, Thermo PepMap, Waters HSS T3). Pulled-tip emitter integrated. Column temperature 50-60 °C for sharper peaks.

Flow rate 200-400 nL/min for nano; 1-5 µL/min for micro-flow (higher robustness; ~2× sensitivity loss). EvoSep One: pre-formed gradient on disposable tips → 30, 60, 100, 200 samples-per-day workflows; high reproducibility but lower depth than custom gradient.

Two-column setup

Loading on a short trap (3 cm × 100 µm; 3 µm C18) at high flow (5-10 µL/min) → desalt → backflow elute onto analytical column at nano flow. Reduces analysis time and protects analytical column.

Direct injection (single-column)

Skips trap; injects directly onto analytical column. Faster cycle; better peak shape if loading conditions optimized.

Data analysis pipelines in detail

FragPipe (MSFragger)

Open-source pipeline:

  1. Convert RAW → mzML (msconvert, ProteoWizard) or use directly.
  2. MSFragger search with target-decoy.
  3. PeptideProphet / iProphet for re-scoring.
  4. ProteinProphet for protein-level FDR.
  5. IonQuant for label-free quant or TMT-Integrator for TMT.
  6. PhilosopherFD output to TSV/Excel.

Strengths: speed (5-10× MaxQuant), advanced features (open-search PTM discovery, immunopeptidomics, glycoproteomics with FragPipe-Glyco), DIA support.

MaxQuant

Cox-Mann 2008+. Powerful default but slow (overnight for 20-30 raw files on 12-core workstation). Andromeda search; protein groups; LFQ with delayed normalization; many statistical helpers in Perseus downstream tool. Still standard for SILAC and many TMT labs.

Spectronaut

Biognosys commercial. Excellent UX; directDIA (library-free); deep-learning Pulsar predicted libraries; cross-run alignment. Industrial / clinical standard for DIA.

DIA-NN

Free, open-source. ML-trained scoring (RT, ion mobility, MS2 intensity prediction). DIA-NN 1.8+ supports timsTOF PASEF-DIA, Astral, multiple TMT-DIA variants. By 2024 effectively the field default for DIA processing in academic labs.

Skyline

MacCoss UW. Open-source. Originally for SRM/PRM; now also DIA. Used for targeted assay development, biomarker verification, ultra-precise quant of small panels (e.g., 20-100 proteins across hundreds of clinical samples).

Statistical post-processing

  • Perseus (Cox-Mann). t-tests, ANOVA, PCA, hierarchical clustering, GO/KEGG enrichment.
  • MSstats (Choi-Vitek). Linear-mixed-effects for cross-condition comparisons; gold standard for label-free DIA.
  • limma-prot. Borrowed from microarray; empirical Bayes moderated t-tests.
  • DEqMS (Zhu-Lehtio). Variance-trend correction.
  • R/Bioconductor (DEP, prolfqua, QFeatures).
  • Python (pyOpenMS, AlphaPeptStats).

Imaging mass spectrometry

MALDI imaging

Spatial mass spec on tissue. Sections (10 µm cryostat) + matrix (CHCA, DHB, SA — sprayed via HTX TM-Sprayer or sublimation) → MALDI laser raster → image of m/z vs (x,y).

Resolution: ~10-50 µm (limited by laser spot, matrix crystal size). Bruker rapifleX TissueTyper, timsTOF flex, Waters MALDI Synapt. Software: SCiLS Lab (Bruker), HD-Imaging (Waters), Cardinal MSI (R).

Applications: drug distribution in tissue, tumor margins (lipidomic distinction), spatial metabolomics, peptide imaging (with on-tissue digestion).

DESI-MS imaging

Ambient ionization with desorbing solvent spray. Lower spatial resolution (~50-100 µm) but no matrix needed. Used in iKnife (Takats Imperial) — surgical mass spec for real-time tumor margin identification during oncology surgery.

MIBI, IMC (imaging mass cytometry)

Antibody-based; metal-tagged antibodies; ion-beam ablation. ~40-100 markers per tissue section at 1-µm resolution. Spatial proteomics at the antibody-resolution level (vs MALDI’s untargeted ~10 µm).

Proteogenomics

Integrate genomics (variant calling), transcriptomics (isoform inference), and proteomics. Custom protein database from RNA-seq translated ORFs → search MS data → identify peptides corresponding to single-amino-acid variants (SAAVs), novel splice junctions, fusion proteins, neoantigens.

CPTAC (Clinical Proteomic Tumor Analysis Consortium) — multi-cancer proteogenomic atlases. Cross-link genetics-and-genomics for genome-level methodology.

Neoantigen discovery

Cancer-specific somatic mutations create neopeptides presented on MHC. Workflow:

  1. Tumor + normal exome → call somatic SNVs.
  2. RNA-seq → confirm expression.
  3. MHC binding prediction (NetMHCpan, MHCflurry).
  4. MS validation — immunopeptidomics: HLA-IP (antibody pulldown of MHC complexes from tumor lysate) → MS identification of presented peptides.
  5. Cross-reference predicted neoantigens with MS-confirmed presented peptides.

Companies: Gritstone bio, BioNTech (mRNA cancer vaccine pipeline), Moderna (mRNA-4157, partner Merck), NEC/Vaximm, Genocea (closed), Personalis (informatics).

Quality control and reproducibility

Internal standards

  • iRT peptides (Biognosys). 11 synthetic peptides spanning RT range; spike-in for retention-time normalization across runs and labs.
  • PROCAL. 40-peptide retention-time + intensity standard.
  • Pierce HeLa standard. Whole-cell digest reference for column / instrument QC.

Repeated injections

Inject HeLa standard at start of week, every ~24 samples, and at end. Track ID counts, RT drift, MS1 mass accuracy, peak shape. CV across runs <10% for top half of intensity range.

Carryover

Wash blanks between samples. Carryover from very abundant peptides (BSA, mAb candidates) can persist 3-5 blanks; monitor with control mAb. Trap-elution and analytical column independent washes.

Cross-lab reproducibility

ABRF sPRG (Association of Biomolecular Resource Facilities standardized Proteomics Research Group) annual studies; ProteomeXchange-coordinated multi-lab benchmarks; HUPO Proteomics Standards Initiative.

Affinity-based proteomics platforms — extended

Pairs of antibodies conjugated to complementary single-stranded DNA tags; dual binding brings DNA into hybridization range → ligation + amplification by qPCR (Olink Target 96) or NGS (Olink Explore HT, Explore 3072).

Coverage: Target 96 panels (96 markers); Explore 1536 (1463 markers); Explore HT (5416 markers in 4 panels, NGS readout, single 96-well plate processes 88 samples + 8 controls).

Sensitivity: ~ng/mL to fg/mL range across analytes. Dynamic range ~10⁹.

Reproducibility: CV 5-15% across plate; high inter-lab concordance demonstrated in UK Biobank PPP rollout.

Limitations: requires antibody pair for each analyte → discovery limited by antibody library; many proteins have only one good antibody.

SomaLogic SomaScan

SOMAmers (Slow Off-rate Modified Aptamers) — chemically modified ssDNA aptamers with hydrophobic side chains (5-deoxyaminouridine variants) that mimic antibody contact surface. ~7000 proteins in SomaScan v4.1; 11,000 in v4.2 (2024 release). Workflow:

  1. Bind SOMAmer mix to immobilized protein in plasma.
  2. Capture protein-SOMAmer complexes on streptavidin beads.
  3. Photocleave biotin linker; elute SOMAmer.
  4. Hybridize to printed array or NGS readout.

Pros: huge coverage, single-platform. Cons: SOMAmer cross-reactivity sometimes higher than mAb pair; “noise floor” issues for very-low-abundance analytes.

Comparison

UK Biobank rolled out Olink Explore HT for ~50k participants by 2023 (Sun-Surendran et al. 2023 Nature). deCODE and Iceland cohorts use SomaScan. Cross-platform comparisons (Pietzner-Wheeler-Langenberg 2021 Science) show modest concordance (~50-70%) for high-abundance analytes, lower for low-abundance — drives demand for ground-truth MS comparator panels.

Other affinity platforms

  • Alamar NULISA. Nucleic acid linked immuno-sandwich assay; competing with Olink.
  • Quanterix Simoa. Single-molecule array; ultra-high sensitivity (fg/mL); per-analyte assay (low multiplex).
  • MSD (Meso Scale Discovery). Electrochemiluminescence; multiplexed; clinical assays.
  • Luminex xMAP, Bio-Plex. Bead-based multiplex immunoassay; mid-multiplex (<100).
  • NanoString nCounter, GeoMx, CosMx. Hybridization-based; multiplexed; spatial.

Targeted assays and PRM panels

PRM panel development workflow

  1. Define target proteins (~20-200).
  2. In silico tryptic digest; pick 2-5 proteotypic peptides per protein (unique to target, no PTM site, no missed cleavage, no Met/Cys when possible, 8-25 aa).
  3. Spike in heavy-labeled synthetic peptide standards (Sigma AQUA, JPT SpikeTides).
  4. Develop scheduling and dwell time on Orbitrap or QQQ.
  5. Validate on QC pools; verify linearity, LLOQ, ULOQ, intra/inter-run CV.
  6. FDA bioanalytical method validation for clinical use: accuracy ±15% (20% at LLOQ), precision <15%.

Clinical PRM examples

Apolipoproteins (ApoA-I, ApoB, ApoC3, ApoE) — cardiovascular risk; LDL-cholesterol estimation by isoform-resolved ApoB. SISCAPA (stable-isotope standards capture by anti-peptide antibody) — Anderson 2004; enriches target peptides before LC-MS for ultra-low-abundance plasma proteins.

Distinguishing from immunoassay-based clinical chemistry

MS-based targeted assays (PRM, SRM) report peptide-level signal, immune to many immunoassay interferences (HAMA, heterophile antibodies, biotin interference). Increasingly preferred in clinical-chemistry-grade plasma assays — Mayo Clinic, ARUP, Quest, Labcorp run dozens of MS-LDTs.

Software ecosystem

File format zoo

  • Raw vendor format — .raw (Thermo), .wiff (Sciex), .d (Bruker, Agilent), .lcd (Shimadzu).
  • Open standards — mzML, mzXML (deprecated), mzMLb (compressed; HDF5-backed), Mascot Generic Format (MGF), mzTab (results), mzIdentML (identification).
  • Spectral libraries — NIST .msp, BiblioSpec .blib, OpenSWATH .pqp, OpenMS .sqMass.
  • Open community pipelines. Snakemake/Nextflow-wrapped MaxQuant/FragPipe; nf-core/proteomicslfq, nf-core/quantms.

Cloud platforms

  • Genentech/Roche proteomic platforms (internal).
  • Vermont Proteomics, Indiana Proteomics, Broad Proteomics (academic core facilities with cloud-deployable pipelines).
  • AWS S3-backed PRIDE submissions — modern data-deposition rebuild.
  • DIA-NN-PIB, AlphaPept on cloud — emerging compute-as-a-service.

AI in proteomics

  • Prosit, AlphaPeptDeep, Casanovo — peptide MS2 spectrum prediction → DDA → DIA library generation without empirical runs.
  • DIA-NN ML-scoring — neural-net classifier on RT, ion-mobility, MS2.
  • PEPNet, DeepNovo, Casanovo (Bittremieux), InstaNovo — de novo peptide sequencing from MS2 without database; growing importance for non-model organisms and immunopeptidomics.
  • AlphaFold for proteomics. Inference of complex-partner identity, PTM site context, drug-binding-pocket priors.

Major proteome atlas projects

Human Proteome Project (HUPO)

Global effort to catalog every human gene product. As of 2024, ~93% of ~20,000 canonical human proteins have at least one strong MS-detected peptide. Last few percent — short ORFs, very-low-abundance proteins, tissue-restricted — remain “missing.” neXtProt, Human Protein Atlas (Uhlén-KTH; tissue immunohistochemistry + MS quant), Proteomics DB (Kuster-TUM) coordinate the catalog.

ProteomeXchange Consortium

PRIDE, MassIVE, jPOST, iProX, Panorama Public — federated repository network. ~30,000 datasets total by 2024. Every published proteomics study now expected to submit raw data + metadata.

CPTAC pan-cancer

Coordinated proteogenomics of TCGA cancer cohorts (breast, ovarian, colon, glioma, lung adenocarcinoma, lung squamous, endometrial, kidney, head and neck, pediatric, pancreatic) — public datasets enabling tumor-classification, drug-target identification, neoantigen catalog work.

Pan-tissue and pan-species

GTEx-Pro (tissue proteomics from GTEx donor tissues), Tabula Sapiens proteomics (single-cell-resolved), Mouse Cell Atlas proteomics, model-organism efforts in yeast (Aebersold-Wenger-Mann), C. elegans, Drosophila, zebrafish.

Future directions

Single-cell DIA on Astral and timsTOF SCP

3000-5000 proteins per cell with 5-min gradient injected directly (no carrier). Throughput ~10k cells per week per instrument.

Spatial proteomics integration

Combine MALDI imaging, IMC, CODEX, MIBI with single-cell-resolved spatial proteomics; integration with spatial transcriptomics (Visium, Slide-seq, Stereo-seq).

Plasma cell-free proteomics

Cell-of-origin assignment from plasma protein composition (cancer biomarker discovery; pregnancy maturation; transplant rejection). Combine with cfDNA methylation for full liquid-biopsy panel.

MS + structural biology integration

Cross-linking MS + cryo-EM + AI structure prediction → integrative models of macromolecular machines. ColabFold, AlphaFold-Multimer + crosslink restraints; XL-MS + native MS for stoichiometry confirmation.

Multi-modal omics

Integrate proteomics + transcriptomics + metabolomics + lipidomics + epigenomics on the same sample. Single-cell CITE-seq (RNA + ~100 surface proteins via DNA-tagged antibodies) is the consumer-end of this; mass-spec-based multi-omics for deeper proteome layers is emerging.

Common pitfalls and how to avoid them

Statistical pitfalls

  • Multiple-testing correction. Bonferroni too conservative; Benjamini-Hochberg FDR (typically 5%) standard.
  • Batch effects. Plate position, instrument run order, processing date — all confounders. Use ComBat, Harmony, limma’s removeBatchEffect; randomize sample order; include batch QC.
  • Inadequate replication. N=3 biological replicates minimum for any quantitative claim; N=5+ for clinical comparisons. Power calculations recommended.
  • Dispersion estimation. With small N, variance is poorly estimated; empirical Bayes (limma, MSstats) borrows information across proteins.

Workflow pitfalls

  • Trypsin in-source. Trypsin self-digest peptides at known m/z (T1: 842.51, T2: 906.50, T3: 1006.49); contaminate every run.
  • Keratin. From skin, hair, dust. Use cRAP database to filter; gloves + clean bench essential.
  • Wrong reference proteome. Mouse samples searched against human, isoform-aware searches against canonical-only, novel-organism samples against poorly-annotated databases.
  • Sloppy peptide search settings. Too-narrow precursor tolerance (miss matches), too-permissive PTM variable mods (combinatorial explosion + FDR violation).
  • Carryover. Between-sample washes essential; HeLa carryover into clinical sample contaminates results.
  • Incorrect TMT channel assignment. Mislabeling a 16-plex destroys the experiment; orthogonal validation of channel-to-sample mapping mandatory.

Reporting pitfalls

  • Reporting “fold change” without uncertainty. A fold change of 2.0 with CI [1.95, 2.05] differs from 2.0 with CI [0.5, 8.0].
  • Failing to deposit raw data. Required by most journals via ProteomeXchange.
  • Incomplete metadata. Specifying instrument, gradient, search engine, version, parameter set, database, FDR threshold.
  • Cherry-picking proteins. Validate selected hits orthogonally (Western, PRM, immunohistochemistry) before drawing biological conclusions.

Mass spec for drug discovery

Mechanism of action (MoA)

CETSA-MS (Cellular Thermal Shift Assay; Molina-Nordlund 2013 Science) — drug-bound target shifts thermal-denaturation curve; whole-proteome readout via TMT-MS. Drives target ID for phenotypic-screening hits. Pelago Bioscience commercial.

Drug-target deconvolution

ABPP — activity-based protein profiling (Cravatt). Active-site-directed warhead with click-handle (alkyne/azide); pull down labeled proteins after drug treatment; identify by MS. Used for serine hydrolases, kinases, deubiquitinases, metalloproteases.

Selectivity profiling

Kinobead (KiNativ, Cellzome — acquired by GSK) — immobilized broad-spectrum kinase inhibitor cocktail; profile compound’s kinase selectivity by competition. ~250+ kinase coverage per run. Standard for kinase-drug development.

Off-target identification

Photo-affinity labeling (PAL) — photo-crosslinker drug analog + UV → covalently captures binding partners → MS identification.

Pharmacokinetics and metabolites

LC-MS triple-quad SRM in plasma, urine, tissue extracts — gold standard for PK/PD. Metabolite ID — Q-TOF or Orbitrap high-res for unknown identification; SCIEX, Thermo, Agilent platforms in every pharma DMPK lab.

Future outlook for proteomics 2026-2030

  • Single-cell proteomics at scale. ~10k cells / week per instrument with Astral/timsTOF SCP; biological-replicate-level statistics in clinical proteomics.
  • Plasma deep-proteome routine. Olink + SomaScan + MS-PRM converge on a “plasma reference panel” of ~5000-10,000 routinely-measurable proteins across population cohorts.
  • Spatial proteomics. MALDI imaging + tissue-section MS at ~5-10 µm resolution; integration with H&E histology and IHC.
  • Multi-omics integration. Single-cell RNA + protein + lipid combined readouts.
  • AI in spectral interpretation. De novo peptide sequencing (Casanovo, InstaNovo); ML-augmented database search; PTM-discovery automated.
  • Cross-lab standardization. ProteomeXchange-coordinated benchmarks; FDA recognition of MS-based clinical biomarkers; CAP, CLIA accreditation streamlining.
  • Therapeutic protein QC by intact-mass and native-MS. Routine in biopharma pharma; expanding to gene-therapy AAV and mRNA-LNP characterization.

The field is maturing from a discovery-only methodology to a fully integrated clinical, industrial, and basic-research analytical platform. The 2030s likely see proteomics complement genomics as a population-scale measurable phenotype.


Further reading

  • Aebersold, R., Mann, M. — “Mass-spectrometric exploration of proteome structure and function” Nature 2016, 537:347 — modern survey of the field.
  • Cox, J., Mann, M. — “MaxQuant enables high peptide identification rates, individualized p.p.b.-range mass accuracies and proteome-wide protein quantification” Nat Biotechnol 2008, 26:1367 — the MaxQuant paper.
  • Olsen, J.V., Mann, M. — “Status of large-scale analysis of post-translational modifications by mass spectrometry” Mol Cell Proteomics 2013, 12:3444.
  • Meier, F., Brunner, A.-D., Koch, S., Koch, H., Lubeck, M., Krause, M., Goedecke, N., Decker, J., Kosinski, T., Park, M.A., Mann, M. — “Online parallel accumulation-serial fragmentation (PASEF) with a novel trapped ion mobility mass spectrometer” Mol Cell Proteomics 2018, 17:2534.
  • Demichev, V., Messner, C.B., Vernardis, S.I., Lilley, K.S., Ralser, M. — “DIA-NN: neural networks and interference correction enable deep proteome coverage in high throughput” Nat Methods 2020, 17:41.
  • Slavov, N. — “Single-cell protein analysis by mass spectrometry” Curr Opin Chem Biol 2021, 60:1.
  • Sinz, A. — “Cross-linking/mass spectrometry: a chemical and biological tool box for protein interaction studies” Anal Bioanal Chem 2018, 410:5995.