Proteomics and Mass Spectrometry — Deep Reference
A Tier 2 deep-dive into the instrumentation, acquisition modes, quantification chemistries, and software pipelines that define modern proteomic mass spectrometry. Complements proteomics-metabolomics-and-computational-neuroscience (the broader omics-and-comp-neuro bundle) — this note assumes you have the overview and goes hands-on with instrument hardware, fragmentation chemistry, isobaric labels, DIA acquisition, post-translational modifications (PTMs), and crosslinking-MS workflows. Where the parent note maps the landscape, this one drills into the engineering of each method.
See also
- proteomics-metabolomics-and-computational-neuroscience
- structural-biology
- cell-molecular-biology
- genetics-and-genomics
- immunology-foundations
- protein-families-and-drug-targets
- pathway-database-and-bioinformatics-resources
- cell-lines-and-antibody-catalog
Bottom-up, top-down, middle-down
Bottom-up
The dominant proteomics workflow since ~2000. Pipeline:
- Lysis in chaotropic buffer (urea 8 M, GdnHCl 6 M, SDS 2-4% for tough samples) + protease inhibitors.
- Reduction of disulfides — DTT (5-10 mM, 56 °C, 30-60 min) or TCEP (5 mM, room T, 15 min).
- Alkylation of cysteines — iodoacetamide (15-50 mM, dark, 30 min, room T) → carbamidomethyl Cys (+57.02 Da). Avoid IAM > 60 min — overalkylation of Lys, His, Met. Alternatives: chloroacetamide (lower over-alkylation), N-ethylmaleimide, MMTS.
- Digestion by sequence-specific protease:
- Trypsin (porcine, bovine; Promega Gold/MS Grade, Pierce, Roche) — cleaves C-terminal to K, R (not before P). Most common; produces ~7-25 aa peptides with C-terminal basic residue → favorable +2/+3 ionization. Lys-C + trypsin double-digest (Lys-C in 8 M urea first, then dilute to 2 M urea for trypsin) — improves yield of missed-cleavages-free peptides.
- Lys-C — C-terminal to K only; complements trypsin.
- Glu-C (V8) — C-terminal to E, D (in phosphate buffer) or E only (bicarbonate).
- Asp-N — N-terminal to D (and sometimes E).
- Chymotrypsin — C-terminal to aromatics F, W, Y, large hydrophobics L, M.
- Arg-C — C-terminal to R only.
- Elastase, Proteinase K — broad-specificity for top-down complement.
- Trypsin/P — engineered to cleave even before P (resolves missed-cleavage problem for trypsin).
- Sample cleanup. SP3 (Hughes-Coleman 2014; carboxylate-coated magnetic beads), S-Trap (Protifi; SDS-tolerant), in-StageTip (iST, PreOmics), FASP (filter-aided sample prep; Wisniewski 2009 — used to be standard but largely replaced by SP3/iST).
- Peptide-level fractionation (optional, for deep proteomes). High-pH reversed phase, SCX (strong cation exchange), HILIC.
- LC-MS/MS. Nano-flow (300 nL/min) or micro-flow (1-5 µL/min) UHPLC; reversed-phase C18 column (50-250 cm; 1.7-3 µm particles).
- Database search + statistical post-processing.
Pros: high throughput, broad proteome coverage (~10k proteins / human cell line), robust quantification. Cons: peptide-level information loses isoform/PTM context; protein inference is statistical (razor/unique peptides).
Top-down
Direct analysis of intact proteins, no proteolysis. Requires:
- Ionization. Native ESI (denaturing or non-denaturing) or MALDI.
- High-resolution MS. Orbitrap (Eclipse, Exploris, Astral), FT-ICR (Bruker solariX 12-21 T; SciLifeLab 21 T). FTMS gives isotope-resolved peaks even for 50-kDa intact proteins; <2 ppm mass accuracy.
- Fragmentation. ETD (electron transfer dissociation) and EThcD (ETD + HCD supplemental activation) preserve labile PTMs and give c/z ions for backbone coverage. UVPD (193 nm ultraviolet photodissociation; Brodbelt UT-Austin) — broad sequence coverage and PTM retention.
Leaders: Kelleher (Northwestern; Consortium for Top-Down Proteomics), Heck (Utrecht), Smith (PNNL, retired). Proteoform-level information: charge variants, truncation isoforms, PTM combinatorial states. Lower throughput than bottom-up; coverage limited to <50 kDa for routine work, though >100 kDa demonstrated with optimization.
Middle-down
Limited proteolysis to 5-20 kDa fragments. Bridges top-down (proteoform context) and bottom-up (sensitivity, depth). Used for antibody analysis (IdeS cleaves IgG at lower hinge → 25 kDa Fc + 50 kDa F(ab’)2), histone PTM combinatorial analysis (Garcia, U Penn).
Mass spectrometer instrumentation
Ionization
- ESI — electrospray ionization (Fenn 1989, Nobel 2002). Liquid sprayed at +2-5 kV through tapered emitter → Taylor cone → solvent evaporates → multiply-charged protein/peptide ions. Nano-ESI (~10-30 nL/min through pulled-glass emitter or Newobjective PicoTip) provides ~10-100× sensitivity gain vs conventional ESI.
- MALDI — matrix-assisted laser desorption/ionization (Karas-Hillenkamp 1988, Tanaka 1987; Nobel 2002 shared). Sample co-crystallized with UV-absorbing matrix (CHCA, DHB, SA); pulsed N₂ (337 nm) or solid-state UV laser; ions ablated and ionized. Primarily singly-charged. Faster than ESI for screening; less compatible with online LC; widely used for tissue imaging (MALDI imaging — Bruker rapifleX, timsTOF flex).
- APPI, APCI. Photoionization, chemical ionization — for nonpolar small molecules.
- DESI, LAESI, SIMS, MASSIA. Ambient ionization for tissue imaging.
Mass analyzers
- Linear quadrupole. Mass filter; unit resolution; cheap and robust. Single-quad used in triple-quadrupole (Q1, Q3) flanking q2 collision cell — SRM/MRM workhorse. Sciex 6500+/7500, Thermo TSQ Altis/Quantis.
- Time-of-flight (TOF). Ions accelerated to fixed kinetic energy; flight time ∝ √m/z. Reflectrons fold flight path, focus energy spread → resolution 30k-60k. Bruker timsTOF, Sciex ZenoTOF, Waters Synapt, Agilent 6500/6600. High duty cycle for full-spectrum acquisition.
- Orbitrap. Makarov 2000. Ions trapped in axial electrostatic field around central spindle electrode; oscillation frequency ∝ √(k/m). FT transform of image current → ultra-high resolution (240k-1M at m/z 200 on high-end Orbitraps); 1-5 ppm mass accuracy routine. Thermo Q Exactive, Orbitrap Exploris (240, 480, MX), Eclipse Tribrid, Ascend Tribrid, Astral (2023 — combines Orbitrap + asymmetric track lossless ion trap → ~200 Hz MS2).
- FT-ICR. Ion cyclotron in strong magnetic field (7-21 T); cyclotron frequency f = qB/(2π m) → ultra-high resolution (>1M at m/z 400). Bruker solariX, magnetic-RT high-Tc superconducting magnet refits emerging. Slow (multi-second scans); reserved for top-down, ultra-high-mass-accuracy applications.
- Ion trap (3D Paul, 2D linear). Lower resolution than Orbitrap, but MSⁿ capability (n up to 5-10). Now mostly used as ion routers / fragmenters in hybrid systems.
Ion mobility
Separation by m/z and shape — gas-phase electrophoresis. Adds an orthogonal dimension to LC × MS.
- DTIMS — drift tube IMS (Agilent). Long drift tube under low-field; collision cross section (CCS) measurable.
- TWIMS — traveling wave IMS (Waters Synapt). Pulsed traveling wave; calibrated CCS.
- TIMS — trapped IMS (Bruker timsTOF). Inverse drift tube — ions held against gas flow by electric field gradient. PASEF acquisition (Meier-Mann 2018) couples TIMS scanning with quadrupole isolation → 10-fold sensitivity gain.
- FAIMS — high-field asymmetric waveform IMS (Thermo FAIMS Pro Duo). Filtering selectivity; reduces chemical noise; widely used as ion-mobility prefilter on Orbitraps for low-input proteomics.
- SLIM — structures for lossless ion manipulations (PNNL). Cyclic 10-100 m drift paths; ultra-high IMS resolution (1000+) for isomer separation.
Fragmentation methods
- CID — collision-induced dissociation. Ar or N₂ at moderate energy in linear ion trap (~30% NCE). Yields b- and y-ions for peptides. Mostly historical; superseded by HCD on Orbitraps.
- HCD — higher-energy collisional dissociation (Olsen-Mann 2007). Higher-energy CID outside the trap (in the HCD cell). Yields b/y ions plus internal/immonium ions; better for low-mass-cutoff loss issues and for TMT-reporter ion detection (~120-135 Da). Default for tryptic peptide MS2 on Orbitraps.
- ETD — electron transfer dissociation (Syka-Hunt 2004). Singly charged radical anion (typically fluoranthene radical) transfers electron to multiply charged peptide cation → backbone N-Cα cleavage → c/z ions. Preserves labile PTMs (phospho, glyco, ADP-ribose, ubiquitin chains). Requires +3 or higher charge for efficient ETD; +2 peptides need supplemental activation (EThcD).
- EThcD. ETD + low-energy HCD supplemental activation; co-existing c/z + b/y ions; best sequence coverage.
- UVPD — 193 nm ultraviolet photodissociation (Brodbelt UT-Austin). High-energy photons; broad ion-type production (a/x/b/y/c/z); excellent for top-down and disulfide-mapping.
- ETciD, AI-ETD. ETD with collisional supplement; activated-ion ETD for top-down.
- EAD — electron-activated dissociation (Sciex ZenoTOF 7600). Tunable electron energy; pseudo-ETD without protonation requirement; useful for glycoproteomics, lipidomics, PTMs.
Acquisition modes
DDA — data-dependent acquisition
The original shotgun workflow. MS1 survey scan (full mass range, e.g., 350-1500 m/z) → pick top N (typically 10-30) most-intense precursors → isolate each with quadrupole (1-2 m/z window) → fragment → MS2 spectrum. Loop. Dynamic exclusion (~20-30 s) prevents re-selecting the same peptide.
Advantages: spectra are clean (one peptide per MS2 ideally); database search is straightforward. Disadvantages: stochastic — only most-intense peptides at each duty cycle are sampled; missing values across runs (40-60% of features in any single sample); peak-picking biased to high-abundance proteins.
DIA — data-independent acquisition
Pioneered by Aebersold (Gillet et al. 2012 MCP — SWATH-MS for Sciex TripleTOF). MS1 → MS2 isolation windows of fixed width (10-25 m/z) across full range → fragment all precursors in each window → composite MS2 spectrum. Loop. Every precursor in the m/z range gets fragmented every cycle → no stochasticity, no missing values across runs.
Advantages: complete sampling; low missing-value rate (5-20%); excellent run-to-run reproducibility; suits clinical and population-scale work. Disadvantages: chimeric MS2 spectra (multiple co-eluting peptides in same isolation window); requires spectral library or library-free search; much heavier compute.
Variants:
- SWATH (Sciex TripleTOF) — original DIA.
- DIA on Orbitrap. Q Exactive HF, Exploris 480, Eclipse, Astral. Astral with 0.5-2 m/z windows + 200 Hz MS2 → near-DDA selectivity at DIA coverage.
- PASEF-DIA / diaPASEF (Bruker timsTOF). DIA in ion-mobility dimension; sensitivity gain.
- WiSIM-DIA, Scanning-SWATH, BoxCar-DIA. Variants optimizing different aspects.
- µDIA, sciDIA, fastDIA. Method names for short gradient + narrow window combinations.
PRM / SRM — targeted
PRM (parallel reaction monitoring; Peterson-Mann 2012) — Orbitrap-based targeted method; isolate one precursor at a time, full MS2 scan. Quantify fragment ion intensity sum. SRM/MRM (selected/multiple reaction monitoring) — triple quadrupole; Q1 selects precursor m/z, q2 fragments, Q3 selects specific fragment m/z. Several transitions per peptide. Cheapest hardware for absolute targeted quant; gold standard for clinical assays (FDA bioanalytical method validation).
DDA-PASEF
Bruker timsTOF DDA with TIMS pre-fractionation. ~30 Hz MS2; deep proteomes in 30-90 min gradients.
Quantification chemistries
Label-free quantification (LFQ)
- MS1 intensity-based. Integrate precursor extracted ion chromatogram (XIC); align across runs with retention-time normalization. MaxLFQ (Cox-Mann 2014), directLFQ (Spectronaut 18+), DIA-NN MaxLFQ. Cross-experiment normalization (median, quantile, LIMMA-based).
- Spectral counting. Number of MS2 spectra assigned to a protein. Legacy; superseded by intensity-based.
Advantages: no labeling cost; scalable to thousands of samples; works on any input. Disadvantages: requires very tight chromatography reproducibility for cross-run comparison; less accurate at low end of dynamic range.
TMT — tandem mass tags
Thompson et al. 2003 (Proteome Sciences), commercialized by Thermo. Isobaric: all label variants have same total mass at MS1, but cleave during MS2 to release reporter ions at different m/z. Sample-multiplex quantification at the MS2 (or MS3) level.
TMT 6-plex → 10-plex → 11-plex → 16-plex (TMTpro 2020) → 18-plex (TMTpro 18plex 2024) → 35-plex announced 2025 (Pierce-Thermo TMTpro+).
Workflow: label each sample with one TMT channel → pool → fractionate → LC-MS/MS → identify peptides by MS2 sequence ions → quantify by reporter ion intensities in low-mass MS2 region.
Ratio compression problem: co-isolation of contaminant peptides in the isolation window contaminates reporter ions toward “1:1:1” baseline. Mitigations:
- SPS-MS3 (synchronous precursor selection MS3; McAlister-Gygi 2014). Top N fragment ions from MS2 selected and re-fragmented in MS3; reporters now isolated from co-isolation contamination. Adds time per duty cycle.
- TMTc, TMTc+. Complementary reporter ions at higher mass; less affected by low-mass MS2 noise.
- FAIMS or TIMS pre-filtering. Reduces co-isolation.
- Narrow isolation windows (0.4-0.7 m/z); requires high-resolution quadrupole.
iTRAQ (Sciex; 4-plex, 8-plex) is older, similar concept; mostly historical now.
SILAC
Mann lab 2002. Cells grown in media containing 13C/15N-labeled lysine and arginine (“heavy”) vs natural (“light”) media. Mix samples 1:1 after extraction; MS1 peak doublets (light + heavy) quantify in single LC run. Triple SILAC adds an intermediate (“medium”) label.
Pros: MS1-level quant (no reporter chemistry); high accuracy. Cons: only works on culturable cells/animals (SILAC mouse — Mann 2008 — feeding 13C6-Lys diet); two-state typical, max three-state.
Other labeled methods
- dimethyl labeling (Boersema-Heck 2009). Cheap (formaldehyde + reductant); 3-plex per LC-MS run.
- AHA / pSILAC — incorporate methionine analog azidohomoalanine; click-pull-down newly synthesized proteins; turnover dynamics.
- mTRAQ (Sciex; nonisobaric Sciex variant of iTRAQ). Used in plexDIA (Slavov-Demichev) for single-cell DIA multiplexing.
Absolute quantification
- AQUA. Spike heavy-isotope-labeled synthetic peptide standard; quantify endogenous peptide vs known amount. Per-peptide standard cost limits scale.
- QconCAT, QPrEST, FlexiQuant. Concatenated standard peptide proteins; cheaper per-protein.
- iBAQ, top3, MaxLFQ-based “absolute.” Approximate; useful for relative protein-rank within sample.
Software and search engines
Bottom-up DDA search engines
- Sequest (Yates-Eng 1994) — the original; integrated into Proteome Discoverer.
- Mascot (Matrix Science 1999) — commercial; widely cited; web-based search.
- Andromeda (Cox-Mann 2011) — integrated into MaxQuant; free.
- MSFragger (Nesvizhskii-Yu, Michigan 2017) — fragment-ion indexing; ~50× faster than Sequest/Mascot. Powers FragPipe pipeline.
- Comet (Eng 2013) — open-source Sequest-like.
- MS-GF+ (Pevzner-Kim) — improved scoring; open-source.
- Tide / Crux (Noble UW).
- PEAKS (BSI; de novo + database hybrid).
Validation / FDR
- Target-decoy. Search against target DB (forward) + decoy DB (reversed or scrambled). Estimate FDR from decoy hit rate at any score threshold; report 1% protein/peptide FDR.
- Percolator (Käll-Noble 2007). Semi-supervised SVM/neural-net re-scoring of PSMs using target-decoy.
- Mascot Percolator, Q-ranker, MS Amanda + Percolator, SAGE.
DIA software
- Spectronaut (Biognosys; commercial, expensive but excellent UX). Library-based and library-free directDIA.
- DIA-NN (Demichev-Ralser 2020 Nat Methods). Open-source; ML-based scoring; library-free or library-aided. Has displaced Spectronaut in many academic and public-data analyses 2022-2026.
- Skyline (MacCoss UW). Open-source; targeted (PRM/SRM) plus DIA. Spectral library and method development.
- OpenSWATH (Aebersold-Röst 2014). Original community SWATH tool.
- EncyclopeDIA, MaxDIA, Scaffold-DIA, FragPipe-DIA. Increasingly converging on similar performance.
Spectral libraries
- In-house DDA-based. Run pooled fractionated sample in DDA → build library → use for DIA quantification.
- Predicted libraries. Prosit (Wilhelm-Wilhelm 2019) — deep neural net predicts MS2 spectra and iRT from peptide sequence. DeepMS, AlphaPeptDeep, MSFragger DDA→DIA. Reduces need for empirical libraries; especially valuable for low-input samples.
- ProteomicsDB, ProteomeTools — synthetic peptide library reference spectra.
Public repositories
- PRIDE (EBI; >1 M datasets; the GEO of proteomics).
- MassIVE (UCSD; integrates with PRIDE via ProteomeXchange).
- jPOST (Japan ProteOme STandard Repository).
- iProX (China).
- Panorama Public — Skyline targeted assays.
ProteomeXchange Consortium coordinates submissions across PRIDE, MassIVE, jPOST, iProX. PXD prefix dataset identifiers (e.g., PXD012345).
Protein-level inference
PSMs → peptides → proteins is non-trivial because of shared peptides (razor peptide assignment). Methods: ProteinProphet, Mayu, IDPicker, MSstats. MaxQuant’s protein groups handle shared-peptide ambiguity by grouping indistinguishable proteins.
Post-translational modifications (PTMs)
PTMs are the rich combinatorial layer of regulation. ~250 known PTM types; ~20 broadly studied. Mass-spec mapping is the leading method.
Phosphoproteomics
Phospho-Ser/Thr/Tyr — +79.966 Da delta. Enrichment essential (phosphopeptides are typically <1% of total peptides at biological abundances; pSer:pThr:pTyr ≈ 90:10:0.1).
Enrichment:
- IMAC — immobilized metal affinity chromatography. Fe³⁺-NTA, Fe³⁺-IDA (TitanSphere TiO₂ commercially most common workhorse).
- TiO₂. Titanium dioxide; older, still widely used; less specific for multiply-phosphorylated peptides.
- HighSelect Fe-NTA (Thermo) — pre-packaged spin columns; reproducible.
- EasyPhos (Mann lab) — single-step enrichment from crude lysate, very simple, scales to hundreds of samples per week.
Workflow: digest → desalt → enrich → optionally fractionate → LC-MS/MS with HCD + ETD (or EThcD) → search with PhosphoRS / Andromeda / Spectronaut site localization.
Phosphosite localization: Ascore (Beausoleil-Gygi 2006), ptmRS, PhosphoRS, MD-score, localization probability. Distinguishes pSer/pThr at S/T-rich tryptic peptides.
Databases: PhosphoSitePlus (Hornbeck, CST; curated phosphosites + functional annotation; >300k sites), Phospho.ELM, MIST/HIPHIN integrations.
Glycoproteomics
N-glycosylation at NXS/T (X ≠ P) Asn, O-glycosylation at S/T (no consensus). Glycans add 800-3500 Da typical; immense isomeric diversity.
Approaches:
- Bottom-up glycopeptide MS. Enrich by HILIC, lectin (ConA, WGA), or boronate. Fragment with HCD-stepped or EThcD. Software: pGlyco (Liu lab), Byonic (Bern-Protein Metrics), GPSeeker, MSFragger-Glyco.
- Released-glycan. PNGase F releases N-glycans; analyze with MALDI or LC-MS or HILIC-FLD with 2-AB labels.
- IgG glycoproteomics. Drives biopharma QC — IgG1 Fc N-glycan composition affects ADCC/CDC; ~60% of mAb characterization is glycan analysis (HILIC-FLD post-PNGase F).
- Top-down/middle-down glycoproteomics. Intact glycoform analysis on Orbitrap or FT-ICR with ETD/UVPD.
GlyTouCan database; Symbol Nomenclature for Glycans (SNFG); GlyConnect (ExPASy).
Ubiquitin and ubiquitin-like
- K-ε-GG remnant. Tryptic digestion of ubiquitin-conjugated proteins leaves di-Gly (GG, +114.04 Da) on K-modified peptide. Anti-K-GG antibody enrichment (PTMScan from CST) → LC-MS/MS. Identifies ubiquitination sites globally.
- Linkage-specific. K48 vs K63 vs K11 ubiquitin chain linkages diagnosed by ratio of K-GG peptides at each Ub Lys.
- NEDD8, SUMO, ISG15 — analogous Ubl proteins; similar enrichment workflows but with Ubl-specific epitope antibodies.
Acetylation, methylation
K-ac (+42.011 Da), K-me/me2/me3 (+14/+28/+42; methylation isobaric with acetylation at me3). Enrichment with acetyl-K antibody (PTMScan, Cell Signaling). Histone PTM combinatorial — middle-down or bottom-up with arginine-derivatization workflows.
ADP-ribosylation, palmitoylation, lipidation
Specialized enrichments (Af1521 macrodomain for ADP-ribose; acyl-biotin exchange for palmitoyl-Cys; click-chemistry for AHA-labeled lipid mimics). Cross-link cell-molecular-biology.
Crosslinking-MS (XL-MS)
Cross-link two residues in a protein/complex with a bifunctional reagent → digest → identify crosslinked peptides → triangulate residue distance constraints for structure modeling.
Cross-linkers
- DSSO (disuccinimidyl sulfoxide). MS-cleavable; spacer 10.3 Å; targets Lys ε-amine. Cleaves under CID to give characteristic alkene/sulfenic-acid signature ions → simplifies identification.
- DSBU (disuccinimidyl dibutyric urea). MS-cleavable; spacer 12.5 Å.
- BS3, DSS, DSG. Non-cleavable Lys-Lys; classical Sinz workhorse.
- PhoX, leiker, tBuPhoX. Photo-cleavable or acid-cleavable.
- EDC + sulfo-NHS. Lys-Asp/Glu zero-length.
- SDA, NHS-diazirine. Photo-activatable; broad target (any C-H bond within distance).
Workflow
-
Crosslink intact protein/complex (typically 1-100× molar excess; quench with Tris/NH4HCO3).
-
Digest (trypsin or trypsin/Lys-C).
-
Enrich for crosslinked peptides — size-exclusion (SEC), SCX, anti-DSS antibody.
-
LC-MS/MS with stepped-HCD or EThcD.
-
Search with crosslink-specific software:
- pLink2 (He lab) — high sensitivity, MS-cleavable + non-cleavable.
- MeroX — MS-cleavable.
- Kojak (Hoopmann-Moritz) — open-source, integrates with Trans-Proteomic Pipeline.
- xQuest/xProphet (Aebersold-Leitner).
- XlinkX in Proteome Discoverer (Heck).
-
Convert XL hits → distance restraints (~Cα-Cα < 30 Å for DSSO/DSS) → integrate into structure modeling (HADDOCK, Integrative Modeling Platform IMP, Modeller, AlphaFold restraints).
XL-MS is now standard complementary tool for cryo-EM-resolved complexes — particularly for flexible regions, transient interactions, and intrinsically disordered proteins. Used to validate ribosome structures, spliceosome, nuclear pore complex (Beck-Aebersold consortia), proteasome, ATP synthase.
HDX-MS
Hydrogen-deuterium exchange MS measures amide hydrogen exchange rates → reveals solvent accessibility, dynamics, binding-induced protection.
Workflow
- Protein in H₂O equilibrium.
- Dilute 10× into D₂O buffer (95-99% D); incubate 10 s, 1 min, 10 min, 1 h, 10 h time points.
- Quench (low pH, low T) at each time point.
- Pepsin digest at 0 °C (pepsin works at pH 2.5; minimizes back-exchange).
- LC-MS at 0 °C, fast 5-min gradient.
- Mass shift of each peptide vs unlabeled → fraction deuterated → fit exchange kinetics.
- Compare ± ligand, ± mutation → identify protected (slower exchange) or exposed (faster exchange) regions.
Software: HX-Express, HDExaminer (Sierra Analytics), Mass Spec Studio, DECA, HDXer. ExMS2 (Englander lab) for residue-level deconvolution.
Applications: epitope mapping (mAb-antigen interface), allosteric site mapping, ligand-binding-site identification, intrinsically disordered region characterization, biosimilar comparability for biopharma.
Commercial HDX system: Waters HDX Manager, Trajan LEAP HDX PAL, Sierra HDXmgr. ExclusiveHDX-MS service: NanoTemper, Malvern Panalytical.
Native MS
ESI from ammonium acetate buffer preserves non-covalent interactions → measures intact protein complex masses, stoichiometries, ligand-binding constants.
Instrumentation: Waters Synapt G2-Si or Cyclic IMS with high-mass quadrupole; Bruker timsTOF Pro 2 with sapphire skimmer; Thermo Q Exactive Orbitrap UHMR (ultra-high mass range, 2020); Thermo Q Exactive UHMR replaced by Astral UHMR.
Charge-reducing additives (TEAA, imidazole) and gentle source conditions preserve complex integrity. Can mass complexes >1 MDa (proteasome, ribosome, viral capsids — Heck consortium).
Applications: stoichiometry of homo/heteromers, ligand-binding (KD via titration), drug-target engagement, antibody-antigen complexes, IgG glycoform distribution at intact-protein level.
Affinity-MS interactomics
AP-MS — affinity purification mass spectrometry
Bait protein expressed with affinity tag (FLAG, HA, GFP, V5, His, MBP, Strep-tag II) or endogenously knocked-in (CRISPR-engineered tag). Pull down with anti-tag bead; wash; elute; trypsin digest on-bead or in-solution; LC-MS/MS; identify co-purifying proteins (preys).
Statistical scoring:
- SAINT (Choi-Nesvizhskii 2011 Nat Methods). Probability-based contaminant filtering.
- MiST / CompPASS (Krogan, Harper labs).
- ProHits-viz (Gingras lab).
BioPlex (Harper-Gygi-Wade) is the canonical large-scale AP-MS interactome — >100,000 interactions across thousands of bait proteins in HEK293T.
BioID / TurboID / APEX2 — proximity labeling
Bait fused to promiscuous biotin ligase (BirA* in BioID, BioID2 smaller; TurboID, miniTurbo — Branon-Ting 2018 Nat Biotechnol — much faster labeling in 10 min vs 18 h BioID); biotin substrate added → labels nearby (≤10 nm) proteins → streptavidin pull-down → MS.
APEX2 (Lam-Ting 2015 Nat Methods) — engineered peroxidase + biotin-phenol + H₂O₂ for 1-min labeling pulses; spatial resolution ~20 nm. Used in organelle proteomics (mitochondrial matrix vs IMS vs OMM), synaptic protein networks (Schiapparelli-Cline).
Advantages over AP-MS: detects transient and weak interactions, preserved native cellular context (no lysis dispersion). Disadvantages: signal-to-noise lower than direct AP-MS.
Co-fractionation MS / Thermal proximity profiling
CF-MS — fractionate native lysate by size (SEC) or charge (IEX), MS-profile each fraction → co-elution patterns predict complexes. Heuristics: COMPASS, EPIC, PrInCE.
TPP (thermal proteome profiling; Mathieson-Savitski-Drewes 2014 Science) — heat lysate at series of temperatures, soluble fraction analyzed by TMT → drug-binding stabilizes target → Tm shift identifies drug targets in whole-proteome. CETSA-MS (Molina-Nordlund 2013).
Single-cell proteomics
Frontier: identify and quantify thousands of proteins per single mammalian cell.
Lysate-based MS
- SCoPE-MS / SCoPE2 (Slavov lab; Specht-Slavov 2018, 2021 Nat Methods/Genome Biol). Single cells lysed in 1 µL each, TMT-labeled, multiplexed 14-16-plex with a “carrier” channel (~100× protein input from bulk lysate) to boost MS1 ions → reporter ions from single-cell channels quantifiable. ~1500-3000 proteins per single cell.
- nanoPOTS — nanodroplet processing in one pot for trace samples (Zhu-Kelly-Smith PNNL 2018 Nat Comm). Sub-µL volumes (~200 nL); picogram-input sample prep. Coupled with Orbitrap or timsTOF.
- plexDIA (Demichev-Ralser-Slavov 2022 Nat Biotechnol). mTRAQ 3-plex DIA single-cell.
- N2 / SCP-DIA. Single-cell DIA without isobaric labels — emerging as cleanest workflow with Astral.
- DISCO-MS, T-SCP, AccuTrap, OAD. Various startup commercial platforms.
State of the field (2026): ~3-5k proteins per single mammalian cell from Astral or timsTOF SCP routinely. Cell-type rare-population biology, drug-perturbation response heterogeneity.
Single-molecule protein sequencing (non-MS)
- Quantum-Si Platinum (2022 launch). Fluorescent N-terminal-degradation single-molecule sequencer. ~16 amino-acid recognition; modest dynamic range; ~30 protein identification per cell vs 100s on MS but lower instrumentation cost.
- Erisyon, Nautilus, Encodia (ProteoCode) — competing fluorescent recognition platforms with different chemistry.
- Oxford Nanopore protein sequencing — early R&D 2023-2025; promise but unrealized at scale.
Clinical proteomics
Plasma proteomics
The plasma proteome spans ~10¹⁰-fold dynamic range (albumin 30-50 mg/mL → cytokines 1-10 pg/mL). Deep coverage requires either depletion (top-12, top-14 high-abundance protein removal — Sigma SEPPRO, Agilent MARS) or alternative ionization-friendly platforms.
UK Biobank Pharma Proteomics Project (PPP) — Olink Explore HT (5000 plasma proteins) on 54k participants (2023 first release), scaling to 250k+ by 2026. Generates pQTL maps, biomarker discovery, drug-target validation. Sun-Surendran-Howson-Butterworth 2023 Nature; Pietzner-Wheeler-Langenberg 2021 Science. Cross-link genetics-and-genomics.
Tissue proteomics / spatial proteomics
- LC-MS on microdissected regions (LCM-MS).
- MALDI-MS imaging. Spatial resolution ~10-50 µm; identifies lipids, drugs, small proteins/peptides direct from tissue. Bruker rapifleX MALDI Tissuetyper, timsTOF flex; Waters MALDI Synapt.
- DESI-MS imaging. Ambient ionization on tissue; spatial ~50-100 µm; used in REIMS Intelligent Knife (Takats; mass-spec margin assessment during surgery).
- CODEX / IBEX / IMC (imaging mass cytometry, Fluidigm-Standard BioTools) — antibody-based but multiplexed (40+ markers in IMC). Bridges proteomics and pathology.
Biomarker discovery
Discovery → verification → validation pipeline (FDA, CLIA, IVDR). Most proteomic biomarker discoveries fail validation. Successes:
- Troponin I/T (cardiac MI); now point-of-care.
- PSA (prostate cancer screening; controversial).
- PCT (procalcitonin) — bacterial vs viral infection differentiation.
- NT-proBNP — heart failure.
- OVA1, ROMA — ovarian cancer (multiprotein algorithms).
- ColonView, EarlyCDT — multiprotein cancer panels (mixed FDA/CE status).
Olink and SomaScan generate hundreds of candidate plasma biomarkers per year; mAb-based validation lags. Translation to clinic remains the bottleneck.
Practical workflows
Standard bottom-up DDA experiment (HeLa lysate)
- Lyse 1-5 × 10⁶ HeLa cells in 5% SDS + 100 mM TEAB + protease inhibitors (PIC); sonicate; clarify by centrifugation 16,000×g 10 min.
- BCA assay → take 50-100 µg.
- S-Trap mini protocol: reduce (TCEP 5 mM 15 min RT), alkylate (MMTS 10 mM 15 min), acidify with PA 1.2% → bind to S-Trap → wash → trypsin 1:25 enzyme:substrate, 47 °C 1 h.
- Elute peptides; dry in SpeedVac; resuspend in 0.1% FA + 2% MeCN.
- C18 cleanup if not already; quantify peptide (BCA or NanoDrop A280).
- LC: 1 µg on 25 cm × 75 µm Aurora C18 (or Thermo PepMap C18, Waters HSS T3), 90 min gradient 2-30% B (0.1% FA in MeCN), 300 nL/min.
- MS: Orbitrap Exploris 480 or Astral, DDA top-15, 1×10⁶ AGC, 60k MS1 resolution, 15k MS2 resolution, HCD NCE 30, dynamic exclusion 30 s.
- Search MaxQuant or FragPipe → UniProt human proteome (~20k canonical + isoforms) + Andromeda/MSFragger; FDR 1% protein + peptide; LFQ enabled.
- ~6000-9000 proteins from single-shot 90-min HeLa with Exploris 480; ~10000+ with Astral.
Phosphoproteomics enrichment (5 mg input)
- Lyse 50-100 × 10⁶ cells in 8 M urea + 50 mM TEAB + 1× phosphatase inhibitor (PhosSTOP); sonicate.
- Reduce/alkylate.
- Digest: dilute urea to 2 M with TEAB; Lys-C (1:100) 4 h 37 °C; trypsin (1:50) overnight 37 °C.
- C18 desalt (Sep-Pak C18 1 cc 100 mg or in-house StageTip equivalent).
- EasyPhos protocol: bind to TiO₂ in 80% ACN / 5% TFA / 1 M glycolic acid; wash; elute with 40% ACN / 15% NH₄OH.
- Re-acidify; C18 desalt.
- LC-MS/MS: Exploris 480 with TopN 12, HCD NCE 27, 50k MS2 resolution.
- MaxQuant phospho search; PTMRS site localization >0.75; FDR 1%.
- ~15,000-25,000 phosphosites quantified from 5 mg cell lysate.
TMT 18-plex workflow
- Process 18 samples in parallel through digestion (S-Trap or in-solution).
- ~100 µg peptide per sample; quantify by BCA on peptide.
- Label each sample with one TMTpro 18-plex channel (Pierce); 1 h RT; quench with hydroxylamine.
- Combine 18 samples 1:1; desalt; high-pH RP fractionate into 24-48 fractions.
- LC-MS/MS each fraction: Eclipse Tribrid with SPS-MS3 method; HCD NCE 36 MS2, HCD NCE 65 MS3, top-10 SPS notches.
- Proteome Discoverer with Sequest HT + Mascot + Percolator; TMT reporter quantification at MS3.
- ~9000 proteins quantified across 18 samples; CV 5-15% biological replicates.
Sample preparation deep dive
The single biggest determinant of proteomic data quality is sample prep — not the MS instrument. Modern best practices:
Lysis buffer choice
- 5% SDS + 50 mM TEAB. Maximum protein extraction; SDS-tolerant downstream methods needed (S-Trap, suspension trap).
- 8 M urea + 50 mM TEAB / TFE. Chaotropic; mild reduction in 4 M urea after Lys-C, 2 M urea after trypsin dilution.
- GdnHCl 6 M. Stronger chaotrope; harder to remove than urea; common for membrane proteins.
- RIPA, NP-40, Triton. Detergent lysis for membrane/organelle prep; detergent removal essential before MS.
- MeOH-CHCl₃-water (Bligh-Dyer modified). Whole-cell precipitation; preserves PTMs; good for lipid-rich samples.
Detergent removal
- SP3. Hughes-Coleman 2014, 2019. Carboxylate magnetic beads bind protein under aqueous-organic conditions; wash off detergent; on-bead digest. Streamlined and robust.
- S-Trap (Protifi). SDS + acid + MeOH → trapped on quartz filter; wash; on-membrane digest. Tolerates 5% SDS input.
- iST / PreOmics. All-in-one cartridge; spin-column based; minimal pipetting.
- Acetone precipitation. Classical; 5× volume cold acetone overnight. Time-consuming but cheap.
- MeOH-CHCl₃ precipitation. Wessel-Flügge 1984; works for small volumes but loses some proteins.
Tip and column conditioning
C18 desalt: Pierce PepClean, Thermo OMIX, Phenomenex Strata-X, Waters Sep-Pak μElution. Wet → equilibrate (0.1% TFA in water) → load → wash → elute (60-80% MeCN/0.1% TFA) → dry. Critical for removing salts that suppress ESI.
Common contaminants
- Keratins — from skin, dust, lab coats. Pre-blank gel lanes, gloves, hood. cRAP database for filtering.
- Polymers (PEG, Triton, polyglycols). From detergents, plasticware. Wash glassware with MeCN.
- Trypsin self-digestion peptides. Inevitable; filtered post-search.
- Streptavidin from biotin pulldowns. Cleavage products.
- BSA from cell culture media or BSA-bead carryover.
LC method optimization
Gradient design
Steeper gradient = faster but fewer IDs; shallower = deeper but slower. Typical:
- 30-min gradient — screening, single-shot QC; ~3500-5000 proteins.
- 60-min — single-shot deep proteome; ~6000-8000 proteins.
- 90-120 min — DDA single-shot best; ~8000-10000 proteins.
- 3-h gradient + fractionation (high-pH RP into 6-12 fractions) — phosphoproteomics with deep coverage.
Column: 25-75 cm × 75 µm i.d. fused-silica with C18 1.6-1.9 µm particles (IonOpticks Aurora, Thermo PepMap, Waters HSS T3). Pulled-tip emitter integrated. Column temperature 50-60 °C for sharper peaks.
Flow rate 200-400 nL/min for nano; 1-5 µL/min for micro-flow (higher robustness; ~2× sensitivity loss). EvoSep One: pre-formed gradient on disposable tips → 30, 60, 100, 200 samples-per-day workflows; high reproducibility but lower depth than custom gradient.
Two-column setup
Loading on a short trap (3 cm × 100 µm; 3 µm C18) at high flow (5-10 µL/min) → desalt → backflow elute onto analytical column at nano flow. Reduces analysis time and protects analytical column.
Direct injection (single-column)
Skips trap; injects directly onto analytical column. Faster cycle; better peak shape if loading conditions optimized.
Data analysis pipelines in detail
FragPipe (MSFragger)
Open-source pipeline:
- Convert RAW → mzML (msconvert, ProteoWizard) or use directly.
- MSFragger search with target-decoy.
- PeptideProphet / iProphet for re-scoring.
- ProteinProphet for protein-level FDR.
- IonQuant for label-free quant or TMT-Integrator for TMT.
- PhilosopherFD output to TSV/Excel.
Strengths: speed (5-10× MaxQuant), advanced features (open-search PTM discovery, immunopeptidomics, glycoproteomics with FragPipe-Glyco), DIA support.
MaxQuant
Cox-Mann 2008+. Powerful default but slow (overnight for 20-30 raw files on 12-core workstation). Andromeda search; protein groups; LFQ with delayed normalization; many statistical helpers in Perseus downstream tool. Still standard for SILAC and many TMT labs.
Spectronaut
Biognosys commercial. Excellent UX; directDIA (library-free); deep-learning Pulsar predicted libraries; cross-run alignment. Industrial / clinical standard for DIA.
DIA-NN
Free, open-source. ML-trained scoring (RT, ion mobility, MS2 intensity prediction). DIA-NN 1.8+ supports timsTOF PASEF-DIA, Astral, multiple TMT-DIA variants. By 2024 effectively the field default for DIA processing in academic labs.
Skyline
MacCoss UW. Open-source. Originally for SRM/PRM; now also DIA. Used for targeted assay development, biomarker verification, ultra-precise quant of small panels (e.g., 20-100 proteins across hundreds of clinical samples).
Statistical post-processing
- Perseus (Cox-Mann). t-tests, ANOVA, PCA, hierarchical clustering, GO/KEGG enrichment.
- MSstats (Choi-Vitek). Linear-mixed-effects for cross-condition comparisons; gold standard for label-free DIA.
- limma-prot. Borrowed from microarray; empirical Bayes moderated t-tests.
- DEqMS (Zhu-Lehtio). Variance-trend correction.
- R/Bioconductor (DEP, prolfqua, QFeatures).
- Python (pyOpenMS, AlphaPeptStats).
Imaging mass spectrometry
MALDI imaging
Spatial mass spec on tissue. Sections (10 µm cryostat) + matrix (CHCA, DHB, SA — sprayed via HTX TM-Sprayer or sublimation) → MALDI laser raster → image of m/z vs (x,y).
Resolution: ~10-50 µm (limited by laser spot, matrix crystal size). Bruker rapifleX TissueTyper, timsTOF flex, Waters MALDI Synapt. Software: SCiLS Lab (Bruker), HD-Imaging (Waters), Cardinal MSI (R).
Applications: drug distribution in tissue, tumor margins (lipidomic distinction), spatial metabolomics, peptide imaging (with on-tissue digestion).
DESI-MS imaging
Ambient ionization with desorbing solvent spray. Lower spatial resolution (~50-100 µm) but no matrix needed. Used in iKnife (Takats Imperial) — surgical mass spec for real-time tumor margin identification during oncology surgery.
MIBI, IMC (imaging mass cytometry)
Antibody-based; metal-tagged antibodies; ion-beam ablation. ~40-100 markers per tissue section at 1-µm resolution. Spatial proteomics at the antibody-resolution level (vs MALDI’s untargeted ~10 µm).
Proteogenomics
Integrate genomics (variant calling), transcriptomics (isoform inference), and proteomics. Custom protein database from RNA-seq translated ORFs → search MS data → identify peptides corresponding to single-amino-acid variants (SAAVs), novel splice junctions, fusion proteins, neoantigens.
CPTAC (Clinical Proteomic Tumor Analysis Consortium) — multi-cancer proteogenomic atlases. Cross-link genetics-and-genomics for genome-level methodology.
Neoantigen discovery
Cancer-specific somatic mutations create neopeptides presented on MHC. Workflow:
- Tumor + normal exome → call somatic SNVs.
- RNA-seq → confirm expression.
- MHC binding prediction (NetMHCpan, MHCflurry).
- MS validation — immunopeptidomics: HLA-IP (antibody pulldown of MHC complexes from tumor lysate) → MS identification of presented peptides.
- Cross-reference predicted neoantigens with MS-confirmed presented peptides.
Companies: Gritstone bio, BioNTech (mRNA cancer vaccine pipeline), Moderna (mRNA-4157, partner Merck), NEC/Vaximm, Genocea (closed), Personalis (informatics).
Quality control and reproducibility
Internal standards
- iRT peptides (Biognosys). 11 synthetic peptides spanning RT range; spike-in for retention-time normalization across runs and labs.
- PROCAL. 40-peptide retention-time + intensity standard.
- Pierce HeLa standard. Whole-cell digest reference for column / instrument QC.
Repeated injections
Inject HeLa standard at start of week, every ~24 samples, and at end. Track ID counts, RT drift, MS1 mass accuracy, peak shape. CV across runs <10% for top half of intensity range.
Carryover
Wash blanks between samples. Carryover from very abundant peptides (BSA, mAb candidates) can persist 3-5 blanks; monitor with control mAb. Trap-elution and analytical column independent washes.
Cross-lab reproducibility
ABRF sPRG (Association of Biomolecular Resource Facilities standardized Proteomics Research Group) annual studies; ProteomeXchange-coordinated multi-lab benchmarks; HUPO Proteomics Standards Initiative.
Affinity-based proteomics platforms — extended
Olink Proximity Extension Assay (PEA)
Pairs of antibodies conjugated to complementary single-stranded DNA tags; dual binding brings DNA into hybridization range → ligation + amplification by qPCR (Olink Target 96) or NGS (Olink Explore HT, Explore 3072).
Coverage: Target 96 panels (96 markers); Explore 1536 (1463 markers); Explore HT (5416 markers in 4 panels, NGS readout, single 96-well plate processes 88 samples + 8 controls).
Sensitivity: ~ng/mL to fg/mL range across analytes. Dynamic range ~10⁹.
Reproducibility: CV 5-15% across plate; high inter-lab concordance demonstrated in UK Biobank PPP rollout.
Limitations: requires antibody pair for each analyte → discovery limited by antibody library; many proteins have only one good antibody.
SomaLogic SomaScan
SOMAmers (Slow Off-rate Modified Aptamers) — chemically modified ssDNA aptamers with hydrophobic side chains (5-deoxyaminouridine variants) that mimic antibody contact surface. ~7000 proteins in SomaScan v4.1; 11,000 in v4.2 (2024 release). Workflow:
- Bind SOMAmer mix to immobilized protein in plasma.
- Capture protein-SOMAmer complexes on streptavidin beads.
- Photocleave biotin linker; elute SOMAmer.
- Hybridize to printed array or NGS readout.
Pros: huge coverage, single-platform. Cons: SOMAmer cross-reactivity sometimes higher than mAb pair; “noise floor” issues for very-low-abundance analytes.
Comparison
UK Biobank rolled out Olink Explore HT for ~50k participants by 2023 (Sun-Surendran et al. 2023 Nature). deCODE and Iceland cohorts use SomaScan. Cross-platform comparisons (Pietzner-Wheeler-Langenberg 2021 Science) show modest concordance (~50-70%) for high-abundance analytes, lower for low-abundance — drives demand for ground-truth MS comparator panels.
Other affinity platforms
- Alamar NULISA. Nucleic acid linked immuno-sandwich assay; competing with Olink.
- Quanterix Simoa. Single-molecule array; ultra-high sensitivity (fg/mL); per-analyte assay (low multiplex).
- MSD (Meso Scale Discovery). Electrochemiluminescence; multiplexed; clinical assays.
- Luminex xMAP, Bio-Plex. Bead-based multiplex immunoassay; mid-multiplex (<100).
- NanoString nCounter, GeoMx, CosMx. Hybridization-based; multiplexed; spatial.
Targeted assays and PRM panels
PRM panel development workflow
- Define target proteins (~20-200).
- In silico tryptic digest; pick 2-5 proteotypic peptides per protein (unique to target, no PTM site, no missed cleavage, no Met/Cys when possible, 8-25 aa).
- Spike in heavy-labeled synthetic peptide standards (Sigma AQUA, JPT SpikeTides).
- Develop scheduling and dwell time on Orbitrap or QQQ.
- Validate on QC pools; verify linearity, LLOQ, ULOQ, intra/inter-run CV.
- FDA bioanalytical method validation for clinical use: accuracy ±15% (20% at LLOQ), precision <15%.
Clinical PRM examples
Apolipoproteins (ApoA-I, ApoB, ApoC3, ApoE) — cardiovascular risk; LDL-cholesterol estimation by isoform-resolved ApoB. SISCAPA (stable-isotope standards capture by anti-peptide antibody) — Anderson 2004; enriches target peptides before LC-MS for ultra-low-abundance plasma proteins.
Distinguishing from immunoassay-based clinical chemistry
MS-based targeted assays (PRM, SRM) report peptide-level signal, immune to many immunoassay interferences (HAMA, heterophile antibodies, biotin interference). Increasingly preferred in clinical-chemistry-grade plasma assays — Mayo Clinic, ARUP, Quest, Labcorp run dozens of MS-LDTs.
Software ecosystem
File format zoo
- Raw vendor format — .raw (Thermo), .wiff (Sciex), .d (Bruker, Agilent), .lcd (Shimadzu).
- Open standards — mzML, mzXML (deprecated), mzMLb (compressed; HDF5-backed), Mascot Generic Format (MGF), mzTab (results), mzIdentML (identification).
- Spectral libraries — NIST .msp, BiblioSpec .blib, OpenSWATH .pqp, OpenMS .sqMass.
- Open community pipelines. Snakemake/Nextflow-wrapped MaxQuant/FragPipe; nf-core/proteomicslfq, nf-core/quantms.
Cloud platforms
- Genentech/Roche proteomic platforms (internal).
- Vermont Proteomics, Indiana Proteomics, Broad Proteomics (academic core facilities with cloud-deployable pipelines).
- AWS S3-backed PRIDE submissions — modern data-deposition rebuild.
- DIA-NN-PIB, AlphaPept on cloud — emerging compute-as-a-service.
AI in proteomics
- Prosit, AlphaPeptDeep, Casanovo — peptide MS2 spectrum prediction → DDA → DIA library generation without empirical runs.
- DIA-NN ML-scoring — neural-net classifier on RT, ion-mobility, MS2.
- PEPNet, DeepNovo, Casanovo (Bittremieux), InstaNovo — de novo peptide sequencing from MS2 without database; growing importance for non-model organisms and immunopeptidomics.
- AlphaFold for proteomics. Inference of complex-partner identity, PTM site context, drug-binding-pocket priors.
Major proteome atlas projects
Human Proteome Project (HUPO)
Global effort to catalog every human gene product. As of 2024, ~93% of ~20,000 canonical human proteins have at least one strong MS-detected peptide. Last few percent — short ORFs, very-low-abundance proteins, tissue-restricted — remain “missing.” neXtProt, Human Protein Atlas (Uhlén-KTH; tissue immunohistochemistry + MS quant), Proteomics DB (Kuster-TUM) coordinate the catalog.
ProteomeXchange Consortium
PRIDE, MassIVE, jPOST, iProX, Panorama Public — federated repository network. ~30,000 datasets total by 2024. Every published proteomics study now expected to submit raw data + metadata.
CPTAC pan-cancer
Coordinated proteogenomics of TCGA cancer cohorts (breast, ovarian, colon, glioma, lung adenocarcinoma, lung squamous, endometrial, kidney, head and neck, pediatric, pancreatic) — public datasets enabling tumor-classification, drug-target identification, neoantigen catalog work.
Pan-tissue and pan-species
GTEx-Pro (tissue proteomics from GTEx donor tissues), Tabula Sapiens proteomics (single-cell-resolved), Mouse Cell Atlas proteomics, model-organism efforts in yeast (Aebersold-Wenger-Mann), C. elegans, Drosophila, zebrafish.
Future directions
Single-cell DIA on Astral and timsTOF SCP
3000-5000 proteins per cell with 5-min gradient injected directly (no carrier). Throughput ~10k cells per week per instrument.
Spatial proteomics integration
Combine MALDI imaging, IMC, CODEX, MIBI with single-cell-resolved spatial proteomics; integration with spatial transcriptomics (Visium, Slide-seq, Stereo-seq).
Plasma cell-free proteomics
Cell-of-origin assignment from plasma protein composition (cancer biomarker discovery; pregnancy maturation; transplant rejection). Combine with cfDNA methylation for full liquid-biopsy panel.
MS + structural biology integration
Cross-linking MS + cryo-EM + AI structure prediction → integrative models of macromolecular machines. ColabFold, AlphaFold-Multimer + crosslink restraints; XL-MS + native MS for stoichiometry confirmation.
Multi-modal omics
Integrate proteomics + transcriptomics + metabolomics + lipidomics + epigenomics on the same sample. Single-cell CITE-seq (RNA + ~100 surface proteins via DNA-tagged antibodies) is the consumer-end of this; mass-spec-based multi-omics for deeper proteome layers is emerging.
Common pitfalls and how to avoid them
Statistical pitfalls
- Multiple-testing correction. Bonferroni too conservative; Benjamini-Hochberg FDR (typically 5%) standard.
- Batch effects. Plate position, instrument run order, processing date — all confounders. Use ComBat, Harmony, limma’s removeBatchEffect; randomize sample order; include batch QC.
- Inadequate replication. N=3 biological replicates minimum for any quantitative claim; N=5+ for clinical comparisons. Power calculations recommended.
- Dispersion estimation. With small N, variance is poorly estimated; empirical Bayes (limma, MSstats) borrows information across proteins.
Workflow pitfalls
- Trypsin in-source. Trypsin self-digest peptides at known m/z (T1: 842.51, T2: 906.50, T3: 1006.49); contaminate every run.
- Keratin. From skin, hair, dust. Use cRAP database to filter; gloves + clean bench essential.
- Wrong reference proteome. Mouse samples searched against human, isoform-aware searches against canonical-only, novel-organism samples against poorly-annotated databases.
- Sloppy peptide search settings. Too-narrow precursor tolerance (miss matches), too-permissive PTM variable mods (combinatorial explosion + FDR violation).
- Carryover. Between-sample washes essential; HeLa carryover into clinical sample contaminates results.
- Incorrect TMT channel assignment. Mislabeling a 16-plex destroys the experiment; orthogonal validation of channel-to-sample mapping mandatory.
Reporting pitfalls
- Reporting “fold change” without uncertainty. A fold change of 2.0 with CI [1.95, 2.05] differs from 2.0 with CI [0.5, 8.0].
- Failing to deposit raw data. Required by most journals via ProteomeXchange.
- Incomplete metadata. Specifying instrument, gradient, search engine, version, parameter set, database, FDR threshold.
- Cherry-picking proteins. Validate selected hits orthogonally (Western, PRM, immunohistochemistry) before drawing biological conclusions.
Mass spec for drug discovery
Mechanism of action (MoA)
CETSA-MS (Cellular Thermal Shift Assay; Molina-Nordlund 2013 Science) — drug-bound target shifts thermal-denaturation curve; whole-proteome readout via TMT-MS. Drives target ID for phenotypic-screening hits. Pelago Bioscience commercial.
Drug-target deconvolution
ABPP — activity-based protein profiling (Cravatt). Active-site-directed warhead with click-handle (alkyne/azide); pull down labeled proteins after drug treatment; identify by MS. Used for serine hydrolases, kinases, deubiquitinases, metalloproteases.
Selectivity profiling
Kinobead (KiNativ, Cellzome — acquired by GSK) — immobilized broad-spectrum kinase inhibitor cocktail; profile compound’s kinase selectivity by competition. ~250+ kinase coverage per run. Standard for kinase-drug development.
Off-target identification
Photo-affinity labeling (PAL) — photo-crosslinker drug analog + UV → covalently captures binding partners → MS identification.
Pharmacokinetics and metabolites
LC-MS triple-quad SRM in plasma, urine, tissue extracts — gold standard for PK/PD. Metabolite ID — Q-TOF or Orbitrap high-res for unknown identification; SCIEX, Thermo, Agilent platforms in every pharma DMPK lab.
Future outlook for proteomics 2026-2030
- Single-cell proteomics at scale. ~10k cells / week per instrument with Astral/timsTOF SCP; biological-replicate-level statistics in clinical proteomics.
- Plasma deep-proteome routine. Olink + SomaScan + MS-PRM converge on a “plasma reference panel” of ~5000-10,000 routinely-measurable proteins across population cohorts.
- Spatial proteomics. MALDI imaging + tissue-section MS at ~5-10 µm resolution; integration with H&E histology and IHC.
- Multi-omics integration. Single-cell RNA + protein + lipid combined readouts.
- AI in spectral interpretation. De novo peptide sequencing (Casanovo, InstaNovo); ML-augmented database search; PTM-discovery automated.
- Cross-lab standardization. ProteomeXchange-coordinated benchmarks; FDA recognition of MS-based clinical biomarkers; CAP, CLIA accreditation streamlining.
- Therapeutic protein QC by intact-mass and native-MS. Routine in biopharma pharma; expanding to gene-therapy AAV and mRNA-LNP characterization.
The field is maturing from a discovery-only methodology to a fully integrated clinical, industrial, and basic-research analytical platform. The 2030s likely see proteomics complement genomics as a population-scale measurable phenotype.
Further reading
- Aebersold, R., Mann, M. — “Mass-spectrometric exploration of proteome structure and function” Nature 2016, 537:347 — modern survey of the field.
- Cox, J., Mann, M. — “MaxQuant enables high peptide identification rates, individualized p.p.b.-range mass accuracies and proteome-wide protein quantification” Nat Biotechnol 2008, 26:1367 — the MaxQuant paper.
- Olsen, J.V., Mann, M. — “Status of large-scale analysis of post-translational modifications by mass spectrometry” Mol Cell Proteomics 2013, 12:3444.
- Meier, F., Brunner, A.-D., Koch, S., Koch, H., Lubeck, M., Krause, M., Goedecke, N., Decker, J., Kosinski, T., Park, M.A., Mann, M. — “Online parallel accumulation-serial fragmentation (PASEF) with a novel trapped ion mobility mass spectrometer” Mol Cell Proteomics 2018, 17:2534.
- Demichev, V., Messner, C.B., Vernardis, S.I., Lilley, K.S., Ralser, M. — “DIA-NN: neural networks and interference correction enable deep proteome coverage in high throughput” Nat Methods 2020, 17:41.
- Slavov, N. — “Single-cell protein analysis by mass spectrometry” Curr Opin Chem Biol 2021, 60:1.
- Sinz, A. — “Cross-linking/mass spectrometry: a chemical and biological tool box for protein interaction studies” Anal Bioanal Chem 2018, 410:5995.