Structural Biology — Protein Structure, X-ray, Cryo-EM, NMR, AlphaFold
Structural biology determines the three-dimensional atomic coordinates of biological macromolecules — proteins, nucleic acids, lipids, and their assemblies — and uses those coordinates to reason about function, mechanism, dynamics, and intervention. The field rests on four experimental pillars (X-ray crystallography, cryo-electron microscopy, nuclear magnetic resonance, and small-angle scattering) plus a fifth computational pillar that matured rapidly between 2018 and 2024 (deep learning structure prediction, culminating in AlphaFold’s 2024 Nobel Prize). Together these methods now resolve macromolecules from individual residues (1.2 Å apoferritin, Lu 2020) to entire cellular landscapes (cryo-electron tomography of organelles).
Hierarchical organization of protein structure
Linus Pauling and Robert Corey, working at Caltech in 1951, used X-ray diffraction of amino acid crystals plus rigorous stereochemical reasoning to predict the α-helix and β-sheet — both confirmed in proteins within a decade. The four hierarchical levels they helped define remain the language of structural biology.
Primary structure — the linear sequence of amino acids joined by peptide bonds.
- 20 standard L-amino acids plus selenocysteine (Sec, the 21st, UGA-recoded with SECIS element) and pyrrolysine (Pyl, the 22nd, in methanogens).
- Peptide bonds are planar (∼180° ω angle, ∼1.32 Å C–N length, partial double-bond character from amide resonance, ∼70 kJ/mol rotational barrier).
- The only routinely rotatable backbone bonds are φ (N–Cα) and ψ (Cα–C’).
- The Ramachandran plot of φ vs ψ (G. N. Ramachandran 1963, J. Mol. Biol.) reveals two large allowed regions: right-handed α-helix near (φ = −60°, ψ = −45°) and β-sheet near (φ = −120°, ψ = +120°), plus a small left-handed α region accessible mainly to glycine.
- Modern validation tools — MolProbity (Richardson, Duke), Phenix.ramachandran, wwPDB validation reports — flag outliers as likely modeling errors.
Secondary structure — local backbone hydrogen-bonded geometry.
- α-helix: 3.6 residues per turn, 1.5 Å rise per residue, i → i+4 H-bond (C=O of i to N–H of i+4); side chains project outward; helix dipole +0.5e at N-terminus.
- β-sheet: extended backbone (φ ≈ −120°, ψ ≈ +120°), interstrand H-bonds; parallel, antiparallel, or mixed; side chains alternate above and below the sheet plane.
- β-turns: four-residue motifs reversing chain direction; types I, II, III classified by Lewis and Venkatachalam (1973); type II requires Gly at position i+2.
- 3-10 helix (i → i+3) and π-helix (i → i+5) are rarer.
- Random coil / loops connect regular elements and carry most functional specificity.
- DSSP (Kabsch and Sander, Biopolymers 1983) remains the standard assignment algorithm; STRIDE (Frishman-Argos) and PROSS are alternatives.
Tertiary structure — the full three-dimensional fold of a single chain.
- Organized into domains: independently folding units of typically 50–250 residues.
- Hierarchical classifications: SCOP (Murzin-Brenner-Hubbard-Chothia, MRC LMB), CATH (Christine Orengo, UCL), ECOD (Grishin lab, UT Southwestern), Pfam / InterPro (sequence-family-based).
- Supersecondary motifs: Greek key (four antiparallel β-strands), Rossmann fold (alternating βαβ NAD(P)-binding, Michael Rossmann 1974), TIM barrel (eight βα units, named after triose phosphate isomerase), β-barrel (porins, GFP, retinol-binding protein), four-helix bundle, EF-hand (Ca2+-binding pair), helix-turn-helix (DNA-binding).
- Kinase fold (N-lobe β-sheet, C-lobe α-helices, ATP cleft between; activation loop, DFG motif, αC-helix) is the single most common eukaryotic catalytic domain — ∼518 human kinases, the “kinome” (Manning-Sudarsanam 2002).
- Ig-like fold (β-sandwich, two sheets of ∼3–4 strands each) is the most common extracellular protein domain across vertebrates.
Quaternary structure — assembly of multiple chains into functional complexes.
- Hemoglobin α2β2 tetramer (Perutz 1959–1970, Nobel 1962) — cooperative O2 binding with Hill coefficient ∼2.8.
- GPCR superfamily forms dimers (μ-opioid) and higher oligomers; class C (mGluR, GABAB) constitutively dimerizes.
- Immunoglobulin G — Y-shaped (αβ)2 heterotetramer with disulfide-linked light + heavy chains; Fab arms + Fc.
- Proteasome 26S — 2.5 MDa complex of 33 unique subunits; 20S core (α7β7β7α7) + 19S regulatory cap.
- Eukaryotic 80S ribosome — 79 proteins plus four rRNAs (18S in 40S small, 28S/5.8S/5S in 60S large); 3.2 MDa.
- Spliceosome — five snRNPs + over 100 proteins; multiple states (Pre-B, B, Bact, B*, C, C*, P, ILS) characterized by cryo-EM since 2015.
Forces stabilizing the native state
Several non-covalent interactions cooperate to stabilize the folded state, each with characteristic energetics and geometric requirements.
- Hydrophobic effect — Walter Kauzmann (Princeton, “Some Factors in the Interpretation of Protein Denaturation”, Adv. Protein Chem. 1959) — the dominant driving force; burial of nonpolar side chains releases ordered (“iceberg”) water from their surface, increasing solvent entropy. Quantified by transfer free energies (Tanford, Wolfenden, Wimley-White scales). Per buried methylene roughly +3 kJ/mol.
- Hydrogen bonds — backbone amide (N–H) to carbonyl (C=O) at ∼3 Å donor–acceptor; bond energies in water net to ∼2–8 kJ/mol because the alternative is H-bonding to water. Constitute the geometric lattice of secondary structure.
- Van der Waals (London dispersion) — ∼0.4–4 kJ/mol per contact; folded cores reach close-packing densities of ∼0.74, comparable to crystalline solids.
- Salt bridges — Asp/Glu carboxylate to Lys/Arg/His ammonium; ∼12–20 kJ/mol on surface (where dielectric screening is high), much less in folded core (where they are usually penalty rather than stabilizer).
- Disulfide bonds — Cys–Cys S–S, ∼250 kJ/mol covalent; formed in oxidizing compartments (ER lumen, bacterial periplasm) by protein disulfide isomerase (PDI / Dsb systems).
- Metal coordination — zinc in zinc fingers (CCHH, CCCC), zinc-dependent metalloproteases, and zinc in alcohol dehydrogenase; iron–sulfur clusters [2Fe-2S], [3Fe-4S], [4Fe-4S] in electron transfer (ferredoxin, aconitase); magnesium in kinases and ribozymes; copper in cytochrome c oxidase and Cu/Zn SOD; molybdenum in nitrogenase and DMSO reductase.
- π-stacking and cation-π — aromatic rings (Phe, Tyr, Trp, His) stack at ∼3.5 Å with ∼4–8 kJ/mol stabilization; cation-π (Lys/Arg ammonium near aromatic π cloud) is a recognition motif for choline and acetylcholine.
The net free energy of folding is typically only 20–60 kJ/mol negative — marginal stability comparable to the heat of a few hydrogen bonds. This is no accident: marginal stability enables regulated unfolding (proteasomal degradation, ATP-driven unfoldases like Clp/Hsp104), conformational signaling (kinases, GPCRs), and is also the source of the protein engineer’s leverage. A handful of point mutations can raise or lower stability by 10–30 kJ/mol — enough to make or break a therapeutic protein’s developability.
Folding, misfolding, and chaperones
Anfinsen’s thermodynamic hypothesis. Christian Anfinsen’s bovine pancreatic ribonuclease A refolding experiments at the NIH (published 1961, Nobel Prize in Chemistry 1972 “for his work on ribonuclease, especially concerning the connection between the amino acid sequence and the biologically active conformation”) established that the primary sequence is sufficient to specify the native fold: reduced and denatured RNase A spontaneously refolded in vitro with full enzymatic activity. The native state corresponds to the global free-energy minimum under physiological conditions.
The Levinthal paradox. Cyrus Levinthal (1968) pointed out that a 100-residue protein with even three rotamers per residue has ∼3^200 ≈ 10^95 conformations (often quoted as 10^300 for larger search spaces); random sampling would take longer than the age of the universe, yet folding occurs in microseconds to seconds. Resolution: folding proceeds along funnel-like energy landscapes (Wolynes, Onuchic, Bryngelson 1989–1995; Karplus, Dill) — many parallel pathways biased toward the native state — with kinetic intermediates such as the molten globule (Ptitsyn, Dolgikh, Kuwajima 1981–1995), a compact intermediate with native-like secondary structure but loose tertiary packing.
The cellular folding network. Dedicated chaperones manage folding kinetics and prevent aggregation.
- Hsp70 family — DnaK in E. coli; BiP/HSPA5 in ER; HSPA1A/B (cytosol, stress-induced) and HSC70/HSPA8 (cytosol, constitutive); mtHSP70/mortalin in mitochondria. ATP-driven substrate binding/release cycle, gated by cochaperones DnaJ/Hsp40 (J-domain proteins, ∼42 in humans) and nucleotide exchange factors GrpE (bacteria), BAG family (eukaryotic cytosol), HspBP1, GRP170, Sil1 (ER).
- Chaperonins (Hsp60) — bacterial GroEL/GroES (14-subunit double ring + 7-subunit cap), eukaryotic TRiC/CCT (eight unique subunits, encapsulates actin and tubulin). Encapsulate substrates ∼50–60 kDa in a folding chamber for ATP-timed cycles.
- Hsp90 — HSP90AA1/HSP90AB1 (cytosol), TRAP1 (mitochondria), GRP94 (ER). Stabilizes near-native clients including kinases (CDK4, RAF, Akt), steroid receptors (GR, ER, AR), p53. Major oncology target via geldanamycin (natural product), 17-AAG/tanespimycin, ganetespib, AT13387, BIIB021; co-chaperones HOP, CDC37, AHA1, p23.
- Prefoldin (PFD1–6 hexamer) — passes nascent actin/tubulin to TRiC/CCT.
- Disaggregases — Hsp104/ClpB (yeast/bacteria), Hsp110 + Hsp70 + Hsp40 (human); reverse aggregation by mechanical pulling.
- Small heat shock proteins (sHsps) — HSPB1 (Hsp27), αA/αB-crystallin (lens), HSPB8; ATP-independent holdases that buffer aggregation under stress.
- ER-specific — calnexin and calreticulin (lectin chaperones for N-glycosylated proteins; CNX cycle with glucosidases I/II and UGGT1 quality-control sensor).
Protein homeostasis (proteostasis) failure and disease. The chaperone-quality-control network is challenged across aging and stress; failure underlies amyloid diseases:
- Alzheimer’s disease — Aβ40/42 plaques + tau neurofibrillary tangles
- Parkinson’s disease — α-synuclein Lewy bodies
- Prion diseases — CJD, kuru, BSE (Stanley Prusiner, Nobel 1997)
- Huntington’s disease — polyQ-expanded huntingtin (CAG repeat expansion in HTT exon 1)
- ALS — TDP-43, SOD1, FUS, C9orf72 dipeptide repeats
- Type II diabetes — islet amyloid polypeptide (IAPP/amylin)
- Systemic amyloidoses — transthyretin (tafamidis, patisiran), AL light chain, AA serum amyloid A
Kinetic folding control is exploited by some proteins:
- Serpins (serine protease inhibitors; antithrombin, α1-antitrypsin, neuroserpin) adopt a metastable native state and undergo a stressed-to-relaxed conformational transition that mousetraps the engaged protease.
- Influenza hemagglutinin HA0 is cleaved to HA1+HA2 to prime pH-triggered fusion peptide release in the endosome.
- HIV Env gp160 → gp120+gp41 cleavage exposes the fusion peptide.
- Pro-domain folding — many proteases (chymotrypsin, subtilisin, caspases) are translated as zymogens with cleavable pro-domains that act as intramolecular chaperones.
Membrane proteins
Roughly 25–30% of human protein-coding genes encode membrane proteins; they account for approximately 60% of drug targets and the substantial majority of small-molecule blockbusters.
α-helical bundles — one or more hydrophobic α-helices (typically 20–25 residues, ∼30 Å — matching the bilayer hydrophobic thickness) thread through the membrane.
- 7-transmembrane (7TM) G-protein-coupled receptor superfamily — ∼800 human members; rhodopsin (class A), β1/β2-adrenergic, μ/δ/κ-opioid, GLP-1R, GIPR, glucagon, parathyroid hormone (class B); mGluR, GABAB, CaSR (class C); frizzled and smoothened (class F).
- Voltage-gated potassium channels (Kv) — Roderick MacKinnon (Rockefeller, Nobel 2003) — KcsA from Streptomyces lividans, Shaker, KvAP, Kv1.2.
- Voltage-gated sodium channels (Nav) — Nav1.1 to Nav1.9, cardiac Nav1.5; targets for local anesthetics (lidocaine), class I antiarrhythmics, anti-epileptics.
- Voltage-gated calcium channels (Cav) — L-type (dihydropyridines amlodipine), T-type, P/Q-type, N-type.
- ClC chloride channels and exchangers; aquaporins (Peter Agre, Johns Hopkins, Nobel 2003 with MacKinnon) — AQP1 in red cells, AQP2 in collecting duct (vasopressin-regulated), AQP4 in astrocytes (NMO target).
- Transporters: LeuT family (NSS, LeuT crystal Yamashita-Gouaux 2005 paradigm structure), SLC superfamily (456 human members), MFS major facilitator superfamily (GLUTs, lactose permease LacY), ABC transporters (CFTR, MDR1/P-gp, ABCA1, TAP).
β-barrels are found almost exclusively in outer membranes of Gram-negative bacteria, mitochondria, and chloroplasts.
- Porins (OmpF, OmpC, OmpA, LamB) — passive diffusion of small hydrophilic solutes.
- TonB-dependent transporters (FhuA, BtuB, FepA) — siderophore and vitamin B12 uptake.
- Mitochondrial VDAC channel — primary metabolite gateway across outer membrane.
- BamA / Sam50 — assembly machinery for β-barrels themselves.
- LptD — 26-strand β-barrel that translocates LPS to the outer surface (target of murepavadin antibiotic).
Structural study of membrane proteins requires extraction from the bilayer with care to preserve function.
- Mild non-ionic detergents: dodecyl maltoside (DDM), lauryl maltose neopentyl glycol (LMNG, Chae and Gellman 2010), digitonin, octyl glucoside (OG), Fos-choline, glyco-diosgenin (GDN).
- Nanodiscs — Stephen Sligar (Illinois, 2002) — apoA-I-derived membrane scaffold protein (MSP1D1, MSP1E3, MSP2N2) wrapping a lipid bilayer patch; tuned diameter ∼9–17 nm.
- Amphipathic polymers — amphipols (A8-35; Popot, Paris); pseudo-bilayer preservation in detergent-free conditions.
- SMALPs (Styrene-Maleic Acid Lipid Particles) — Tim Knowles (Birmingham, 2009) — co-polymer that extracts native lipid environment around the membrane protein, avoiding any detergent step. DIBMA, zSMA variants improve compatibility with divalent cations and pH.
- Saposin lipid nanoparticles (Salipro) — Salipro Biotech; alternative scaffold.
- Reconstitution into liposomes and giant unilamellar vesicles for functional assays (single-channel recording, transport flux).
X-ray crystallography
Max von Laue established X-ray diffraction by crystals (Nobel 1914); William Henry and William Lawrence Bragg derived Bragg’s law (nλ = 2d sin θ) and worked out the first crystal structures (Nobel 1915). Macromolecular crystallography began with Max Perutz (hemoglobin) and John Kendrew (myoglobin) at the Medical Research Council Laboratory of Molecular Biology in Cambridge in 1958–1962 (Nobel 1962); the first protein structures used isomorphous replacement (heavy-atom soaks like mercury or platinum) to solve the crystallographic phase problem. Dorothy Hodgkin determined penicillin (1945), vitamin B12 (1954), and insulin (1969), winning the 1964 Nobel.
The modern pipeline begins with cloning, expression (E. coli for soluble bacterial proteins; Sf9/baculovirus, HEK293, CHO for eukaryotic and glycosylated; in vitro translation for difficult cases), purification (affinity tags — His6, Strep, MBP, GST — followed by ion exchange and size exclusion), crystallization (vapor diffusion in hanging or sitting drops, microbatch under oil, or counter-diffusion in capillaries; ∼500–1500 screen conditions sampling pH, salts, precipitants like PEG or ammonium sulfate, and additives), and cryocooling (transfer to cryoprotectant, plunge into liquid nitrogen; data collection at 100 K reduces radiation damage by ∼70-fold).
Diffraction is now usually collected at synchrotron beamlines: the Advanced Photon Source (APS, Argonne — 7 GeV upgrade APS-U commissioned 2024 with ∼500× brighter beams), the Advanced Light Source (ALS, Berkeley — ALS-U), NSLS-II (Brookhaven), SPring-8 (Hyogo, 8 GeV), the European Synchrotron Radiation Facility (ESRF, Grenoble — EBS upgrade 2020 first fourth-generation hard X-ray source), Diamond Light Source (UK), the Swiss Light Source (SLS, Paul Scherrer Institute — SLS 2.0 upgrade), and PETRA III (DESY Hamburg). Microfocus beamlines (∼1–5 µm spots) handle microcrystals; serial crystallography collects many small crystals to circumvent radiation damage.
Phasing — recovering the lost phase information that diffraction intensities alone do not give — uses one of three strategies. Molecular replacement (used in ∼75% of new structures) computes phases from a homologous starting model (now often AlphaFold predicted) using Phaser (Read, McCoy), Molrep (CCP4), or AMoRe. Multi-wavelength anomalous diffraction (MAD) and single-wavelength anomalous diffraction (SAD), developed by Wayne Hendrickson and others, exploit the anomalous scattering of incorporated heavy atoms (selenomethionine, bromine, gold, mercury) at tunable wavelengths. Isomorphous replacement is largely historical. Software ecosystems: CCP4 (Collaborative Computational Project No. 4, UK), Phenix (Paul Adams LBNL et al), Refmac (Murshudov), Coot (Emsley) for model building.
X-ray free-electron lasers (XFELs) produced their first protein “diffract-before-destroy” femtosecond data at the Linac Coherent Light Source (LCLS, SLAC Stanford) in 2011 (Henry Chapman et al, Nature 2011): the femtosecond pulse outruns radiation damage, allowing room-temperature serial crystallography of nanocrystals and time-resolved studies of enzyme catalysis. SACLA (Japan, 2012), the European XFEL (Hamburg, 2017), SwissFEL (Paul Scherrer Institute, 2018), and PAL-XFEL (Korea, 2017) joined LCLS; LCLS-II (SLAC, first light 2023, superconducting accelerator, 1 MHz repetition) marks the next generation. Pump-probe XFEL studies have caught photoactive proteins (photoactive yellow protein, bacteriorhodopsin, photosystem II) mid-reaction at femtosecond resolution.
Cryo-electron microscopy
The “resolution revolution” in cryo-EM (Werner Kühlbrandt, Science 2014) won Jacques Dubochet (vitrification), Joachim Frank (single-particle analysis), and Richard Henderson (high-resolution EM of bacteriorhodopsin) the 2017 Nobel Prize in Chemistry.
Samples are vitrified by plunge-freezing into liquid ethane (∼90 K) — water is trapped in an amorphous (non-crystalline) glass-like state preserving native hydration and avoiding ice damage. Grids are perforated carbon (Quantifoil, C-flat) or graphene oxide (GraFutura, Cryosol), often charge-neutralized by glow discharge. Vitrobot (Thermo Fisher) and Leica EM GP are the standard blot-and-plunge instruments; manual filter-paper blotting variability has been a chief reproducibility headache, addressed by writable / wickless devices (Spotiton, Chameleon — Carragher and Potter).
Microscopes: the Titan Krios (Thermo Fisher / FEI, 300 kV; ∼$8M; ∼130 installed globally by 2024) and the Glacios (200 kV) dominate; JEOL CRYO ARM 300/200 instruments are gaining share in Asia. Direct electron detectors transformed the field by recording electrons directly on a CMOS sensor with single-electron sensitivity and high frame rate: the Gatan K2 (2012) and K3 (2018), the Thermo Fisher Falcon 3 / Falcon 4 / Falcon 4i, and the Direct Electron DE-64 and Apollo. Frame summing with beam-induced motion correction (MotionCor2, Zheng/Cheng) recovers blur from radiation-induced sample motion.
Image processing pipelines: RELION (Sjors Scheres, MRC LMB, first release 2012; v4 in 2022) implemented Bayesian maximum-likelihood reconstruction; cryoSPARC (Punjani et al, Nature Methods 2017, commercial through Structura) uses stochastic gradient descent and GPU acceleration; EMAN2 (Steve Ludtke, Baylor), cisTEM (Tim Grigorieff, UMass), Frealign, and SPHIRE (Pawel Penczek) are alternatives. CTF estimation (CTFFIND4, Rohou-Grigorieff; Gctf, Zhang), particle picking (Topaz CNN-based, crYOLO), 2D classification, ab initio 3D reconstruction, 3D classification (handling heterogeneity), and final refinement with B-factor sharpening produce density maps; ChimeraX (Pettersen, Goddard, Ferrin), Coot, ISOLDE (Croll), and Phenix.real_space_refine produce atomic models.
Single-particle resolution records: 1.54 Å β-galactosidase (Bartesaghi 2018), 1.22 Å apoferritin (Yip et al / Nakane et al, two 2020 Nature papers using Thermo Fisher’s E-CFEG cold field emitter at MRC LMB and Diamond eBIC), 1.15 Å apoferritin (Nakane 2020). Routine resolution for well-behaved samples is now 2.5–3.5 Å, comfortably enough to build atomic models.
Cryo-electron tomography (cryo-ET) images intact cells or lamellae cut by focused ion beam (FIB) milling — typically thinning frozen cells to ∼150–250 nm using a cryo-FIB-SEM (Aquilos, Helios, Crossbeam). Tilt series (typically ±60° in 2–3° increments) are reconstructed by weighted back-projection (IMOD, Tomo3D) or iterative methods. Sub-tomogram averaging (Briggs, Förster, Bharat) of repeated complexes — ribosomes translating in cells (Pfeffer-Förster 2014, Hoffmann-Beck 2022), nuclear pore complexes (Kosinski-Beck 2016, Schuller-Beck 2021), HIV capsid (Schur 2016), SARS-CoV-2 spike on virions (Ke-Briggs 2020), and chromatin (Cai 2018) — brings cryo-ET into atomic-resolution territory.
Nuclear magnetic resonance
Kurt Wüthrich (ETH Zürich) established sequential resonance assignment and the use of nuclear Overhauser effect (NOE) distances to determine protein structures in solution (Nobel 2002). Richard Ernst (ETH) developed Fourier transform and multidimensional NMR (Nobel 1991). The first complete protein NMR structure (BUSI II proteinase inhibitor, 57 residues) appeared in 1985.
Protein NMR exploits multiple nuclei: 1H (abundant, sensitive), 13C and 15N (require isotope labeling — usually uniform 13C/15N from labeled minimal media), 19F (rare, sensitive — used for ligand binding), 2H (selective deuteration improves spectra of larger proteins). Standard experiments: HSQC (heteronuclear single-quantum coherence, 1H–15N or 1H–13C correlation — the “fingerprint” of a folded protein), HNCA / HNCACB / CBCA(CO)NH triple-resonance backbone assignment, NOESY for distance restraints (4–6 Å through-space), J-coupling for dihedral angles, and residual dipolar couplings (RDCs, aligned media) for orientational restraints.
Solution NMR structures are deposited from the BMRB (Biological Magnetic Resonance Bank). The size limit was ∼20–25 kDa until the late 1990s, when TROSY (transverse relaxation-optimized spectroscopy, Wüthrich 1997) used relaxation interference of dipole-dipole and CSA contributions to sharpen lines for proteins up to 50–80 kDa (and even larger MDa GroEL complexes with methyl-TROSY of Ile/Leu/Val).
Solid-state NMR (ssNMR) under magic-angle spinning (MAS) tackles membrane proteins and amyloids that resist crystallization. Bob Griffin (MIT), Hartmut Oschkinat (Berlin), Marc Baldus (Utrecht), Beat Meier (ETH), and Chad Rienstra (Wisconsin) have driven the field. Dynamic nuclear polarization (DNP-NMR) boosts sensitivity 20–80×. ssNMR has produced structures of α-synuclein fibrils, prion fragments, KcsA, Aβ amyloids, and bacteriorhodopsin in lipid bilayers.
NMR’s distinctive strengths: solution conditions, dynamics across timescales (ps to seconds via R1, R2, R1ρ, RDCs, ZZ exchange, CPMG, CEST), and exquisite reporting on intrinsically disordered proteins (IDPs) where crystallography and cryo-EM struggle.
Small-angle scattering and other complementary methods
Small-angle X-ray scattering (SAXS) and small-angle neutron scattering (SANS).
- Report on overall shape, oligomeric state, radius of gyration (Rg), maximum dimension (Dmax), and conformational ensembles in solution at ∼10–30 Å resolution.
- Software: CRYSOL (fit atomic models to 1D curves, Svergun), FoXS (Sali lab), EOM ensemble optimization (Bernadó-Svergun), GNOM (pair distance distribution function P(r)), DAMMIF and DAMMIN (ab initio bead models), GASBOR (chain-like).
- BioSAXS beamlines: B21 at Diamond, BM29 at ESRF, SIBYLS at ALS, P12 at PETRA III, LiX at NSLS-II, SWING at SOLEIL, BL4-2 at SSRL.
- Time-resolved SAXS catches large conformational changes on millisecond–second timescales using rapid mixers (Bio-Logic SFM) or laser triggers.
- SANS at ILL Grenoble, ORNL HFIR/SNS, ISIS UK; deuterium contrast variation lets you “highlight” specific subunits in a complex.
Native mass spectrometry (Albert Heck, Utrecht; Carol Robinson, Oxford).
- Electrosprays intact non-covalent complexes from ammonium acetate buffers, preserving quaternary structure into the gas phase.
- Q-TOF (Synapt G2-Si, timsTOF Pro), Orbitrap UHMR, FT-ICR instruments handle MDa assemblies.
- Ion mobility (IM-MS, Synapt, timsTOF) measures collisional cross-sections (CCS) that report on shape.
- Has resolved membrane protein–lipid stoichiometries (Robinson’s seminal work on ABC transporters and GPCRs), antibody–antigen complexes, ribosomes, virus capsids.
Hydrogen-deuterium exchange MS (HDX-MS).
- Measures the rate at which backbone amide protons exchange with D2O — slow exchange reports on H-bonding and burial.
- Time points 10 s – 24 h, then pepsin digest at low pH (slow back-exchange) and LC-MS/MS.
- Differences between bound and free states map binding interfaces; differences upon allosteric activation map conformational change.
- Waters HDX (DynamX), Bruker, Sierra Analytics automated systems; commercial use in biopharma to characterize antibody epitopes (Genentech, Regeneron, Roche pipelines).
Cross-linking mass spectrometry (XL-MS).
- Chemical cross-linkers covalently link nearby residues; identifies cross-linked peptides by MS/MS to produce distance restraints.
- DSSO and DSBU (∼25–35 Å Cα–Cα), BS3 (longer), SDA and Sulfo-SDA (photoactivatable), PhoX (Mn-bead enrichable, Heck), DiSPASO.
- Search engines: XlinkX (Heck-Thermo), pLink (Pengyuan Yang), MaxLynx, Xolik, MeroX.
- Juri Rappsilber (Berlin / Edinburgh), Andrea Sinz (Halle), Albert Heck have driven adoption; major use in integrative modeling pipelines.
EPR and pulsed-EPR (DEER / PELDOR, double electron-electron resonance).
- Site-directed spin labels (typically MTSL on engineered cysteines) measure interspin distances (15–80 Å, longer with Gd3+ or trityl) with full distance distribution (captures conformational heterogeneity).
- Method developers: Daniella Goldfarb (Weizmann), Wayne Hubbell (UCLA), Gunnar Jeschke (ETH, DeerAnalysis software), Stefan Stoll (UW, EasySpin software), Tom Prisner (Frankfurt).
- Particularly powerful for membrane proteins and disordered systems where crystallography is infeasible.
Single-molecule FRET (smFRET).
- Dye pairs (Cy3/Cy5, Alexa 488/594) on engineered cysteines or non-canonical amino acids report on inter-dye distances (∼20–80 Å) in real time at the single-molecule level.
- Reveals conformational dynamics and rare states invisible to ensemble methods.
- Practitioners: Taekjip Ha (Johns Hopkins), Shimon Weiss (UCLA), Claus Seidel (Düsseldorf), Scott Blanchard (St. Jude).
Computational structure prediction
Before the deep-learning revolution, three classes of methods dominated.
Homology / comparative modeling.
- Builds a model of a query sequence by aligning to a homolog of known structure (“template”) and copying-then-refining coordinates.
- MODELLER (Andrej Sali, UCSF, 1993 onward) — satisfies spatial restraints derived from the template alignment.
- SWISS-MODEL (Bordoli, Schwede, SIB Bioinformatics, web server since 1993) — user-friendly automated pipeline; integrated into UniProt.
- I-TASSER (Yang Zhang, then U. Michigan; now U. Washington) — combines threading with iterative fragment assembly; long a CASP top performer.
- Quality depends critically on sequence identity: >50% usually reliable (1–2 Å backbone RMSD), 30–50% useful (∼3 Å), <30% “twilight zone”.
Threading / fold recognition (Jones, Eisenberg, Bryant, Sippl 1990s).
- Scores a sequence against a library of known folds using statistical potentials (e.g., PROSPECT, RAPTORX).
- Useful when no clear sequence homolog exists but a remote structural similarity is plausible.
Ab initio / fragment assembly.
- Rosetta (David Baker, University of Washington, 1998 onward) — assembles short (3- and 9-residue) structural fragments from a library and minimizes a knowledge-based energy function.
- Rosetta’s design arm produced Top7 (Kuhlman-Baker, Science 2003) — the first computationally designed protein with a novel topology not seen in nature.
- Subsequent achievements: designed enzymes (Kemp eliminase 2008, retroaldolase, Diels-Alderase 2010), mini-binders for cytokine targets (Cao 2020 — picomolar binders for IL-2R, PD-L1, IL-23R, SARS-CoV-2 spike), vaccine nanoparticles (SKYCovione / IVX-411 COVID-19 protein nanoparticle, SK bioscience, approved Korea 2022).
- David Baker shared the 2024 Nobel Prize in Chemistry for computational protein design.
- Distributed computing arm: Rosetta@home (BOINC).
The deep-learning era.
- CASP13 (2018) — DeepMind’s AlphaFold 1 won by a wide margin using deep convolutional networks for inter-residue distance predictions; first signal of ML dominance.
- CASP14 (2020) and Nature 2021 — AlphaFold 2 (Jumper et al, Nature July 2021). John Jumper and Demis Hassabis shared the 2024 Nobel Prize in Chemistry.
- AF2 architecture: attention-based “Evoformer” jointly reasons over multiple sequence alignment (MSA) features and pairwise residue features; SE(3)-equivariant “structure module” produces atomic coordinates directly; recycling iteratively refines.
- Median GDT-TS at CASP14 jumped to ∼92, approaching experimental accuracy (GDT-TS ≥ 90 corresponds to backbone RMSD ≤ 1 Å).
- Confidence metrics: pLDDT (per-residue local accuracy, 0–100 scale, >90 highly confident, <50 likely disordered), pAE (Pairwise Aligned Error, for inter-domain reliability).
AlphaFold ecosystem.
- AlphaFold Protein Structure Database (EMBL-EBI / DeepMind, public since July 2021) — over 200 million predicted structures covering the UniProt reference proteomes; essentially every known protein.
- AlphaFold-Multimer (Evans et al, bioRxiv 2022) — handles heteromeric assemblies; recommended over running monomers separately for any complex.
- AlphaFold 3 (Abramson et al, Nature May 2024) — extends to ligands, nucleic acids, ions, and modified residues using a diffusion-based generative module. The AlphaFold Server (academic web access) launched May 2024; source code released November 2024 with significant license restrictions.
Constellation of additional deep-learning predictors.
- RoseTTAFold (Minkyung Baek, David Baker et al, Science 2021) — reproduced AF2-like accuracy with a three-track network (sequence, pairwise, 3D); first widely available open-source competitor.
- ESMFold (Lin et al, Science 2023, Meta AI / FAIR) — single-sequence prediction using the ESM-2 protein language model (15B parameters); slower per residue but faster overall on full proteomes (no MSA generation step).
- ESM Atlas — catalogs ∼700 million predicted metagenomic structures from ESMFold.
- OmegaFold (HeliXon, 2022) — another single-sequence predictor.
- OpenFold (Ahdritz, AlQuraishi et al 2022, Columbia + collaborators) — open-source PyTorch reimplementation of AlphaFold 2; used for fine-tuning and downstream pipelines.
- Boltz-1 (MIT, December 2024) and Boltz-2 (2025) — reproduces AF3-class quality with an open MIT license; widely adopted.
- Chai-1 (Chai Discovery, September 2024) — AF3-quality multimer + ligand predictions with weights available; commercial-friendly.
- AlphaMissense (Cheng et al, Science 2023, DeepMind) — annotates 71 million missense variants with pathogenicity probabilities using AF2-derived features.
Intrinsically disordered proteins (IDPs).
- IDPs — including p53 transactivation domain, α-synuclein, FUS, TDP-43, hnRNPA1, neurofilament tails, BRCA1 N-terminal region, many transcription factor activation domains — lack a single folded state. Structure prediction in the AF2/AF3 sense is ill-posed by construction.
- Researchers: Keith Dunker (Indiana), Peter Tompa (VIB Brussels), Rohit Pappu (Washington University in St. Louis), Tanja Mittag (St. Jude), Julie Forman-Kay (SickKids Toronto), Madan Babu (St. Jude).
- IDP-specific concepts: ensembles (Bayesian inference of conformer populations from SAXS, NMR, smFRET), conditional folding (folding upon binding — coupled folding-and-binding to a partner), polymer-physics descriptors (Pappu group’s CIDER server, sequence parameters κ for charge patterning).
- Biomolecular condensates (membraneless organelles formed by liquid-liquid phase separation, LLPS) — Cliff Brangwynne, Tony Hyman (Dresden / Princeton, Science 2009 P-granules in C. elegans).
- Examples: nucleolus, Cajal bodies, stress granules, P bodies, transcription condensates (Young lab MIT), heterochromatin (Karpen, Narlikar 2017), CBX2 / Polycomb condensates.
- Tools: ALBATROSS (Pappu / Holehouse) for ensemble property prediction from sequence, FOLDUPON-BINDING modeling, AlphaFold pLDDT (low pLDDT ≈ disorder), IUPred and PONDR disorder predictors.
The Protein Data Bank and structural databases
The PDB was founded in 1971 at Brookhaven National Laboratory (Helen Berman) following a community meeting at Cold Spring Harbor; it moved to Rutgers / SDSC / EBI as the worldwide Protein Data Bank (wwPDB) in 2003. As of 2025 the PDB holds approximately 220,000 deposited experimental structures plus structure factors and (for newer entries) raw images. Sister archives include the Electron Microscopy Data Bank (EMDB, for 3D maps; ∼40,000 entries by 2024), the Biological Magnetic Resonance Bank (BMRB), and PDB-Dev (integrative structures with explicit uncertainty). The PDBe-KB layer integrates structures with sequence, family, ligand, and disease annotations.
Adjacent databases: SCOP / SCOPe (structural classification, hierarchical), CATH (Class-Architecture-Topology-Homology, Janet Thornton), ECOD (Evolutionary Classification Of protein Domains, Grishin lab Texas Southwestern), Pfam (sequence families, now under InterPro), Uniprot (sequence + annotation), and ChEMBL / DrugBank / Binding MOAD for ligand information. The AlphaFold DB (over 200 million structures) and the ESM Atlas (over 700 million) dwarf the experimental PDB but complement rather than replace it: confidence (pLDDT, pAE, PAE) scoring is essential.
Landmark structures
A non-exhaustive sample of structures that re-shaped biology:
- Hemoglobin (Perutz, Cambridge MRC, 1959–1970) — the molecular basis of cooperative O2 binding and the Bohr effect; first molecular disease (sickle cell, Pauling 1949 + Ingram 1956 + Perutz structure) showed how a single base mutation (Glu6→Val in β-globin) propagates to clinical phenotype.
- Myoglobin (Kendrew, Cambridge MRC, 1958) — the first protein structure, a globular 153-residue oxygen-binding muscle protein.
- Lysozyme (David Phillips, Royal Institution London, 1965) — the first enzyme structure; the substrate-binding cleft and catalytic Asp52 / Glu35 launched mechanistic enzymology.
- DNA double helix (Watson, Crick, Wilkins, with Rosalind Franklin’s Photograph 51 and B-form fiber diffraction data, 1953; Nobel 1962 to Watson, Crick, Wilkins) — the founding structure of molecular biology.
- tRNA (Aaron Klug, MRC, 1974) — L-shaped, anticodon at one end, amino acid at the other.
- Nucleosome core particle (Tim Richmond et al, ETH Zürich, Nature 1997 at 2.8 Å) — DNA wrapped around an H2A/H2B/H3/H4 histone octamer; ∼147 bp, ∼1.65 turns.
- RNA polymerase II 12-subunit complex (Roger Kornberg, Stanford, 2001 Science; Nobel 2006) — gene transcription at atomic resolution; structures of elongating, paused, terminated complexes followed.
- 70S ribosome (Venki Ramakrishnan at MRC, Tom Steitz at Yale, Ada Yonath at Weizmann; Nobel 2009) — the catalytic core of protein synthesis, peptidyl-transferase center mediated by 23S rRNA (ribozyme).
- ATP synthase F1Fo (John Walker, MRC, 1994; Paul Boyer mechanism; Nobel 1997) — rotary catalysis; F1 stator + rotor (γ-subunit) drives sequential α3β3 conformational cycle producing 3 ATP per full rotation.
- GPCR active states — Robert Lefkowitz and Brian Kobilka (Nobel 2012) — β2-adrenergic receptor / G-protein complex 2011 (Rasmussen et al, Nature 3.2 Å crystal); subsequent cryo-EM structures of GLP-1R, calcitonin receptor, frizzled, smoothened, μ-opioid agonist + arrestin complexes.
- Photosystem II (Athina Zouni, Petra Fromme, Horst Witt, James Barber, So Iwata, Nobuo Kamiya — 3.5 Å 2005 then 1.9 Å Umena 2011) — Mn4CaO5 oxygen-evolving complex resolved.
- CRISPR-Cas9 (Jinek 2014, Anders 2014, Nishimasu 2014, Sternberg/Doudna; Doudna and Charpentier Nobel 2020) — sgRNA-guided dsDNA cleavage; the structures of Cas9 + crRNA, Cas9 + dsDNA, and various engineered variants underpinned the genome-editing revolution.
- SARS-CoV-2 spike trimer (David Veesler at UW, Jason McLellan at UT Austin, Cryo-EM 2.8 Å, Wrapp et al Science Feb 2020 — six weeks after sequence release) — and the 2P-stabilized prefusion form (McLellan-Graham 2017 for RSV F, applied to MERS / SARS-CoV-2) became the antigen for the Pfizer-BioNTech and Moderna mRNA vaccines.
- Monoclonal antibody Fab + antigen complexes — every approved therapeutic mAb has its complex resolved or modeled.
Structure-based drug discovery
Once a target’s structure or homology model is in hand, several computational and experimental funnels narrow chemistry to leads.
Virtual screening.
- Docks libraries (typically 10^5–10^9 commercially available compounds) into a binding site.
- Compound libraries: ZINC and ZINC20 (Shoichet, UCSF, ∼1.4 billion in 2020), Enamine REAL (38 billion virtual molecules in 2024 via on-demand synthesis), eMolecules, Mcule, ChemBridge.
- Docking engines: AutoDock 4 / AutoDock Vina (Trott-Olson, Scripps), Schrödinger Glide (SP and XP modes), DOCK (Tack Kuntz, UCSF — the first docking program 1982), GOLD (CCDC, genetic algorithm), FlexX, ICM (Molsoft, Abagyan), Surflex.
- Active-learning docking (Lyu et al, Nature 2019 — D4 dopamine; Sadybekov et al, Nature 2022 — sigma-2 receptor) has scaled virtual screening from millions to billions of compounds.
- Re-scoring with free energy methods, MM-GBSA, or QM/MM is the standard post-processing step.
Fragment-based drug discovery (FBDD).
- Practitioners: Astex Pharmaceuticals (Cambridge UK, founded 1999), Vernalis (UK), Plexxikon (Berkeley, acquired by Daiichi 2011), Fragments of Life (FBLD), SGC (Structural Genomics Consortium, Oxford / Toronto), Crelux.
- Screens small (∼150–250 Da) fragments at high concentration (∼mM) using X-ray crystallography (XChem at Diamond), NMR (SAR by NMR — Fesik et al at Abbott, Science 1996), SPR (surface plasmon resonance on Biacore), DSF (differential scanning fluorimetry, ThermoFluor), MST (microscale thermophoresis).
- Hits with low affinity (mM–µM) but high ligand efficiency are grown, linked, or merged into drug-like leads.
- Approved FBDD-derived drugs: vemurafenib (Plexxikon → Roche, BRAF V600E inhibitor for melanoma, approved 2011), venetoclax / Venclexta (AbbVie / Genentech, BCL-2 inhibitor for CLL, approved 2016), erdafitinib (J&J, FGFR, 2019), pexidartinib (Plexxikon, CSF1R for tenosynovial giant cell tumor 2019), asciminib (Novartis, allosteric BCR-ABL myristoyl pocket binder, approved 2021).
Free energy perturbation (FEP+).
- Uses molecular dynamics in alchemical thermodynamic cycles to compute relative binding free energies with ∼1 kcal/mol accuracy for many congeneric series.
- Wang et al, JACS 2015 — Schrödinger’s commercial FEP+ benchmark on 8 systems.
- Software: Schrödinger FEP+, OpenFE (open-source consortium, OpenForceField + OpenMM), BFEE2, Yank, NAMD, GROMACS.
- Major impact in lead optimization at most large pharma; routinely guides synthesis decisions.
Other computational methods.
- Pharmacophore modeling (Catalyst, MOE).
- Shape-based screening (ROCS, Phase).
- Inverse / target docking (find targets for a given molecule).
- Cryptic pocket discovery via MD (Bowman, Lindorff-Larsen).
- QM/MM for covalent inhibitors and metalloenzymes.
- Generative chemistry with reinforcement learning + structure-based scoring (Pasteur, Insilico Medicine, BenevolentAI, Exscientia).
Famous structure-enabled successes:
- HIV protease inhibitors — Steve Roberts (Roche), Alex Wlodawer (NCI Frederick) and others solved HIV-1 protease in 1989; saquinavir (Roche, approved 1995, the first HIV PI), indinavir (Merck Crixivan 1996), nelfinavir, ritonavir, lopinavir, atazanavir, darunavir (TMC114, Tibotec) followed, converting AIDS from a death sentence to a chronic disease.
- Imatinib / Gleevec (Brian Druker at OHSU, Nicholas Lydon at Ciba-Geigy / Novartis; approved 2001 for chronic myeloid leukemia, BCR-ABL kinase inhibitor) — the structure of Abl with imatinib (Schindler / Kuriyan 2000 Science) explained selectivity and resistance mutations (e.g., T315I gatekeeper) and validated targeted therapy for cancer.
- Oseltamivir / Tamiflu (Gilead / Roche, approved 1999) — designed by Mark von Itzstein and colleagues at Monash (then Biota / Gilead) against influenza neuraminidase using the crystal structure of N2 + sialic acid analogs.
- Nirmatrelvir / Paxlovid (Pfizer PF-07321332, approved December 2021) — covalent inhibitor of SARS-CoV-2 Mpro (3CLpro / nsp5); structure-based design from 2003 SARS-CoV Mpro work + 2020 SARS-CoV-2 Mpro structures led to a deployable oral antiviral within ∼18 months.
- Sotorasib (Amgen AMG 510, approved 2021) and adagrasib (Mirati MRTX849, approved 2022) — covalent inhibitors of KRAS G12C exploiting a cryptic switch II pocket (Shokat 2013 Nature seminal paper); broke the “undruggable” KRAS barrier.
Protein engineering and design
Directed evolution.
- Frances Arnold (Caltech, Nobel 2018 — “for the directed evolution of enzymes”) — iterative cycles of mutagenesis (error-prone PCR, DNA shuffling) plus high-throughput screening or selection.
- Has produced enzymes operating in organic solvents, on non-natural substrates, and catalyzing non-natural reactions (P450-mediated carbene and nitrene insertions into C-H and C-C bonds, abiological cyclopropanations, hydrolases for PET plastic degradation — Tournier-Marty FAST-PETase 2020 and LCC variant 2020).
- George Smith (Missouri) and Gregory Winter (MRC LMB) shared the 2018 chemistry Nobel for phage display (Smith, Science 1985) and its application to evolve antibodies and peptides.
- Phage-display antibodies in the clinic: humira / adalimumab (AbbVie, anti-TNF), belimumab (anti-BLyS), raxibacumab (anthrax), and dozens of others; the Cambridge Antibody Technology platform fed into many.
De novo computational design (Baker lab, University of Washington).
- Top7 (Kuhlman-Baker, Science 2003) — first computationally designed protein with a novel topology.
- Designed enzymes — Kemp eliminase (Röthlisberger 2008), retroaldolase (Jiang 2008), Diels-Alderase (Siegel 2010).
- Mini-binders (Cao et al, Nature 2020) — picomolar miniprotein binders for SARS-CoV-2 spike, IL-2R, PD-L1, IL-23R, IL-6R; oral / inhalable formulations being explored.
- Vaccine nanoparticle scaffolds — I53-50 dodecahedral particle (King-Bale-Baker, Nature 2014); SKYCovione / IVX-411 protein nanoparticle COVID-19 vaccine (SK bioscience, Korea approval 2022); RSV-F nanoparticle vaccines.
- Designed signaling switches — LOCKR (Langan 2019), de novo designed Notch ligands (Quijano-Rubio 2023).
- The 2024 Nobel in Chemistry to David Baker recognized this body of work.
Protein language models and generative design.
- ESM family (Meta AI / FAIR) — ESM-1b (2021), ESM-2 (2022), ESM-3 (2024) — masked-language-model protein representations and inverse generation.
- ProtTrans (Elofsson, Rost), ProGen (Salesforce — Madani et al, Nature Biotech 2023; first ML-generated functional enzyme).
- RFdiffusion (Watson et al, Nature 2023, Baker lab) — diffusion model that generates protein backbones conditioned on functional motifs, symmetry, or partial structures.
- ProteinMPNN (Dauparas et al, Science 2022, Baker lab) — inverse-folds sequences onto a given backbone.
- RFdiffusion All-Atom (RFAA) and AlphaProteo (DeepMind 2024) extend to all-atom + ligand contexts.
- Chroma (Generate Biomedicines 2023) — diffusion-based programmable design.
- Together these tools enable rapid binder, scaffold, and enzyme design at academic and biotech scale.
Frontiers
Integrative structural biology.
- Andrej Sali’s hierarchical approach (UCSF) — combine cryo-EM, X-ray, NMR, XL-MS, HDX-MS, SAXS, smFRET, and prediction to model assemblies that no single method can solve.
- Software: Integrative Modeling Platform (IMP, Sali), HADDOCK (Bonvin), Modeller-Multi.
- Archive: PDB-Dev (now wwPDB-IHM) for integrative-method depositions.
- Landmark integrative models — the nuclear pore complex (∼500 proteins, ∼120 MDa; Beck-Förster 2007, Lin-Hoelz 2016, Kim-Sali 2018, Schuller-Beck 2021, Mosalaganti-Beck 2022 in situ NPC at near-atomic resolution), spliceosome (Yigong Shi-Cech, Nagai-Pomeranz multiple states 2015 onward), the type III secretion injectisome, the kinetochore (Cheeseman-Desai), the bacterial flagellum.
In situ cryo-ET.
- Cryo-electron tomography in native cellular contexts via FIB-milling and sub-tomogram averaging is rapidly approaching atomic resolution.
- Reshaping the relationship between structure and cell biology: ribosomes mid-translation (Tegunov-Cramer Nature Methods 2021), nuclear pore complex inside cells (Schuller 2021), microtubule-associated chromatin during mitosis, signaling complexes at synapses.
Time-resolved studies.
- XFEL serial crystallography on the femtosecond–picosecond scale (LCLS, European XFEL, SACLA) — bacteriorhodopsin proton pumping, photosystem II water splitting, myoglobin CO photolysis, beta-lactamase mix-and-inject.
- Time-resolved cryo-EM (mixing-spraying setups, Spotiton at NYSBC, photoactivation) — millisecond intermediates of ribosomes, channels, kinases.
- Microsecond-resolution NMR (CPMG relaxation dispersion, ZZ exchange).
- Integrative all-atom MD simulation — Anton 3 special-purpose supercomputer (D. E. Shaw Research, ∼1 µs/day for ∼1M atoms), GPU MD via OpenMM, AMBER, GROMACS, NAMD; folding@home volunteer compute (Vijay Pande, Greg Bowman).
Era of prediction-first structural biology.
- AlphaFold gives a baseline model for nearly any well-folded protein.
- Experimental work increasingly focuses on:
- Conformational ensembles and dynamics (allostery, signaling)
- Assemblies and quaternary states
- Ligand-bound complexes for drug discovery
- The >30% of human proteins that are disordered, conditionally folded, or condensate-resident
- Disease mutants and ensemble shifts (AlphaMissense annotates 71M missense variants 2023)
- Non-canonical states (cancer kinase resistance mutations, kinome-wide selectivity panels)
- Exactly the regimes where prediction is least reliable and where biology is most interesting.
Adjacent
- cell-molecular-biology — central dogma, organelles, and where folding happens (cytosol, ER, mitochondria)
- genetics-and-genomics — sequence is structure’s starting input; MSAs power AlphaFold
- immunology-foundations — antibody Fab structures and engineering
- biochemistry-overview — amino acid chemistry, enzyme mechanism
- physical-chemistry — thermodynamics of folding and binding, statistical mechanics of conformational ensembles
- _index — Biology MOC