Cell & Molecular Biology — Biology Reference

1. At a glance

Life’s fundamental unit is the cell. Every known living organism is built of one or more cells; viruses are an edge case (obligate intracellular parasites, not generally classified as cells). Two domains of cellular life are distinguished by structure:

  • Prokaryote — Bacteria + Archaea. No membrane-bound nucleus or organelles. Single circular chromosome in a nucleoid region. Typically 1–5 µm diameter.
  • Eukaryote — Animals, plants, fungi, protists. Linear chromosomes inside a membrane-bound nucleus. Extensive endomembrane system (ER, Golgi, lysosomes, endosomes) plus mitochondria (and chloroplasts in plants + algae) acquired by endosymbiosis from ancestral bacteria (Lynn Margulis, 1967). Typically 10–100 µm.

All cells share four invariants:

  1. A lipid bilayer plasma membrane separating inside from outside.
  2. A DNA genome as heritable information store (with the retrovirus exception of RNA genomes).
  3. Ribosomes that translate mRNA into protein.
  4. ATP as the universal energy currency for metabolism.

The central dogma of molecular biology, articulated by Francis Crick in 1958 and refined in 1970, states that sequence information flows DNA → RNA → protein. The dogma admits two well-characterized exceptions: reverse transcription (RNA → DNA, used by retroviruses such as HIV via reverse transcriptase) and RNA replication (RNA → RNA, used by RNA viruses such as influenza). It explicitly excludes protein → nucleic-acid flow; the only known exception that resembles this is the prion mechanism, which propagates conformational state rather than sequence.

Modern cell + molecular biology is deeply quantitative + computational. A reasonable practitioner in 2026 routinely uses sequencing-by-synthesis at 200 per genome, single-cell RNA-seq atlases with 10⁶–10⁷ cells, CRISPR-Cas9 + base editing + prime editing for genome perturbation, AlphaFold-3 for structural prediction, and foundation models (ESM-2, Evo, scGPT, AlphaProteo, RFdiffusion, Boltz-1) for protein design + functional annotation.

2. Cell structure

Prokaryote (E. coli as canonical model)

E. coli K-12 is the workhorse: ~2 µm long rod, doubling time ~20 min in rich medium at 37 °C, ~4.6 Mb genome (~4,300 protein-coding genes).

Components:

  • Plasma membrane — phospholipid bilayer with embedded proteins. No internal organelles; oxidative phosphorylation runs at the inner surface of the plasma membrane.
  • Cytoplasm — aqueous + crowded (~200 mg/mL protein), houses metabolic machinery, ribosomes, and the nucleoid.
  • Nucleoid — non-membrane-bound region containing the circular chromosome supercoiled by topoisomerases + nucleoid-associated proteins (NAPs: HU, H-NS, Fis, IHF).
  • Ribosomes (70S) — two subunits 30S (16S rRNA + 21 proteins) + 50S (23S + 5S rRNA + 33 proteins). 70S/50S/30S Svedberg coefficients describe sedimentation, not mass.
  • Plasmids — extra-chromosomal circular DNA; carry antibiotic resistance, virulence factors, conjugation machinery.
  • Cell wallGram-positive (thick peptidoglycan + teichoic acid, retains crystal violet) vs Gram-negative (thin peptidoglycan + outer membrane with lipopolysaccharide (LPS) + porins, does not retain stain). Mycobacteria have a third class with mycolic acid + are acid-fast (Ziehl-Neelsen).
  • Flagella + pili (fimbriae) — motility + adhesion + conjugation (F pilus mediates plasmid transfer).
  • Capsule — polysaccharide layer external to the cell wall in some species; virulence factor.

Archaea share the prokaryote architecture but use ether-linked isoprenoid membrane lipids (vs ester-linked fatty-acid bacterial/eukaryote lipids), have pseudopeptidoglycan or S-layer instead of peptidoglycan, and use a eukaryote-like transcription apparatus (TBP, TFIIB homologs).

Eukaryote

A typical mammalian cell is ~10–30 µm with the following organelles:

  • Nucleus — double membrane (nuclear envelope) perforated by nuclear pore complexes (NPCs, ~120 MDa, ~30 nucleoporins). Houses chromatin + nucleolus (rRNA synthesis + ribosome subunit assembly).
  • Endoplasmic reticulum (ER) — continuous with the outer nuclear membrane. Rough ER studded with ribosomes synthesizes secreted + membrane proteins; smooth ER handles lipid synthesis, detoxification (cytochrome P450), Ca²⁺ storage. ER is the entry point of the secretory pathway.
  • Golgi apparatus — cis → medial → trans stack; protein maturation, glycosylation editing, sorting to plasma membrane, lysosomes, or secretory vesicles.
  • Lysosomes + endosomes — acidic (pH ~4.5–5) compartments with hydrolases for degradation. Early endosome → late endosome → lysosome maturation. Lysosomal storage diseases (Tay-Sachs, Gaucher, Pompe) trace to defective hydrolases.
  • Mitochondria — double membrane; outer permeable, inner highly folded into cristae. Houses TCA cycle (matrix) + electron transport chain + ATP synthase (cristae). ~16.5 kb circular maternally-inherited genome encoding 13 protein subunits + 22 tRNAs + 2 rRNAs; ~1,500 mitochondrial proteins encoded in the nuclear genome and imported. Endosymbiont origin from an α-proteobacterium ~1.5–2 Bya.
  • Chloroplasts (plants + algae) — analogous double-membrane organelle for photosynthesis; ~120–160 kb genome, cyanobacterial origin.
  • Peroxisomes — single-membrane, β-oxidation of very-long-chain fatty acids + plasmalogen synthesis + H₂O₂ detoxification (catalase).
  • Ribosomes (80S) — 40S (18S rRNA + 33 proteins) + 60S (28S + 5.8S + 5S rRNA + ~49 proteins). Cytoplasmic translation; mitochondrial + chloroplast ribosomes resemble prokaryote 70S.
  • Cytoskeleton — three filament systems:
    • Actin microfilaments (~7 nm) — cortex, lamellipodia, cytokinesis ring, muscle contraction (with myosin).
    • Microtubules (~25 nm) — α/β-tubulin heterodimer polymers; mitotic spindle, intracellular transport (kinesin + dynein motors), cilia + flagella (9+2 axoneme).
    • Intermediate filaments (~10 nm) — keratin (epithelia), vimentin (mesenchymal), neurofilaments (neurons), lamins (nuclear envelope). Mechanical resilience.

3. Biomolecules

Carbohydrates

Built from monosaccharides (glucose, fructose, galactose, mannose, ribose, deoxyribose). Linked via glycosidic bonds into disaccharides (sucrose, lactose, maltose), oligosaccharides (used in glycoprotein N + O-linked glycans), and polysaccharides: starch (α-1,4 + α-1,6 glucose, plant storage), glycogen (highly branched α-1,4 + α-1,6, animal storage), cellulose (β-1,4 glucose, plant cell wall, indigestible to most animals), chitin (β-1,4 N-acetylglucosamine, fungal cell wall + arthropod exoskeleton).

Roles: energy (glycolysis substrate); structure (cellulose, peptidoglycan); signaling + recognition (cell-surface glycans, blood-group antigens, selectin ligands).

Lipids

Hydrophobic + amphipathic small molecules; classified into eight LIPID MAPS categories:

  • Fatty acids — long-chain carboxylic acids; saturated vs unsaturated (mono + poly); essential ω-3 + ω-6 cannot be synthesized de novo by humans.
  • Glycerolipids — mono/di/triacylglycerols; energy storage (~9 kcal/g, vs ~4 for carbs + protein).
  • Glycerophospholipids — phosphatidylcholine, -ethanolamine, -serine, -inositol (PI). The bulk of the membrane bilayer. PI is a signaling reservoir (PI 4,5-bisphosphate → IP3 + DAG via PLC).
  • Sphingolipids — ceramide backbone; sphingomyelin (myelin), gangliosides (neuronal membranes, blood-group basis), sulfatides.
  • Sterols — cholesterol (animals, modulates membrane fluidity + raft formation; precursor of steroid hormones + bile acids + vitamin D), ergosterol (fungi, azole-antifungal target), phytosterols (plants).
  • Eicosanoids — prostaglandins, thromboxanes, leukotrienes; signaling lipids from arachidonic acid via COX + LOX.

Proteins

Polymers of 20 canonical L-α-amino acids linked by peptide bonds (amide between α-COOH + α-NH₂; planar + trans by default; cis-prolines rare). Side chains classified as nonpolar (Ala, Val, Leu, Ile, Met, Phe, Trp, Pro), polar uncharged (Ser, Thr, Asn, Gln, Tyr, Cys), basic (Lys, Arg, His), acidic (Asp, Glu). Plus selenocysteine (Sec, U) + pyrrolysine (Pyl, O) as the 21st + 22nd encoded in some organisms.

Four levels of structure (Christian Anfinsen, 1972 Nobel):

  1. Primary — amino-acid sequence (covalent).
  2. Secondary — local backbone hydrogen-bonded structures: α-helix (3.6 residues/turn, 5.4 Å pitch, Pauling 1951), β-sheet (parallel or antiparallel), turns, coils.
  3. Tertiary — full 3D fold of a single chain; stabilized by hydrophobic core, salt bridges, hydrogen bonds, disulfides (Cys–Cys), occasional metal coordination.
  4. Quaternary — assembly of multiple chains into a complex (hemoglobin α₂β₂, ribosome, proteasome, nuclear pore).

Functions: enzymes (catalysis, ~10⁶–10¹⁵ rate enhancement), structural (collagen, keratin, tubulin), transport (hemoglobin, channels, pumps), signaling (receptors, kinases, hormones), storage (ferritin, casein), defense (antibodies, complement), motors (myosin, kinesin, dynein).

Nucleic acids

  • DNA (deoxyribonucleic acid) — deoxyribose + phosphate backbone, bases adenine (A), guanine (G), cytosine (C), thymine (T). Right-handed B-form double helix (Watson + Crick, Nature 1953, building on Rosalind Franklin + Maurice Wilkins X-ray data + Erwin Chargaff base-ratio rules). Antiparallel strands (5’→3’ opposing 3’→5’); base pairs A=T (2 H-bonds) + G≡C (3 H-bonds); helix diameter ~2 nm, 10.5 bp/turn, major + minor grooves.
  • RNA (ribonucleic acid) — ribose + uracil (U) replaces thymine. Single-stranded but extensively folded; mRNA (messenger), tRNA (transfer), rRNA (ribosomal), miRNA (microRNA), siRNA (small interfering), lncRNA (long non-coding), piRNA (Piwi-interacting), snoRNA, snRNA. Catalytic RNAs (ribozymes; Cech + Altman 1989 Nobel) support an RNA-world origin hypothesis.

4. DNA + chromatin

The diploid human genome is ~6 Gb (~3 Gb haploid); ~20,000 protein-coding genes occupy only ~1.5%; the rest is regulatory, structural, repetitive, or of uncertain function (“dark genome”). Other landmark genome sizes: E. coli 4.6 Mb, yeast S. cerevisiae 12 Mb, Drosophila 140 Mb, mouse 2.7 Gb, lungfish 130 Gb (largest known animal).

Packaging in eukaryote nuclei is hierarchical:

  • Nucleosome — 147 bp DNA wrapped 1.65 times around a histone octamer (2× each of H2A, H2B, H3, H4); discovered by Roger Kornberg (2006 Nobel). Linker DNA + linker histone H1 between nucleosomes.
  • 10 nm fiber (beads-on-a-string) → 30 nm fiber (disputed in vivo) → topologically associated domains (TADs)A/B compartmentschromosome territories. TADs are mapped by Hi-C (Lieberman-Aiden 2009) + Micro-C; defined by CTCF + cohesin loop extrusion (Mirny + Dekker models).
  • Telomeres — TTAGGG repeats at chromosome ends; protected by shelterin complex (TRF1, TRF2, POT1, TIN2, TPP1, RAP1). Telomerase (reverse transcriptase, Greider + Blackburn + Szostak 2009 Nobel) extends telomeres in germline + stem cells + most cancers.
  • Centromeres — site of kinetochore assembly + spindle attachment; defined epigenetically by CENP-A histone variant rather than DNA sequence.
  • Heterochromatin vs euchromatin — heterochromatin is condensed, transcriptionally silent (constitutive: centromere + telomere; facultative: inactive X), marked by H3K9me3 + H3K27me3 + HP1 binding. Euchromatin is open, active, marked by H3K4me3 (promoters) + H3K27ac + H3K36me3.

DNA replication is semi-conservative — each parent strand templates a new daughter (Meselson + Stahl 1958, the “most beautiful experiment in biology”). Mechanism in E. coli:

  1. Initiation at a single origin (oriC) — DnaA-ATP melts the AT-rich DUE; DnaB helicase loaded; SSB stabilizes ssDNA.
  2. Elongation — DNA polymerase III holoenzyme replicates the leading strand continuously (5’→3’); the lagging strand discontinuously via Okazaki fragments (~1–2 kb in bacteria, ~150 bp in eukaryotes) initiated by RNA primers from primase. DNA Pol III synthesizes; Pol I removes primers + fills gaps; DNA ligase seals nicks.
  3. Termination at ter sites; chromosome decatenated by topoisomerase IV.

Eukaryote replication: thousands of origins per chromosome, licensed in G1 by ORC + Cdc6 + Cdt1 loading MCM helicase; activated in S by CDK + DDK firing. Polymerases: Pol α (primase), Pol δ (lagging strand), Pol ε (leading strand). Fidelity: ~10⁻⁹ errors per base per round, achieved by combined polymerase selectivity (~10⁻⁵), 3’→5’ proofreading exonuclease (~10⁻²), and post-replicative mismatch repair (~10⁻²–10⁻³).

DNA damage + repair pathways: base-excision repair (BER, glycosylase + AP endonuclease), nucleotide-excision repair (NER, XPA–G + ERCC1; defective in xeroderma pigmentosum), mismatch repair (MMR, MSH2/MSH6 + MLH1/PMS2; defective in Lynch syndrome), homologous recombination (HR, BRCA1 + BRCA2 + RAD51), non-homologous end joining (NHEJ, Ku70/80 + DNA-PKcs + LIG4). Synthetic-lethality drugs: PARP inhibitors (olaparib, talazoparib) kill BRCA-deficient tumors.

5. Transcription

In eukaryotes, three nuclear RNA polymerases divide labor:

  • RNA Pol I — rRNA (28S, 18S, 5.8S) in the nucleolus.
  • RNA Pol II — all mRNA + most snRNA + miRNA.
  • RNA Pol III — tRNA, 5S rRNA, U6 snRNA, 7SK, other short RNAs.

Pol II transcription cycle:

  1. Initiation — general transcription factors (TFIID with TBP, TFIIA, TFIIB, TFIIE, TFIIF, TFIIH) assemble at the core promoter (TATA box at –25 in TATA-containing genes; Inr, DPE, BRE in others). Mediator complex bridges to enhancers + transcription factors. TFIIH phosphorylates Pol II’s CTD (Ser5) and unwinds DNA.
  2. Promoter clearance + elongation — P-TEFb (CDK9 + cyclin T) phosphorylates Pol II CTD Ser2; releases the Pol from promoter-proximal pause (NELF + DSIF imposed); elongation factors (TFIIS, Spt4/5, Paf1C) sustain productive elongation.
  3. Termination — Pol II runs past the polyA signal; CPSF + CstF cleavage; XRN2 5’-exonuclease “torpedo” model.

Pre-mRNA processing is co-transcriptional:

  • 5’ cap — 7-methylguanosine added via inverted 5’-5’ triphosphate linkage; required for nuclear export, ribosome recruitment, stability.
  • Splicing — introns removed + exons joined by the spliceosome (U1, U2, U4, U5, U6 snRNPs + ~170 proteins). Two transesterification reactions; lariat intermediate. Alternative splicing generates multiple isoforms from a single gene; ~95% of multi-exon human genes are alternatively spliced. Disease-causing splice mutations are extensive (e.g., spinal muscular atrophy → nusinersen, an ASO that corrects SMN2 splicing; Spinraza 2016).
  • 3’ polyadenylation — CPSF + CstF cleave at AAUAAA + downstream U/GU element; poly(A) polymerase adds ~200 A residues. Alternative polyadenylation (APA) generates isoforms with different 3’ UTRs.
  • RNA editing — A→I (inosine, read as G) by ADAR enzymes; C→U by APOBEC; extensive in nervous system + Alu elements.

Regulation is combinatorial. Humans encode ~1,600 transcription factors (TFs), 80% are sequence-specific DNA binders (Lambert et al., Cell 2018). Major TF families: zinc fingers (largest, ~700), homeodomain, basic-helix-loop-helix (bHLH), basic-leucine-zipper (bZIP), nuclear receptors (ER, AR, GR, PPAR, RAR — drug targets). Enhancers can be 10–1000 kb from their target promoters and reach via chromatin looping (cohesin + CTCF + Mediator). Super-enhancers + phase-separated transcriptional condensates (Hnisz, Young, Brangwynne) concentrate TFs + Pol II + Mediator at lineage-defining genes.

Epigenetic regulation: DNA methylation (CpG, mostly silencing; DNMT1 maintenance + DNMT3A/B de novo; TET 5mC → 5hmC oxidation; bisulfite + EM-seq for detection). Histone modifications: methylation (Lys 4/9/27/36/79; Arg), acetylation (Lys, by HATs like p300/CBP, removed by HDACs), phosphorylation, ubiquitination, sumoylation, lactylation. Reader–writer–eraser model (Strahl + Allis “histone code” 2000).

6. Translation

The ribosome is a ribozyme — peptide-bond formation is catalyzed by 23S/28S rRNA, not protein (Steitz + Yonath + Ramakrishnan 2009 Nobel for ribosome structure).

Genetic code: 64 codons (4³) → 20 amino acids + 3 stop (UAA, UAG, UGA). The code is degenerate (most amino acids encoded by 2–6 codons), almost universal (mitochondria + ciliates have minor variations), and read 5’→3’ in non-overlapping triplets from the AUG start. Wobble at the third position (Crick 1966) allows a single tRNA to read multiple synonymous codons.

Translation cycle:

  1. Initiation — small subunit + initiator tRNA-Met scan mRNA from the 5’ cap (eIF4F: eIF4E cap-binder + eIF4G scaffold + eIF4A helicase) to the first Kozak-context AUG; large subunit joins; eIFs released. Internal ribosome entry sites (IRES) bypass cap dependence in some viruses + stress conditions.
  2. Elongation — eEF1A delivers aminoacyl-tRNA to A site; peptidyl transfer; eEF2 + GTP translocates ribosome one codon.
  3. Termination — stop codon recognized by release factor (eRF1 + eRF3); peptide released; ribosome recycled (ABCE1 splits subunits).

Aminoacyl-tRNA synthetases charge tRNAs with cognate amino acid; 20 enzymes split into Class I + II, each with editing domains to prevent mischarging (Fersht + Schimmel proofreading).

Co-translational + post-translational modifications (PTMs):

  • Signal peptide cleavage — N-terminal signal → SRP-mediated ER targeting → signal peptidase.
  • N-linked glycosylation — Asn-X-Ser/Thr; oligosaccharyltransferase in ER.
  • Disulfide bond formation — PDI in ER lumen.
  • Phosphorylation — Ser/Thr/Tyr; ~500 protein kinases in human kinome (Manning et al. 2002); major drug target class (>70 FDA-approved kinase inhibitors, e.g., imatinib for BCR-ABL).
  • Acetylation, methylation, ubiquitination, SUMOylation, NEDDylation, lipidation (myristoylation, palmitoylation, prenylation, GPI anchor), ADP-ribosylation, hydroxylation, sulfation, citrullination, O-GlcNAcylation.
  • Ubiquitin-proteasome system — E1 + E2 + E3 (~600 E3 ligases) attach polyubiquitin (K48-linked targets to proteasome; K63 + others signal differently). Drug class: targeted protein degradation via PROTACs + molecular glues (Arvinas, C4 Therapeutics, Kymera, Foghorn) — bivalent molecules that recruit an E3 to a target protein.

7. Protein folding + quality control

Proteins fold spontaneously (Anfinsen’s thermodynamic hypothesis) but in the crowded cell rely on molecular chaperones to prevent aggregation:

  • HSP70 family (DnaK in bacteria; HSPA1, HSPA8 in human) — ATP-driven cycles binding hydrophobic stretches; assisted by J-domain proteins (HSP40/DnaJ) + NEFs.
  • HSP90 (HSP90AA1, HSP90AB1; HtpG in bacteria) — late-stage folding of signaling clients (kinases, steroid receptors); inhibitors (geldanamycin, ganetespib) showed limited clinical success.
  • Chaperonins — large barrel-shaped complexes; bacterial GroEL/ES + eukaryote TRiC/CCT; encapsulated folding chamber.
  • Small HSPs (HSPB1/Hsp27) — ATP-independent aggregate sequestration.

ER stress + unfolded protein response (UPR) — sensors PERK (eIF2α phosphorylation → translation attenuation), IRE1 (XBP1 splicing), ATF6 (Golgi cleavage → TF). Chronic ER stress drives apoptosis (CHOP).

Degradation pathways:

  • Ubiquitin-proteasome system — soluble misfolded + regulated proteins.
  • Autophagy — macroautophagy (autophagosome → lysosome), chaperone-mediated autophagy (HSC70 + LAMP2A), microautophagy. Ohsumi 2016 Nobel for autophagy genetics. Selective forms: mitophagy (PINK1 + Parkin), aggrephagy, xenophagy.

Misfolding diseases:

  • Alzheimer’s disease — extracellular amyloid-β (Aβ) plaques + intracellular hyperphosphorylated tau tangles. Anti-amyloid antibodies lecanemab (Leqembi, Eisai + Biogen, FDA 2023) + donanemab (Kisunla, Lilly, FDA 2024) clear Aβ + modestly slow decline.
  • Parkinson’s disease — α-synuclein Lewy bodies in dopaminergic substantia nigra neurons; GBA + LRRK2 mutations as major risk factors; lecanerlimab + other α-syn antibodies in trials.
  • Huntington’s — polyQ-expanded huntingtin (CAG triplet repeat).
  • Prion diseases (Creutzfeldt-Jakob, kuru, BSE) — self-propagating PrP^Sc conformer (Prusiner 1997 Nobel).
  • Transthyretin amyloidosis — patisiran (Onpattro, Alnylam, first siRNA drug FDA 2018) + tafamidis (stabilizer) + vutrisiran.

Structural prediction revolution:

  • AlphaFold-2 (Jumper et al., Nature 2021) — transformer-based model achieved near-experimental accuracy at CASP14; predicted structures for ~200M proteins now in AlphaFold DB (EMBL-EBI).
  • AlphaFold-3 (Abramson et al., Nature 2024) — single architecture predicts proteins + nucleic acids + small-molecule ligands + ions + covalent modifications + complexes; available via AlphaFold Server (research use) + weights released for academic use late 2024.
  • ESMFold (Lin et al., Science 2023; Meta AI) — language-model-only, no MSA required; faster but slightly less accurate.
  • RoseTTAFold + RoseTTAFold All-Atom (Baker lab) — competitive open-source; All-Atom handles ligands + nucleic acids.
  • Boltz-1 (MIT 2024) + Boltz-2 (2025) — open-source AlphaFold-3-class models with permissive licensing.

8. Cell membrane + transport

Fluid mosaic model (Singer + Nicolson, Science 1972) — phospholipid bilayer with embedded proteins free to diffuse laterally. Modern refinement: lipid rafts (cholesterol + sphingolipid-enriched microdomains, ~10–200 nm) compartmentalize signaling.

Membrane composition is asymmetric — PC + sphingomyelin enriched on outer leaflet; PE + PS + PI on inner. PS exposure on the outer leaflet is an “eat-me” signal to phagocytes during apoptosis (catalyzed by scramblases like Xkr8). Maintained by flippases + floppases (ATP-driven) + scramblases.

Transport across membranes:

  • Passive diffusion — small nonpolar molecules (O₂, CO₂, NH₃, urea, ethanol, steroid hormones) cross unaided.
  • Facilitated diffusion — via channels (Na⁺, K⁺, Ca²⁺, Cl⁻, aquaporins for water — Agre 2003 Nobel) or carriers (GLUT1–4 glucose transporters).
  • Primary active transport — ATP-driven pumps: Na⁺/K⁺-ATPase (3 Na⁺ out, 2 K⁺ in per ATP; sets resting membrane potential ~ –70 mV), Ca²⁺-ATPase (SERCA, PMCA), H⁺-ATPase (V-ATPase, F-ATPase), ABC transporters (P-glycoprotein/MDR1 effluxes chemotherapeutics).
  • Secondary active transport — driven by electrochemical gradient: Na⁺-glucose symporter SGLT1 (SGLT2 inhibitors empagliflozin + dapagliflozin are major type-2 diabetes + heart-failure drugs), Na⁺/H⁺ exchanger, Na⁺/Ca²⁺ exchanger.
  • Endocytosis — clathrin-mediated (receptor-mediated, e.g., LDL uptake; Brown + Goldstein 1985 Nobel), caveolae, macropinocytosis, phagocytosis.
  • Exocytosis — constitutive (continuous) + regulated (Ca²⁺-triggered, e.g., neurotransmitter release via SNAREs: synaptobrevin/VAMP + syntaxin + SNAP-25; Rothman + Schekman + Südhof 2013 Nobel).

9. Cell signaling

Signal transduction converts an extracellular signal (hormone, growth factor, neurotransmitter, pathogen) into intracellular response. Major receptor + pathway families:

GPCRs (G-protein-coupled receptors)

~800 in human genome; 7-transmembrane α-helices. Activated GPCR catalyzes GDP → GTP exchange on heterotrimeric G protein α subunit; Gα + Gβγ dissociate + act on effectors. Gα families: Gs (activates adenylyl cyclase → cAMP), Gi (inhibits AC), Gq (PLCβ → IP3 + DAG → Ca²⁺ + PKC), G12/13 (RhoGEF → cytoskeleton). β-arrestin desensitizes + drives biased signaling. GPCRs are targets of ~35% of FDA-approved drugs. Lefkowitz + Kobilka shared the 2012 Chemistry Nobel for structural + functional dissection.

Notable GPCR drugs: β-blockers (propranolol, metoprolol), antihistamines (loratadine), opioids (morphine — MOR; naloxone reversal), AT1 antagonists (losartan), serotonin agonists (sumatriptan), GLP-1 agonists (semaglutide/Ozempic + Wegovy, tirzepatide/Mounjaro + Zepbound — GLP-1 + GIP dual; Eli Lilly + Novo Nordisk drove the 2023–2025 obesity revolution).

RTKs (Receptor Tyrosine Kinases)

~58 in human. Ligand binding → dimerization → trans-autophosphorylation on Tyr residues → SH2/PTB-domain adapter recruitment → downstream cascades. Examples: EGFR (HER family — gefitinib, erlotinib, osimertinib for NSCLC; trastuzumab for HER2+ breast cancer), insulin receptor (IRS → PI3K), VEGFR (bevacizumab, ramucirumab; sunitinib, pazopanib multi-kinase), FGFR, PDGFR, c-Kit, RET (selpercatinib), TRK (larotrectinib).

Major intracellular cascades

  • MAPK (Ras-Raf-MEK-ERK) — Ras GTPase (KRAS, HRAS, NRAS) activated downstream of RTKs; KRAS is the most common oncogenic mutation in cancer. Sotorasib + adagrasib selectively inhibit KRAS-G12C (Amgen + Mirati/BMS, FDA 2021 + 2022). Pan-RAS + KRAS-G12D inhibitors (Revolution Medicines RMC-6236) in late-stage trials 2025–26.
  • PI3K / Akt / mTOR — growth, survival, metabolism. PTEN tumor suppressor antagonizes. mTOR (rapamycin/sirolimus + everolimus) controls translation + autophagy + ribosome biogenesis.
  • Wnt / β-catenin — embryogenesis + stem cell maintenance; APC tumor suppressor; β-catenin mutations in colorectal cancer.
  • Hedgehog (Hh) — patched (PTCH1) + smoothened (SMO) + Gli; vismodegib (SMO inhibitor) for basal cell carcinoma.
  • Notch — ligand-receptor on neighboring cells; γ-secretase cleaves NICD → nuclear translocation.
  • JAK-STAT — cytokine + interferon signaling; tofacitinib + ruxolitinib + upadacitinib (JAK inhibitors for autoimmune + myelofibrosis).
  • NF-κB — inflammation + immune response; activated by TNF, IL-1, TLRs.
  • TGF-β / SMAD — growth inhibition + differentiation + EMT.
  • Hippo / YAP-TAZ — organ size + mechanotransduction.

Second messengers

cAMP (PKA), cGMP (PKG; PDE5 inhibitors sildenafil + tadalafil prolong cGMP), Ca²⁺ (calmodulin + CaMKII + calcineurin + troponin), IP3 (releases ER Ca²⁺ via IP3R), DAG (activates PKC), PIP3, NO (gaseous; activates soluble guanylyl cyclase; Furchgott + Ignarro + Murad 1998 Nobel), H₂S, CO.

10. Cell cycle + apoptosis

Eukaryote cell cycle: G1 → S → G2 → M → cytokinesis → G1 (or G0 quiescence).

Phases: G1 = growth + commitment (restriction point); S = DNA replication; G2 = preparation for mitosis; M = mitosis (prophase → prometaphase → metaphase → anaphase → telophase) + cytokinesis.

Regulatorscyclins (D in G1, E in late G1/S, A in S/G2, B in M) + cyclin-dependent kinases (CDKs):

  • CDK4/6 + cyclin D — G1 progression; Rb phosphorylation releases E2F. CDK4/6 inhibitors palbociclib (Ibrance, Pfizer 2015), ribociclib (Kisqali, Novartis), abemaciclib (Verzenio, Lilly) — major HR+/HER2− breast cancer drugs.
  • CDK2 + cyclin E — S entry.
  • CDK1 + cyclin B — M entry; targeted by Wee1 (negative) + Cdc25 (positive) phosphatases.

CKIs (CDK inhibitors): p21 (CDKN1A, p53 target), p27 (CDKN1B), p16 (CDKN2A; INK4 family); tumor suppressors.

Checkpoints — DNA-damage (ATM/ATR → Chk2/Chk1 → p53 + Cdc25), G2/M (spindle integrity), spindle-assembly checkpoint (Mad2 + BubR1 + MCC inhibits APC/C until all kinetochores attached).

Tumor suppressors + oncogenes:

  • p53 (TP53) — “guardian of the genome” (David Lane 1992); mutated in >50% of human cancers; transactivates p21, PUMA, BAX, MDM2 (feedback). MDM2 antagonists (idasanutlin, navtemadlin) revive WT p53. Hainaut + Olivier IARC TP53 database catalogs ~30,000 somatic mutations.
  • Rb (retinoblastoma) — phosphorylated Rb releases E2F → S entry.
  • Oncogenes — MYC (transcription amplifier; difficult drug target; degraders + BET inhibitors targeting MYC-driven enhancers), RAS family, BCL-2 (anti-apoptotic; venetoclax/Venclexta is a BH3 mimetic FDA 2016 for CLL + AML).

Apoptosis (programmed cell death)

Defined morphologically by Kerr, Wyllie, Currie (1972 Br J Cancer). Distinct from necrosis (cell rupture, inflammatory) and from other PCD modes: necroptosis (RIPK1/3 + MLKL), pyroptosis (caspase-1/4/5/11 + gasdermin D), ferroptosis (iron-dependent lipid peroxidation, GPX4 + ACSL4), cuproptosis, NETosis, parthanatos.

Caspases — cysteine-aspartate proteases; initiator (caspase-8, -9, -10) → executioner (caspase-3, -6, -7).

  • Intrinsic (mitochondrial) pathway — cellular stress → BAX/BAK oligomerization → mitochondrial outer-membrane permeabilization (MOMP) → cytochrome c release → apoptosome (Apaf-1 + caspase-9) → caspase-3. Regulated by BCL-2 family (anti: BCL-2, BCL-xL, MCL1; pro: BAX, BAK; BH3-only: BIM, BID, PUMA, NOXA).
  • Extrinsic (death-receptor) pathway — Fas/FasL or TNFR1 + TRADD/FADD → DISC → caspase-8 → caspase-3 (+ tBID amplification loop into the intrinsic pathway).

Inhibitors of apoptosis: IAPs (XIAP, cIAP1/2, survivin) — antagonized by SMAC/DIABLO; SMAC mimetics (birinapant, tolinapant) in trials.

11. Mitosis + meiosis

Mitosis — somatic cell division producing two genetically identical diploid (2n) daughter cells:

  1. Prophase — chromatin condenses, centrosomes separate.
  2. Prometaphase — nuclear envelope breaks down (lamin phosphorylation by CDK1), microtubules capture kinetochores.
  3. Metaphase — chromosomes align at the metaphase plate.
  4. Anaphase — APC/C ubiquitinates securin → separase cleaves cohesin → sister chromatids separate; cyclin B destruction inactivates CDK1.
  5. Telophase — nuclear envelope reforms; chromosomes decondense.
  6. Cytokinesis — actomyosin contractile ring (animals) or cell plate (plants) divides the cytoplasm.

Meiosis — germ-line division producing four haploid (n) gametes, with one round of DNA replication followed by two divisions:

  • Meiosis I (reductional) — homologous chromosomes pair (synaptonemal complex), recombine via crossing-over (Spo11 induces DSBs → resection → DMC1/RAD51-mediated strand invasion → Holliday junctions → resolution by MUS81 + GEN1; ~30–40 crossovers per human meiosis), then separate; sister chromatids remain joined.
  • Meiosis II (equational) — sister chromatids separate (mitosis-like).

Genetic diversity arises from (a) crossing-over, (b) independent assortment of homolog pairs (2²³ ≈ 8.4 million combinations for human 23 pairs), (c) random fertilization.

12. Mitochondria + ATP

Cellular respiration of one glucose generates ~30–32 ATP via three coupled stages:

Glycolysis (cytoplasm)

Glucose → 2 pyruvate; net 2 ATP + 2 NADH. Rate-limited by phosphofructokinase-1 (PFK-1), allosterically activated by AMP + F-2,6-BP, inhibited by ATP + citrate. Anaerobic: pyruvate → lactate (regenerates NAD⁺); aerobic: pyruvate enters mitochondria.

TCA / citric acid / Krebs cycle (mitochondrial matrix)

Pyruvate → acetyl-CoA (pyruvate dehydrogenase, PDH); acetyl-CoA enters cycle via citrate synthase. Per glucose (2 acetyl-CoA): 6 NADH + 2 FADH₂ + 2 GTP + 4 CO₂. Hans Krebs 1953 Nobel.

Electron transport chain + oxidative phosphorylation (inner membrane)

Four complexes pump H⁺ from matrix to intermembrane space:

  • Complex I (NADH:ubiquinone oxidoreductase) — NADH → ubiquinone (CoQ); ~45 subunits; rotenone inhibitor.
  • Complex II (succinate dehydrogenase) — succinate → CoQ; does not pump H⁺; also a TCA enzyme.
  • Complex III (cytochrome bc1) — CoQH₂ → cytochrome c; Q-cycle; antimycin A inhibitor.
  • Complex IV (cytochrome c oxidase) — cyt c → O₂ + H₂O; cyanide + CO inhibitors.

The H⁺ gradient drives ATP synthase (Complex V, F1F0) — rotary motor; γ-subunit rotation cycles three β-subunits through L/T/O conformations (binding-change mechanism, Boyer 1997 Nobel; Walker 1997 Nobel for atomic structure). ~2.5 ATP per NADH; ~1.5 per FADH₂.

Uncoupling proteins (UCP1/thermogenin) in brown adipose tissue dissipate the gradient as heat.

Mitochondrial DNA (mtDNA) — 16,569 bp circular in human; 13 protein-coding (all ETC subunits), 22 tRNA, 2 rRNA. Maternal inheritance (sperm mitochondria selectively degraded). Heteroplasmy + threshold effect underlie mitochondrial diseases (MELAS, LHON, Leigh syndrome).

13. Modern molecular tools

PCR + amplification

  • PCR (Mullis 1985; Mullis 1993 Nobel) — denature → anneal primers → extend with thermostable polymerase (Taq from Thermus aquaticus; Pfu, KOD, Phusion for higher fidelity). Exponential amplification, ~10⁶ × in 30 cycles.
  • qPCR (real-time PCR) — quantitative; SYBR Green dye or TaqMan probes; Cq values relate to template abundance.
  • RT-qPCR — reverse transcribe RNA → cDNA → qPCR; mainstay of gene expression + clinical viral load assays (SARS-CoV-2 tests).
  • Digital PCR (dPCR / ddPCR — Bio-Rad QX200, Stilla naica) — partitions sample; absolute quantification; gold standard for rare variant detection + copy number.
  • Isothermal amplification — LAMP, RPA, NASBA, SHERLOCK + DETECTR (CRISPR-Cas12/13 fluorescence assays, Doudna + Zhang labs); point-of-care diagnostics.

Sanger sequencing

Chain-termination with fluorescent ddNTPs (Sanger 1977; second Nobel 1980). Read length ~800–1000 bp; ~99.99% raw accuracy. Still gold standard for clone validation + clinical CFTR + HBB + targeted gene confirmation. ABI 3500/3730 capillary instruments dominate.

Next-generation sequencing (NGS)

Massively parallel platforms; reads per run from 10⁸ to 10¹⁰.

  • Illumina — sequencing-by-synthesis (SBS) with reversible terminators on a flow cell. Dominant since 2010. Platforms (2026): NovaSeq X / X Plus (top end, ~20 Gb/hr, ~$200 human genome at scale; 2023 launch), NextSeq 1000/2000, MiSeq (small runs + amplicon).
  • Element Biosciences AVITI — avidite-bound nucleotides, polony amplification; competitive accuracy + cost; second wave 2022–24.
  • Singular Genomics G4 + G4X spatial (2024).
  • Ultima Genomics UG100 — mostly natural nucleotides on a spinning wafer; pushed cost toward $100 / human genome (2022 ASHG announcement).
  • Ion Torrent (Thermo Fisher) — semiconductor-based H⁺ detection; clinical panels (Oncomine) + smaller-scale.
  • PacBio (Pacific Biosciences) — single-molecule real-time (SMRT) zero-mode waveguides; HiFi circular consensus reads ~15–25 kb at >99.9% accuracy; Revio (2023) instrument; Onso short-read sequencing-by-binding (2023).
  • Oxford Nanopore Technologies — proteinaceous nanopore ionic-current sequencing; reads from kb to Mb (ultra-long); MinION (USB, portable), GridION, PromethION 24/48; real-time streaming; direct RNA + base modification detection (5mC, 5hmC, 6mA).

Cost trajectory: Human Genome Project ~1M Watson genome 2007 (454) → 100–200 genome 2024** (NovaSeq X, Ultima, Element). Faster than Moore’s law for two decades.

Long-read + telomere-to-telomere (T2T) — T2T consortium published the first complete (gapless) human reference (T2T-CHM13, Science 2022), adding ~200 Mb of previously unresolved sequence (centromeres, rDNA, segmental duplications). HPRC pangenome (Nature 2023) gives 47 phased diploid assemblies; the new reference standard.

CRISPR genome editing

CRISPR-Cas9 — bacterial adaptive immune system repurposed for programmable genome editing. sgRNA (single-guide RNA, fusion of crRNA + tracrRNA) directs Cas9 nuclease to a 20-nt target adjacent to a PAM (NGG for S. pyogenes Cas9); creates a blunt DSB ~3 bp upstream of the PAM; cellular repair via NHEJ (indels, knockout) or HDR (precise edits with donor template). Doudna + Charpentier 2020 Nobel for mechanism (Jinek et al., Science 2012); Zhang + Church demonstrated mammalian application (Cong et al. + Mali et al., Science 2013).

Casgevy (exa-cel) — Vertex + CRISPR Therapeutics; ex vivo CRISPR editing of autologous hematopoietic stem cells to disrupt BCL11A and reactivate fetal hemoglobin; FDA approval December 2023 for sickle cell disease + β-thalassemia — first approved CRISPR therapy.

Other Cas enzymes:

  • Cas12 (Cpf1) — RNA-guided; staggered cut + T-rich PAM; uses single crRNA; detects ssDNA with collateral cleavage → DETECTR diagnostics.
  • Cas13 — RNA-targeting; collateral ssRNA cleavage → SHERLOCK diagnostics; viral RNA degradation (PAC-MAN, ZIM).
  • Cas14 (Cas12f) — compact (~400–700 aa) → AAV packaging.
  • Cas3 — Type I systems; long-range processive degradation.

Base editing (David Liu lab) — fuses dCas9 or Cas9 nickase to a deaminase:

  • Cytosine base editor (CBE) — APOBEC1 + UGI; C→T (G→A on opposite strand).
  • Adenine base editor (ABE) — engineered TadA; A→G (T→C).

No DSB → cleaner edits, fewer indels + translocations. Verve Therapeutics VERVE-101 → VERVE-102in vivo ABE delivered via LNP targeting PCSK9 in liver for hypercholesterolemia (heterozygous FH); first in vivo base-editing clinical trial; positive Phase 1b 2025.

Prime editing (David Liu lab, 2019) — Cas9 nickase fused to a reverse transcriptase + pegRNA (prime editing guide RNA) that encodes both the target site + the desired edit. Supports all 12 base substitutions + small indels without DSB or donor DNA. PE-clinical — Prime Medicine PM359 for CGD; 2024 first dosing.

CRISPR screens — pooled lentiviral sgRNA libraries (Brunello, Brie, GeCKO, TKO, Vienna, KuKuKK; Sanson + Doench 2018) for genome-wide loss-of-function (KO), CRISPRi (dCas9-KRAB knockdown), CRISPRa (dCas9-VP64/p65/Rta activation), Perturb-seq (combined with scRNA-seq).

Single-cell + spatial

  • scRNA-seq — 10x Genomics Chromium (3’ or 5’; ~10⁴ cells/run; dominant platform 2020–26); Parse Biosciences Evercode (combinatorial indexing, >10⁵ cells); BD Rhapsody; Mission Bio Tapestri (genotype + phenotype). Atlases: Human Cell Atlas (HCA), Tabula Sapiens (multi-organ, Science 2022), HuBMAP, Allen Brain Cell Atlas, CellxGene (curation portal).
  • Single-nucleus RNA-seq (snRNA-seq) — for frozen tissue + neurons.
  • Spatial transcriptomics — 10x Genomics Visium (55 µm spots), Xenium (subcellular, in situ probe-based, 100s of genes); NanoString CosMx (RNA + protein, subcellular); Vizgen MERSCOPE (MERFISH); Curio Seeker (Slide-seq); Stereo-seq (BGI, nm-scale).
  • CITE-seq — paired RNA + surface protein (oligo-tagged antibodies).
  • ATAC-seq + scATAC-seq — chromatin accessibility (10x Multiome combines RNA + ATAC).
  • Spatial proteomics — Akoya CODEX/PhenoCycler, Lunaphore COMET, IONpath MIBI.

Mass + flow cytometry

  • Flow cytometry — fluorescence-based, ~30 parameters (BD FACSymphony, Cytek Aurora spectral); cell sorting (FACS).
  • Mass cytometry (CyTOF) — metal-conjugated antibodies + TOF mass spec; ~50 markers without fluorescence overlap (Fluidigm/Standard BioTools Helios).

Microscopy

  • Confocal — Zeiss LSM 900 + 980, Leica STELLARIS, Nikon AX/AX R; pinhole rejects out-of-focus light.
  • Light-sheet — Zeiss Lattice Lightsheet 7, Luxendo MuVi-SPIM, 3i LCS-SPIM — illuminates a thin sheet; gentle on samples; fast volumetric imaging of embryos + organoids.
  • Super-resolution (Hell + Betzig + Moerner 2014 Nobel):
    • STED — Leica TCS SP8 STED, Abberior STEDYCON.
    • PALM / STORM — single-molecule localization, ~20 nm.
    • MINFLUX — Abberior MINFLUX, sub-nm precision (2017+).
  • Cryo-electron microscopy (cryo-EM)Henderson + Frank + Dubochet 2017 Nobel. “Resolution revolution” triggered by direct-electron detectors (Gatan K2/K3, Falcon 4) ~2013 enabling routine sub-3 Å single-particle reconstructions; Thermo Fisher Krios + Glacios microscopes. Cryo-ET (electron tomography) for in situ structures. The current structural-biology golden age is the combination of AlphaFold + cryo-EM at scale.
  • Expansion microscopy (ExM, Boyden 2015) — physically swells the specimen ~4–20× in hydrogel; super-resolution on a confocal.

14. Omics

  • Genomics — DNA sequence + variation (SNVs, indels, CNVs, SVs).
  • Transcriptomics — RNA expression; bulk RNA-seq (most cost-effective) + scRNA-seq (cell-type resolution).
  • Proteomics — quantitative mass spectrometry. Aebersold + Mann (2003 Nature review) established the field. Modern: DIA / SWATH-MS (data-independent acquisition; Bruker timsTOF Pro 2 + HT, Thermo Orbitrap Astral 2024) + TMT / iTRAQ isobaric labeling + label-free quantification. Single-cell proteomics emerging (SCoPE-MS, Nautilus Biotechnology, IonOpticks Aurora).
  • Metabolomics — small-molecule profiling via NMR + LC-MS / GC-MS; HMDB + METLIN databases.
  • Epigenomics — DNA methylation (bisulfite WGBS, EM-seq enzymatic without bisulfite damage, methylation EPIC array), histone modifications (ChIP-seq, CUT&RUN, CUT&Tag), accessibility (ATAC-seq, DNase-seq), 3D genome (Hi-C, Micro-C, ChIA-PET, HiChIP).
  • Lipidomics + glycomics + interactomics — specialty omics for membrane biology, sugar codes, protein-protein interaction networks (AP-MS, BioID, TurboID, APEX2).
  • Multi-omics integration — Seurat + Signac (R), Scanpy + muon (Python), MOFA / MOFA+, GLUE (graph-linked unified embedding, 2022), totalVI, MultiVI; methods from the deep-learning + variational-inference toolkit.

15. Modern + AI biology 2024–26

The 2020s are biology’s deep-learning decade. Key landmarks:

  • AlphaFold-2 (Jumper et al., Nature 2021) — solved CASP14; released 200 M structures via AlphaFold DB (EMBL-EBI partnership).
  • AlphaFold-3 (Abramson et al., Nature 2024; Google DeepMind + Isomorphic Labs) — single architecture for proteins + nucleic acids + ligands + ions + covalent modifications + complexes; revolutionized structure-based drug discovery; AlphaFold Server (free research) + commercial Isomorphic Labs licensing.
  • ESMFold + ESM-2 (Lin et al., Science 2023; Meta AI) — protein language model; sequence-only structure prediction; ESM-3 (EvolutionaryScale, 2024) is a generative biology model.
  • RFdiffusion (Watson et al., Nature 2023; Baker lab) + RFdiffusion All-Atom + Chroma (Generate Biosciences) — diffusion-model-based de novo protein design.
  • AlphaProteo (Google DeepMind 2024) — de novo binder design; reported affinities 3–300 fold better than prior best.
  • Boltz-1 (MIT 2024) + Boltz-2 (2025) — open-weight AlphaFold-3-class with permissive license; closes the open-source gap.
  • AlphaMissense (Google DeepMind, Science 2023) — classifies 71M missense variants across the proteome.
  • Genomic foundation modelsEvo (Arc Institute, 2024) trained on 2.7M prokaryote + phage genomes, generates synthetic genomes + CRISPR systems; Evo-2 (2025) extends to eukaryote scale; Nucleotide Transformer (InstaDeep + NVIDIA); HyenaDNA (Stanford 2023, 1M-token context).
  • Single-cell foundation modelsscGPT (Cui et al. 2024), Geneformer (Theodoris 2023), scFoundation (BGI 2023), UCE (universal cell embedding, CZI 2024).
  • AI-designed enzymes + therapeutics — Profluent (ProGen2 family; OpenCRISPR-1 2024, first AI-designed gene editor), Inceptive (AI-designed mRNA medicines), Generate Biosciences (Chroma), Cradle.bio (protein engineering platform), Latent Labs (de novo protein platform 2024 Demis Hassabis-related), Diamond Age (AI antibodies).
  • AI drug discovery — Isomorphic Labs (DeepMind spinout, partnerships with Lilly + Novartis 2024), Recursion (acquired Exscientia 2024 → mega-merged platform), Insilico Medicine (Phase 2 INS018_055 IPF molecule).
  • AI clinical foundation models — Med-PaLM-2 → Med-Gemini → MedGemma (Google 2024), Tx-LLM (Google 2024 for therapeutic development), Aristotle BioMedLM.

16. Bioinformatics tools

Sequence analysis:

  • BLAST (Altschul et al. 1990) — heuristic local alignment; NCBI mainstay.
  • BWA-MEM / BWA-MEM2 — short-read alignment to reference.
  • minimap2 (Heng Li) — long-read alignment; de facto standard.
  • STAR + Salmon + Kallisto — RNA-seq alignment + transcript quantification.
  • GATK (Broad) + DeepVariant (Google) — variant calling; PEPPER-Margin-DeepVariant for long-read.
  • Samtools + bcftools + bedtools — file manipulation (BAM/SAM/VCF/BED).
  • fastp + cutadapt + Trim Galore — read trimming + QC.

Workflow + reproducibilitySnakemake (Köster), Nextflow + nf-core (Ewels et al.), WDL/Cromwell (Broad), Galaxy (web-based).

Languages + ecosystemsBioconductor (R; DESeq2, edgeR, limma, Seurat, GenomicRanges), Biopython, scikit-bio, scverse (Scanpy, AnnData, muon, scvi-tools, squidpy).

Structure prediction + analysis — AlphaFold + ColabFold (community-accessible Mmseqs MSA + AF2/AF3), OpenFold (PyTorch reimplementation), Boltz, RoseTTAFold, ChimeraX (UCSF visualization), PyMOL (Schrödinger).

DatabasesNCBI (GenBank, Entrez, dbSNP, ClinVar, RefSeq, PubMed, SRA), Ensembl + Ensembl Genomes (EBI), UniProt (protein sequence + annotation; SwissProt manually curated, TrEMBL automatic), RCSB PDB (experimental structures), AlphaFold DB (200 M predicted), ChEMBL (bioactivity), DisGeNET + OMIM (disease–gene), Reactome + KEGG + WikiPathways (pathways), STRING + BioGRID + IntAct (PPI), GTEx (tissue expression), TCGA + ICGC (cancer genomics), gnomAD + UK Biobank + All of Us (population genomics).

17. Selection heuristics (for life science engineers)

  • Sequence DNA at scale (clinical / population / large project): Illumina NovaSeq X for short-read at ~$100–200/genome; PacBio Revio or Oxford Nanopore PromethION 24/48 for long-read structural variants + repeats + methylation.
  • Sequence RNA (bulk): Illumina NovaSeq X / NextSeq 2000 with poly-A or rRNA depletion; quantify with Salmon + DESeq2.
  • Sequence single cells: 10x Chromium 3’ or 5’ (most published + supported) → CellRanger → Seurat (R) or Scanpy (Python); integration with scVI + Harmony + scANVI; foundation-model annotation via scGPT or Geneformer.
  • Functional genomics screen: CRISPR-Cas9 KO (Brunello / Brie genome-wide libraries) or CRISPRi (Dolcetto); for single-cell readout, Perturb-seq + analysis with mixscape (Seurat).
  • Predict protein structure: AlphaFold-3 via AlphaFold Server (single + complex) or Boltz-1/2 (local, open weights); cross-validate with experimental data (cryo-EM, X-ray, NMR, SAXS).
  • Design a novel protein / binder: RFdiffusion (backbone) → ProteinMPNN (sequence design) → AlphaFold-3 validation → wet-lab screen.
  • Drug-target validation: combine bulk + scRNA-seq (expression in target tissue) + CRISPR screens (essentiality) + structural prediction + multi-omics + chemical biology (PROTACs, covalent fragments).
  • Diagnose pathogen: metagenomic NGS (mNGS) with Kraken2 + Bracken / Centrifuge / Metagenomics Rapid Annotation; nanopore for field deployment.
  • Edit a patient cell (ex vivo): CRISPR-Cas9 (Casgevy-style autologous HSC) for blood / bone-marrow diseases; LNP-delivered base editing (Verve-style) for in vivo liver targets.

18. Pitfalls

  • CRISPR off-target edits + chromothripsis — use high-fidelity Cas9 variants (eSpCas9, SpCas9-HF1, HiFi Cas9), careful sgRNA design (CRISPick / Synthego / Benchling / CHOPCHOP), and on-/off-target validation (GUIDE-seq, CIRCLE-seq, DISCOVER-seq, rhAmpSeq).
  • Sequencing depth too low for variant detection — WGS needs 30× minimum; tumor-WES often 100–200×; ctDNA + rare somatic / mosaic variants need 1000× + UMIs (e.g., duplex sequencing, Twinstrand).
  • Batch effects in scRNA-seq + bulk RNA-seq — mix conditions across batches/days/operators; use Harmony, scVI, fastMNN, or scANVI for integration; never confound batch with biological variable.
  • Biological vs technical replicates — at least 3 biological replicates for statistical claims; technical replicates do not increase the biological N.
  • Mass-spec cross-talk + contamination — keratin + trypsin autolysis + plasticizer peaks; carryover; use blank runs + iRT spike-in (Biognosys) + isolation-window awareness.
  • Over-interpreting AlphaFold confidencepLDDT is per-residue confidence (use >70 for trust; >90 for very high), PAE is pairwise alignment error (essential for multimer interface confidence); flexible regions + disorder + alternative conformations + ligand-induced changes are not captured. AlphaFold-3 confidence metrics: pLDDT, PAE, ipTM, pTM.
  • Confounding alternative splicing + isoform — gene-level counts hide isoform switches; use Salmon + IsoformSwitchAnalyzeR or Bambu (Nanopore).
  • Mistaking correlation for causality in GWAS / eQTL — use Mendelian randomization, fine-mapping (SuSiE, CAVIAR), and follow-up CRISPR perturbation to nominate causal genes.

19. Cross-references

  • [[Biology/_index]] — library index + subdomain plan.
  • [[Chemistry/organic-chemistry-foundations]] — biomolecule building blocks (amino acids, sugars, lipids, nucleotides) + reaction mechanisms.
  • [[Engineering/pharma-process-engineering]] — manufacturing of biologics + small molecules + cell therapies.
  • [[Engineering/bioinstrumentation]] — sequencers, mass spec, microscopes, cytometers, bioreactors.
  • [[Engineering/biomechanics]] — physical biology, mechanotransduction, tissue engineering.
  • [[Engineering/microfluidics]] — chip technology underlying 10x Chromium, organ-on-chip, lab-on-chip diagnostics.
  • [[Compute/transformer-architecture]] — backbone of AlphaFold, ESM-2, scGPT, Evo, ESM-3.
  • [[Math/probability-fundamentals]] — Bayesian inference for phylogenetics, population genetics, GWAS, variant calling, single-cell analysis.
  • [[Compute/database-internals]] — petabyte-scale sequence + structure databases (NCBI SRA, EBI ENA, AlphaFold DB).

20. Citations

Textbooks:

  • Alberts B, Hopkin K, Lewis J, Morgan D, Raff M, Roberts K, Walter P. Molecular Biology of the Cell. 7th ed. W.W. Norton, 2022. (The canonical reference.)
  • Lodish H, Berk A, Kaiser CA, et al. Molecular Cell Biology. 9th ed. W.H. Freeman, 2021.
  • Watson JD, Baker TA, Bell SP, Gann A, Levine M, Losick R. Molecular Biology of the Gene. 8th ed. Pearson, 2022.

Landmark papers:

  • Watson JD, Crick FHC. Molecular structure of nucleic acids. Nature 1953;171:737–8.
  • Meselson M, Stahl FW. The replication of DNA in Escherichia coli. PNAS 1958;44:671–82.
  • Crick FHC. Central dogma of molecular biology. Nature 1970;227:561–3 (originally 1958 symposium).
  • Kerr JFR, Wyllie AH, Currie AR. Apoptosis: a basic biological phenomenon with wide-ranging implications in tissue kinetics. Br J Cancer 1972;26:239–57.
  • Singer SJ, Nicolson GL. The fluid mosaic model of the structure of cell membranes. Science 1972;175:720–31.

CRISPR:

  • Jinek M, Chylinski K, Fonfara I, Hauer M, Doudna JA, Charpentier E. A programmable dual-RNA–guided DNA endonuclease in adaptive bacterial immunity. Science 2012;337:816–21.
  • Cong L, Ran FA, Cox D, et al. Multiplex genome engineering using CRISPR/Cas systems. Science 2013;339:819–23.
  • Mali P, Yang L, Esvelt KM, et al. RNA-guided human genome engineering via Cas9. Science 2013;339:823–6.
  • Komor AC, Kim YB, Packer MS, Zuris JA, Liu DR. Programmable editing of a target base in genomic DNA without DSBs. Nature 2016;533:420–4 (CBE).
  • Gaudelli NM, Komor AC, Rees HA, et al. Programmable base editing of A•T to G•C in genomic DNA. Nature 2017;551:464–71 (ABE).
  • Anzalone AV, Randolph PB, Davis JR, et al. Search-and-replace genome editing without DSBs or donor DNA. Nature 2019;576:149–57 (prime editing).
  • Doench JG, Fusi N, Sullender M, et al. Optimized sgRNA design to maximize activity and minimize off-target effects. Nat Biotechnol 2016;34:184–91.

Structure prediction:

  • Jumper J, Evans R, Pritzel A, et al. Highly accurate protein structure prediction with AlphaFold. Nature 2021;596:583–9.
  • Abramson J, Adler J, Dunger J, et al. Accurate structure prediction of biomolecular interactions with AlphaFold 3. Nature 2024;630:493–500.
  • Lin Z, Akin H, Rao R, et al. Evolutionary-scale prediction of atomic-level protein structure with a language model. Science 2023;379:1123–30.
  • Watson JL, Juergens D, Bennett NR, et al. De novo design of protein structure and function with RFdiffusion. Nature 2023;620:1089–100.

Other:

  • Lambert SA, Jolma A, Campitelli LF, et al. The human transcription factors. Cell 2018;172:650–65.
  • Manning G, Whyte DB, Martinez R, Hunter T, Sudarsanam S. The protein kinase complement of the human genome. Science 2002;298:1912–34.
  • Strahl BD, Allis CD. The language of covalent histone modifications. Nature 2000;403:41–5.
  • Nurk S, Koren S, Rhie A, et al. The complete sequence of a human genome. Science 2022;376:44–53 (T2T-CHM13).
  • Liao W-W, Asri M, Ebler J, et al. A draft human pangenome reference. Nature 2023;617:312–24 (HPRC).

Nobel Prizes referenced: Sanger (Chem 1958 + 1980), Watson + Crick + Wilkins (Med 1962), Krebs (Med 1953), Anfinsen (Chem 1972), Mullis (Chem 1993), Boyer + Walker (Chem 1997), Prusiner (Med 1997), Furchgott + Ignarro + Murad (Med 1998), Brown + Goldstein (Med 1985), Cech + Altman (Chem 1989), Greider + Blackburn + Szostak (Med 2009), Steitz + Ramakrishnan + Yonath (Chem 2009), Lefkowitz + Kobilka (Chem 2012), Yamanaka + Gurdon (Med 2012), Rothman + Schekman + Südhof (Med 2013), Hell + Betzig + Moerner (Chem 2014), Ohsumi (Med 2016), Dubochet + Frank + Henderson (Chem 2017), Arnold + Smith + Winter (Chem 2018), Charpentier + Doudna (Chem 2020), Karikó + Weissman (Med 2023, mRNA vaccines).