Genetics & Genomics — Biology Reference

Genetics is the study of inheritance of traits encoded in DNA; genomics is the study of the structure, function, evolution, and editing of complete genomes. This reference covers classical (Mendelian) genetics, population genetics, linkage and recombination, sequencing technologies (short-read, long-read, nanopore, T2T finishing), variant taxonomy, genome-wide association studies (GWAS), polygenic scoring, pharmacogenomics, epigenomics, single-cell and spatial genomics, synthetic biology and genome engineering, gene therapy (approved AAV products and 2024-26 in-vivo editing trials), cancer genomics consortia (TCGA, ICGC, Genomics England, COSMIC, AACR Project GENIE), and the 2024-26 wave of generative-AI biology models (AlphaFold-3, ESM-3, Evo, Boltz-1, AlphaProteo, OpenCRISPR-1, Chroma, Generate Biomedicines RFdiffusion descendants, scGPT). Cross-references at the end link to cell-molecular-biology, pharma-process-engineering, bioinstrumentation, transformer-architecture, and probability-fundamentals.


1. Classical (Mendelian) genetics

1.1 Mendel’s laws (Gregor Mendel, Brno, 1866 Versuche über Pflanzenhybriden)

  • Law of segregation. Each diploid individual carries two alleles per autosomal locus; during meiosis the alleles segregate, so each gamete receives exactly one. Rediscovered 1900 by de Vries, Correns, Tschermak.
  • Law of independent assortment. Alleles at unlinked loci segregate independently (modified by linkage and recombination — section 3).
  • Dominance and recessiveness. A dominant allele masks a recessive allele in heterozygotes; incomplete dominance and codominance (ABO blood group, Landsteiner 1900) violate the binary picture.

1.2 Pedigree patterns

  • Autosomal dominant. Each affected child has at least one affected parent (modulo de novo mutations); 1/2 transmission risk. Examples: Huntington disease (CAG repeat in HTT, Gusella 1983 / MacDonald 1993), achondroplasia (FGFR3 p.G380R), familial hypercholesterolaemia (LDLR, Goldstein and Brown 1973-85, Nobel 1985).
  • Autosomal recessive. Affected child has two carrier parents; 1/4 risk per pregnancy. Examples: cystic fibrosis (CFTR p.F508del, Riordan and Tsui 1989), sickle-cell anaemia (HBB p.E6V, Ingram 1956), phenylketonuria (PAH, Følling 1934).
  • X-linked recessive. Males (XY) express; female carriers transmit. Examples: Duchenne muscular dystrophy (DMD, Kunkel 1986), haemophilia A (F8), red-green colour blindness, Lesch-Nyhan (HPRT1).
  • X-linked dominant. Rare; fragile X (FMR1 CGG expansion, Verkerk 1991) and Rett syndrome (MECP2, Amir 1999) approximate this.
  • Y-linked / holandric. SRY (testis-determining factor, Sinclair 1990).
  • Mitochondrial. Strictly maternal (heteroplasmy → variable penetrance): LHON, MELAS, Leigh syndrome.

1.3 Beyond Mendel

  • Penetrance and expressivity. BRCA1 p.185delAG carries ~70% lifetime breast-cancer risk, not 100%; modifier loci and environment matter.
  • Pleiotropy. One gene → many phenotypes (e.g. CFTR → lung, pancreas, reproductive tract).
  • Epistasis. Allele at one locus masks another (Bateson 1907; modern CRISPR screens recover this systematically — Horlbeck 2018).
  • Imprinting. Parent-of-origin–specific expression: IGF2/H19, Prader-Willi vs Angelman (15q11-q13 deletion, Nicholls 1989).
  • Anticipation. Repeat-expansion disorders worsen across generations (Huntington, myotonic dystrophy, fragile X).

2. Population genetics

2.1 Hardy-Weinberg equilibrium (Hardy 1908, Weinberg 1908)

For a biallelic locus with frequencies p and q = 1 − p, expected genotype frequencies under random mating, no selection / mutation / migration / drift in an infinite population:

  • p² (homozygous reference)
  • 2pq (heterozygous)
  • q² (homozygous variant)

Deviations diagnose: assortative mating, inbreeding (F-statistic, Wright 1922-65), population structure (Wahlund effect), selection, or genotyping error. Routine QC filter for GWAS imputation panels: exclude variants with HWE p < 1e-6 in controls.

2.2 Forces

  • Mutation. Per-generation per-bp rate μ ≈ 1.2 × 10⁻⁸ (Kong 2012, trio-sequencing in deCODE Iceland cohort); ~70 de novo SNVs per zygote, paternally biased (∝ paternal age).
  • Genetic drift. Sampling noise; effective population size N_e ≈ 10⁴ for humans out-of-Africa; African N_e several-fold larger.
  • Selection. Fitness coefficient s; classical equations dq/dt = -s·q²·p for recessive lethal. Balancing selection: sickle-cell HbS allele maintained at q ≈ 0.1 in malaria-endemic regions (Allison 1954).
  • Migration / gene flow. Quantified by F_ST (Wright); 1000 Genomes Project (2008-15) and gnomAD v4 (2024, 807 K exomes + 76 K genomes, Karczewski) provide reference allele-frequency panels.
  • Recombination. Genome-wide map ~1.2 cM/Mb mean; hotspots controlled by PRDM9 (Myers 2008-10).

2.3 Coalescent theory (Kingman 1982)

Backward-in-time model: expected time to most recent common ancestor (MRCA) for two lineages is 2·N_e generations. Underlies inference engines: PSMC (Li and Durbin 2011), SMC++ (Terhorst 2017), Relate (Speidel 2019), and 2024’s tsinfer/tsdate on tree-sequence representations (Kelleher 2018, Wong 2024).


3. Linkage, LD, recombination

  • Genetic linkage. Loci on the same chromosome co-segregate; recombination fraction θ ≤ 0.5; LOD score > 3 considered significant (Morton 1955).
  • Linkage disequilibrium (LD). Non-random allelic association; measures: D, D′, r². Decays with distance ∝ 1/(recombination rate × generations). LD blocks ~5-50 kb in Europeans, ~5 kb in Africans (HapMap 2005, 1000 G).
  • Recombination maps. deCODE 2019 high-resolution map (Halldorsson), used by IMPUTE5, Eagle2, Beagle 5.4 for phasing and imputation.

4. DNA sequencing technologies

4.1 Short-read (Illumina dominance)

  • Illumina NovaSeq X / X Plus (2023, shipping 2024-25). Sequencing by synthesis with reversible-terminator chemistry (Bentley 2008). 1.6 Tb / run on 25 B paired-end reads; per-30× human genome cost ~ USD 200 list. The XLEAP-SBS chemistry (2023) doubles cycle speed.
  • Illumina NextSeq 1000/2000, MiSeq i100 (2024 refresh). Smaller-scale workhorses for amplicon, exome, panel work.

4.2 Short-read challengers

  • Ultima UG 100 (Ultima Genomics, Almogy 2022). Mostly natural-nucleotide SBS on circular silicon wafer; published USD 100 / genome at 80 × depth in Nature 2024. Adopted by Broad Institute, Regeneron.
  • Element Biosciences AVITI (Arslan 2022, AVITI24 launch 2024). Avidity base chemistry decouples ID from incorporation; high accuracy and lower capex than NovaSeq for mid-throughput labs.
  • MGI DNBSEQ-T7 / T20 (BGI/Complete Genomics rolling circle replication, combinatorial probe-anchor synthesis, Drmanac 2010). Dominant in PRC, growing share in EU/UK 2024-26.
  • Singular Genomics G4X (2024). Spatial multiomics platform.

4.3 Long-read

  • PacBio Revio (2023, expanded 2024 with SPRQ chemistry). HiFi circular-consensus reads, 15-25 kb, Q30+ accuracy. 1300 HiFi human genomes per year per box. Used by All of Us long-read sub-cohort and HPRC (Human Pangenome Reference Consortium, Liao 2023).
  • Oxford Nanopore PromethION 2 Solo / P24 / P48. Single-molecule nanopore current sensing (Clarke 2009; Jain 2018). Read N50 routinely

    50 kb; ultra-long protocols reach > 1 Mb. R10.4.1 + Dorado v0.7 / Remora 2024 calls modified bases (5mC, 5hmC, 6mA) directly without bisulfite. Adaptive sampling (Read Until, Loose 2016; deployed widely 2023-25) enriches target loci in real time.

4.4 Reference genomes

  • GRCh38 (Genome Reference Consortium, 2013, patches through GRCh38.p14 2022). Still the production reference for most clinical pipelines.
  • T2T-CHM13 (Telomere-to-Telomere Consortium, Nurk 2022). First gapless haploid human assembly using PacBio HiFi + ONT ultralong; adds 200 Mb of previously unresolved sequence, including full centromeric satellite arrays, complete rDNA arrays, segmental duplications.
  • Human Pangenome Reference (HPRC, Liao 2023, Nature). 47 diverse diploid assemblies producing a reference graph; minigraph-cactus representation (Hickey 2024). Migration to pangenome-aware variant calling is the major 2024-26 inflection (vg, PanGenie, GraphTyper2).

4.5 Alignment / variant calling stack 2024-26

  • Aligners. BWA-MEM2 (Li 2019), minimap2 (Li 2018), DRAGEN-OS (Illumina ASIC, open-sourced 2023), Strobealign (Sahlin 2022), vg giraffe (Sirén 2021), Parabricks GPU re-implementations (Nvidia 2024).
  • Short variants. GATK HaplotypeCaller (DePristo 2011), DeepVariant (Poplin 2018), Clair3 (Zheng 2022 for ONT), Octopus, NVIDIA Parabricks DeepVariant 4.4 (2024) on H100/B200.
  • Structural variants. Manta, Delly, GRIDSS2 for short reads; Sniffles2, cuteSV, SVision-pro for long reads. Pangenome SV genotypers: PanGenie (Ebler 2022), VG.

5. Variant taxonomy

ClassSizeDetection
SNV (single-nucleotide variant)1 bpShort-read SBS at ≥ 30×
Indel1-50 bpHaplotypeCaller, DeepVariant
MNV2-10 bpDeepVariant phases; gnomAD reports MNV catalogue
SV (structural)> 50 bp inv/del/dup/ins/translocLong-read or paired-end + split-read
CNV (copy-number)exon → MbRead-depth or array CGH
Repeat expansiontens to thousands of repeatsExpansionHunter, STRique, ONT direct
Mobile-element insertionfull L1/Alu/SVAMELT, xTea
Tandem repeats / VNTRvariableTRGT (PacBio 2024), Vamos

ACMG/AMP variant interpretation (Richards 2015, updated by ClinGen Sequence Variant Interpretation working group 2023-25): pathogenic / likely pathogenic / VUS / likely benign / benign with PVS1-PP5 / BA1-BP7 evidence codes. ClinVar (NCBI 2013-) is the public repository; DECIPHER, LOVD, HGMD Pro for specialist content.


6. Functional genomics

6.1 Transcriptomics

  • RNA-seq (Mortazavi 2008). Illumina-based by default; salmon (Patro 2017), kallisto bustools (Melsted 2021), STAR (Dobin 2013). DESeq2 (Love 2014), edgeR (Robinson 2010) for differential expression.
  • Long-read RNA. Iso-Seq (PacBio), direct RNA on ONT (modified-base preserving), used to resolve full-length isoforms.
  • Spike-in normalization. ERCC controls (Jiang 2011).

6.2 Chromatin and epigenomics

  • ChIP-seq. Antibody pulldown for TF or histone mark → sequencing (Johnson 2007). Standardized by ENCODE (2003-25) and Roadmap Epigenomics.
  • CUT&RUN / CUT&Tag (Skene 2017, Kaya-Okur 2019). Tethered MNase / Tn5 cut adjacent to bound antibody; needs 100-1000× fewer cells than ChIP-seq; standard for low-input applications by 2024.
  • ATAC-seq (Buenrostro 2013). Tn5 hyperactive transposase inserts adapters into open chromatin; omni-ATAC and Fast-ATAC reduce mitochondrial contamination.
  • Hi-C / Micro-C. 3D genome contacts (Lieberman-Aiden 2009; Krietenstein 2020 Micro-C); TADs, A/B compartments, loops; Pore-C (ONT, 2020).
  • DNA methylation.
    • Bisulfite sequencing (Frommer 1992) — gold standard, partly retired.
    • EM-seq (NEB 2019) — enzymatic, less degradation.
    • Long-read native methylation via ONT (Remora 2024) or PacBio HiFi (Tse 2021) detects 5mC/6mA/5hmC at single-molecule, single-base resolution.

6.3 Perturbation screens

  • CRISPR pooled screens. GeCKO (Shalem 2014), Brunello (Doench 2016), TKOv3, dCas9-KRAB CRISPRi (Gilbert 2014), CRISPRa SAM (Konermann 2014), Perturb-seq (Dixit 2016), CROP-seq (Datlinger 2017), Tap-seq (Schraivogel 2020). Genome-wide essentiality maps in DepMap (Broad, 2024 release 24Q4 covers > 1100 cell lines).
  • Saturation mutagenesis / DMS. Deep mutational scanning of every variant in a protein (Fowler 2010-14); MAVE-GS standardized format 2024.

7. GWAS and polygenic scoring

7.1 Methodology

  • Cohort + array genotyping + imputation (HRC 2016, TOPMed 2021, 1000G 30× 2022) + linear/logistic regression per variant + genomic-control λ + LD Score Regression (Bulik-Sullivan 2015) for confounding.
  • Mixed-model engines: BOLT-LMM (Loh 2015), SAIGE (Zhou 2018), REGENIE (Mbatchou 2021 — UK Biobank/Regeneron production tool), fastGWA (Jiang 2019).
  • Multi-ancestry meta-analysis: METAL, MR-MEGA; cross-population PRS: PRS-CSx (Ruan 2022), BridgePRS (Hoggart 2024).
  • Conditional analyses (COJO), fine-mapping (SuSiE, FINEMAP, CARMA), TWAS (FUSION, S-PrediXcan), coloc.

7.2 Reference biobanks (2024-26)

  • UK Biobank (Bycroft 2018). 500 K participants; WES (450 K, 2023) + WGS (500 K released March 2024 by Regeneron / DNAnexus); imaging sub-cohort 100 K; proteomics (Olink 2.9 K) on full cohort.
  • All of Us Research Program (NIH, 2018-). > 750 K enrolled, > 300 K short-read WGS released through Researcher Workbench by 2024; long-read sub-cohort scaled in 2025.
  • FinnGen (Kurki 2023). ~520 K Finns with EHR linkage; powerful for Finnish-enriched rare variants.
  • deCODE Genetics / Iceland. Pop-scale family WGS; gold standard for de novo mutation studies.
  • BioBank Japan, China Kadoorie, Million Veteran Program (USA, > 1.1 M), Estonian Biobank, Generation Scotland.

7.3 Polygenic risk scores (PRS)

  • Sum_i β_i · dosage_i over GWAS-discovered variants, with shrinkage: LDpred2 (Privé 2020), PRS-CS (Ge 2019), SBayesR (Lloyd-Jones 2019), MegaPRS (Zhang 2021).
  • Clinical-grade PGS for coronary-artery disease (Khera 2018; recalibrated Aragam 2022), type-2 diabetes, breast cancer (CanRisk / BOADICEA v6), prostate cancer.
  • 2024-25: NHS England Generation Study + Genomics England PRS pilots; US “Genomic Information Risk Adjustment” debate.
  • Ancestry-portability gap: scores trained in EUR explain 2-5× less in non-EUR ancestries (Martin 2019). Multi-ancestry methods + diverse cohorts narrowing but not closing the gap by 2026.

8. Mendelian and rare-disease diagnostics

  • Whole-exome sequencing (WES). Standard first-tier diagnostic for suspected Mendelian disease; ~40% diagnostic yield in paediatric cases.
  • Whole-genome sequencing (WGS). Higher uniformity, captures non-coding and SV; Genomics England 100 000 Genomes (Caulfield 2022) and NHS Genomic Medicine Service (2018-).
  • Rapid trio WGS. Rady Children’s NSIGHT2 — 13.5 h time-to-result with Illumina NovaSeq + DRAGEN, demonstrated through 2024.
  • Long-read clinical. Resolves repeat expansions (RFC1 CANVAS, FGF14 SCA27B Pellerin 2023), structural rearrangements, and methylation imprinting disorders in a single test.
  • HPO (Human Phenotype Ontology, Köhler 2008-25) for phenotype-driven prioritization. Tools: Exomiser (Smedley), AMELIE, LIRICAL, GenIE-Sys, AI-assisted: Face2Gene (FDNA), GestaltMatcher (Hsieh 2022), and 2024 Bertinetto-style LLM-as-phenotype-encoder.

9. Cancer genomics

9.1 Driver discovery and consortia

  • TCGA (The Cancer Genome Atlas, NCI / NHGRI, 2006-18). ~11 K tumours, 33 cancer types, multi-omic. PanCanAtlas (Ding 2018) capstone.
  • ICGC / ICGC-ARGO (International, 2007-). Whole-genome focus; PCAWG (Pan-Cancer Analysis of Whole Genomes, Campbell 2020) on 2 658 tumours.
  • Genomics England 100 000 Genomes (2013-). Tumour-normal pairs in NHS Cancer Programme; > 25 000 cancer WGS reportable.
  • AACR Project GENIE (2017-). Clinico-genomic registry; release 16 (2024) covers > 250 K tumours from 21 institutions. Public via Sage Bionetworks Synapse.
  • MSK-IMPACT (Memorial Sloan Kettering, Cheng 2015). > 100 K prospectively sequenced tumours by 2024; FDA-cleared.
  • COSMIC (Sanger Institute, Tate 2019-25). Catalogue of Somatic Mutations in Cancer; > 30 M coding variants.
  • Hartwig Medical Foundation (NL). Metastatic-cohort WGS, > 6 000 pan-cancer WGS publicly accessible.

9.2 Mutational signatures

  • COSMIC Signatures (Alexandrov 2013; v3.3, 2022; v3.4, 2024). SBS, DBS, ID, CN, SV signatures. Drivers: APOBEC (SBS2/13), MMR deficiency (SBS6, 15, 26), POL-ε exonuclease (SBS10a/b), UV (SBS7), tobacco (SBS4), aristolochic acid (SBS22), platinum (SBS31, SBS35).
  • HRD signature (BRCA1/2 deficient) and CHORD / HRDetect classifier (Davies 2017) determine PARP-inhibitor eligibility.

9.3 Clinically actionable

  • Targeted therapies. EGFR (osimertinib in EGFRm NSCLC), BRAF p.V600E (dabrafenib + trametinib), KRAS p.G12C (sotorasib, adagrasib), ALK fusions (alectinib, lorlatinib), NTRK fusions (larotrectinib, entrectinib), RET fusions (selpercatinib), KIT (imatinib).
  • Immune checkpoint biomarkers. TMB ≥ 10 mut/Mb (FoundationOne CDx), MSI-H / dMMR (pembrolizumab tumour-agnostic, 2017).
  • HRD + PARP inhibitors. BRCA1/2, olaparib, talazoparib, niraparib.
  • Liquid biopsy / MRD. Signatera (Natera), Guardant Reveal, Foresight CLARITY, NeXT Personal (Personalis). ctDNA-MRD predicts colorectal recurrence months before imaging (DYNAMIC, Tie 2022).

10. Pharmacogenomics (PGx)

10.1 Drug-gene pairs with actionable guidelines (CPIC, DPWG)

  • CYP2C19 — clopidogrel. Loss-of-function alleles *2, *3 → poor metabolizer → reduced active metabolite → MACE; CYP2C19-guided P2Y12 inhibitor choice in POPular Genetics (2019) and TAILOR-PCI (2020).
  • CYP2C9 + VKORC1 — warfarin. Combined dosing algorithm (IWPC 2009); recent push toward DOACs reduces relevance.
  • CYP2D6 — codeine, tramadol, tamoxifen. Ultra-rapid metabolisers (gene duplications) at risk of morphine overdose; poor metabolisers get inadequate analgesia and reduced endoxifen.
  • TPMT, NUDT15 — thiopurines. Azathioprine, 6-MP myelosuppression.
  • DPYD — fluoropyrimidines. 5-FU, capecitabine; EMA mandatory pre-test since 2020; CPIC v3 2024.
  • UGT1A1 — irinotecan. *28/*28 → reduced glucuronidation, neutropenia.
  • HLA-B*57:01 — abacavir. Hypersensitivity (Mallal 2008).
  • HLA-B*15:02 — carbamazepine. Stevens-Johnson syndrome in Asians.
  • HLA-A*31:01 — carbamazepine. SJS/TEN in Europeans.
  • SLCO1B1 — simvastatin. Myopathy with *5/*5.
  • G6PD — rasburicase, primaquine. Haemolysis risk.

10.2 Resources

  • CPIC (Clinical Pharmacogenetics Implementation Consortium). Free guidelines; > 25 drug-gene pairs as of 2025.
  • PharmGKB. Curated annotations (Stanford / NIH).
  • PharmCAT. Open-source clinical decision support tool that consumes VCF and outputs CPIC recommendations.
  • DPWG (Dutch Pharmacogenetics Working Group). Parallel EU guidelines.

11. Single-cell and spatial genomics

11.1 Single-cell

  • scRNA-seq platforms. 10× Genomics Chromium X / Flex / GEM-X (2024), BD Rhapsody, Parse Biosciences Evercode (combinatorial split-pool, no microfluidics — full sci-RNA / SPLiT-seq lineage, Cao 2017 / Rosenberg 2018), Scale Biosciences ScalePlex.
  • Multi-omic single-cell. 10× Multiome (snRNA + snATAC), CITE-seq for surface proteins (Stoeckius 2017), ECCITE-seq for CRISPR + protein, G&T-seq, scNMT for methylation.
  • Cell atlases. Human Cell Atlas (Regev / Teichmann 2017-), Tabula Sapiens v2 (2024), CZ CELLxGENE (Chan Zuckerberg, > 100 M cells by 2025).

11.2 Spatial

  • Sequencing-based. 10× Visium HD (2024, 2 μm bins, near-single-cell); Stereo-seq (BGI, 500 nm DNB array, Chen 2022); Slide-seqV2 (Stickels 2021); Curio Seeker; Open-ST (2024).
  • Imaging-based. 10× Xenium (transcript imaging, 5 K-plex panels by 2025); Vizgen MERSCOPE Ultra; NanoString CosMx SMI WTX (whole-transcriptome 6 K, 2024); Resolve Biosciences Molecular Cartography; Akoya PhenoCycler / CODEX (proteins).
  • Spatial proteomics. IMC (Hyperion, Standard BioTools), MIBI-TOF (Ionpath), CODEX, GeoMx DSP.

11.3 Analysis stack

  • Scanpy / AnnData (Wolf 2018), Seurat v5 (Hao 2024), Bioconductor SCE, scvi-tools (Lopez 2018-25), Squidpy + STAlign, Giotto, CellCharter, scGPT / Geneformer / SCimilarity / UCE / scFoundation (single-cell foundation models, 2023-25).

12. Synthetic biology and genome engineering

12.1 DNA assembly

  • Gibson assembly (Gibson 2009). Exonuclease + polymerase + ligase one-pot; standard for medium-throughput cloning.
  • Golden Gate (Engler 2008-09). Type IIS restriction enzyme (BsaI, BbsI, SapI) for scarless modular assembly; MoClo, GoldenBraid hierarchical toolkits.
  • NEBuilder HiFi, In-Fusion (Takara), uLoop, Twist gene synthesis, IDT gBlocks / eBlocks. Commercial workhorses by 2024.
  • DNA synthesis. Twist Bioscience silicon-array oligo + assembly, GenScript, IDT Affinity Plus, Ansa Biotechnologies (enzymatic DNA synthesis, EDS), DNA Script (EDS desktop, 2024), Molecular Assemblies enzymatic, Camena Bioscience.

12.2 Genetic circuits

  • Boolean logic gates. Voigt lab Cello compiler (Nielsen 2016) composes NOR/NOT gates from repressor libraries; Cello 2.0 (2020-23).
  • Toggle switches. Gardner / Collins 2000 — bistable.
  • Repressilator. Elowitz / Leibler 2000 — oscillator.
  • Riboregulators / toehold switches. Green and Yin 2014; deployed for ZIKV / SARS-CoV-2 diagnostics.
  • Recombinase memory. Bonnet / Endy / Smolke; integrase-based state machines.

12.3 Genome editing

  • ZFNs, TALENs, meganucleases — predecessors (1996-2010).
  • CRISPR-Cas9 (Jinek 2012, Cong 2013, Mali 2013). Wild-type Cas9 from Streptococcus pyogenes; HiFi-Cas9, SpRY (PAM-relaxed, Walton 2020), eSpCas9, evoCas9.
  • Other Cas families. Cas12a/Cpf1 (Zetsche 2015) — staggered cuts, T-rich PAM; Cas13 (Abudayyeh 2016) — RNA targeting; CasX/Cas12e (Liu 2019); IscB/TnpB miniature (Karvelis 2021, Han 2024).
  • Base editing (Komor 2016, Gaudelli 2017). Cytosine BE (CBE), adenine BE (ABE); evolved variants ABE8.20, ABE8e; clinical: Beam Therapeutics BEAM-101 (HbF reactivation for SCD, BHD trial enrolling 2024-25), BEAM-201 (CD7+ T-cell leukaemia, 2024), Verve VERVE-102 (in-vivo ABE of PCSK9 via LNP delivery; HeartII Phase 1b dosing through 2025).
  • Prime editing (Anzalone 2019). Cas9 nickase + reverse transcriptase
    • pegRNA. PE7 / PE8 (Doman / Liu 2024) push efficiency. Prime Medicine PM359 (chronic granulomatous disease, CYBB, IND clear 2023, Phase 1 ex-vivo dosing 2024-25); PM577 cystic fibrosis (preclinical 2025).
  • Twin-prime, GRAND, PASTE. Integration of multi-kb payloads via prime-editing-driven recombinase landing pads (Anzalone 2021, Yarnall 2023). PASTE: Cas9-RT + Bxb1 integrase.
  • Epigenome editing. dCas9-VP64 / KRAB / DNMT3A / TET1 (Tycko 2024 — durable transcriptional silencing via combination repressor domains); Tune Therapeutics TUNE-401 (HBV, IND 2024); Chroma Medicine (epigenetic-editing platform, 2024 series C).

12.4 Foundry & DBTL companies

  • Ginkgo Bioworks (2008-). Organism-design foundry; 2024 push into AI models with Google Cloud and Codebase Discovery program.
  • Inscripta Onyx (CRISPR-EZ massively-parallel genome engineering). Restructured / acquired 2023-24.
  • Asimov, Asimica, Octant Bio, LanzaTech, Zymergen (acquired Ginkgo 2022), Strateos, Codon Devices (historical).

13. Approved and late-stage gene therapies

13.1 In-vivo AAV

  • Luxturna (voretigene neparvovec, Spark / Roche 2017). AAV2-RPE65 for Leber congenital amaurosis.
  • Zolgensma (onasemnogene abeparvovec, Novartis 2019). AAV9-SMN1 for spinal muscular atrophy; single-dose IV in infants.
  • Hemgenix (etranacogene dezaparvovec, uniQure / CSL Behring 2022). AAV5-FIX-Padua for haemophilia B; > USD 3.5 M list price.
  • Roctavian (valoctocogene roxaparvovec, BioMarin 2022 EU / 2023 US). AAV5-FVIII for haemophilia A.
  • Elevidys (delandistrogene moxeparvovec, Sarepta 2023, expanded label 2024). AAVrh74-microdystrophin for DMD; 2024 confirmatory data EMBARK; FDA expanded approval to ambulatory and non-ambulatory boys ≥ 4.
  • Beqvez (fidanacogene elaparvovec, Pfizer 2024). AAV-FIX for haem B.

13.2 Ex-vivo cell + gene

  • Casgevy (exagamglogene autotemcel, Vertex / CRISPR Therapeutics, Dec 2023 UK MHRA / FDA, EMA 2024). BCL11A enhancer disruption in autologous HSPCs to reactivate fetal haemoglobin; sickle cell disease and transfusion-dependent β-thalassaemia. First CRISPR-edited approved medicine. ~50 dosed sites worldwide by 2026 Q1.
  • Lyfgenia (lovotibeglogene autotemcel, bluebird bio 2023). LV-HbA-T87Q lentiviral; SCD.
  • Skysona (elivaldogene autotemcel, bluebird 2022). ABCD1 cerebral ALD.
  • Zynteglo (betibeglogene autotemcel, bluebird 2022). β-thalassaemia.

13.3 In-vivo editing — clinical 2024-26

  • Intellia NTLA-2001 (CRISPR-Cas9 of TTR, LNP). ATTR amyloidosis; Phase 3 MAGNITUDE enrolling 2024; durable knockdown > 90% for 24+ months reported 2024.
  • Intellia NTLA-2002 (KLKB1 knockout). Hereditary angioedema; Phase 3 HAELO 2024-26.
  • Verve VERVE-102 / -201 (ABE of PCSK9, ANGPTL3; GalNAc-LNP). HeFH / homozygous FH / refractory ASCVD; 2024-25 dosing.
  • Beam BEAM-302 (in-vivo ABE of SERPINA1). Alpha-1 antitrypsin deficiency, 2024 IND.
  • Prime PM359 (ex-vivo prime editing CGD). First prime-edited human dosed 2024 H2.
  • Editas EDIT-301 (renamed reni-cel, 2024). Casgevy competitor for SCD.
  • Caribou CB-010 (allogeneic anti-CD19 CAR-T with Cas12a edits).

13.4 RNA / mRNA / siRNA adjacencies

  • siRNA (GalNAc-conjugated). Patisiran, vutrisiran, inclisiran, givosiran, lumasiran (Alnylam); now > 5 approved drugs.
  • ASOs. Nusinersen (SMN2 splice modulator), tofersen (SOD1 ALS, Biogen / Ionis 2023).
  • mRNA therapeutics. Moderna mRNA-3927 (propionic acidaemia), mRNA-3705 (methylmalonic acidaemia); BioNTech / Genentech personalized neoantigen vaccines (autogene cevumeran, INT) Phase 2 in pancreatic adenocarcinoma 2024 — Rojas 2023 Nature promising signal.

14. AI in genomics and biology (2024-26)

14.1 Protein and complex structure

  • AlphaFold-2 (Jumper, DeepMind, 2021). Trunk + Evoformer + structure module; reset structural biology.
  • AlphaFold-3 (Abramson et al., DeepMind / Isomorphic Labs, May 2024 in Nature). Diffusion-based all-atom model that predicts proteins, nucleic acids, ligands, ions, and PTMs in a single pass. Server-only initially; weights and inference code released November 2024 for non-commercial use.
  • RoseTTAFold-AllAtom, RoseTTAFold2NA, RFdiffusion (Baker lab, 2023-24). Open-source structure and design family.
  • Boltz-1 (Wohlwend, MIT / Recursion, 2024). Open replication of AlphaFold-3-class quality, fully open weights and license — the reference open AF3-equivalent at end of 2024. Boltz-2 (2025) adds affinity and confidence improvements.
  • Chai-1 (Chai Discovery, 2024). Commercial AF3-class with API.
  • ESMFold (Lin / Meta, 2022). Sequence-only LM-based.
  • OpenFold (Ahdritz 2024). Open AF2 retrain.

14.2 Protein language models

  • ESM-2 (Lin 2022) → ESM-3 (Hayes et al., EvolutionaryScale, Science 2025; preprint 2024). Frontier protein generative model trained on sequence + structure + function tokens; demonstrated design of a novel fluorescent protein 500 M years of evolution distant from any natural GFP.
  • ProGen2 (Salesforce 2022) → ProGen3.
  • xTrimoPGLM (BioMap 2024). 100-B parameter protein LM.
  • ProtTrans family, Ankh, SaProt (structure-aware tokens, Su 2024).

14.3 Protein design

  • RFdiffusion / RFdiffusion-AA (Watson 2023 Nature, Krishna 2024). De novo backbone generation by score-based diffusion of frames.
  • AlphaProteo (DeepMind, September 2024). Designs novel binders to user-specified protein targets with 3-300× higher binding affinity than best published baselines; ~10-90% success rate across 7 evaluated targets.
  • Chroma (Generate Biomedicines, Ingraham 2023 Nature). Programmable diffusion model conditioned on symmetry, shape, and natural-language prompts. ChromaProtein API in 2024.
  • Generate Biomedicines (2024-25). Clinical-stage AI-designed biologics, GB-0669 (RSV) and partnered programmes with Amgen.
  • Profluent ProGen-style + OpenCRISPR-1 (Ruffolo / Madani et al., 2024). First open-weight AI-designed CRISPR-Cas9 family nuclease; functional editing of human cells with novel sequence distant from any natural Cas9.
  • EvolutionaryScale, Cradle, Diamond, Latent Labs, Cyrus Bio, BigHat.

14.4 Genomic foundation models

  • Evo (Nguyen, Arc Institute / TogetherAI, 2024 Science). 7 B-param StripedHyena trained on 2.7 M prokaryotic and phage genomes at single- nucleotide resolution; zero-shot fitness prediction and DNA generation (synthesized full-length CRISPR-Cas and IS200/IS605 systems).
  • Evo-2 (Arc / Nvidia, 2025). 40 B parameters, 1 M-token context, trained on eukaryotic + prokaryotic at all-domain scale. Released open weights.
  • HyenaDNA (Nguyen 2023). Long-context DNA LM.
  • Nucleotide Transformer (InstaDeep / NVIDIA 2023-24). Up to 2.5 B parameters, multi-species.
  • DNABERT-2, GENA-LM, Caduceus (bidirectional Mamba), Mistral-DNA.
  • Borzoi (Calico / Linder 2024). Sequence-to-expression with 524 kb context, supersedes Enformer (Avsec 2021) on RNA-seq tracks.
  • scGPT (Cui / Wang 2024 Nature Methods), Geneformer (Theodoris 2023 Nature), SCimilarity (Heimberg 2024), UCE (Universal Cell Embedding, Rosen 2024), scFoundation (Hao 2024), CellPLM, GeneCompass. Foundation models trained on tens of millions of cells.

14.5 Where the boundary is in 2026

  • AI protein design has crossed the threshold for routine binder generation and small-protein de-novo; structure prediction is solved enough to be the default starting point but ligand-bound dynamics, allostery, membrane embedding, and large multimers remain hard.
  • Sequence-to-function for non-coding genome remains coarse. Single-cell perturbation foundation models (Perturb-seq + scFM) are the active frontier.
  • Generative DNA models (Evo-2, Caduceus) can write functional CRISPR-Cas cassettes but not yet full eukaryotic regulatory programs.
  • Foundation-model–assisted CRISPR design tools (DeepCRISPR, CRISPRon, CRISPRoff, plus Profluent OpenCRISPR-1) are increasingly preferred to classical Doench / Azimuth.

15. Workflow patterns (production 2024-26)

  • Pipelines. nf-core (Ewels 2020-25), Snakemake, WDL on Terra / DNAnexus / Seven Bridges Cancer Genomics Cloud, NVIDIA Parabricks 4.x on H100/B200.
  • Containerisation. Docker, Singularity / Apptainer (HPC).
  • Cloud genomics. Google Cloud Life Sciences (deprecated 2025 in favour of Batch + GKE), AWS HealthOmics (2023 GA), Azure Genomics + Microsoft Open Health Data + Project Health Futures.
  • Variant interchange. VCF 4.5 (GA4GH), BCF; spec: gVCF, MAF, MAF lite; htsjdk, htslib (Bonfield 2021 cram 3.1), pysam.
  • Variant databases. ClinVar, gnomAD v4 (2024), Exome Aggregation Consortium (legacy), UK Biobank RAP, All of Us Researcher Workbench.

16. Bioethics, governance, equity

  • GINA (Genetic Information Non-discrimination Act, USA 2008). Covers health insurance and employment; not life/disability/long-term care.
  • GDPR + GA4GH frameworks. Cross-border genomic data sharing.
  • Heritable / germline editing. Currently subject to moratorium across most jurisdictions; the WHO Expert Advisory Committee (2021) and US NAS / NAM / Royal Society reports recommend against clinical use; the 2018 He Jiankui CCR5 case (twins Lulu and Nana) remains the cautionary precedent.
  • Ancestry bias. > 80% of GWAS participants of European ancestry in 2020; H3Africa, GAPP, PAGE, TOPMed Hispanic, GenomeAsia 100K efforts improve the gap; PRS portability still inferior in 2026.
  • Return of incidental findings. ACMG SF v3.2 (2023) — 81 actionable genes for adult or paediatric reporting.

17. Cross-references

  • cell-molecular-biology — the molecular substrate (DNA, RNA, protein, chromatin) on which genetics operates.
  • pharma-process-engineering — manufacturing AAV vectors, LNPs, plasmid, mRNA, and CGT cell-therapy products.
  • bioinstrumentation — sequencer optics, flow cells, nanopore arrays, microfluidics, mass spec, and lab-automation instrumentation.
  • transformer-architecture — the deep-learning backbone for AlphaFold-3, ESM-3, Evo, Boltz-1, scGPT, and protein/DNA foundation models.
  • probability-fundamentals — Hardy-Weinberg, drift, mixed-model GWAS, Bayesian fine-mapping, and PRS shrinkage all build on it.

18. Selected citations

  • Mendel G. Versuche über Pflanzenhybriden. Verh. Naturforsch. Ver. Brünn, 1866.
  • Hardy GH. Mendelian proportions in a mixed population. Science 1908.
  • Watson JD, Crick FH. Molecular structure of nucleic acids. Nature 1953.
  • Sanger F, Nicklen S, Coulson AR. DNA sequencing with chain-terminating inhibitors. PNAS 1977.
  • Mullis KB et al. PCR. Cold Spring Harb. Symp. 1986.
  • Lander ES et al. Initial sequencing of the human genome. Nature 2001.
  • Venter JC et al. The sequence of the human genome. Science 2001.
  • Jinek M, Charpentier E, Doudna JA. Science 2012 (CRISPR-Cas9).
  • Cong L, Zhang F. Science 2013. Mali P, Church GM. Science 2013.
  • Komor AC, Liu DR. Nature 2016 (cytosine BE). Gaudelli NM, Liu DR. Nature 2017 (adenine BE).
  • Anzalone AV, Liu DR. Nature 2019 (prime editing).
  • Nurk S et al. The complete sequence of a human genome. Science 2022.
  • Liao W-W et al. Human Pangenome Reference. Nature 2023.
  • Jumper J et al. AlphaFold-2. Nature 2021.
  • Abramson J et al. AlphaFold-3. Nature 2024.
  • Watson JL et al. RFdiffusion. Nature 2023.
  • Hayes T et al. ESM-3 (EvolutionaryScale). Science 2025.
  • Nguyen E et al. Evo. Science 2024.
  • Ingraham JB et al. Chroma. Nature 2023.
  • Frangoul H et al. Casgevy CLIMB SCD/TDT. NEJM 2021; Locatelli F et al. Long-term follow-up NEJM 2024.
  • Gillmore JD et al. Intellia NTLA-2001 ATTR. NEJM 2021; phase 2 long follow-up 2024.
  • Mendell JR et al. Elevidys EMBARK. Lancet 2024.
  • Karczewski KJ et al. gnomAD v4. Nature 2024.
  • Mbatchou J et al. REGENIE. Nat. Genet. 2021.
  • Cui H, Wang B et al. scGPT. Nat. Methods 2024.
  • Theodoris CV et al. Geneformer. Nature 2023.
  • Alexandrov LB et al. Mutational signatures v3.4. Nature 2020 / COSMIC release 2024.