Model Organisms & Sequencing Technology Catalog
The two reference tables every wet-lab and computational biologist returns to weekly: (1) which model organism, strain, and cell line is appropriate for a given question, and (2) which sequencing platform delivers the right read length / accuracy / price for a given assay. Plus the gene-editing and bioinformatics tooling layered on top.
Part 1 — Model organisms
Prokaryotes
| Organism | Strain | Genome size | Use | Note |
|---|---|---|---|---|
| Escherichia coli | K-12 MG1655 | 4.64 Mb | Reference E. coli; gene knockout (Keio collection) | Sequenced 1997 Blattner Science |
| E. coli | K-12 W3110 | 4.65 Mb | Industrial fermentation chassis | Less mutator than MG1655 |
| E. coli | DH5α (rec− endA−) | 4.6 Mb | Cloning; plasmid maintenance | Tetcyc, supE44, lacZΔM15 (blue-white) |
| E. coli | BL21(DE3) | 4.6 Mb | T7 polymerase under lac control; protein over-expression | Lon/OmpT protease deficient |
| E. coli | Rosetta, Origami, Tuner | various | rare-codon (Rosetta), disulfide-bond (Origami) variants of BL21 | Novagen / Merck lines |
| E. coli | NEB Stable, Stbl3 | low-recombination | unstable repeats (lentiviral LTRs) | NEB / Invitrogen |
| Bacillus subtilis | 168 | 4.2 Mb | Gram+ model; sporulation; secreted protein | First Gram+ sequenced 1997 |
| Mycobacterium tuberculosis | H37Rv | 4.4 Mb | TB pathogen | Sequenced 1998 Cole Nature |
| M. smegmatis | mc²155 | 7.0 Mb | Fast-growing surrogate for Mtb | BSL2 vs Mtb BSL3 |
| Streptomyces coelicolor | A3(2) | 8.7 Mb | Secondary metabolism (antibiotic) reference | Linear chromosome; >20 BGCs |
| Pseudomonas aeruginosa | PAO1 | 6.3 Mb | Opportunistic pathogen; biofilm | Sequenced 2000 |
| Caulobacter crescentus | NA1000 / CB15N | 4.0 Mb | Cell-cycle differentiation (stalk vs swarmer) | Asymmetric division model |
| Vibrio cholerae | N16961 | 4.0 Mb (2 chromosomes) | Quorum sensing; two-chromosome | Sequenced 2000 |
| Synechocystis sp. PCC 6803 | 3.6 Mb | Photosynthetic cyanobacterium | First cyanobacterium sequenced 1996 |
Fungi / yeasts
| Organism | Strain | Genome | Use | Sequenced |
|---|---|---|---|---|
| Saccharomyces cerevisiae | S288C | 12.1 Mb, 16 chromosomes, ~6000 genes | Genetic screens (synthetic lethal, SGA), eukaryote model | 1996 Goffeau Science — first eukaryote |
| S. cerevisiae | W303 | Classic genetics background | Multiple auxotrophies | |
| S. cerevisiae | BY4741 / BY4742 / BY4743 | Gene deletion collection (4900 mutants — Giaever 2002 Nature) | YKO library haploid + heterozygous diploid | |
| S. cerevisiae | EUROSCARF, Tn-seq collections | systematic | ||
| Schizosaccharomyces pombe | 972 h− | 12.6 Mb, 3 chromosomes | Fission yeast; G2/M cell cycle; CDC genes | Paul Nurse Nobel 2001 (Hartwell + Nurse + Hunt) |
| Pichia pastoris (now Komagataella phaffii) | GS115, X-33 | 9.4 Mb | Industrial recombinant protein (>1000 mg/L); methanol-induced AOX1 | Invitrogen Pichia kit standard |
| Candida albicans | SC5314 | 14.3 Mb (diploid) | Fungal pathogen | Sequenced 2004 |
| Aspergillus nidulans | FGSC A4 | 30 Mb | Filamentous fungus; sexual cycle | |
| Neurospora crassa | OR74A | 41 Mb | Circadian (frq), heterokaryon | Beadle-Tatum one-gene-one-enzyme 1958 Nobel |
| Cryptococcus neoformans | H99 | 19 Mb | Pathogenic yeast (encapsulated) |
Invertebrates
| Organism | Strain | Genome | Use | Note |
|---|---|---|---|---|
| Caenorhabditis elegans | N2 Bristol | 100 Mb, 6 chromosomes, ~20,000 genes | 959 somatic cells fully mapped (Sulston lineage); 302 neurons connectome | Brenner-Sulston-Horvitz Nobel 2002; first multicellular sequenced 1998 |
| C. elegans | Hawaiian CB4856 | Wild-isolate; RIL crosses | ||
| Drosophila melanogaster | Oregon-R, Canton-S | 144 Mb, 4 chromosomes | Genetics (Morgan Nobel 1933), Hox patterning (Lewis + Nüsslein-Volhard + Wieschaus Nobel 1995), GAL4-UAS, FLP-FRT | Bloomington Stock Center; Vienna VDRC |
| D. melanogaster | w¹¹¹⁸ | white-eyed background for transgenics | ||
| Aplysia californica | 1.8 Gb | Marine slug; learning + memory (sensitization, classical conditioning) | Kandel Nobel 2000; ~20k neurons large for ID | |
| Dictyostelium discoideum | AX2, AX4 | 34 Mb | Social amoeba; chemotaxis (cAMP); cell-aggregation | Multicellular life from unicellular origin |
| Tetrahymena thermophila | SB210 | 104 Mb (macronucleus) | Telomerase discovery (Greider-Blackburn 1985, Nobel 2009); ribozyme (Cech 1989 Nobel) | Two-nucleus ciliate |
| Paramecium tetraurelia | 72 Mb | Cilia, cortical inheritance | Whole-genome duplications (3) | |
| Hydra vulgaris | 1.3 Gb | Regeneration; stem cells | Cnidarian; immortal soma | |
| Planaria (Schmidtea mediterranea) | CIW4, S2F1L3F2 | 700 Mb (CIW4 asexual) | Whole-body regeneration; neoblasts | Sánchez Alvarado lab |
| Bombyx mori | p50T | 432 Mb | Silkworm; agricultural | |
| Anopheles gambiae | PEST | 273 Mb | Malaria vector | |
| Aedes aegypti | LVP_AGWG | 1.38 Gb | Dengue + Zika + chikungunya vector |
Plants
| Organism | Cultivar | Genome | Notes |
|---|---|---|---|
| Arabidopsis thaliana | Col-0 (Columbia-0) | 135 Mb, 5 chromosomes, 27,400 genes | Genome 2000 (first plant); ~6 wk life cycle; T-DNA insertion lines (SALK, GABI-Kat) |
| A. thaliana | Ler (Landsberg erecta) | Mapping crosses with Col-0 | |
| Rice (Oryza sativa) | ssp. japonica Nipponbare | 389 Mb, 12 chromosomes | First crop genome 2005 IRGSP |
| Rice | indica 9311 (Beijing Genomics 2002) | Heterosis studies | |
| Maize (Zea mays) | B73 | 2.3 Gb, 10 chromosomes | High repeat content; reference 2009; pan-genome 2020s |
| Sorghum bicolor | BTx623 | 730 Mb | C4 grass |
| Setaria italica | Yugu1 | 510 Mb | Foxtail millet |
| Tomato (Solanum lycopersicum) | Heinz 1706 | 900 Mb | Genome 2012; ripening (rin, nor mutants) |
| Soybean (Glycine max) | Williams 82 | 1.1 Gb | Legume reference; paleo-tetraploid |
| Brassica spp. | Various (B. napus DH12075, B. rapa Chiifu) | Polyploidy / WGD studies | |
| Tobacco (N. tabacum, N. benthamiana) | TN90, LAB | 3.8 Gb / 3.1 Gb | Transient expression (agroinfiltration) |
| Marchantia polymorpha | Tak-1 (male), Tak-2 (female) | 226 Mb | Basal land plant; simple Gemma propagation |
| Physcomitrium patens | Gransden 2004 | 480 Mb | Moss; gene targeting via HR |
| Chlamydomonas reinhardtii | CC-503, CC-1690 | 111 Mb | Green alga; flagella, photosynthesis |
| Selaginella moellendorffii | 213 Mb | Lycophyte; intermediate |
Fish / amphibians / reptiles
| Organism | Strain / line | Genome | Use |
|---|---|---|---|
| Zebrafish (Danio rerio) | AB, Tübingen (TU), TL (top long fin), WIK | 1.4 Gb, 25 chromosomes | Embryo transparent; ENU mutagenesis screens (Nüsslein-Volhard 1990s+); Tg lines; ZFIN database |
| Zebrafish casper line | roy−/− nacre−/− (mitfa−/−) | Transparent adult — White 2008 | |
| Medaka (Oryzias latipes) | HdrR (Hd-rR) | 700 Mb | Diploid fish alternative to zebrafish |
| Xenopus laevis | J strain | 3.1 Gb allotetraploid | Classical embryology; oocyte; large eggs (~1 mm) |
| Xenopus tropicalis | Nigerian, Ivory Coast | 1.7 Gb diploid | Modern Xenopus; faster generation |
| Axolotl (Ambystoma mexicanum) | wild-type, white d/d, GFP+ | 32 Gb | Limb + heart + spinal-cord regeneration |
| Salamander (Notophthalmus, Pleurodeles) | very large | Regeneration | |
| Anolis carolinensis | 1.8 Gb | Reptile model; eye development |
Birds
| Organism | Strain | Genome | Use |
|---|---|---|---|
| Chicken (Gallus gallus) | Red Junglefowl (UCD001), White Leghorn | 1.05 Gb, ~16,000 genes | Embryology window; immunology (B cell discovery; Bursa of Fabricius) |
| Quail (Coturnix japonica) | 1.0 Gb | Chick alternative; faster | |
| Zebra finch (Taeniopygia guttata) | 1.2 Gb | Vocal learning; song system |
Mammals
| Organism | Strain | Genome | Use | Note |
|---|---|---|---|---|
| Mouse (Mus musculus) | C57BL/6J | 2.7 Gb, 20 chromosomes | Reference inbred; ES cell-derived gene KO Capecchi-Smithies-Evans Nobel 2007; CRISPR | Jackson Labs JAX (Bar Harbor ME) |
| Mouse | C57BL/6N | NIH version; subtly different from JAX 6J | ||
| Mouse | BALB/c | Th2-skewed; allergy + tumor immunology | ||
| Mouse | 129 (129S1/Sv, 129S6, 129S4) | ES-cell donor strain | ||
| Mouse | FVB/N | Albino; pronuclear injection (large pronucleus); transgenic standard | ||
| Mouse | CD-1 (outbred), Swiss Webster | Toxicology, breeding | ||
| Mouse | NOD/SCID, NSG (NOD-scid-IL2Rγnull) | Humanized mouse, xenograft | ||
| Mouse | Black 6 + Cre/lox driver collection | tissue-specific KO (Albumin-Cre liver, Villin-Cre intestine, etc.) | ||
| Rat (Rattus norvegicus) | Sprague-Dawley | 2.9 Gb | Pharmacology, toxicology, behavior; bigger than mouse for surgery | |
| Rat | Wistar, Wistar-Kyoto (WKY) | |||
| Rat | Lewis (LEW), SHR (spontaneously hypertensive) | Hypertension model | ||
| Rat | F344 (Fischer) | NIH tox | ||
| Rabbit (Oryctolagus cuniculus) | New Zealand White, Dutch | 2.7 Gb | Atherosclerosis (Watanabe — LDLR), polyclonal antibody production | |
| Guinea pig (Cavia porcellus) | Hartley, Strain 13 | 2.7 Gb | Vitamin C requirement; vaccine + TB | |
| Pig (Sus scrofa) | Yucatan, Göttingen Minipig | 2.7 Gb | Cardiovascular, transplant; CRISPR PERV-knockout xenotransplant (eGenesis, Revivicor 2024 — Bartley kidney recipient) | |
| Sheep (Ovis aries) | Dolly (1996 — Roslin/Wilmut) | 2.7 Gb | First cloned mammal; SCNT | |
| Dog (Canis familiaris) | Beagle, Boxer (reference Tasha 2005) | 2.4 Gb | Cardiovascular tox, neuro | |
| Ferret (Mustela putorius furo) | 2.4 Gb | Influenza, respiratory pathogen | ||
| Rhesus macaque (Macaca mulatta) | Indian, Chinese origin | 2.9 Gb | Primate immunology, neurology, HIV/SIV | |
| Cynomolgus macaque (M. fascicularis) | 2.9 Gb | Tox + biologics last-stage preclinical | ||
| Marmoset (Callithrix jacchus) | 2.9 Gb | Primate CRISPR (CLARITY, optogenetics); 4-month maturity | ||
| Squirrel monkey (Saimiri) | NHP cheaper alternative |
Cell lines — human + mouse + rodent
| Line | Tissue / origin | Year | Note |
|---|---|---|---|
| HeLa | Cervical adenocarcinoma | 1951 | Henrietta Lacks; first immortal human; HPV-18 driver; Skloot 2010 book; >50 million metric tons total cultured to date; ATCC CCL-2 |
| HEK293 (293T, 293FT) | Human embryonic kidney (likely neuronal origin per Shaw 2002) | 1973 (Graham) | Transfection workhorse; lentivirus packaging |
| CHO (Chinese hamster ovary) | Ovary; CHO-K1, DG44 (DHFR−), CHO-S, ExpiCHO | 1957 Puck | Recombinant biologics — Activase (1986 first), Enbrel, Humira, EPO, insulin, mAbs; >70% biologic mass produced in CHO |
| Vero | African green monkey kidney | 1962 | Vaccine production (polio, Ebola, COVID-19 Sinovac CoronaVac) |
| MDCK | Madin-Darby canine kidney | 1958 | Influenza vaccine production; epithelial polarity model |
| Sf9 / Sf21 / High Five | Spodoptera frugiperda / Trichoplusia ni insect cells | 1980s | Baculovirus-insect expression (BEVS); Cervarix HPV vaccine; FluBlok |
| A549 | Lung adenocarcinoma | 1972 | NSCLC, drug screen, viral infection (SARS-CoV-2 needs ACE2 overexpression) |
| MCF-7 | Breast adenocarcinoma ER+ | 1973 | ER+ tamoxifen response |
| MDA-MB-231 | Breast TNBC | 1973 | Triple-negative breast cancer model |
| HepG2 | Hepatocellular carcinoma | 1979 | Hepatocyte / drug metabolism (limited CYP) |
| HuH-7 | HCC | 1982 | HCV replication (Lohmann-Bartenschlager 1999) |
| Hep3B | HCC | 1979 | HBV+ |
| U2OS | Osteosarcoma | 1964 | DNA-damage imaging (large flat cells) |
| U-87 MG | Glioblastoma | 1968 | GBM (note: reidentified 2016 as different patient than original) |
| HCT116, SW480, HT-29, Caco-2 | Colorectal | various | Caco-2 transepithelial transport |
| K562 | CML (Ph+) | 1975 | First human CML line; differentiation studies |
| Jurkat | T-ALL | 1976 | T cell receptor signaling |
| THP-1 | Acute monocytic leukemia | 1980 | Monocyte / macrophage differentiation (PMA) |
| U937 | Histiocytic lymphoma | 1976 | Macrophage model |
| RAW264.7 | Mouse macrophage (Abelson MLV-induced) | 1978 | NF-κB, TLR signaling |
| NIH/3T3 | Mouse embryonic fibroblast | 1962 | Transformation focus assay (Weinberg ras 1981) |
| 3T3-L1 | Mouse preadipocyte | 1974 | Adipogenesis (Green-Kehinde) |
| COS-7, COS-1 | African green monkey kidney + SV40 LT | 1981 | Transient transfection (SV40 ori) |
| BHK-21 | Baby hamster kidney | 1962 | Virology (FMDV vaccine), recombinant |
| PC-12 | Rat pheochromocytoma | 1976 | NGF-induced neurite differentiation |
| Neuro-2a | Mouse neuroblastoma | 1969 | Cheap neuronal model (caveat) |
| L929 | Mouse fibroblast | 1948 | First continuous mammalian cell line — Sanford |
iPSC lines (induced pluripotent stem cells — Yamanaka Nobel 2012):
- WiCell H1, H9 (hESC — Thomson 1998); used as comparator.
- ATCC, Coriell, EBiSC (European Bank for iPSC) repositories.
- iPSC reprogramming: OSKM (Oct4, Sox2, Klf4, c-Myc) — retro/lenti, episomal, mRNA, Sendai (CytoTune ThermoFisher).
- Organoids: Sato-Clevers 2009 intestinal; brain organoids (Lancaster 2013); kidney (Takasato 2015); liver, lung, pancreas, retina, cardiac.
Part 2 — Sequencing technologies
Sanger sequencing (1977)
- Chain-termination with ddNTPs (Frederick Sanger; Walter Gilbert chemical-degradation; Nobel 1980).
- Read length ~800 bp routine; ~1,000 bp max.
- Accuracy 99.9% (Q40).
- Capillary electrophoresis: ABI 377 1995 (slab gel) → ABI Prism 3100 → ABI 3730 / 3730xl (Applied Biosystems, 96 capillaries, since 2002).
- Throughput: 24-96 reads / 1-3 h.
- Cost 2024: ~3-5 outsourced (Genewiz Azenta, Eurofins).
- Used for: Sanger validation of variants (gold standard), plasmid verification, short PCR amplicon, MLST. Human Genome Project (1990-2003 public; Celera 2000 private) finished by Sanger — $2.7B / 13 years.
Next-generation sequencing (short-read NGS)
| Platform | Vendor | Year | Read length | Throughput | Cost/Gb (2024) | Note |
|---|---|---|---|---|---|---|
| 454 GS FLX | Roche (acq. 454 Life Sciences 2007) | 2005 first commercial NGS | 400-1000 bp | 0.7 Gb/run | discontinued 2016 | Pyrosequencing; PPi → light via luciferase; homopolymer errors |
| Solexa / Illumina SBS | Illumina (acq. Solexa 2007 $600M) | 2006 | 2x50 → 2x300 bp | varies by instrument | $5-15 | Sequencing-by-synthesis; reversible terminators; cluster amplification on flowcell |
| Illumina MiSeq | Illumina | 2011 | 2x300 bp | 15 Gb/run | $80/Gb | 16S amplicon, small genome |
| Illumina NextSeq 550 / 1000 / 2000 | Illumina | 2014 / 2020 | 2x150 bp | 30-360 Gb/run | $20-40 | Mid-throughput; exome, RNA-seq |
| Illumina HiSeq 2500 / 4000 / X Ten | Illumina | 2014 | 2x150 bp | 1.5 Tb/run | $7 | First $1000 genome (X Ten cluster 2014); discontinued 2021 |
| Illumina NovaSeq 6000 | Illumina | 2017 | 2x150 bp | 3 Tb/run (S4) | $5 | High-throughput workhorse 2017-2024 |
| Illumina NovaSeq X Plus | Illumina | 2023 commercial 2024 | 2x150 bp | 20 Tb/run | $2 | $200 per 30x WGS; XLEAP-SBS chemistry |
| Ion Torrent PGM / Proton / S5 / GeneStudio Genexus | Thermo Fisher (acq. Life Tech 2014) | 2010 | 200-600 bp | 5-15 Gb (S5) | $30-50 | H+ ion detection; semiconductor (no optics); homopolymer error remains |
| SOLiD 5500 | Applied Biosystems (Thermo) | 2007 | 50-75 bp | discontinued 2016 | — | Ligation-based; high accuracy but short reads |
| BGI / MGISEQ-2000 / DNBSEQ-G400 / T7 / T20 / T20+ | BGI / MGI Tech (acq. Complete Genomics 2013) | 2015+ | 100-200 bp | T20+: 72 Tb/run | $1-3 | DNB (DNA nanoball) rolling-circle; ~$100 30x genome on T20+; dominant in Asia |
| Element AVITI | Element Biosciences (2018, ex-Illumina founders) | 2023 | 2x150 bp | 1 Tb/run | $5-10 | Avidity sequencing; lower cost CapEx (985k) |
| Singular Genomics G4 | Singular Genomics | 2022 | 2x150 bp | 130 Gb/run | $20 | Mid-throughput |
| Ultima UG100 | Ultima Genomics | 2022 | 100-300 bp | 1 Tb/run | $1-2 | Wafer-based; $100 30x genome target |
| Roche SBX (Sequencing by Expansion) | Roche (technology Stratos Genomics acquired 2020) | launching 2024-2025 | aimed long single-molecule | — | target <$100/genome | ”Xpandomer” expansion |
| AstraZeneca / Polaris (former Avantium) | various | Long-read short-read hybrid emerging |
Long-read sequencing
| Platform | Vendor | Year | Read length typical | Read length max | Accuracy | Throughput | Cost/Gb |
|---|---|---|---|---|---|---|---|
| PacBio RS / RS II | Pacific Biosciences | 2011 | 5-15 kb | 60 kb | 87% raw, >99% CCS | low | high |
| PacBio Sequel | PacBio | 2015 | 10-30 kb | 100 kb | similar | 5-10 Gb/run | mid |
| PacBio Sequel II / IIe | PacBio | 2019 / 2020 | 15-25 kb HiFi | 100+ kb | 99.9% HiFi (Q30+ CCS) | 30 Gb HiFi / SMRT cell | $30-50 |
| PacBio Revio | PacBio | 2023 | 15-25 kb HiFi | 100+ kb | Q30+ HiFi | 90 Gb HiFi / SMRT cell × 4 cells/day = 360 Gb/day | $5-10 |
| PacBio Onso | PacBio | 2023 | 150-200 bp short-read | — | Q40 | 250 Gb/run | $10-15 |
| ONT MinION (Mk1B, Mk1C, Mk1D) | Oxford Nanopore | 2014 first commercial USB | 5-30 kb typical, 100+ kb | >4 Mb max recorded (Loose lab) | 95-99% R10.4.1 | 20-50 Gb/flowcell | 900) |
| ONT GridION | ONT | 2017 | as above; 5 flowcells | as above | as above | 250 Gb total | mid |
| ONT PromethION 24 / 48 / P2 / P2 Solo | ONT | 2018 / 2024 | as above | as above | as above | P48 = 14 Tb/run | $5-15 |
| ONT Flongle | ONT | 2019 | shorter run | 2-3 Gb adapter | low CapEx | ||
| Singular / Element / Ultima long-read variants | various | 2024-2025 | emerging |
ONT chemistry / flowcell: R9.4.1 (legacy 2018), R10.4.1 (2023; dominant; duplex sequencing pushes to Q30+). Dorado basecaller (Rust-based, GPU; supersedes Guppy and Albacore). Pore types: ONT uses E. coli CsgG nanopore engineered variants. Sample prep: ligation (LSK114), rapid (RAD), PCR-cDNA / dRNA, native RNA (RNA-002 / RNA-004 kit 2024).
Cost trajectory (sequencing) — NHGRI cost benchmarks
- 2001: $100M per genome (Sanger; HGP-era).
- 2007: $10M (early NGS).
- 2008: $300k (Solexa).
- 2010: $50k.
- 2014: 1000 genome”).
- 2020: $600 (NovaSeq 6000).
- 2024: 100 (MGI T20+, Ultima); <$100 (Roche SBX target).
- Moore’s law equivalent doubling — sequencing has outpaced Moore’s law since 2008.
Reduced-representation and targeted
- Exome capture: ~30-35 Mb (1-2% of genome) coding sequence + UTR. Hybridization probes: Agilent SureSelect Human All Exon V8 (35 Mb, 30 ng input); IDT xGen Exome (39 Mb); Twist Comprehensive Exome (35 Mb, often best uniformity). Indexed pooling 96-384 plex.
- Targeted gene panels: FoundationOne CDx (Foundation Medicine — Roche; 324 genes; FDA approved 2017 for solid tumor); Tempus xT (648 genes); Caris Molecular Intelligence; Guardant Health Guardant360 (74-gene cfDNA liquid biopsy); Natera Signatera (MRD); MSK-IMPACT (505 genes; CLIA Memorial Sloan Kettering).
- Cancer hereditary: Myriad myRisk; Color Genomics; Ambry; Invitae.
- Methylation:
- Whole-genome bisulfite (WGBS) — Lister 2009.
- Reduced-representation bisulfite (RRBS) — Meissner 2008.
- EPIC array v2 (Illumina Infinium MethylationEPIC v2.0) — 935k CpGs; 2023 release; replaces 450k + EPIC v1.
- oxBS-seq (5hmC vs 5mC discrimination); TAB-seq (Tet-assisted; selective for 5hmC).
- Long-read native methylation calling: ONT 5mC + 6mA + 5hmC via current; PacBio kinetic detection.
- RNA-seq:
- Bulk mRNA polyA+ (TruSeq mRNA, NEBNext Ultra II Directional).
- Total RNA / rRNA-depleted (TruSeq Stranded Total RNA with Ribo-Zero Gold; RiboCop).
- 3’ tag-seq (QuantSeq Lexogen; PrimeSeq; Drop-seq derivatives).
- Ribo-seq translating ribosomes (Ingolia-Weissman 2009).
- SLAM-seq metabolic labeling (Muhar-Zuber 2018; s4U-thiol-iodoacetamide).
- Direct RNA (ONT RNA-004).
- Chromatin / epigenome:
- ChIP-seq — classical; replicates (ENCODE standard); spike-in (Drosophila chromatin).
- CUT&RUN (Henikoff 2017); CUT&Tag (Kaya-Okur 2019) — pA-MNase / pA-Tn5; low input.
- ATAC-seq (Buenrostro-Greenleaf 2013) — Tn5 transposase tags accessible chromatin; 50,000-cell input.
- DNase-seq, MNase-seq (legacy).
- Hi-C (Lieberman-Aiden 2009) — 3D genome contact map; Micro-C (Hsieh-Rando 2020) — MNase digest; capture-Hi-C (promoter capture).
- HiChIP (Mumbach-Chang 2016) — Hi-C + ChIP combined.
- 4C, 5C (Capture-C 2018).
Single-cell sequencing
| Platform | Vendor | Chemistry | Output | Cells/sample | Year |
|---|---|---|---|---|---|
| 10x Genomics Chromium iX / X | 10x Genomics (founded 2012; IPO 2019 NASDAQ TXG; ~$620M revenue 2023) | GEMs (gel beads in emulsion) | Single Cell 3’ v3.1 / 5’ v3 / Multiome / VDJ / Immune Profiling | 500-10k routine, up to 65k/lane | Chromium 2017+ |
| 10x Visium | 10x | spatial barcoded slide | 55 µm spots; ~5k spots/capture area | tissue section 6.5x6.5 mm | 2019 |
| 10x Visium HD | 10x | 2 µm bins (subcellular) | continuous capture | tissue section | 2024 |
| 10x Xenium | 10x | in situ HISH | 300-5000 gene panels; subcellular | tissue | 2023 |
| BD Rhapsody | BD Biosciences (Cellular Research acq) | microwell | mRNA + AbSeq protein + sample tag | 100-40k cells/lane | 2018 |
| Mission Bio Tapestri | Mission Bio | microfluidic; DNA-focused | single-cell DNA panel + protein | 1000-10k | 2018 |
| Parse Biosciences | Parse Bio (founded 2018 ex-Seelig lab) | split-pool combinatorial barcoding (SPLiT-seq descendant) | Evercode 1M+ kit | up to 1M cells/run; multiplex 96-sample | 2020 |
| Standard BioTools (Fluidigm rebrand) | C1 (legacy) → Polaris → CyTOF (mass cytometry) | mass + scRNA | bridge | ||
| Bio-Rad ddSEQ | Bio-Rad + Illumina (partnership 2017 ended 2020) | droplet | mRNA | discontinued mostly | |
| Honeycomb HIVE | Honeycomb Biotechnologies | picowell | mRNA; sample-stable | shippable | 2022 |
Plate-based / low-throughput single-cell
- Smart-seq2 (Picelli-Sandberg 2014) — full-length cDNA; 96-well plate; high gene detection (~10k genes/cell); >$1/cell sequencing-only.
- Smart-seq3 (Hagemann-Jensen 2020) — UMI + 5’ tag + full-length; better isoform.
- SS-Seq, SCRB-seq, mCEL-Seq2 — UMI-based 3’-tag; cheaper.
- MARS-seq (Jaitin-Amit 2014) — robotic FACS-sorted into 384-well; immune scRNA.
Spatial transcriptomics
| Method | Vendor | Type | Resolution | Throughput | Reference |
|---|---|---|---|---|---|
| 10x Visium | 10x | barcoded capture (NGS) | 55 µm (Visium); 2 µm bin (Visium HD) | ~5k spots/section | Ståhl-Frisén 2016 ST → 10x acq |
| MERFISH | Vizgen MERSCOPE | multiplexed FISH | ~100 nm subcellular | 500-1000 gene panels | Chen-Zhuang 2015 |
| seqFISH+ | (academic) | multiplexed FISH | subcellular | 10k genes | Cai-Lab 2019 |
| CosMx SMI | Bruker (acq. NanoString 2024 $392M) | in situ chemistry | subcellular | 1k+ gene + 64-protein | Geiss 2022 |
| Slide-seq / Slide-seq2 | (academic; Beam Therapeutics acq IP partly) | bead-on-puck | 10 µm | tissue section | Rodriques-Macosko-Chen 2019; Stickels 2021 |
| Slide-tags | Macosko-Chen | barcoded tags on dissociated cells | tissue context preserved | 2024 | |
| Seq-Scope | (academic Yi-Choi 2021) | Illumina-flowcell barcoded capture | submicron | small areas | |
| Stereo-seq | BGI-StereoSeq | DNB-CID array | 500 nm | full-section | 2022 |
| Xenium | 10x | in situ | subcellular | 300-5000 genes | 2023 |
| CODEX (Akoya PhenoCycler) | Akoya Biosciences (PhenoCycler-Fusion) | multiplex IF cycling | cellular protein | 40+ markers | Goltsev-Nolan 2018 |
| MIBI-TOF | Standard BioTools (Stanford Angelo origin) | mass-tag IF + TOF-MS | subcellular | 40+ markers | Angelo-Nolan 2014 |
| IMC (Imaging Mass Cytometry) | Standard BioTools Hyperion | mass-tag IF + laser ablation | 1 µm | 40+ markers | 2014 |
Multi-omic single-cell
- 10x Multiome ATAC+GEX — joint chromatin + RNA from same cell; 2020.
- CITE-seq (Stoeckius-Smibert 2017) — RNA + cell-surface protein via oligo-tagged antibodies (TotalSeq from BioLegend, expanded to 200+ markers).
- TEA-seq (Mimitou-Smibert 2021) — ATAC + RNA + protein.
- DOGMA-seq — ATAC + RNA + protein joint.
- SHARE-seq (Ma-Buenrostro 2020) — chromatin accessibility + RNA.
- ASAP-seq — ATAC + protein.
Pangenome and reference assemblies
- GRCh38 — Genome Reference Consortium human 2013; remains the production reference 2024 for clinical interpretation.
- T2T-CHM13 — Telomere-to-Telomere consortium 2022 (Nurk-Phillippy Science); first gapless human genome (CHM13 hydatidiform mole, near-haploid); adds 200 Mb new sequence including centromeres, rDNA, segmental duplications.
- HPRC v1 — Human Pangenome Reference Consortium 2023 (Liao-Asri-Ebler Nature) — 47 diploid genomes from globally diverse individuals; 94 haplotypes; minigraph-CACTUS pangenome graph; aims 350 by 2025.
- GRCm39 — mouse reference (C57BL/6J) current.
Assembly tools:
- Sanger / short read: legacy AbySS, SOAPdenovo, ALLPATHS-LG, Velvet, SPAdes (prokaryote), Trinity (transcriptome).
- Long-read: hifiasm (Cheng-Li 2021; HiFi standard), Canu (Koren-Phillippy 2017), wtdbg2 (Ruan-Li 2020), Flye / metaFlye (Kolmogorov 2019), Verkko (T2T pipeline 2023), nextDenovo (BGI).
- Polishing: Pilon (Illumina-polish), Racon, Medaka (ONT), DeepVariant + DeepConsensus.
- Scaffolding: 3D-DNA + Juicer (Hi-C), SALSA2.
Aligners (read → reference):
- Short read: BWA-MEM (Li 2009/2013), Bowtie2 (Langmead-Salzberg 2012), HISAT2 (Kim-Salzberg 2015; splice-aware), STAR (Dobin-Gingeras 2013; splice-aware; default scRNA), Minimap2 (Li 2018; long + short).
- Long read: Minimap2 dominant; NGMLR (Sedlazeck), LRA, Winnowmap (Jain 2022).
Variant callers:
- Germline: GATK4 HaplotypeCaller + GVCF (Broad); DeepVariant (Poplin-Carroll 2018 Google; CNN); Strelka2 (Illumina, Kim 2018); FreeBayes (Garrison 2012); BCFtools.
- Somatic / cancer: Mutect2 (Broad), Strelka2-Somatic, VarScan2, MuSE, SomaticSniper, DeepSomatic.
- Structural variants: Manta, Delly, Lumpy (short-read); Sniffles2, CuteSV, pbsv (long-read).
- CNV: CNVkit, GATK gCNV, Sequenza, FACETS.
Genome editing technology
| Tech | Year | Type | Effect | Therapy / company |
|---|---|---|---|---|
| Zinc-finger nucleases (ZFN) | 1996-2002 | engineered FokI + ZF protein | DSB → NHEJ or HDR | Sangamo CCR5 SB-728 HIV; pre-CRISPR |
| TALENs | 2009-2011 | FokI + TAL effector | DSB | Cellectis UCART19; Allogene |
| CRISPR-Cas9 (S. pyogenes) | 2012 Doudna-Charpentier Science; Zhang-Cong 2013 Cell mammalian | RNA-guided DSB | KO via NHEJ, KI via HDR; Nobel 2020 Chemistry | Casgevy (exa-cel; Vertex + CRISPR Therapeutics 2023 first FDA-approved CRISPR — sickle cell + β-thalassemia ex vivo HSC edit BCL11A) |
| Cas9 variants — SaCas9 (S. aureus, smaller), ScCas9, dCas9 (dead — CRISPRi/a) | 2015+ | various PAM | smaller for AAV (SaCas9 4 kb) | Editas (LCA10 EDIT-101 CEP290 — first in vivo CRISPR human 2020) |
| Cas12a / Cpf1 (LbCas12a, AsCas12a, MAD7) | 2015 Zhang | RNA-guided; TTTV PAM; staggered cut | KO; multiplex | Inscripta MAD7; ToolGen |
| Cas13a/b/d | 2017 Zhang | RNA target; collateral cleavage | RNA edit / knockdown; diagnostics (SHERLOCK 2017) | Tessera, Locanabio |
| Base editing | Komor-Liu 2016 CBE (C→T); Gaudelli-Liu 2017 ABE (A→G) | dCas9 + APOBEC1 or TadA deaminase | no DSB; precise | Beam Therapeutics BEAM-101 (sickle cell, BCL11A enhancer); Verve VERVE-101 → 102 PCSK9 ABE (CV indication) |
| Prime editing | Anzalone-Liu 2019 | nCas9 + reverse transcriptase + pegRNA | targeted insertions / deletions / point | Prime Medicine (PM359 CGD 2024 clinical) |
| Twin prime editing (twinPE / PASTE) | Anzalone-Liu 2021 | dual pegRNA → integrase | larger payload | |
| Search-and-replace / SeekRNA | various | |||
| RLR / Retron / DAD | various 2020s | bacterial retron + RNA template |
Bioinformatics software ecosystem
- Read QC: FastQC, MultiQC (Ewels 2016), Trimmomatic, fastp (Chen 2018), cutadapt, BBDuk.
- BAM / VCF tools: samtools (Li-Durbin 2009), bcftools, picard (Broad), GATK (Broad), bedtools (Quinlan 2010).
- Visualizers: IGV (Robinson 2011 Broad), JBrowse2, UCSC Genome Browser, Ensembl Browser, USCS Cell Browser, cellxgene (Chan-Zuckerberg).
- Transcript quant: Salmon (Patro 2017), kallisto (Bray-Pachter 2016), RSEM (Li-Dewey 2011), HTSeq (Anders 2015), featureCounts (Liao-Smyth 2014).
- Differential expression: DESeq2 (Love-Anders-Huber 2014), edgeR (Robinson-McCarthy-Smyth 2010), limma + voom (Smyth 2004 + Law 2014), sleuth (kallisto downstream).
- Single-cell: Seurat v5 (Satija lab 2024), Scanpy (Wolf-Theis 2018; Python; AnnData), Cell Ranger (10x official), STARsolo (Dobin), scVI / scvi-tools (Lopez-Yosef 2018 deep generative), MOFA+ (Argelaguet 2018 multi-omic), CellOracle (Kamimoto-Morris 2023 GRN), CellChat (Jin-Nie 2021), inferCNV (Broad), monocle3 (Trapnell pseudotime).
- Dimensionality reduction / viz: PCA, t-SNE (van der Maaten-Hinton 2008), UMAP (McInnes-Healy-Melville 2018; Python and R), PHATE.
- Pathway / enrichment: GSEA (Subramanian-Mootha-Mesirov 2005 PNAS; ssGSEA; preranked); MSigDB (H, C1-C8); g:Profiler (Reimand 2007+); Enrichr (Kuleshov 2016); IPA (QIAGEN Ingenuity); DAVID (Huang-Sherman 2009); WebGestalt.
- Pathway databases: KEGG (Kanehisa Kyoto), Reactome (EMBL-EBI + OICR), WikiPathways, BioCyc / MetaCyc, BioGRID, STRING (Szklarczyk 2023), CORUM (complexes), SignaLink.
- Ontologies + identifiers: Gene Ontology (GO; biological process / molecular function / cellular component), MeSH, HPO (Human Phenotype Ontology — Köhler 2021), OMIM (McKusick 1966 → online), DisGeNET, MONDO, Cell Ontology, Uberon (anatomy), ChEBI (chemicals).
- Gene symbol authorities: HGNC (human), MGI (mouse), RGD (rat), ZFIN (zebrafish), FlyBase (Drosophila), WormBase (C. elegans), SGD (yeast), TAIR (Arabidopsis).
- Variant + clinical: ClinVar (NCBI), gnomAD v4 (Broad 2023; 807k exomes + 76k genomes), 1000 Genomes / IGSR, TCGA (cancer genomics 2008-2018), ICGC, COSMIC (Sanger), PharmGKB, CIViC.
- Protein: UniProt (SwissProt curated + TrEMBL automatic), AlphaFold DB (Jumper-Hassabis 2021; AlphaFold3 2024; Nobel Chemistry 2024 Hassabis-Jumper-Baker), Pfam / InterPro, PDB (~220k structures 2024), PROSITE, OpenFold (Bonneau 2022).
- Workflow managers: Snakemake (Köster 2012), Nextflow (Di Tommaso 2017; nf-core community), WDL + Cromwell (Broad), CWL.
Adjacent
- Tier 1 home for these: genetics-and-genomics, cell-molecular-biology, developmental-biology
- Tier 3 siblings: _index, deferred protein-families-catalog, cell-line-catalog, drug-target-families-catalog, antibody-catalog, pathway-database-mapping
- Adjacent reference vaults:
- reagent-and-reaction-catalog — peptide coupling, click chemistry, bioconjugation reactions used to attach fluorophores, oligonucleotides, and drug payloads to antibodies and small molecules
- _index — bioprocess / fermentation / mAb production scale
- _index — bioinformatics software ecosystem in more detail
- Domain notes referencing this: any CRISPR screen design walkthrough, scRNA-seq pipeline note, organoid protocol, sequencing-cost forecast
- History references: HGP, ENCODE, GTEx, TCGA, HCA (Human Cell Atlas), HPA (Human Protein Atlas), HuBMAP, Tabula Muris / Tabula Sapiens