Model Organisms & Sequencing Technology Catalog

The two reference tables every wet-lab and computational biologist returns to weekly: (1) which model organism, strain, and cell line is appropriate for a given question, and (2) which sequencing platform delivers the right read length / accuracy / price for a given assay. Plus the gene-editing and bioinformatics tooling layered on top.


Part 1 — Model organisms

Prokaryotes

OrganismStrainGenome sizeUseNote
Escherichia coliK-12 MG16554.64 MbReference E. coli; gene knockout (Keio collection)Sequenced 1997 Blattner Science
E. coliK-12 W31104.65 MbIndustrial fermentation chassisLess mutator than MG1655
E. coliDH5α (rec− endA−)4.6 MbCloning; plasmid maintenanceTetcyc, supE44, lacZΔM15 (blue-white)
E. coliBL21(DE3)4.6 MbT7 polymerase under lac control; protein over-expressionLon/OmpT protease deficient
E. coliRosetta, Origami, Tunervariousrare-codon (Rosetta), disulfide-bond (Origami) variants of BL21Novagen / Merck lines
E. coliNEB Stable, Stbl3low-recombinationunstable repeats (lentiviral LTRs)NEB / Invitrogen
Bacillus subtilis1684.2 MbGram+ model; sporulation; secreted proteinFirst Gram+ sequenced 1997
Mycobacterium tuberculosisH37Rv4.4 MbTB pathogenSequenced 1998 Cole Nature
M. smegmatismc²1557.0 MbFast-growing surrogate for MtbBSL2 vs Mtb BSL3
Streptomyces coelicolorA3(2)8.7 MbSecondary metabolism (antibiotic) referenceLinear chromosome; >20 BGCs
Pseudomonas aeruginosaPAO16.3 MbOpportunistic pathogen; biofilmSequenced 2000
Caulobacter crescentusNA1000 / CB15N4.0 MbCell-cycle differentiation (stalk vs swarmer)Asymmetric division model
Vibrio choleraeN169614.0 Mb (2 chromosomes)Quorum sensing; two-chromosomeSequenced 2000
Synechocystis sp. PCC 68033.6 MbPhotosynthetic cyanobacteriumFirst cyanobacterium sequenced 1996

Fungi / yeasts

OrganismStrainGenomeUseSequenced
Saccharomyces cerevisiaeS288C12.1 Mb, 16 chromosomes, ~6000 genesGenetic screens (synthetic lethal, SGA), eukaryote model1996 Goffeau Science — first eukaryote
S. cerevisiaeW303Classic genetics backgroundMultiple auxotrophies
S. cerevisiaeBY4741 / BY4742 / BY4743Gene deletion collection (4900 mutants — Giaever 2002 Nature)YKO library haploid + heterozygous diploid
S. cerevisiaeEUROSCARF, Tn-seq collectionssystematic
Schizosaccharomyces pombe972 h−12.6 Mb, 3 chromosomesFission yeast; G2/M cell cycle; CDC genesPaul Nurse Nobel 2001 (Hartwell + Nurse + Hunt)
Pichia pastoris (now Komagataella phaffii)GS115, X-339.4 MbIndustrial recombinant protein (>1000 mg/L); methanol-induced AOX1Invitrogen Pichia kit standard
Candida albicansSC531414.3 Mb (diploid)Fungal pathogenSequenced 2004
Aspergillus nidulansFGSC A430 MbFilamentous fungus; sexual cycle
Neurospora crassaOR74A41 MbCircadian (frq), heterokaryonBeadle-Tatum one-gene-one-enzyme 1958 Nobel
Cryptococcus neoformansH9919 MbPathogenic yeast (encapsulated)

Invertebrates

OrganismStrainGenomeUseNote
Caenorhabditis elegansN2 Bristol100 Mb, 6 chromosomes, ~20,000 genes959 somatic cells fully mapped (Sulston lineage); 302 neurons connectomeBrenner-Sulston-Horvitz Nobel 2002; first multicellular sequenced 1998
C. elegansHawaiian CB4856Wild-isolate; RIL crosses
Drosophila melanogasterOregon-R, Canton-S144 Mb, 4 chromosomesGenetics (Morgan Nobel 1933), Hox patterning (Lewis + Nüsslein-Volhard + Wieschaus Nobel 1995), GAL4-UAS, FLP-FRTBloomington Stock Center; Vienna VDRC
D. melanogasterw¹¹¹⁸white-eyed background for transgenics
Aplysia californica1.8 GbMarine slug; learning + memory (sensitization, classical conditioning)Kandel Nobel 2000; ~20k neurons large for ID
Dictyostelium discoideumAX2, AX434 MbSocial amoeba; chemotaxis (cAMP); cell-aggregationMulticellular life from unicellular origin
Tetrahymena thermophilaSB210104 Mb (macronucleus)Telomerase discovery (Greider-Blackburn 1985, Nobel 2009); ribozyme (Cech 1989 Nobel)Two-nucleus ciliate
Paramecium tetraurelia72 MbCilia, cortical inheritanceWhole-genome duplications (3)
Hydra vulgaris1.3 GbRegeneration; stem cellsCnidarian; immortal soma
Planaria (Schmidtea mediterranea)CIW4, S2F1L3F2700 Mb (CIW4 asexual)Whole-body regeneration; neoblastsSánchez Alvarado lab
Bombyx morip50T432 MbSilkworm; agricultural
Anopheles gambiaePEST273 MbMalaria vector
Aedes aegyptiLVP_AGWG1.38 GbDengue + Zika + chikungunya vector

Plants

OrganismCultivarGenomeNotes
Arabidopsis thalianaCol-0 (Columbia-0)135 Mb, 5 chromosomes, 27,400 genesGenome 2000 (first plant); ~6 wk life cycle; T-DNA insertion lines (SALK, GABI-Kat)
A. thalianaLer (Landsberg erecta)Mapping crosses with Col-0
Rice (Oryza sativa)ssp. japonica Nipponbare389 Mb, 12 chromosomesFirst crop genome 2005 IRGSP
Riceindica 9311 (Beijing Genomics 2002)Heterosis studies
Maize (Zea mays)B732.3 Gb, 10 chromosomesHigh repeat content; reference 2009; pan-genome 2020s
Sorghum bicolorBTx623730 MbC4 grass
Setaria italicaYugu1510 MbFoxtail millet
Tomato (Solanum lycopersicum)Heinz 1706900 MbGenome 2012; ripening (rin, nor mutants)
Soybean (Glycine max)Williams 821.1 GbLegume reference; paleo-tetraploid
Brassica spp.Various (B. napus DH12075, B. rapa Chiifu)Polyploidy / WGD studies
Tobacco (N. tabacum, N. benthamiana)TN90, LAB3.8 Gb / 3.1 GbTransient expression (agroinfiltration)
Marchantia polymorphaTak-1 (male), Tak-2 (female)226 MbBasal land plant; simple Gemma propagation
Physcomitrium patensGransden 2004480 MbMoss; gene targeting via HR
Chlamydomonas reinhardtiiCC-503, CC-1690111 MbGreen alga; flagella, photosynthesis
Selaginella moellendorffii213 MbLycophyte; intermediate

Fish / amphibians / reptiles

OrganismStrain / lineGenomeUse
Zebrafish (Danio rerio)AB, Tübingen (TU), TL (top long fin), WIK1.4 Gb, 25 chromosomesEmbryo transparent; ENU mutagenesis screens (Nüsslein-Volhard 1990s+); Tg lines; ZFIN database
Zebrafish casper lineroy−/− nacre−/− (mitfa−/−)Transparent adult — White 2008
Medaka (Oryzias latipes)HdrR (Hd-rR)700 MbDiploid fish alternative to zebrafish
Xenopus laevisJ strain3.1 Gb allotetraploidClassical embryology; oocyte; large eggs (~1 mm)
Xenopus tropicalisNigerian, Ivory Coast1.7 Gb diploidModern Xenopus; faster generation
Axolotl (Ambystoma mexicanum)wild-type, white d/d, GFP+32 GbLimb + heart + spinal-cord regeneration
Salamander (Notophthalmus, Pleurodeles)very largeRegeneration
Anolis carolinensis1.8 GbReptile model; eye development

Birds

OrganismStrainGenomeUse
Chicken (Gallus gallus)Red Junglefowl (UCD001), White Leghorn1.05 Gb, ~16,000 genesEmbryology window; immunology (B cell discovery; Bursa of Fabricius)
Quail (Coturnix japonica)1.0 GbChick alternative; faster
Zebra finch (Taeniopygia guttata)1.2 GbVocal learning; song system

Mammals

OrganismStrainGenomeUseNote
Mouse (Mus musculus)C57BL/6J2.7 Gb, 20 chromosomesReference inbred; ES cell-derived gene KO Capecchi-Smithies-Evans Nobel 2007; CRISPRJackson Labs JAX (Bar Harbor ME)
MouseC57BL/6NNIH version; subtly different from JAX 6J
MouseBALB/cTh2-skewed; allergy + tumor immunology
Mouse129 (129S1/Sv, 129S6, 129S4)ES-cell donor strain
MouseFVB/NAlbino; pronuclear injection (large pronucleus); transgenic standard
MouseCD-1 (outbred), Swiss WebsterToxicology, breeding
MouseNOD/SCID, NSG (NOD-scid-IL2Rγnull)Humanized mouse, xenograft
MouseBlack 6 + Cre/lox driver collectiontissue-specific KO (Albumin-Cre liver, Villin-Cre intestine, etc.)
Rat (Rattus norvegicus)Sprague-Dawley2.9 GbPharmacology, toxicology, behavior; bigger than mouse for surgery
RatWistar, Wistar-Kyoto (WKY)
RatLewis (LEW), SHR (spontaneously hypertensive)Hypertension model
RatF344 (Fischer)NIH tox
Rabbit (Oryctolagus cuniculus)New Zealand White, Dutch2.7 GbAtherosclerosis (Watanabe — LDLR), polyclonal antibody production
Guinea pig (Cavia porcellus)Hartley, Strain 132.7 GbVitamin C requirement; vaccine + TB
Pig (Sus scrofa)Yucatan, Göttingen Minipig2.7 GbCardiovascular, transplant; CRISPR PERV-knockout xenotransplant (eGenesis, Revivicor 2024 — Bartley kidney recipient)
Sheep (Ovis aries)Dolly (1996 — Roslin/Wilmut)2.7 GbFirst cloned mammal; SCNT
Dog (Canis familiaris)Beagle, Boxer (reference Tasha 2005)2.4 GbCardiovascular tox, neuro
Ferret (Mustela putorius furo)2.4 GbInfluenza, respiratory pathogen
Rhesus macaque (Macaca mulatta)Indian, Chinese origin2.9 GbPrimate immunology, neurology, HIV/SIV
Cynomolgus macaque (M. fascicularis)2.9 GbTox + biologics last-stage preclinical
Marmoset (Callithrix jacchus)2.9 GbPrimate CRISPR (CLARITY, optogenetics); 4-month maturity
Squirrel monkey (Saimiri)NHP cheaper alternative

Cell lines — human + mouse + rodent

LineTissue / originYearNote
HeLaCervical adenocarcinoma1951Henrietta Lacks; first immortal human; HPV-18 driver; Skloot 2010 book; >50 million metric tons total cultured to date; ATCC CCL-2
HEK293 (293T, 293FT)Human embryonic kidney (likely neuronal origin per Shaw 2002)1973 (Graham)Transfection workhorse; lentivirus packaging
CHO (Chinese hamster ovary)Ovary; CHO-K1, DG44 (DHFR−), CHO-S, ExpiCHO1957 PuckRecombinant biologics — Activase (1986 first), Enbrel, Humira, EPO, insulin, mAbs; >70% biologic mass produced in CHO
VeroAfrican green monkey kidney1962Vaccine production (polio, Ebola, COVID-19 Sinovac CoronaVac)
MDCKMadin-Darby canine kidney1958Influenza vaccine production; epithelial polarity model
Sf9 / Sf21 / High FiveSpodoptera frugiperda / Trichoplusia ni insect cells1980sBaculovirus-insect expression (BEVS); Cervarix HPV vaccine; FluBlok
A549Lung adenocarcinoma1972NSCLC, drug screen, viral infection (SARS-CoV-2 needs ACE2 overexpression)
MCF-7Breast adenocarcinoma ER+1973ER+ tamoxifen response
MDA-MB-231Breast TNBC1973Triple-negative breast cancer model
HepG2Hepatocellular carcinoma1979Hepatocyte / drug metabolism (limited CYP)
HuH-7HCC1982HCV replication (Lohmann-Bartenschlager 1999)
Hep3BHCC1979HBV+
U2OSOsteosarcoma1964DNA-damage imaging (large flat cells)
U-87 MGGlioblastoma1968GBM (note: reidentified 2016 as different patient than original)
HCT116, SW480, HT-29, Caco-2ColorectalvariousCaco-2 transepithelial transport
K562CML (Ph+)1975First human CML line; differentiation studies
JurkatT-ALL1976T cell receptor signaling
THP-1Acute monocytic leukemia1980Monocyte / macrophage differentiation (PMA)
U937Histiocytic lymphoma1976Macrophage model
RAW264.7Mouse macrophage (Abelson MLV-induced)1978NF-κB, TLR signaling
NIH/3T3Mouse embryonic fibroblast1962Transformation focus assay (Weinberg ras 1981)
3T3-L1Mouse preadipocyte1974Adipogenesis (Green-Kehinde)
COS-7, COS-1African green monkey kidney + SV40 LT1981Transient transfection (SV40 ori)
BHK-21Baby hamster kidney1962Virology (FMDV vaccine), recombinant
PC-12Rat pheochromocytoma1976NGF-induced neurite differentiation
Neuro-2aMouse neuroblastoma1969Cheap neuronal model (caveat)
L929Mouse fibroblast1948First continuous mammalian cell line — Sanford

iPSC lines (induced pluripotent stem cells — Yamanaka Nobel 2012):

  • WiCell H1, H9 (hESC — Thomson 1998); used as comparator.
  • ATCC, Coriell, EBiSC (European Bank for iPSC) repositories.
  • iPSC reprogramming: OSKM (Oct4, Sox2, Klf4, c-Myc) — retro/lenti, episomal, mRNA, Sendai (CytoTune ThermoFisher).
  • Organoids: Sato-Clevers 2009 intestinal; brain organoids (Lancaster 2013); kidney (Takasato 2015); liver, lung, pancreas, retina, cardiac.

Part 2 — Sequencing technologies

Sanger sequencing (1977)

  • Chain-termination with ddNTPs (Frederick Sanger; Walter Gilbert chemical-degradation; Nobel 1980).
  • Read length ~800 bp routine; ~1,000 bp max.
  • Accuracy 99.9% (Q40).
  • Capillary electrophoresis: ABI 377 1995 (slab gel) → ABI Prism 3100 → ABI 3730 / 3730xl (Applied Biosystems, 96 capillaries, since 2002).
  • Throughput: 24-96 reads / 1-3 h.
  • Cost 2024: ~3-5 outsourced (Genewiz Azenta, Eurofins).
  • Used for: Sanger validation of variants (gold standard), plasmid verification, short PCR amplicon, MLST. Human Genome Project (1990-2003 public; Celera 2000 private) finished by Sanger — $2.7B / 13 years.

Next-generation sequencing (short-read NGS)

PlatformVendorYearRead lengthThroughputCost/Gb (2024)Note
454 GS FLXRoche (acq. 454 Life Sciences 2007)2005 first commercial NGS400-1000 bp0.7 Gb/rundiscontinued 2016Pyrosequencing; PPi → light via luciferase; homopolymer errors
Solexa / Illumina SBSIllumina (acq. Solexa 2007 $600M)20062x50 → 2x300 bpvaries by instrument$5-15Sequencing-by-synthesis; reversible terminators; cluster amplification on flowcell
Illumina MiSeqIllumina20112x300 bp15 Gb/run$80/Gb16S amplicon, small genome
Illumina NextSeq 550 / 1000 / 2000Illumina2014 / 20202x150 bp30-360 Gb/run$20-40Mid-throughput; exome, RNA-seq
Illumina HiSeq 2500 / 4000 / X TenIllumina20142x150 bp1.5 Tb/run$7First $1000 genome (X Ten cluster 2014); discontinued 2021
Illumina NovaSeq 6000Illumina20172x150 bp3 Tb/run (S4)$5High-throughput workhorse 2017-2024
Illumina NovaSeq X PlusIllumina2023 commercial 20242x150 bp20 Tb/run$2$200 per 30x WGS; XLEAP-SBS chemistry
Ion Torrent PGM / Proton / S5 / GeneStudio GenexusThermo Fisher (acq. Life Tech 2014)2010200-600 bp5-15 Gb (S5)$30-50H+ ion detection; semiconductor (no optics); homopolymer error remains
SOLiD 5500Applied Biosystems (Thermo)200750-75 bpdiscontinued 2016Ligation-based; high accuracy but short reads
BGI / MGISEQ-2000 / DNBSEQ-G400 / T7 / T20 / T20+BGI / MGI Tech (acq. Complete Genomics 2013)2015+100-200 bpT20+: 72 Tb/run$1-3DNB (DNA nanoball) rolling-circle; ~$100 30x genome on T20+; dominant in Asia
Element AVITIElement Biosciences (2018, ex-Illumina founders)20232x150 bp1 Tb/run$5-10Avidity sequencing; lower cost CapEx (985k)
Singular Genomics G4Singular Genomics20222x150 bp130 Gb/run$20Mid-throughput
Ultima UG100Ultima Genomics2022100-300 bp1 Tb/run$1-2Wafer-based; $100 30x genome target
Roche SBX (Sequencing by Expansion)Roche (technology Stratos Genomics acquired 2020)launching 2024-2025aimed long single-moleculetarget <$100/genome”Xpandomer” expansion
AstraZeneca / Polaris (former Avantium)variousLong-read short-read hybrid emerging

Long-read sequencing

PlatformVendorYearRead length typicalRead length maxAccuracyThroughputCost/Gb
PacBio RS / RS IIPacific Biosciences20115-15 kb60 kb87% raw, >99% CCSlowhigh
PacBio SequelPacBio201510-30 kb100 kbsimilar5-10 Gb/runmid
PacBio Sequel II / IIePacBio2019 / 202015-25 kb HiFi100+ kb99.9% HiFi (Q30+ CCS)30 Gb HiFi / SMRT cell$30-50
PacBio RevioPacBio202315-25 kb HiFi100+ kbQ30+ HiFi90 Gb HiFi / SMRT cell × 4 cells/day = 360 Gb/day$5-10
PacBio OnsoPacBio2023150-200 bp short-readQ40250 Gb/run$10-15
ONT MinION (Mk1B, Mk1C, Mk1D)Oxford Nanopore2014 first commercial USB5-30 kb typical, 100+ kb>4 Mb max recorded (Loose lab)95-99% R10.4.120-50 Gb/flowcell900)
ONT GridIONONT2017as above; 5 flowcellsas aboveas above250 Gb totalmid
ONT PromethION 24 / 48 / P2 / P2 SoloONT2018 / 2024as aboveas aboveas aboveP48 = 14 Tb/run$5-15
ONT FlongleONT2019shorter run2-3 Gb adapterlow CapEx
Singular / Element / Ultima long-read variantsvarious2024-2025emerging

ONT chemistry / flowcell: R9.4.1 (legacy 2018), R10.4.1 (2023; dominant; duplex sequencing pushes to Q30+). Dorado basecaller (Rust-based, GPU; supersedes Guppy and Albacore). Pore types: ONT uses E. coli CsgG nanopore engineered variants. Sample prep: ligation (LSK114), rapid (RAD), PCR-cDNA / dRNA, native RNA (RNA-002 / RNA-004 kit 2024).

Cost trajectory (sequencing) — NHGRI cost benchmarks

  • 2001: $100M per genome (Sanger; HGP-era).
  • 2007: $10M (early NGS).
  • 2008: $300k (Solexa).
  • 2010: $50k.
  • 2014: 1000 genome”).
  • 2020: $600 (NovaSeq 6000).
  • 2024: 100 (MGI T20+, Ultima); <$100 (Roche SBX target).
  • Moore’s law equivalent doubling — sequencing has outpaced Moore’s law since 2008.

Reduced-representation and targeted

  • Exome capture: ~30-35 Mb (1-2% of genome) coding sequence + UTR. Hybridization probes: Agilent SureSelect Human All Exon V8 (35 Mb, 30 ng input); IDT xGen Exome (39 Mb); Twist Comprehensive Exome (35 Mb, often best uniformity). Indexed pooling 96-384 plex.
  • Targeted gene panels: FoundationOne CDx (Foundation Medicine — Roche; 324 genes; FDA approved 2017 for solid tumor); Tempus xT (648 genes); Caris Molecular Intelligence; Guardant Health Guardant360 (74-gene cfDNA liquid biopsy); Natera Signatera (MRD); MSK-IMPACT (505 genes; CLIA Memorial Sloan Kettering).
  • Cancer hereditary: Myriad myRisk; Color Genomics; Ambry; Invitae.
  • Methylation:
    • Whole-genome bisulfite (WGBS) — Lister 2009.
    • Reduced-representation bisulfite (RRBS) — Meissner 2008.
    • EPIC array v2 (Illumina Infinium MethylationEPIC v2.0) — 935k CpGs; 2023 release; replaces 450k + EPIC v1.
    • oxBS-seq (5hmC vs 5mC discrimination); TAB-seq (Tet-assisted; selective for 5hmC).
    • Long-read native methylation calling: ONT 5mC + 6mA + 5hmC via current; PacBio kinetic detection.
  • RNA-seq:
    • Bulk mRNA polyA+ (TruSeq mRNA, NEBNext Ultra II Directional).
    • Total RNA / rRNA-depleted (TruSeq Stranded Total RNA with Ribo-Zero Gold; RiboCop).
    • 3’ tag-seq (QuantSeq Lexogen; PrimeSeq; Drop-seq derivatives).
    • Ribo-seq translating ribosomes (Ingolia-Weissman 2009).
    • SLAM-seq metabolic labeling (Muhar-Zuber 2018; s4U-thiol-iodoacetamide).
    • Direct RNA (ONT RNA-004).
  • Chromatin / epigenome:
    • ChIP-seq — classical; replicates (ENCODE standard); spike-in (Drosophila chromatin).
    • CUT&RUN (Henikoff 2017); CUT&Tag (Kaya-Okur 2019) — pA-MNase / pA-Tn5; low input.
    • ATAC-seq (Buenrostro-Greenleaf 2013) — Tn5 transposase tags accessible chromatin; 50,000-cell input.
    • DNase-seq, MNase-seq (legacy).
    • Hi-C (Lieberman-Aiden 2009) — 3D genome contact map; Micro-C (Hsieh-Rando 2020) — MNase digest; capture-Hi-C (promoter capture).
    • HiChIP (Mumbach-Chang 2016) — Hi-C + ChIP combined.
    • 4C, 5C (Capture-C 2018).

Single-cell sequencing

PlatformVendorChemistryOutputCells/sampleYear
10x Genomics Chromium iX / X10x Genomics (founded 2012; IPO 2019 NASDAQ TXG; ~$620M revenue 2023)GEMs (gel beads in emulsion)Single Cell 3’ v3.1 / 5’ v3 / Multiome / VDJ / Immune Profiling500-10k routine, up to 65k/laneChromium 2017+
10x Visium10xspatial barcoded slide55 µm spots; ~5k spots/capture areatissue section 6.5x6.5 mm2019
10x Visium HD10x2 µm bins (subcellular)continuous capturetissue section2024
10x Xenium10xin situ HISH300-5000 gene panels; subcellulartissue2023
BD RhapsodyBD Biosciences (Cellular Research acq)microwellmRNA + AbSeq protein + sample tag100-40k cells/lane2018
Mission Bio TapestriMission Biomicrofluidic; DNA-focusedsingle-cell DNA panel + protein1000-10k2018
Parse BiosciencesParse Bio (founded 2018 ex-Seelig lab)split-pool combinatorial barcoding (SPLiT-seq descendant)Evercode 1M+ kitup to 1M cells/run; multiplex 96-sample2020
Standard BioTools (Fluidigm rebrand)C1 (legacy) → Polaris → CyTOF (mass cytometry)mass + scRNAbridge
Bio-Rad ddSEQBio-Rad + Illumina (partnership 2017 ended 2020)dropletmRNAdiscontinued mostly
Honeycomb HIVEHoneycomb BiotechnologiespicowellmRNA; sample-stableshippable2022

Plate-based / low-throughput single-cell

  • Smart-seq2 (Picelli-Sandberg 2014) — full-length cDNA; 96-well plate; high gene detection (~10k genes/cell); >$1/cell sequencing-only.
  • Smart-seq3 (Hagemann-Jensen 2020) — UMI + 5’ tag + full-length; better isoform.
  • SS-Seq, SCRB-seq, mCEL-Seq2 — UMI-based 3’-tag; cheaper.
  • MARS-seq (Jaitin-Amit 2014) — robotic FACS-sorted into 384-well; immune scRNA.

Spatial transcriptomics

MethodVendorTypeResolutionThroughputReference
10x Visium10xbarcoded capture (NGS)55 µm (Visium); 2 µm bin (Visium HD)~5k spots/sectionStåhl-Frisén 2016 ST → 10x acq
MERFISHVizgen MERSCOPEmultiplexed FISH~100 nm subcellular500-1000 gene panelsChen-Zhuang 2015
seqFISH+(academic)multiplexed FISHsubcellular10k genesCai-Lab 2019
CosMx SMIBruker (acq. NanoString 2024 $392M)in situ chemistrysubcellular1k+ gene + 64-proteinGeiss 2022
Slide-seq / Slide-seq2(academic; Beam Therapeutics acq IP partly)bead-on-puck10 µmtissue sectionRodriques-Macosko-Chen 2019; Stickels 2021
Slide-tagsMacosko-Chenbarcoded tags on dissociated cellstissue context preserved2024
Seq-Scope(academic Yi-Choi 2021)Illumina-flowcell barcoded capturesubmicronsmall areas
Stereo-seqBGI-StereoSeqDNB-CID array500 nmfull-section2022
Xenium10xin situsubcellular300-5000 genes2023
CODEX (Akoya PhenoCycler)Akoya Biosciences (PhenoCycler-Fusion)multiplex IF cyclingcellular protein40+ markersGoltsev-Nolan 2018
MIBI-TOFStandard BioTools (Stanford Angelo origin)mass-tag IF + TOF-MSsubcellular40+ markersAngelo-Nolan 2014
IMC (Imaging Mass Cytometry)Standard BioTools Hyperionmass-tag IF + laser ablation1 µm40+ markers2014

Multi-omic single-cell

  • 10x Multiome ATAC+GEX — joint chromatin + RNA from same cell; 2020.
  • CITE-seq (Stoeckius-Smibert 2017) — RNA + cell-surface protein via oligo-tagged antibodies (TotalSeq from BioLegend, expanded to 200+ markers).
  • TEA-seq (Mimitou-Smibert 2021) — ATAC + RNA + protein.
  • DOGMA-seq — ATAC + RNA + protein joint.
  • SHARE-seq (Ma-Buenrostro 2020) — chromatin accessibility + RNA.
  • ASAP-seq — ATAC + protein.

Pangenome and reference assemblies

  • GRCh38 — Genome Reference Consortium human 2013; remains the production reference 2024 for clinical interpretation.
  • T2T-CHM13 — Telomere-to-Telomere consortium 2022 (Nurk-Phillippy Science); first gapless human genome (CHM13 hydatidiform mole, near-haploid); adds 200 Mb new sequence including centromeres, rDNA, segmental duplications.
  • HPRC v1 — Human Pangenome Reference Consortium 2023 (Liao-Asri-Ebler Nature) — 47 diploid genomes from globally diverse individuals; 94 haplotypes; minigraph-CACTUS pangenome graph; aims 350 by 2025.
  • GRCm39 — mouse reference (C57BL/6J) current.

Assembly tools:

  • Sanger / short read: legacy AbySS, SOAPdenovo, ALLPATHS-LG, Velvet, SPAdes (prokaryote), Trinity (transcriptome).
  • Long-read: hifiasm (Cheng-Li 2021; HiFi standard), Canu (Koren-Phillippy 2017), wtdbg2 (Ruan-Li 2020), Flye / metaFlye (Kolmogorov 2019), Verkko (T2T pipeline 2023), nextDenovo (BGI).
  • Polishing: Pilon (Illumina-polish), Racon, Medaka (ONT), DeepVariant + DeepConsensus.
  • Scaffolding: 3D-DNA + Juicer (Hi-C), SALSA2.

Aligners (read → reference):

  • Short read: BWA-MEM (Li 2009/2013), Bowtie2 (Langmead-Salzberg 2012), HISAT2 (Kim-Salzberg 2015; splice-aware), STAR (Dobin-Gingeras 2013; splice-aware; default scRNA), Minimap2 (Li 2018; long + short).
  • Long read: Minimap2 dominant; NGMLR (Sedlazeck), LRA, Winnowmap (Jain 2022).

Variant callers:

  • Germline: GATK4 HaplotypeCaller + GVCF (Broad); DeepVariant (Poplin-Carroll 2018 Google; CNN); Strelka2 (Illumina, Kim 2018); FreeBayes (Garrison 2012); BCFtools.
  • Somatic / cancer: Mutect2 (Broad), Strelka2-Somatic, VarScan2, MuSE, SomaticSniper, DeepSomatic.
  • Structural variants: Manta, Delly, Lumpy (short-read); Sniffles2, CuteSV, pbsv (long-read).
  • CNV: CNVkit, GATK gCNV, Sequenza, FACETS.

Genome editing technology

TechYearTypeEffectTherapy / company
Zinc-finger nucleases (ZFN)1996-2002engineered FokI + ZF proteinDSB → NHEJ or HDRSangamo CCR5 SB-728 HIV; pre-CRISPR
TALENs2009-2011FokI + TAL effectorDSBCellectis UCART19; Allogene
CRISPR-Cas9 (S. pyogenes)2012 Doudna-Charpentier Science; Zhang-Cong 2013 Cell mammalianRNA-guided DSBKO via NHEJ, KI via HDR; Nobel 2020 ChemistryCasgevy (exa-cel; Vertex + CRISPR Therapeutics 2023 first FDA-approved CRISPR — sickle cell + β-thalassemia ex vivo HSC edit BCL11A)
Cas9 variants — SaCas9 (S. aureus, smaller), ScCas9, dCas9 (dead — CRISPRi/a)2015+various PAMsmaller for AAV (SaCas9 4 kb)Editas (LCA10 EDIT-101 CEP290 — first in vivo CRISPR human 2020)
Cas12a / Cpf1 (LbCas12a, AsCas12a, MAD7)2015 ZhangRNA-guided; TTTV PAM; staggered cutKO; multiplexInscripta MAD7; ToolGen
Cas13a/b/d2017 ZhangRNA target; collateral cleavageRNA edit / knockdown; diagnostics (SHERLOCK 2017)Tessera, Locanabio
Base editingKomor-Liu 2016 CBE (C→T); Gaudelli-Liu 2017 ABE (A→G)dCas9 + APOBEC1 or TadA deaminaseno DSB; preciseBeam Therapeutics BEAM-101 (sickle cell, BCL11A enhancer); Verve VERVE-101 → 102 PCSK9 ABE (CV indication)
Prime editingAnzalone-Liu 2019nCas9 + reverse transcriptase + pegRNAtargeted insertions / deletions / pointPrime Medicine (PM359 CGD 2024 clinical)
Twin prime editing (twinPE / PASTE)Anzalone-Liu 2021dual pegRNA → integraselarger payload
Search-and-replace / SeekRNAvarious
RLR / Retron / DADvarious 2020sbacterial retron + RNA template

Bioinformatics software ecosystem

  • Read QC: FastQC, MultiQC (Ewels 2016), Trimmomatic, fastp (Chen 2018), cutadapt, BBDuk.
  • BAM / VCF tools: samtools (Li-Durbin 2009), bcftools, picard (Broad), GATK (Broad), bedtools (Quinlan 2010).
  • Visualizers: IGV (Robinson 2011 Broad), JBrowse2, UCSC Genome Browser, Ensembl Browser, USCS Cell Browser, cellxgene (Chan-Zuckerberg).
  • Transcript quant: Salmon (Patro 2017), kallisto (Bray-Pachter 2016), RSEM (Li-Dewey 2011), HTSeq (Anders 2015), featureCounts (Liao-Smyth 2014).
  • Differential expression: DESeq2 (Love-Anders-Huber 2014), edgeR (Robinson-McCarthy-Smyth 2010), limma + voom (Smyth 2004 + Law 2014), sleuth (kallisto downstream).
  • Single-cell: Seurat v5 (Satija lab 2024), Scanpy (Wolf-Theis 2018; Python; AnnData), Cell Ranger (10x official), STARsolo (Dobin), scVI / scvi-tools (Lopez-Yosef 2018 deep generative), MOFA+ (Argelaguet 2018 multi-omic), CellOracle (Kamimoto-Morris 2023 GRN), CellChat (Jin-Nie 2021), inferCNV (Broad), monocle3 (Trapnell pseudotime).
  • Dimensionality reduction / viz: PCA, t-SNE (van der Maaten-Hinton 2008), UMAP (McInnes-Healy-Melville 2018; Python and R), PHATE.
  • Pathway / enrichment: GSEA (Subramanian-Mootha-Mesirov 2005 PNAS; ssGSEA; preranked); MSigDB (H, C1-C8); g:Profiler (Reimand 2007+); Enrichr (Kuleshov 2016); IPA (QIAGEN Ingenuity); DAVID (Huang-Sherman 2009); WebGestalt.
  • Pathway databases: KEGG (Kanehisa Kyoto), Reactome (EMBL-EBI + OICR), WikiPathways, BioCyc / MetaCyc, BioGRID, STRING (Szklarczyk 2023), CORUM (complexes), SignaLink.
  • Ontologies + identifiers: Gene Ontology (GO; biological process / molecular function / cellular component), MeSH, HPO (Human Phenotype Ontology — Köhler 2021), OMIM (McKusick 1966 → online), DisGeNET, MONDO, Cell Ontology, Uberon (anatomy), ChEBI (chemicals).
  • Gene symbol authorities: HGNC (human), MGI (mouse), RGD (rat), ZFIN (zebrafish), FlyBase (Drosophila), WormBase (C. elegans), SGD (yeast), TAIR (Arabidopsis).
  • Variant + clinical: ClinVar (NCBI), gnomAD v4 (Broad 2023; 807k exomes + 76k genomes), 1000 Genomes / IGSR, TCGA (cancer genomics 2008-2018), ICGC, COSMIC (Sanger), PharmGKB, CIViC.
  • Protein: UniProt (SwissProt curated + TrEMBL automatic), AlphaFold DB (Jumper-Hassabis 2021; AlphaFold3 2024; Nobel Chemistry 2024 Hassabis-Jumper-Baker), Pfam / InterPro, PDB (~220k structures 2024), PROSITE, OpenFold (Bonneau 2022).
  • Workflow managers: Snakemake (Köster 2012), Nextflow (Di Tommaso 2017; nf-core community), WDL + Cromwell (Broad), CWL.

Adjacent