Single-Cell Genomics — Deep Reference

A Tier 2 deep-dive into the platforms, chemistries, library structures, atlases, and analysis stacks that constitute modern single-cell genomics. The field arose at the boundary of microfluidics and short-read sequencing — Macosko-Klein-McCarroll’s Drop-seq (2015 Cell) and Zheng et al.’s 10x Chromium (2017 Nat Commun) defined the droplet paradigm; Cao-Shendure’s sci-RNA-seq (2017 Science) demonstrated combinatorial indexing without microfluidics; in 2024-2026 the dominant outputs are 10x Chromium GEM-X (3M cells/run), Parse Biosciences Evercode (1M cells, no instrument), and emerging in situ approaches (Vizgen MERSCOPE, 10x Xenium 5K) that resolve transcripts in tissue without dissociation. By 2026 published datasets aggregate ~200M cells: CELLxGENE Census (CZI; >100M cells), HuBMAP, Human Cell Atlas (HCA) progress, Tabula Sapiens, Tahoe-100M (drug perturbation atlas, Vevo + Arc), and the Allen Brain Cell Atlas. This note maps the chemistries, the library structures, the analysis stack from Cell Ranger / STARsolo through scVI and foundation models, perturbation methods (Perturb-seq, CROP-seq), and the spatial + 3D-genome extensions.

Platform overview

Droplet microfluidics

Aqueous-in-oil droplets carry a single cell + a single barcoded gel bead, lysing the cell and capturing mRNA on bead-bound oligos. Each droplet’s RNA inherits the bead’s cell barcode (CB) and unique molecular identifier (UMI). Hundreds of thousands of cells per run; ~30-90 minute encapsulation.

10x Chromium / Chromium X / Chromium GEM-X (2023-2024). Current market leader. v3 chemistry — 3’ Gene Expression standard; ~75 bp R1 (16 nt CB + 12 nt UMI + polyT primer); R2 captures cDNA insert. GEM-X delivers ~3× sensitivity vs v3 (median ~5k genes/cell PBMC) and supports 8 lanes × 60k cells = 480k cells per run; ATAC + multiome supported. Catalog: PN-1000128 (Chromium Next GEM Single Cell 3’ GEM, Library + Gel Bead Kit v3.1); GEM-X SC 3’ (PN-1000648). Per-cell sequencing cost ~$0.10-0.30 at depth.
Drop-seq. Open-source, custom microfluidic device (Dolomite Bio); ~10x cheaper consumables but more hands-on. Original Macosko 2015 Cell.
inDrops. Klein lab, Harvard; hydrogel beads instead of polystyrene; now mostly historical.
DroNc-seq. Single-nucleus variant (frozen tissue, post-mortem brain).
BD Rhapsody. Honeycomb microwell plate (~200k wells); semi-droplet; integrated with surface protein detection (AbSeq). Less dominant than 10x but strong in immune profiling.
Honeycomb HIVE. Standalone microwell device; cells captured into 30k wells; sample collection by patients/clinicians; no instrument; preserves at room T for shipping.

Combinatorial indexing (split-pool)

No microfluidics — cells permeabilized, distributed across N wells, RNA-tagged with well-specific barcode round 1, pooled, redistributed, tagged round 2, … up to 4 rounds → unique 4-barcode combination per cell with probability ≪ 1 of collision at 10k-1M cell scale.

sci-RNA-seq (Shendure 2017 Science). First demonstration; 49,000 cells from C. elegans embryos. sci-RNA-seq3 (2019 Nature) → 4M cells/run cost ~$0.01 per cell.
SPLiT-seq (Rosen-Wang-Zheng 2018 Science). Founders later commercialized as Parse Biosciences Evercode.
Parse Biosciences Evercode WT. Commercial split-pool; up to 1M cells, 96 samples per run; no instrument required (uses standard pipettes + 96-well plates + benchtop centrifuge). Mega kit (1M cells, 96 samples, ~$0.05-0.15/cell consumable).
Scifi-RNA-seq. Datlinger-Bock 2021 — improved sci-RNA-seq sensitivity.
sci-Plex. Srivatsan-Cao-Shendure 2020 Science; nuclear-hashing barcode + sci-RNA-seq for chemical-perturbation screens; 5000 wells × 10k cells.

In situ / spatial-resolved

No dissociation — measure transcripts within tissue. See Spatial transcriptomics section below.

Library structure summary

3’ Gene Expression — typical 10x v3 / GEM-X. R1: 16 nt CB + 12 nt UMI + polyT primed. R2: 91 nt cDNA fragment near 3’UTR. Cheapest, deepest cell counts.
5’ Gene Expression — R1: 16 nt CB + 10 nt UMI (10x 5’ v2). cDNA primed at 5’ (template switch). Enables paired V(D)J seq + antigen-specific TCR/BCR.
V(D)J seq (TCR / BCR). Targeted amplification of TRA/TRB or IGH/IGK/IGL from 5’ library. 10x Single Cell V(D)J (PN-1000005); Parse Evercode TCR.
CITE-seq / TotalSeq (BioLegend). Surface protein measurement via DNA-tagged antibody (~150-300 markers). TotalSeq-A (poly-A tail for 3’ library), TotalSeq-B (capture sequence for 10x), TotalSeq-C (5’ library), TotalSeq-D (BD Rhapsody). Catalog example: TotalSeq-A0034 (anti-human CD3, clone UCHT1; BioLegend 300475).
CRISPR perturbation (CROP-seq / Perturb-seq / 10x CRISPR Screening Library). sgRNA cassette captured alongside transcriptome; assigns each cell to a perturbation.
Multiome (RNA + ATAC). 10x Multiome ATAC + Gene Expression (PN-1000283). Nuclei processed once → both libraries.
TEA-seq (transcript + epitope + ATAC). Swanson-Bhardwaj-Slichter-Cole-Greenleaf-Skene 2021 eLife.
DOGMA-seq. Mimitou-Smibert 2021 Nat Biotechnol — transcriptome + epitope + chromatin accessibility.
Methylome-RNA (snmC2T-seq, scNMT). Combined DNA methylation + transcriptome + chromatin accessibility.

Sequencing depth and design

Read allocation

Typical 10x 3’ v3 GEM-X PBMC: 20-30k read pairs per cell → 5-8k genes detected/cell, 15-25k median UMIs/cell. Brain tissue (lower RNA per nucleus): 15-50k reads/nucleus → 1-3k genes/nucleus.

Saturation: sequence until % saturation (1 − unique UMIs / total reads) ~70%; sequencing more deeply increases gene detection sublinearly.

Sample multiplexing

Cell hashing (BioLegend TotalSeq-A/B/C Hashtag). Sample-specific DNA-tagged antibody → pool 6-16 samples per channel → demultiplex by hashtag barcode counts. Avoids batch effects from separate channels.
MULTI-seq (Zheng-McGinnis-Marshall-Gartner 2019 Nat Methods). Lipid-anchored barcode oligos; works on live or fixed cells of any species.
CellPlex (10x). Lipid-anchored CMOs (cell multiplexing oligos); 12-plex.
Demuxlet / souporcell. Genotype-based demultiplexing of pooled-donor samples — leverages natural SNPs.

Doublet detection

DoubletFinder, Scrublet, scDblFinder. Genetic demultiplexing (souporcell) also catches multi-donor doublets directly.

Spatial transcriptomics

Spatially resolved gene expression at the tissue level. Two paradigms:

Sequencing-based (capture-then-NGS)

10x Visium. Tissue placed on slide with 55-μm spots, 100 μm center-to-center, ~5000 spots per capture area. Whole-transcriptome unbiased capture. Resolution multi-cell per spot.
Visium HD (2024). Continuous lawn of 2-μm × 2-μm barcoded squares; analysis binned at 8 μm or 16 μm for cell-comparable resolution; ~11M barcoded squares per capture area.
Slide-seq / Slide-seqV2 (Stickels-Macosko-Chen 2021 Nat Biotechnol). 10-μm barcoded beads on slide.
Stereo-seq (BGI; Chen-Liu 2022 Cell). DNA nanoball spot array at ~500 nm resolution; whole-transcriptome.
Open-ST, Pixel-seq, sci-Space. Academic/open variants.

Imaging-based (single-molecule FISH / multiplexed)

10x Xenium / Xenium 5K (2024). In situ HCR-amplified probe imaging; ~5000 transcripts per panel; subcellular resolution. Pre-designed panels (Multi-Tissue, Brain, Immune).
Vizgen MERSCOPE / MERFISH. Multiplexed error-robust FISH; 500-1000 transcripts; encoding by combinatorial probe rounds.
NanoString CosMx SMI. 1000-plex RNA; protein co-detection (~64 antibodies); fixed slide imaging.
Resolve Bioscience Molecular Cartography. Up to 200 genes.
Akoya PhenoCycler (formerly CODEX). Antibody-based; ~60 proteins; spatial proteomics analogue.
MIBI (multiplexed ion beam imaging; Standard BioTools / Ionpath). Mass-tagged antibody + ion-beam ablation; spatial proteomics at 800 nm.
seqFISH+ (Cai Caltech). 10,000 transcripts; super-resolution imaging.

Spatial + 3D + multiomic

Visium HD + Xenium combined (10x Visium HD on top section + Xenium on adjacent section).
DBiT-seq (Liu-Su-Bao 2020 Cell). Deterministic barcoding by microfluidic channels.
GeoMx DSP (NanoString). Photo-cleavable barcode oligo + UV illumination at chosen ROIs; whole transcriptome from regions.

Analysis software stack

Preprocessing / cell-by-gene matrix construction

Cell Ranger (10x Genomics). Standard for 10x; STAR-based alignment + UMI deduplication + cell-barcode calling.
STARsolo. Free; comparable to Cell Ranger; faster.
alevin-fry (salmon). Lightweight; ~10× faster; ~equivalent accuracy.
kallisto | bustools. Pachter lab; fast.
CITE-seq-Count (BioLegend). ADT/HTO counting from FASTQ.
scNanoGPS, FLAMES. Long-read isoform-aware (Oxford Nanopore / PacBio).

Python / R analysis frameworks

Scanpy (Wolf-Theis 2018 Genome Biol). Python; AnnData-based; dominant academic stack.
Seurat (Hao-Stoeckius-Satija 2024 v5 Nat Biotechnol). R; equally common; v5 supports BPCells on-disk sparse matrices, integration with reference atlases.
scvi-tools (Lopez-Yosef 2018 Nat Methods). Probabilistic deep generative models; scVI, scANVI, totalVI, peakVI, multiVI; jointly handles batch correction + integration + label transfer.
Bioconductor SingleCellExperiment + scran + scater + DropletUtils. Lun-Marioni stack; R.
Monocle 3 (Cao-Trapnell 2019). R; trajectory inference + DE.
PEGASUS (Li-Garmire). Python; very fast for large atlases.

Batch correction / integration

Harmony (Korsunsky-Raychaudhuri 2019 Nat Methods). Soft k-means + Jacobian-balanced loss.
BBKNN (Polanski-Teichmann 2020). Batch-balanced k-NN graph for UMAP.
scVI / scANVI (Lopez-Yosef). Variational autoencoder with batch covariate.
MNN, fastMNN (Haghverdi-Marioni 2018 Nat Biotechnol). Mutual nearest neighbors.
LIGER, rliger (Welch-Macosko 2019 Cell). Integrative NMF; cross-modality integration.
Symphony, scArches. Reference mapping (query → atlas embedding).
PRECAST, scIB benchmarking (Luecken-Theis 2022 Nat Methods) — quantitative comparison.

Clustering, embedding, visualization

UMAP (McInnes-Healy 2018), t-SNE (van der Maaten), PaCMAP, PHATE. Leiden (Traag-Waltman 2019 Sci Rep) > Louvain for community detection on k-NN graph.

Cell type annotation

CellTypist (Domínguez Conde-Teichmann 2022 Science). Logistic-regression classifier trained on Human Cell Atlas reference; >40 tissue models.
Azimuth (Hao-Satija 2021). Reference mapping via PBMC, lung, motor cortex, kidney, fetal references.
SingleR (Aran-Bhattacharya 2019 Nat Immunol). Pearson correlation to bulk references.
scGen, SCimilarity (Heimberg-Genentech 2023 bioRxiv). Foundation-model-based similarity to atlas reference.
scANVI (Xu-Lopez-Yosef). Semi-supervised labels propagation.

Foundation models (scLLM era)

Pretrained transformer encoders on millions of cells; fine-tune downstream:

Geneformer (Theodoris-Ellinor 2023 Nature). ~30M cells; encodes rank-ordered gene expression; performs cell-type, perturbation, drug-response prediction.
scGPT (Cui-Wang 2024 Nat Methods). ~33M cells; autoregressive next-gene; fine-tunes for annotation, perturbation, multi-omic integration.
scFoundation (Hao-Zhang 2024 Nat Methods). 50M cells; read-depth-aware xTrimoGene encoder.
scBERT (Yang-Wang-Song 2022 Nat Mach Intell). BERT-style encoder; performs cell-type label.
UCE (Universal Cell Embedding; Rosen-Quake-Leskovec 2023 bioRxiv). 36M cells across 8 species; protein-language-aware gene representation.
Nicheformer (Schaar-Theis 2024). Spatial + scRNA pretraining for niche prediction.

Critique: 2024 benchmarks (Boiarsky-Lopez 2024; Kedzierska-Pollard 2023) found scLLMs often underperform tuned linear baselines on simple annotation tasks. Performance gains emerge on harder tasks (perturbation prediction, cross-species mapping).

Trajectory inference

Monocle 3. UMAP + principal-graph + pseudotime assignment.
Slingshot (Street-Risso 2018 BMC Genomics). Branching trajectories on cluster graph.
PAGA (Wolf-Theis 2019 Genome Biol). Partition-based abstract graph.
Diffusion maps + Palantir (Setty-Pe’er 2019 Nat Biotechnol). Probabilistic terminal-state assignment.
RNA velocity (La Manno-Linnarsson 2018 Nature). Spliced vs unspliced ratio → future cell-state direction.
scVelo (Bergen-Theis 2020 Nat Biotechnol). Dynamical model; estimates per-gene kinetic rates.
VeloVI, UniTVelo, cellDancer. Newer velocity methods; address scVelo limitations (boundary cells, stationary states).
CellRank (Lange-Theis 2022 Nat Methods). Markov chain on velocity-informed transitions; terminal/initial-state probability.

Differential expression

MAST (Finak-Gottardo 2015 Genome Biol). Hurdle model; zero-inflated.
DESeq2 / edgeR pseudobulk. Aggregate cells per sample-cluster; Wald or quasi-likelihood. Becoming the standard for cross-condition DE — Squair-McLelland-Brisson-Krishnaswamy-Zandstra 2021 Nat Commun showed pseudobulk far better calibrated than single-cell-level Wilcoxon.
Glmm-based methods (NEBULA, dreamlet, miloDE). Mixed-effects model accounting for sample variation.

Compositional / cell abundance analysis

Milo (Dann-Marioni 2022 Nat Biotechnol). Neighborhood-level differential abundance.
scCODA (Büttner-Theis 2021 Nat Commun). Bayesian compositional model.
Cacoa. Cluster-free differential abundance.

Cell type atlases

Human Cell Atlas (HCA)

International consortium (Aviv Regev, Sarah Teichmann founders 2016) building reference atlas of all human cell types. ~50M cells annotated and indexed by 2024. Tissues completed/in-progress: PBMC, bone marrow, lung, gut, kidney, liver, heart, brain, skin, placenta, adipose, etc.

CELLxGENE Census (Chan Zuckerberg Initiative)

Aggregator + query interface for >100M cells from public scRNA-seq datasets. Standardized AnnData schema (HCA Tier 1 metadata). Python/R API (cellxgene_census); harmonized to Ensembl gene IDs; ontology-aligned cell labels (Cell Ontology).

HuBMAP (NIH; Human BioMolecular Atlas Program)

Spatial + single-cell atlas of healthy adult human tissue at sub-cellular resolution; integrates Visium, Xenium, CODEX, MALDI imaging.

Tabula Sapiens (Quake-CZI 2022 Science)

400k cells, ~24 tissues, 16 donors. Hard-labeled by domain experts; gold-standard reference.

Allen Brain Cell Atlas (ABCA, 2023)

~7M cells across mouse + human cortex, striatum, hippocampus. ~5000 cell types with hierarchy.

Tahoe-100M (Vevo Therapeutics + Arc Institute 2024)

100M cells from 1100 cell lines × 1100 drugs = ~1.2M conditions; drug perturbation atlas to enable foundation-model training on chemical responses.

scTab (Theis lab 2024)

~22M curated cells, train/val/test split for ML benchmark.

Other atlases

Allen Brain Atlas (2003); ENCODE (bulk + scATAC); GTEx (bulk); Human Lung Cell Atlas (Sikkema-Theis 2023); Human Gut Cell Atlas (Elmentaite-Teichmann 2021 Nature); Fetal Atlas (Cao-Shendure 2020 Science); Pan-cancer atlas (Sun-Tirosh-Regev).

Single-cell ATAC and multiome

scATAC-seq

Chromatin accessibility per cell. Methods:

10x scATAC v2 / Multiome. Tn5 transposition on nuclei → 10x droplet capture; per-nucleus fragment file.
sci-ATAC-seq (Cusanovich-Shendure 2015 Science). Combinatorial indexing.
dscATAC-seq. Bio-Rad commercial droplet platform.

Analysis: Signac (Stuart-Satija 2021), ArchR (Granja-Greenleaf 2021), SnapATAC2, muon (multimodal integration). Topics from cisTopic, LDA-based topic modeling.

Joint RNA+ATAC multiome

10x Multiome — same nucleus, both modalities. Identify cis-regulatory elements driving expression; chromVAR motif activity; SCENIC+ regulon inference.

Other epigenome single-cell

scCUT&Tag (Bartosovic-Castelo-Branco 2021 Nat Biotechnol). Histone modifications.
CoTECH, Paired-Tag, scNTT-seq. Multiplexed histone marks.
scBisulfite (sc-WGBS, snmC-seq). Methylation.
scNMT-seq (Clark-Stegle-Reik 2018 Nat Comm). Methylation + accessibility + RNA.

Perturbation single-cell

Perturb-seq / CROP-seq / CRISP-seq

CRISPR knockout/i/a library + scRNA-seq → per-cell knockout identity + transcriptome.

Dixit-Regev Perturb-seq (2016 Cell). Original; cytosolic Cas9 + sgRNA-barcode capture.
Datlinger CROP-seq (2017 Nat Methods). sgRNA in 3’ UTR of puro selection cassette → captured by polyA.
Replogle-Weissman 2022 Cell “Mapping information-rich genotype-phenotype landscapes with genome-scale Perturb-seq.” 9867 essential genes knocked down in K562 + RPE1 × 2.5M cells.
Frangieh-Tirosh-Regev 2021 Nat Genet Perturb-CITE-seq. Genome-wide CRISPR + transcriptome + surface proteome (immune-checkpoint genes in tumor).

Perturbation variants

CRISPR activation (CRISPRa) Perturb-seq. dCas9-VP64 + sgRNA → upregulation.
CRISPRi-Perturb-seq. dCas9-KRAB; knockdown.
Spatial Perturb-seq, in vivo Perturb-seq. Adaptation to tissue context (Jin-Sanjana 2020 Nature).
Combinatorial perturbation. Wessels-Sanjana SpliceCRISPR; up to 5-gene combinations.

Chemical perturbation

sci-Plex (Srivatsan-Shendure 2020 Science). Combinatorial-indexed scRNA + drug; 5000 wells × 188 compounds.
MIX-seq, MULTI-seq drug screens.
Tahoe-100M (above).

Prediction models

CPA (Compositional Perturbation Autoencoder; Lotfollahi-Theis 2023). Linear-additive perturbation effect in latent space.
scGen. Vector arithmetic in latent space.
GEARS (Roohani-Leskovec 2024 Nat Biotechnol). Graph-NN; predicts unseen combinations.
CellOT (Bunne-Krause 2023 Nat Methods). Optimal transport-based cell-state shift prediction.

V(D)J repertoire profiling

10x 5’ V(D)J or Parse Evercode TCR.

TCR αβ (TRA/TRB). Recovered from CD3+ T cells; identify clonally expanded populations; pair with transcriptome to classify clone phenotype.
BCR (IGH + IGK/IGL). Mutation analysis for somatic hypermutation, clonal lineage trees (Immcantation, Dowser, scoper, alakazam).
MAIT, NKT, γδ T receptors. Custom primers needed.

Pipeline: Cell Ranger V(D)J → contig assembly → CDR3 annotation → clonotype clustering. Integration: scRepertoire (R), scirpy (Python).

Antigen-specific TCR identification via dCODE / barcoded MHC multimers (10x Single Cell Immune Profiling with Feature Barcoding).

3D genome single-cell

Genome-wide chromatin folding per cell — captures cell-to-cell variability of contact maps that bulk Hi-C blurs.

scHi-C (Ramani-Steemers-Shendure 2017 Nat Methods). Combinatorial-indexed Hi-C.
sci-Hi-C, sn-m3C-seq.
scNanoHi-C (Zhang-Liu 2023). Long-read Nanopore Hi-C.
scSPRITE. Multi-way contact via split-pool barcoding.

Analysis: scHiCExplorer, schicluster (Liu-Jen 2019), Higashi, scA/B-compartment.

Single-nucleus alternatives

For tissues that don’t dissociate well (brain, heart, skeletal muscle, frozen biobanks), use nuclei instead of cells:

snRNA-seq. Drop-seq / 10x of nuclei. ~30-50% nuclear RNA vs cytoplasmic in matched cell — lower depth but accessible to dense tissues and frozen samples.
sNuc-DropSeq, snDrop-seq. Pioneering nuclear methods.
DroNc-seq. Habib-Regev 2017 Nat Methods.
10x Multiome. Nuclei-required (nuclear extraction with 0.1% NP-40 or Triton X-100).

Long-read single-cell

Short-read 3’ v3 misses isoform-level information (loses 5’ splice junctions). Long-read approaches:

ScNapBar / FLAMES. Hybrid short + long read; long reads from same library to inform isoform structure.
PacBio Iso-Seq scRNA. SMRT bell + 10x cell barcode; full-length transcripts.
MAS-Iso-seq (Al’Khafaji-Adiconis-Levin 2024). Concatenated cDNAs in single SMRT-bell; ~16-fold throughput boost.
R2C2 (Volden-Vollmers 2018). Rolling-circle long-read.

Isoform analysis: Sicelore (Lebrigand-Waldmann-Pinello 2020 Nat Comm), Flames (Tian-Ritchie 2021 Nat Methods).

Practical workflows

Standard 10x 3’ v3 GEM-X PBMC

Thaw 5-10 × 10⁶ PBMC; viability >85% (AOPI fluorescent count, Nexcelom Cellometer K2).
Load 16,000-20,000 cells per channel onto Chromium Controller / Chromium X with Next GEM Single Cell 3’ v3.1 reagents (PN-1000128) or GEM-X (PN-1000648). Expected recovery 8000-12,000 cells per channel; ~50-70% capture rate.
GEM-RT 53 °C 45 min → emulsion break → cDNA cleanup (SPRIselect; Beckman Coulter B23318).
cDNA amplification (12-14 cycles); QC on Bioanalyzer HS DNA chip (Agilent 5067-4626); peak ~1500-2000 bp.
Library construction: enzymatic fragmentation + end-repair + A-tail + adapter ligation + sample-index PCR (12-14 cycles).
Pool libraries; sequence on NovaSeq 6000 / NovaSeq X / NextSeq 2000 — 28 R1 (CB + UMI) + 90 R2 (cDNA) + 10 i7 (sample index).
Cell Ranger count → filtered_feature_bc_matrix.h5.
Scanpy QC: pct_mt < 15%, n_genes > 200, n_genes < 6000 (doublet filter); Scrublet doublet score < 0.2.
Normalize (sc.pp.normalize_total → 1e4 + log1p), HVG selection (3000 genes), PCA (50 PC), neighbors, Leiden clustering, UMAP.
CellTypist annotation against Immune_All_High model.

CITE-seq 200-marker panel

Pellet 1-2 × 10⁶ cells in Cell Staining Buffer (BioLegend 420201) + Human TruStain FcX (BioLegend 422301) 10 min RT.
Add TotalSeq-B 200-panel cocktail (BioLegend Universal Cocktail v1.0; PN-99814); 30 min ice, dark.
Wash 3× in CSB; resuspend in 10x master mix at 1500 cells/μL.
Load 30,000 cells onto Chromium 3’ v3.1; capture; build 3’ GEX library + Feature Barcode (TotalSeq-B) library separately (PN-1000079 Feature Barcode kit).
Pool GEX and ADT libraries at ~5:1 (cDNA:ADT) read ratio; sequence to 30k GEX + 5k ADT reads per cell.
Demultiplex with Cell Ranger Multi (handles gene expression + antibody capture + V(D)J in one pipeline).
CLR-normalize ADT counts (within cell across markers); WNN integration (Seurat v4+) of RNA + ADT into joint UMAP.

Perturb-seq genome-scale

Lentiviral package CRISPRi library (e.g., Dolcetto Set A — 57,050 sgRNAs against 19,114 genes; Addgene Pooled #92385) in HEK293T.
Transduce target cells (e.g., K562-dCas9-KRAB) at low MOI 0.3 → mostly single sgRNA per cell; puromycin select 7-10 d.
Pool selected cells; load onto 10x 3’ v3 or Parse Evercode; capture both transcriptome + sgRNA cassette (via direct-capture sgRNA scaffold or CROP-seq-style polyA).
Cell Ranger count with feature-barcoding for sgRNA.
Replogle-style analysis: aggregate cells per sgRNA → pseudobulk DE vs non-targeting controls → energy distance + cluster perturbations by phenotype.

Common pitfalls

Ambient RNA contamination. Cell-free RNA in suspension → falsely assigned to droplets. Correct with SoupX (Young-Behjati 2020 GigaScience), CellBender (Fleming-Babadi 2023 Nat Methods), or DecontX.
Doublets. Real biological cells co-encapsulated. ~0.8% / 1000 cells loaded for 10x. Filter with Scrublet, DoubletFinder; for pooled-donor samples, souporcell automatically calls inter-donor doublets.
Dropout / sparsity. 80-95% zero entries in matrix; many “zeros” reflect dropout, not biology. Use methods designed for sparse data (scVI, MAST, pseudobulk) rather than t-test on raw counts.
Batch effects. Sample-of-origin, processing date, lot of beads, sequencer flow cell. Confound with biology if not randomized. Include hashtag-based multiplexing to merge donors per channel.
Mitochondrial fraction. High % mt indicates dying / stressed cells. Threshold typically 10-20% (lower for nuclei <5%); brain neurons may legitimately have higher mt %.
Cell cycle. S/G2M genes dominate variance — regress out (Seurat ScaleData regress.vars; scanpy sc.pp.regress_out) or score with cell-cycle gene sets (Tirosh 2016 Science) and use as covariate.
Sample size for DE. Single-cell pseudobulk requires ≥3 donors per condition; cluster-level cell-by-cell tests are p-hacked without donor-level variance.

Statistical caveats

Compositional bias

scRNA-seq counts are inherently compositional — total RNA per cell normalizes to 1 (in CPM/CP10k). Treatments shifting one cell type’s abundance shift others numerically without biological change. Use Milo, scCODA, or compositional regression (Aitchison transform) for abundance comparisons.

Pseudoreplication

Treating cells as independent samples inflates Type I error 100-1000×. Always aggregate to donor / replicate level for cross-condition inference. Hierarchical models (NEBULA, dreamlet, miloDE) handle nested structure.

Batch confounded with biology

Patient-stratified samples processed on different days → batch corrects but loses signal. Randomize across batches.

Annotation circularity

If clusters drive cell-type labels and then DE per cell type → trivial DE. Use reference-based label transfer rather than de novo clustering for downstream condition comparison.

Hardware and reagent vendors

Instruments

10x Genomics Chromium Controller / Chromium X / Chromium iX. ~$120-300k.
BD Rhapsody Express / HT. ~$200-400k.
Parse Biosciences. No instrument — pipettes + centrifuge only.
Honeycomb HIVE. Single-use device; ~$200-500/sample.
MissionBio Tapestri. Single-cell DNA + protein for clonal evolution in leukemia.
Vizgen MERSCOPE, NanoString CosMx, 10x Xenium Analyzer. Spatial imaging instruments $300k-$1M.

Reagents

TotalSeq antibodies (BioLegend), AbSeq (BD), BD Stain Buffer, eBioscience FoxP3 staining kit (intracellular), Cell Staining Buffer (BioLegend 420201), Fc blockers (Human TruStain FcX 422301, Mouse TruStain fcX PLUS 156604).
10x Single Cell 3’ v3.1 (PN-1000128), GEM-X 3’ (PN-1000648), Multiome ATAC + GEX (PN-1000283), V(D)J Human B/T (PN-1000005).
Parse Evercode WT Mega (1M cells, 96 samples).

Sequencer requirements

Illumina NovaSeq 6000 / NovaSeq X / NextSeq 2000. S2/S4 flow cells; ~3-20B reads.
Element AVITI. Cheaper per Gb; gaining single-cell market share.
MGI DNBSEQ-T7. China-dominant; price-competitive.

Cost economics (2024-2026)

10x 3’ v3.1 per cell (chemistry only, 5k reads/cell): ~$0.40-0.60.
GEM-X higher throughput: ~$0.10-0.25/cell.
Parse Evercode WT Mega (1M cells, 96 samples): ~$0.05-0.10/cell.
Sequencing (NovaSeq X 10B reads, 20k reads/cell): ~$0.05-0.10/cell.
Total per 10k-cell 10x experiment: $4-8k all-in (chemistry + sequencing + labor).
Spatial Visium HD: ~$3-5k per capture area.
Xenium 5K: ~$2-4k per tissue section.

Sample cost a decade ago (~2015 Drop-seq): $0.05-0.10/cell consumable, $1-3/cell sequencing — sequencing cost has dropped ~30× since.

Single-cell proteomics adjacent to scRNA-seq

CITE-seq (above) reads ~100-300 surface markers via DNA-tagged antibodies; bridges to MS-based single-cell proteomics (proteomics-and-mass-spec-deep single-cell section). Olink Explore and SomaScan now operate at “single-tissue” plasma level but not yet single-cell.

Quantum-Si Platinum single-molecule protein sequencer (2022 launch) reaches ~30 proteins/cell — far below scRNA-seq’s 1000s but at a fraction of the cost. Nautilus Biotechnology and Erisyon competing.

Bridging scRNA + protein: SCoPE-MS / SCoPE2 (Slavov), nanoPOTS (PNNL), plexDIA single-cell DIA — covered in proteomics deep note.

Single-cell ChIP / cut-and-tag / footprinting

Beyond ATAC-seq, profile specific histone modifications + transcription factor binding per cell.

scCUT&Tag (Bartosovic-Castelo-Branco 2021 Nat Biotechnol). Henikoff CUT&Tag adapted to droplet/combinatorial; profiles H3K27me3, H3K27ac, H3K4me3.
CoTECH, Paired-Tag (Hao-Wen Zhou; 2021 Nat Methods). Multiple histone marks per cell.
scNTT-seq (nano-CUT&Tag). Lower input.
scCUT&RUN. Skene lab variant.
scDNase-seq, scMNase-seq. Nucleosome positioning.
scACT-seq. Acetylation patterns.

Software: chromVAR (motif activity from scATAC), SCENIC+ (combined RNA + ATAC → regulon inference), Pando, FigR (Granja et al.).

Common species and references

Human. GRCh38 / hg38 (primary) + GENCODE v45 annotation. Ensembl IDs preferred over symbols for stability.
Mouse. GRCm39 / mm39 + GENCODE M34.
Rat. mRatBN7.2.
Zebrafish. GRCz11.
Drosophila melanogaster. BDGP6.46.
C. elegans. WBcel235 + WormBase WS294.
S. cerevisiae. R64-1-1.
Arabidopsis thaliana. TAIR10.
Macaca mulatta (rhesus macaque). Mmul_10.

Cell Ranger references: refdata-gex-GRCh38-2024-A, refdata-gex-GRCm39-2024-A, etc. Pre-built at 10x; STARsolo can use custom references.

Single-cell genomics in clinical context

Cancer

Pan-cancer scRNA atlases. Tirosh-Regev melanoma (2016 Science), Sun-Tirosh-Regev pan-cancer (2024).
Tumor microenvironment characterization. T-cell exhaustion states, TAM polarization, CAF subtypes.
Resistance mechanism mapping. Pre-treatment vs post-treatment scRNA → cell state transitions.

Inflammatory / autoimmune

Lupus PBMC atlas (CZI / Accelerating Medicines Partnership AMP-AIM).
Rheumatoid arthritis synovium (Zhang-Brenner-Raychaudhuri 2019 Nat Immunol).
IBD gut atlas (Smillie-Sankaran-Regev 2019 Cell).

Immune monitoring in trials

scRNA / CITE-seq baseline + on-treatment profiling in CAR-T (FLAIR — phenotype of infusion product, ZUMA-1 — pre-infusion T cell composition predicts response), checkpoint inhibitors (CD8+ T-cell stemness predicts pembrolizumab response).

Single-cell as companion diagnostic

Not yet — scRNA-seq is far from CLIA-validated diagnostic use. Bulk RNA-seq + IHC remain CDx standards. Single-cell pathology emerging via Xenium / CosMx / PhenoCycler in research-pathology settings.

Compendium

Explorer

Single-Cell Genomics — Deep Reference

Single-Cell Genomics — Deep Reference

See also

Platform overview

Droplet microfluidics

Combinatorial indexing (split-pool)

In situ / spatial-resolved

Library structure summary

Sequencing depth and design

Read allocation

Sample multiplexing

Doublet detection

Spatial transcriptomics

Sequencing-based (capture-then-NGS)

Imaging-based (single-molecule FISH / multiplexed)

Spatial + 3D + multiomic

Analysis software stack

Preprocessing / cell-by-gene matrix construction

Python / R analysis frameworks

Batch correction / integration

Clustering, embedding, visualization

Cell type annotation

Foundation models (scLLM era)

Trajectory inference

Differential expression

Compositional / cell abundance analysis

Cell type atlases

Human Cell Atlas (HCA)

CELLxGENE Census (Chan Zuckerberg Initiative)

HuBMAP (NIH; Human BioMolecular Atlas Program)

Tabula Sapiens (Quake-CZI 2022 Science)

Allen Brain Cell Atlas (ABCA, 2023)

Tahoe-100M (Vevo Therapeutics + Arc Institute 2024)

scTab (Theis lab 2024)

Other atlases

Single-cell ATAC and multiome

scATAC-seq

Joint RNA+ATAC multiome

Other epigenome single-cell

Perturbation single-cell

Perturb-seq / CROP-seq / CRISP-seq

Perturbation variants

Chemical perturbation

Prediction models

V(D)J repertoire profiling

3D genome single-cell

Single-nucleus alternatives

Long-read single-cell

Practical workflows

Standard 10x 3’ v3 GEM-X PBMC

CITE-seq 200-marker panel

Perturb-seq genome-scale

Common pitfalls

Statistical caveats

Compositional bias

Pseudoreplication

Batch confounded with biology

Annotation circularity

Hardware and reagent vendors

Instruments

Reagents

Sequencer requirements

Cost economics (2024-2026)

Single-cell proteomics adjacent to scRNA-seq

Single-cell ChIP / cut-and-tag / footprinting

Common species and references

Single-cell genomics in clinical context

Cancer

Inflammatory / autoimmune

Immune monitoring in trials

Single-cell as companion diagnostic

Adjacent

Further reading

Graph View

Table of Contents

Backlinks