AI & Machine Learning for Climate

Climate is among the most data-rich and simulation-heavy problems in science. Petabyte-scale satellite archives, hundreds of in-situ networks (weather stations, ocean buoys, eddy-covariance towers, ice cores), reanalysis products such as ERA5 (~5 PB at 31 km / 137 vertical levels, hourly back to 1940), and dozens of CMIP6 general circulation models (GCMs) running at ~25-100 km horizontal resolution generate signals that no human analyst can fully exploit. Machine learning offers four broad leverage points: (1) accelerating physics-based simulation by 10^3 to 10^5 times via emulation, (2) improving forecast skill by learning patterns directly from data, (3) automating planetary-scale monitoring from remote-sensing imagery, and (4) enabling entirely new measurement modalities (e.g. methane plume retrieval from hyperspectral cubes).

The canonical reference is Rolnick et al 2019 — “Tackling Climate Change with Machine Learning” (arXiv:1906.05433, updated for Communications of the ACM 2022, 65(2):96-106). The 97-page survey was co-authored by 22 researchers including David Rolnick (Mila), Priya Donti (MIT), Lynn Kaack (Hertie School), Yoshua Bengio (Mila/Université de Montréal), Andrew Ng (Stanford), Demis Hassabis (DeepMind), and John Platt (Google). It maps ML interventions across 13 sectors — electricity systems, transportation, buildings, industry, farms & forests, CO2 removal, climate prediction, societal impacts, solar geoengineering, individual action, collective decisions, education, finance — and remains the field’s organizing taxonomy. It also seeded the Climate Change AI (CCAI) non-profit, which now runs NeurIPS/ICLR workshops and the CCAI Innovation Grants programme.

This note inventories where ML has produced measured operational gains (weather forecasting, EO monitoring, materials discovery), where it shows credible promise (climate-model emulation, plasma control, grid optimization), and where it remains overclaimed (carbon-credit verification, geoengineering optimization, climate “GPT” chatbots). For deeper subject-specific context see physical-climate-system, carbon-cycle-and-greenhouse-gases, climate-impacts-and-adaptation, and climate-mitigation-and-adaptation.

Why climate is an ML-friendly domain

Several structural features make climate an unusually tractable target for modern deep learning:

Translation-equivariant spatial fields. The fundamental state of the atmosphere, ocean, and surface is a multi-channel raster (or graph on a sphere). Convolutional, transformer, and graph-neural-net architectures all map naturally onto this geometry. The same backbone that achieves SOTA on ImageNet (~10^7 labelled images) can be repurposed for ERA5 (~10^11 grid-point-hours of physical observation).
Abundant labels through physics. Unlike vision tasks dependent on human annotation, climate datasets are self-labelling — every grid cell at every timestep is a vector observation tied to physical laws. Reanalyses such as ERA5 effectively pre-package decades of consistent labels.
Cheap simulators. Even imperfect climate models can generate effectively unlimited training data for emulators. This sidesteps the data-hunger problem facing most scientific ML.
Operationally measurable metrics. Z500 ACC, CRPS, RMSE on temperature, Brier score on precipitation — the community already has decades of consensus scoring that allows like-for-like benchmarking between physics-based and ML-based forecasts.
High economic stakes. Energy markets, insurance, agriculture, aviation, marine, defense — each have multi-billion-USD willingness-to-pay for incremental forecast skill. This funds compute and operational integration in ways pure-research benchmarks do not.

1. Weather forecasting — the headline domain

1.1 Numerical weather prediction (NWP) baseline

Operational forecasting since the 1950s has relied on numerical integration of the primitive equations on a rotating sphere. Modern global NWP centers and their flagship models:

ECMWF IFS (Integrated Forecasting System) — European Centre for Medium-Range Weather Forecasts, Reading UK. Current operational resolution since 2023: TCo1279 (~9 km) horizontal, 137 vertical levels, 51-member ensemble (EPS) at TCo319 (~36 km). Considered the world’s most skillful deterministic global model. 4D-Var data assimilation window 12 h.
GFS (Global Forecast System) — NCEP/NOAA US. ~13 km horizontal, 127 vertical levels. Hybrid 4D-EnVar assimilation.
ICON (Icosahedral Nonhydrostatic) — DWD Germany + MPI-M. 13 km global, 6.5 km Europe.
UM (Unified Model) — UK Met Office. 10 km global.
GSM — JMA Japan. 13 km.
KIM (Korean Integrated Model) — KMA Korea. 12 km.
GEM — Environment and Climate Change Canada.

A single 10-day deterministic IFS forecast takes about an hour on Cray XC40 / Atos BullSequana XH2000 hardware drawing ~10 MW. The 51-member ensemble at lower resolution takes comparable wall-clock time but consumes most of the supercomputer.

Forecast skill is measured by anomaly correlation coefficient (ACC) at 500 hPa geopotential height (Z500) — the historical “useful forecast” threshold of ACC ≥ 0.6 had crept from day 5 in 1980 to day 9 by 2020 thanks to model resolution, better physics, and improved data assimilation (the so-called “quiet revolution”; see Bauer-Thorpe-Brunet 2015 Nature 525:47-55).

Data assimilation is the half of the NWP problem most people underestimate. ECMWF ingests ~60 M observations per 12 h cycle from ~100 satellite instruments, ~10 000 surface stations, ~1 000 radiosondes, ~3 000 aircraft AMDAR reports, ~5 000 buoys, and dozens of GNSS-RO occultation streams. The 4D-Var minimization that produces the analysis is itself a ~10^8-dimensional inverse problem — and is currently the most expensive single step in the operational workflow. ML-based observation operators (e.g. learned radiative transfer surrogates) are a growing research area, as are end-to-end differentiable assimilation pipelines (Frerix, Kochkov, Smith 2021; Manshausen et al 2024).

1.2 The 2022-2025 ML weather paradigm shift

Within four years a sequence of deep-learning models matched, then exceeded, the best NWP on many metrics — running in 60-300 seconds on a single GPU rather than hours on a supercomputer. They are trained on ERA5 reanalysis (Hersbach et al 2020 QJRMS) at 0.25° / 6 h resolution.

FourCastNet — Pathak, Subramanian, Harrington et al, NVIDIA + LBNL + Caltech, arXiv:2202.11214 (Feb 2022). Uses an Adaptive Fourier Neural Operator (AFNO) backbone. 0.25° resolution, 6-hour steps, 20 atmospheric variables. First credible demonstration that a pure-ML model could approach IFS skill on Z500 and U10 (10 m wind speed). Inference time ~7 s per 24 h forecast.
Pangu-Weather — Bi, Xie, Zhang, Wang, Tian (Huawei Cloud), Nature 619:533-538 (Jul 2023). 3D Earth-Specific Transformer with hierarchical time aggregation (1 h, 3 h, 6 h, 24 h models combined greedily). First ML model to outperform IFS on most upper-air variables at 3-7 day lead. Trained on 39 years of ERA5; inference 1.4 s per 24 h forecast on a single A100.
GraphCast — Lam, Sanchez-Gonzalez, Willson et al, DeepMind, Science 382:1416-1421 (Dec 2023). 36.7 M-parameter graph neural network on a multi-mesh icosahedral grid; 0.25° + 6 h steps; outperforms HRES on 90% of 1380 variable-level-leadtime combinations. Forecast generation < 60 s on a single Cloud TPU v4. Open-source weights released.
AIFS (Artificial Intelligence Forecasting System) — ECMWF in-house ML model. Pre-operational 2024; full operational status announced 2025. Graph-transformer hybrid trained on ECMWF’s own analysis data; intended to run alongside IFS as the centre’s primary medium-range product.
FuXi — Chen, Han, Han et al, Fudan University + Shanghai Academy of AI for Science, NPJ Climate and Atmospheric Science 6:190 (Nov 2023). Cascade of three U-Transformer modules for short / medium / long-range. Skillful 15-day forecasts.
FengWu — Chen, Zhong, Tang et al, Shanghai AI Lab, arXiv:2304.02948 (Apr 2023). Multi-modal multi-task transformer; first ML model to push skillful forecasts beyond 10 days on Z500.
NeuralGCM — Kochkov, Yuval, Langmore et al, DeepMind + Google Research, Nature 632:1060-1066 (Aug 2024). Hybrid differentiable atmosphere: a learned ML correction is nested inside a JAX-based dynamical core. Runs 1.4° ensembles thousands of times faster than X-SHiELD while matching CMIP6 climatologies — the first ML model with credible long-run (multi-decade) stability.
GenCast — Price, Sanchez-Gonzalez, Alet et al, DeepMind, Nature 637:84-90 (Jan 2025). Diffusion-based generative model for probabilistic ensemble forecasting; outperforms ECMWF’s ENS on 97% of evaluated metrics at 1-15 day leads.
Aurora — Bodnar, Bruinsma, Lucic et al, Microsoft Research AI for Science, arXiv:2405.13063 (May 2024). 1.3 B-parameter foundation model for atmospheric prediction, pre-trained on multiple datasets (ERA5, CMIP6, GFS analysis), fine-tuned for tasks including air-quality (CAMS), ocean wave (HRES-WAM), tropical cyclone tracking. Reached operational consideration at multiple national met services by 2025.
AlphaEarth Foundations — Google DeepMind 2024, multi-modal foundation embeddings for the full Earth surface (Sentinel-2, Sentinel-1, Landsat, climate data, elevation). 64-dim per ~10 m grid cell; designed as a feature library for downstream classifiers.
Prithvi-WxC — IBM + NASA 2024, weather + climate variant of Prithvi family at 2.3 B parameters; targets nonhydrostatic km-scale prediction tasks.
StormCast — Pathak, Hassanzadeh, Pritchard et al, NVIDIA + Lawrence Berkeley + UW, arXiv:2408.10958 (Aug 2024). Generative diffusion model for km-scale convection-permitting regional forecasts over the US Great Plains.
MetNet-3 — Andrychowicz, Espeholt, Li et al, Google Research, arXiv:2306.06079 (Jun 2023). U-Net + transformer hybrid; high-resolution (4 km, 2 min) precipitation + surface variable nowcasting up to 24 h in CONUS; deployed in Google Search weather snippet 2023.
DGMR (Deep Generative Model of Radar) — Ravuri, Lenc, Willson et al, DeepMind + Met Office, Nature 597:672-677 (Sep 2021). Generative adversarial network for 90-min radar nowcasting; rated higher than competing methods by 56 Met Office expert forecasters in head-to-head evaluation.
NowcastNet — Zhang, Long, Chen et al, Tsinghua + DeepMind, Nature 619:526-532 (Jul 2023). Physics-conditional generative model for short-fuse extreme precipitation nowcasting in China.

Quality control and benchmarking

A growing infrastructure of standardized benchmarks tries to keep the ML-weather race honest:

WeatherBench 1.0/2 — Rasp, Hoyer, Merose, Langmore et al 2024 JAMES — common evaluation harness on ERA5 0.25° / 1.5° with held-out 2018-2022 verification. WB2 added probabilistic scoring (CRPS, spread-skill ratio) and tropical-cyclone metrics.
ClimateBench (Watson-Parris et al 2022) — emulation benchmark.
ChaosBench (Nathaniel et al 2024) — subseasonal-to-seasonal evaluation.
ExtremeWeatherBench — community effort 2024-2025.

Architectural taxonomy

The leading ML weather models cluster into a handful of architectural families, with consequences for accuracy, calibration, and operationalization:

Fourier neural operators (FNO / AFNO) — FourCastNet, used because the spherical harmonics + Fourier mixing layers match the natural eigenbasis of the rotating-atmosphere PDE. Cheap, parameter-efficient, but tends to over-smooth small-scale features.
Vision transformers (ViT) on lat-lon grids — Pangu-Weather, Aurora. Tile the globe into patches and apply attention; Pangu’s earth-specific positional bias accounts for the spherical geometry. Strong on synoptic-scale variables; computationally heavier than FNO.
Graph neural networks (GNN) on icosahedral meshes — GraphCast, GenCast, MetNet-3 backbone. Multi-scale message passing on a refined icosahedron handles the pole singularity gracefully and represents long-range teleconnections as a multi-hop graph. Architectural complexity is higher; training is more expensive but inference is fast.
Diffusion models — GenCast, StormCast, NowcastNet. Produce calibrated probabilistic ensembles by iterative denoising; resolve the over-smoothing problem of deterministic-MSE models. Inference is slower per sample (10-50 denoising steps) but ensemble cost is naturally tractable.
Hybrid differentiable physics — NeuralGCM. Couples a JAX dynamical core with learned subgrid corrections; the only family demonstrating multi-decade climate stability so far.
Foundation models — Aurora, Prithvi-WxC. Pre-train on massive multi-source data; fine-tune to specific downstream tasks (TC tracks, air quality, sea ice).
Latent-space rollouts — newer designs (Aurora 2.0, Stormer, ClimaX 2024) push the forward integration into a learned latent rather than physical space, then decode only when needed. Cheaper, but adds a layer of opacity.

1.3 Strengths and limits

ML models are cheap at inference: ~1-60 s per 10-day forecast on a single GPU, versus ~1 h on a Tier-1 supercomputer for IFS. This unlocks ensemble sizes O(10^3) which were previously infeasible — directly relevant to extreme-event probability estimation. However:

Models are trained on ERA5 and inherit its biases and resolution. Sub-25 km convective dynamics are absent.
Out-of-distribution extremes are a known weakness. A 2024 storm or heatwave outside the training envelope is not guaranteed to be captured (see Charlton-Perez et al 2024 npj Climate and Atmospheric Science 7:93 on Pangu-Weather skill during Storm Ciaran).
ML forecasts can be over-smooth; calibration of probabilistic fields requires post-processing (CRPS, rank histograms).
Operationalization (24/7 production runs, version control, regulatory acceptance for aviation/marine warnings) is still emerging. ECMWF AIFS, NOAA’s Graph-EFS exploration, and DWD’s preliminary integration are the bellwethers.

2. Climate-model emulation

Climate (decadal-to-centennial) prediction differs from weather (sub-2-week): boundary-value problem rather than initial-value, ensemble-of-opportunities across CMIP6, and forced response (CO2, aerosols) dominates internal variability.

GCM emulators — Watson-Parris, Rao, Olivié et al 2022 J. Adv. Mod. Earth Sys. 14:e2021MS002954 published ClimateBench v1.0, a benchmark for emulating CMIP6 responses under different SSP scenarios. Beucler, Pritchard, Rasp et al 2024 follow-up. Mansfield et al PNAS 2020 117:23195 — climate-pattern scaling with Gaussian Process emulation.
Subgrid parameterization — Convection, clouds, turbulence, and boundary-layer fluxes operate below the GCM grid (typically <25 km). Rasp, Pritchard, Gentine 2018 PNAS 115:9684-9689 trained a neural net to replace convection in NCAR CAM aquaplanet — first end-to-end demonstration of stable ML subgrid replacement. Yuval-O’Gorman 2020 Nature Communications 11:3295 extended to non-aquaplanet. Brenowitz-Bretherton 2018 GRL 45:6289 — superparameterization-trained NN. Gentine et al 2018, Bretherton et al 2022, Beucler et al 2021 PRL 126:098302 (physics-constrained NN parameterizations).
CliMA — Climate Modeling Alliance — Schneider, Pritchard, Stuart, founded 2018 (Caltech + MIT + JPL + Naval Postgraduate School). Building a next-generation Earth system model in Julia: ClimaAtmos, ClimaLand, ClimaOcean, with online learning + uncertainty quantification + ML-aided closures. Ott, Pressel, Cleary et al CalibrateEmulateSample (CES) framework.
ESMs being ML-augmented: ICON-A (DWD / MPI-M), CESM2 (NCAR), E3SM (DOE), GFDL ESM4 (NOAA Princeton), IPSL-CM (France), HadGEM3 (Met Office), MPI-ESM, NorESM, CMCC.

Stability and constraint enforcement

A persistent issue with ML emulators of long-time-horizon dynamical systems is drift: small biases compound across many forward steps, eventually producing nonphysical states (negative humidity, supersonic winds, energy nonconservation). Mitigations include:

Physics-constrained architectures — embed conservation laws as hard constraints (e.g. Beucler et al PRL 2021); use divergence-free vector field parameterizations; soft penalties on global mean energy/mass.
Hybrid differentiable models — keep a slow, accurate dynamical core (NeuralGCM, CliMA’s CalibrateEmulateSample) and only learn the fast subgrid corrections.
Spectral nudging / climatology anchoring — periodically pull the emulator back toward observed climatological statistics.
Ensembles and noise injection — stochastic perturbations during training break overconfident single-trajectory failure modes (key to GenCast and NeuralGCM ensembles).

See physical-climate-system for the dynamical-systems foundations these models discretize, and numerical methods and pdes for the discretization theory the hybrid models rely on.

3. Remote sensing and Earth observation

Satellite Earth observation has become the central nervous system of climate monitoring. Volumes are staggering: Sentinel-1/2/3 alone produce >12 TB/day of open data; Landsat archive >20 PB; commercial PlanetScope ~3.5 million images/day.

3.1 Satellite missions of climate relevance

Optical / multispectral: Landsat 1 (1972, NASA/USGS) through Landsat 9 (2021), 30 m resolution, 16-day revisit; Sentinel-2A/B/C (ESA Copernicus, 2015-2024, 10-60 m, 5-day revisit); MODIS Terra (1999) / Aqua (2002) — 250-1000 m, daily; VIIRS on Suomi NPP (2011) + NOAA-20/21 (2017/2022); commercial PlanetScope, BlackSky, Maxar WorldView-3/4, Airbus Pleiades Neo (30 cm).
Geostationary: GOES-R series (NOAA GOES-16/17/18/19); Himawari-8/9 (JMA Japan); Meteosat Second/Third Generation (EUMETSAT); GeoKompsat-2A (Korea); Fengyun-4 (China).
Microwave / radar: Sentinel-1A/C SAR; GPM Core Observatory (NASA JAXA 2014 — global precipitation); SMAP (NASA 2015 — soil moisture); CYGNSS (NASA 2016 — ocean wind).
Gravity & altimetry: GRACE-FO (NASA-DLR 2018 — successor to GRACE 2002-2017, monthly groundwater + ice-mass anomalies); ICESat-2 (NASA 2018 — 0.7 m laser altimetry of ice, vegetation, oceans).
Atmospheric composition: OCO-2/3 (NASA — CO2 columns); TROPOMI on Sentinel-5P (ESA 2017 — exposed Turkmenistan, Texas Permian, Russian methane super-emitters); GOSAT-1/2 (JAXA — CH4 + CO2); MERLIN (CNES-DLR, lidar methane planned).
Hyperspectral: EnMAP (DLR Germany 2022); PRISMA (ASI Italy 2019); EMIT (NASA JPL 2022, on ISS — Earth Surface Mineral Dust Source Investigation, repurposed for methane plume detection); MethaneSAT (Environmental Defense Fund + Harvard + SmithsonianA, launched Mar 2024 — high-precision methane mapping for oil & gas regions); Carbon Mapper (Planet Labs + JPL + state of California, launching Tanager-1 in 2024); GHGSat commercial constellation.
Open-data foundations: Copernicus Open Access Hub, USGS EarthExplorer, NASA Earthdata, Microsoft Planetary Computer (PB-scale STAC catalog).

3.2 ML applications across EO

Deforestation alerts — Global Forest Watch (WRI 2014) with GLAD alerts (Hansen, Tyukavina, Potapov et al, University of Maryland Global Land Analysis & Discovery Lab) — weekly 30 m Landsat detections, plus RADD alerts (Reiche, Wageningen) Sentinel-1 radar-based, fortnightly across humid tropics.
Land-use classification — Dynamic World (Google + WRI 2022, near-real-time 10 m global land cover); Esri Living Atlas Land Cover (Karra et al, Impact Observatory + Esri); ESA WorldCover 10 m 2020/2021.
Crop yield prediction — Climate Corp (Bayer, FieldView platform), Indigo Ag, Granular (Corteva); academic — You, Li, Low, Ermon (Stanford 2017 AAAI), Wang et al, M3 prediction systems used by USDA NASS via Mountain View Research.
Wildfire detection — NASA FIRMS (Fire Information for Resource Management System, MODIS+VIIRS active-fire pixels); Pano AI (panoramic mountaintop cameras + computer vision, 1500+ camera sites); ALERTCalifornia (UCSD + Cal Fire — 1100 camera network with ML classifier).
Methane plume detection — GHGSat (Montréal, 12-satellite constellation); MethaneSAT; Kayrros (Paris — fuses Sentinel-5P, Sentinel-2, GHGSat); SRON / TROPOMI super-emitter catalog (Pandey, Lauvaux, Sherwin et al). Identified ~3000 high-emission events globally in 2023 alone.
Coastal change and bathymetry — Allen Coral Atlas (Vulcan + Planet 2020, global reef maps), Coast Train USGS, ICESat-2 bathymetric lidar inversions (Parrish et al 2019); shoreline-change ML monitoring used by NOAA Office for Coastal Management.
Glacier and ice-sheet monitoring — NSIDC MEaSUREs ice-velocity (Joughin, Smith et al), ITS_LIVE (NASA JPL — automated time-series of glacier surface velocities globally), Calving front detection (Drews, Greene et al 2024). Sentinel-1 InSAR enables sub-cm vertical-motion detection.
Air quality — fine-scale PM2.5 and NO2 mapping at urban resolution via TROPOMI + ground monitors + ML; CMU OpenAQ pipeline; Aclima vehicle-mounted sensors + Google Street View; PurpleAir crowd-sourced sensor calibration via ML.
Biodiversity proxies — eBird + iNaturalist + Macaulay Library acoustic recordings; BirdNET (Cornell Lab), Wildbook for individual animal re-ID, MERLIN field guide.
Oil-tank fill estimation — Orbital Insight (founded 2013; floating-roof tank shadow analysis on Maxar / Airbus imagery as proxy for global crude stockpiles).
Insurance & risk: First Street Foundation Risk Factor flood/fire models; Jupiter Intelligence ClimateScore; Tomorrow.io.

3.3 Foundation models for Earth observation

A 2023-2024 wave of pre-trained vision foundation models specific to EO:

Prithvi-100M — NASA-IBM, Aug 2023. Vision transformer pre-trained on HLS (Harmonized Landsat Sentinel-2) imagery; tasks include flood mapping, multi-temporal cloud removal, burn scars. Released open-weights on Hugging Face.
SatlasPretrain — AI2 Allen Institute (Bastani, Wolters et al 2023); 856k Sentinel + NAIP labels across 137 tasks.
SkyScript — Wang et al — multimodal vision-language pretraining for satellite imagery.
USat / SeasonalContrast / GeoBench (Lacoste et al 2023 NeurIPS) — comparable benchmarks.
Clay Foundation Model — Clay project 2024, open licensed, billion-parameter geospatial encoder.
TerraMind — IBM follow-on 2024 with broader sensor coverage.

3.4 Sensor fusion and downstream products

The most impactful EO ML deployments are not single-sensor classifiers but multi-sensor data-fusion pipelines:

Methane attribution — combining TROPOMI (daily global but ~7 km), Sentinel-2 (10-20 m but reflectance-only), GHGSat / EMIT / MethaneSAT (high-precision targeted), wind reanalysis (HRRR / ECMWF), and infrastructure databases (Global Oil & Gas Infrastructure Map, Climate TRACE). End-to-end pipelines (Kayrros, GHGSat Operations Center) localize super-emitter events down to facility level within hours.
Fire weather — fusing geostationary thermal (GOES-R ABI, Himawari AHI), low-orbit fire products (VIIRS I-band, MODIS), fuels maps (LANDFIRE), and ML-downscaled mesoscale weather (HRRR + ML smoke-transport) for rapid initial-attack decision support (e.g. Technosylva wildfire analyst platform used by California IOUs).
Coastal hazard — combining Sentinel-1 SAR interferometry (subsidence, ground motion), tide gauges, GNSS ground stations, ICESat-2 lidar elevation, and storm-surge models for sea-level-rise-aware infrastructure risk assessment.

The pattern is consistent: ML provides the “glue” that makes heterogeneous sensors interoperable, while the underlying physics (radiative transfer, atmospheric chemistry, hydrology) provides the structure that prevents the ML from being a black box.

4. Energy systems

See electricity markets and grid economics for the market structure these systems plug into.

4.1 Renewable forecasting

Solar irradiance — OpenClimateFix (London non-profit, founded Jack Kelly 2019) Quartz Solar model, used by UK National Grid ESO; Cohere + ECMWF radiation products; Solcast (DNV) commercial nowcasting.
Wind power — DeepMind + Google Cloud 2019 LinkedIn post on Google Wind portfolio: 36-hour-ahead forecasts boosted commercial value of wind energy ~20%. NCAR WPP (Wind Power Prediction), Vortex Iberia, DNV WindFarmer.
Grid load forecasting — ISO/RTO-level Neural Prophet / N-BEATS variants; Google + Eskom Africa.

4.2 Battery management

State-of-charge / state-of-health — Severson, Attia, Jin et al, Nature Energy 4:383-391 (May 2019), “Data-driven prediction of battery cycle life before capacity degradation” — Stanford SLAC + Toyota Research Institute. Used Bayesian + early-life feature extraction to predict cycle life of LFP cells from first 100 cycles within 9.1% error.
Closed-loop battery testing — Attia, Grover, Jin et al, Nature 578:397-402 (Feb 2020), “Closed-loop optimization of fast-charging protocols for batteries with machine learning.”
Battery digital twins — Volta Foundation, Voltaiq (commercial), Twaice (Munich, acquired by ACCURE 2024).
Battery chemistry exploration — Aionics (Stanford spinout, generative chemistry for electrolytes), Chemix (Chempix), Wildcat Discovery Technologies.
Lithium extraction — EnergyX, Lilac Solutions, Vulcan Energy (DLE direct lithium extraction), with ML-optimized adsorbent/ion-exchange media discovery (cf. mining and mineral processing for downstream context).
Second-life batteries — ReJoule, Connected Energy, Moment Energy use ML state-of-health screening to grade retired EV packs for stationary storage.

4.2.1 EV charging optimization

WeaveGrid (San Francisco) — utility-side EV managed-charging platform; ML for charging-curve forecasting + grid-deferral value.
Smartcar + Camus Energy — connected-vehicle API + grid orchestration.
Octopus Intelligent / Tesla Virtual Power Plant — household-level dispatch tied to wholesale prices.

4.3 Materials discovery

GNoME — Merchant, Batzner, Schoenholz et al, DeepMind, Nature 624:80-85 (Nov 2023), “Scaling deep learning for materials discovery.” Generated 2.2 million candidate stable inorganic crystals; 381k confirmed stable by DFT. Among them: 528 candidate Li-ion conductors for solid-state batteries.
A-Lab — Szymanski, Rendy, Fei et al, Lawrence Berkeley National Lab + Google DeepMind, Nature 624:86-91 (Nov 2023). Autonomous lab successfully synthesized 41 of 58 GNoME-predicted novel inorganic materials in 17 days. The follow-on critique by Leeman, Liu et al 2024 questioned phase purity in 9 of 41 reported syntheses, highlighting that ML-suggested candidates still need expert characterization. The episode is now a case study in the limits of “closed-loop” autonomous discovery.
Citrine Informatics — materials informatics platform (founded 2013, San Francisco).
MatterSim — Microsoft Research AI for Science 2024, deep-learning interatomic potential across the periodic table at 0-5000 K.
Toyota Research IM2 / Matter Lab — automated electrochemistry platforms for battery electrolyte discovery.

4.4 Catalyst discovery

Open Catalyst Project — Meta AI + Carnegie Mellon, 2020-present. OC20 (1.3 M DFT relaxations of adsorbate-catalyst systems for renewable energy applications), OC22 (oxide electrocatalysts), OCx24 (multi-component catalysts). Targets CO2 reduction reaction (CO2RR), nitrogen reduction (NRR), oxygen evolution / hydrogen evolution (OER/HER). Models: SchNet, DimeNet++, GemNet-OC, EquiformerV2.
MACE / Orb / SevenNet / CHGNet — universal machine-learned interatomic potentials (UMLIPs) trained on tens of millions of DFT calculations (Materials Project, OC20, Alexandria); achieve near-DFT accuracy at ~10^4-10^5x speedup. Enables molecular dynamics on green-cement clinker chemistries, solid electrolytes, and electrocatalysts at minute-not-month timescales.
CO2 reduction reactor design — operational deployments by Twelve (formerly Opus 12, Berkeley), Dioxide Materials, Carbon Recycling International combining DFT-guided catalyst screening with flow-cell engineering.

4.5 Fusion plasma control

DeepMind + EPFL TCV tokamak — Degrave, Felici, Buchli et al, Nature 602:414-419 (Feb 2022), “Magnetic control of tokamak plasmas through deep reinforcement learning.” A model-free RL agent learned coil-current commands to stabilize plasma shapes (including snowflake and droplet configurations) on the variable-configuration Tokamak à Configuration Variable in Lausanne. Shipped policy ran at 10 kHz control rate.
Princeton + DeepMind 2024 — predictive disruption avoidance models for ITER-class devices, trained on the C-Mod / DIII-D / JET disruption databases.
Commonwealth Fusion Systems SPARC + TAE Technologies Norman — proprietary ML stacks for HTS magnet quench prediction and FRC sustained-confinement control.
Helion Polaris — ML-tuned pulsed direct-electricity fusion target shots, claimed first-electricity 2028.

4.6 Grid operations

Predix (GE Vernova), Tibco Spotfire, Bidgely — ML for grid-asset health, fault detection, transformer-loading forecasting.
AutoGrid / Schneider Electric — distributed-energy-resource (DER) virtual power plants.
Octopus Energy Kraken (UK) — ML-driven retail tariff + grid balancing, exporting platform to ~50 M accounts.
Tapestry (Alphabet X) — grid-planning visualization for transmission system operators (PJM, AES Indiana piloting 2023-2024).
PJM, MISO, ERCOT, CAISO — incorporating ML probabilistic load + renewable forecasting into day-ahead and real-time markets; see electricity markets and grid economics.

5. Buildings and cities

HVAC optimization — DeepMind / Google data-center cooling, deployed 2016: 40% reduction in cooling energy use, ~15% overall PUE improvement. Commercial spinoffs: BrainBox AI (Montréal), Carbon Lighthouse (acquired by Trane 2022), Buildings IoT, 75F.
Smart thermostats — Nest (Google, since 2011), Ecobee (since 2007), Sense (Cambridge MA — home energy disaggregation / NILM). Foundational NILM work: Kelly & Knottenbelt 2015 BuildSys, “Neural NILM”; UK-DALE dataset.
Heat pump deployment intelligence — Tetra, Quilt, Sealed, Aeroseal combine ML home-energy modeling with retrofit recommendations; Mitsubishi Electric Trane ducted/ductless ML controls.
Water heaters as grid storage — Shifted Energy (Honolulu) and Aquanta ML-controlled electric resistance + heat-pump water heaters as multi-hour thermal batteries for utility load-shifting.
Urban heat island — Sherwood, Mostofi et al; CAPA Heat Watch (Portland State + community campaigns); Climate Resilience Fund work on urban heat mapping at 10 m via Sentinel-2 thermal proxies.
Mobility — Google Maps eco-routing (deployed 2021, claims >1.2 Mt CO2 saved through 2023); flight emissions estimates (Google Flights uses Travel Impact Model open standard 2022); Citymapper; Uber Movement.
Contrail mitigation — non-CO2 aviation forcing (~1-2% of total anthropogenic radiative forcing). Google Research + American Airlines + Breakthrough Energy 2023 trial: ML-guided flight-level adjustments to avoid ice-supersaturated regions reduced contrail formation by ~54% on instrumented flights. See Schumann, Voigt, Kärcher et al CoCiP model.
Embodied carbon in construction — EC3 (Embodied Carbon in Construction Calculator), One Click LCA, Building Transparency for material-substitution decisions in design phase. ML-augmented life-cycle inventory matching reduces analyst effort from days to minutes.

6. Industry and materials

See process engineering and control for the chemical engineering foundations.

6.1 Cement and concrete

Cement contributes ~7-8% of global anthropogenic CO2.

Sublime Systems (Cambridge MA, spun out of MIT Chiang lab 2020) — electrochemical low-temperature lime production, eliminates CaCO3 calcination CO2.
Brimstone Energy (Oakland, founded 2019) — uses calcium silicate (instead of limestone) as feedstock, co-producing magnesium oxide.
Fortera (Saratoga CA) — calcium silicate carbonation, CO2-cured cement.
Heliogen — Bill Gross + Caltech 2019, AI-controlled heliostat array for high-temperature solar process heat (cement kilns, hydrogen).
Solidia Technologies (Piscataway NJ) — CO2-cured low-clinker precast concrete.
CarbonCure (Halifax NS) — CO2 mineralization injection during ready-mix concrete batching.
AI-designed mixes — Concrete.ai, Converge.io (UCL spinout); Ge, Friedrich et al, Materials & Design 2024 on Bayesian-optimized green cementitious blends.

6.2 Steel

Steel ~7% of global CO2.

HYBRIT (SSAB + LKAB + Vattenfall, Sweden 2016-) — hydrogen direct-reduced iron (H2-DRI) demonstration plant Luleå.
Stegra (formerly H2 Green Steel, Boden Sweden) — 2.5 Mt/y green steel commercial plant under construction, first production 2026.
Boston Metal (Woburn MA, MIT Sadoway lab) — Molten Oxide Electrolysis (MOE), zero-emissions iron from ore.
Form Energy (Somerville MA, founded 2017 by Chiang + Mateo Jaramillo) — multi-day iron-air battery, 100-hour duration; first commercial deployment 2024 with Great River Energy.
ML in steel-plant operations: Fero Labs (control optimization for re-rolling mills), Tata Steel + IBM smart manufacturing.

6.3 Plastics and recycling

Carbios (Clermont-Ferrand France) — engineered PET-ase enzyme; demonstration plant Longlaville opening 2025.
Plastic Energy (London) — pyrolysis for chemical recycling (TacOil feedstock).
Polymateria (London) — biotransformation additives.
Solbar — solar-driven plastic depolymerization research.

7. Agriculture and food

See design-autonomous-electric-tractor for the autonomous platform context that hosts much of this stack. See also agricultural machinery and automation.

7.1 Precision agriculture

John Deere See & Spray Ultimate / Select — built on Blue River Technology (acquired 2017 for $305 M, Sunnyvale CA, founded by Jorge Heraud + Lee Redden ex-Stanford). Real-time computer-vision-based weed-vs-crop discrimination at boom speed (~22 km/h), spot-spraying reduces herbicide ~66% in row crops.
Carbon Robotics LaserWeeder (Seattle, founded Paul Mikesell 2018) — fiber laser thermal weeding, 200 kW total emitter array on a tractor-pulled rig; 30 weeds/s.
Sabanto (Chicago) — retrofit autonomy kits for legacy Class 5-9 tractors.
FarmWise Titan (San Francisco) — mechanical weeding with CV guidance for specialty vegetables.
Naïo Technologies Oz / Dino / Orio (Toulouse) — autonomous platforms for orchards and vegetables.

7.2 Crop disease and field intelligence

Plantix (Berlin) — smartphone-based disease/pest/nutrient diagnosis from leaf photos, 30+ crops.
PEAT GmbH (Hannover, founded Simone Strey + Pierre Munzel 2015) — same lineage, Plantix is their app.
AGCO Fendt — IDEALharvester combines with grain-flow sensors and crop-stress prediction.
Sugar beet weed RCT — Buchanan, Esser et al 2022 Smart Agricultural Technology, demonstrating ML-guided herbicide reductions in production trials.

7.3 Aquaculture

Aquabyte (Bay Area + Bergen) — underwater computer vision for salmon biomass + lice counting; Cargill partnership.
Tidal X (Alphabet/X 2017-2024) — underwater perception system for aquaculture; wound down 2024.
ReelData AI (Halifax) — biomass and feed-conversion estimation for land-based recirculating aquaculture systems (RAS).
Innovasea, AKVA Group — sensor + ML for net-pen aquaculture monitoring.
Ocean Mind, Global Fishing Watch — vessel monitoring (AIS + SAR + dark-target detection) for illegal-unreported-unregulated (IUU) fishing enforcement, which has emissions co-benefits via stock recovery.

7.4 Livestock methane

Mootral (Wales) — garlic + citrus feed additive (~30% enteric CH4 reduction in trials).
Symbrosia (Hawaii) — Asparagopsis taxiformis red seaweed cultivation; clinical trials show 50-90% enteric CH4 reduction.
Bovaer (3-NOP) — DSM-Firmenich, regulatorily approved EU 2022, USA FDA 2024.

7.5 Food-system optimization

Cargill, Bunge, ADM, COFCO — using ML in trade-route + storage + crush-margin optimization (carbon-intensity reporting follows EU CSRD / EUDR rules from 2024).
Just Eat / DoorDash / Wolt — route optimization with carbon-aware variants.
Olio, Too Good To Go, Imperfect Foods — food-waste-redistribution platforms (food waste ~8-10% of global GHG, see carbon-cycle-and-greenhouse-gases).
Alternative proteins — Perfect Day (precision-fermentation dairy), Solar Foods Solein (gas-fermentation single-cell protein), Air Protein, The Every Company — many use ML-guided strain design (DeepMind AlphaFold-derived pipelines for engineered enzymes).

8. Carbon accounting and verification

8.1 Forest carbon

Pachama (San Francisco, founded Diego Saez-Gil 2018) — remote-sensing-based forest-carbon project assessment + monitoring.
Sylvera (London) — independent carbon-credit rating, scoring projects 1-AAA.
BeZero Carbon (London) — comparable rating agency.
CarbonPlan critiques — Badgley, Freeman, Hamman, Cullenward et al 2022 Global Change Biology 28:1433 — exposed systematic over-crediting (29% inflation) in the California Forest Offset Protocol, used 30% of California cap-and-trade offsets.
NCX / Forest Carbon Works (formerly SilviaTerra) — short-duration carbon contracts for small US forest landowners; pivoted 2024 from credits to direct biomass valuation following methodology critiques.
Funga (Austin, founded Colin Averill 2021) — soil-microbiome (mycorrhizal) inoculation to enhance tree growth + below-ground carbon.

8.2 Soil carbon

Indigo Ag (Boston) — Carbon by Indigo soil-C credit programme.
Boomitra (San Mateo, founded 2020) — satellite + ML soil-organic-carbon estimation, Global South focus.
CIBO Technologies (Cambridge MA) — DayCent + SALUS process-based crop models + ML.
Yard Stick (Boston, founded 2020) — in-situ soil-C measurement probe (Inelastic Neutron Scattering principle).

8.3 MRV automation

CTrees (Pasadena, founded Sassan Saatchi JPL 2022) — global, satellite-derived tree-level carbon stock and flux estimates.
CoverCress (St Louis) — cover-crop + winter oilseed; data platform tracks acreage.
Climate TRACE — coalition launched 2021 (Al Gore + WattTime + RMI + Carbon Plan + Earthrise Media + others); facility-level emissions inventory from satellite + sensor data; ~352 M sources covered by 2024 inventory. The data are openly licensed and aggregated by sector, fuel, and country — increasingly cited in national inventory submissions to UNFCCC.
Verra Verified Carbon Standard methodology updates (VM0047 ARR dynamic baselines, VM0048 REDD+ jurisdictional consolidated 2024) — ML-MRV-friendly methodologies replacing static project-by-project baselines.
Gold Standard for the Global Goals — comparable update path, including methane and waste sectors.
Isometric (London, founded 2022 by Eamon Jubbawy) — open-protocol carbon-removal registry with peer-reviewed methodologies and full data transparency.
Puro.earth (Helsinki, acquired by Nasdaq 2021) — biochar, BECCS, mineralization registry.

8.4 Supply-chain emissions (Scope 3)

Watershed (San Francisco, founded ex-Stripe 2019) — enterprise carbon accounting.
Persefoni (Tempe AZ) — climate-management software, partnered with Big-4 accounting.
Sweep (Paris).
Plan A (Berlin).
Greenly (Paris).
Normative (Stockholm).
CarbonChain (London) — commodity-trade emissions tracing using vessel + facility data.
Sourcemap (NY) — physical-supply-chain mapping with EUDR / Uyghur Forced Labor Prevention Act compliance overlays.

ML role in Scope-3 accounting is mostly LLM-assisted product-to-emission-factor matching, spend-based proxy refinement, and supplier-questionnaire automation. Direct measurement remains rare outside Scope 1/2.

8.5 Direct air capture and CDR verification

The carbon-removal (CDR) industry — Climeworks Orca/Mammoth, Heirloom, CarbonCapture Inc, Octavia Carbon, Charm Industrial, Lithos Carbon (enhanced rock weathering), Running Tide (ocean alkalinity), Vesta, Planetary Technologies, Ebb Carbon — depends on credible MRV. ML is being applied to:

Inverse modeling of atmospheric CO2 anomalies near DAC plants (OCO-3 plus regional models).
Olivine and basalt weathering rate estimation from soil-sample spectra.
Ocean alkalinity verification from satellite ocean-color and float networks (e.g. BGC-Argo).
Forecasting and routing of biomass feedstock for BECCS plants.

8.6 CO2 transport and storage

Geological storage requires monitoring of subsurface CO2 plumes via 4D seismic, InSAR surface deformation, microseismicity, downhole pressure / temperature / fluid sampling. ML enters at:

Plume tracking — convolutional + recurrent models trained on history-matched reservoir simulations (Sleipner Norway, Quest Canada, Decatur Illinois operational datasets).
Wellbore integrity — anomaly detection on distributed-fiber-optic sensing (DFOS) — DAS (acoustic), DTS (temperature), DSS (strain). Used by Equinor, Shell, BP Northern Endurance Partnership, Storegga.
Site selection — ML over basin-scale geological databases (NETL CO2 SCOPE, IEA GHG, BGR Germany, GA Australia) to rank prospects.

9. Climate adaptation

See climate-impacts-and-adaptation for the full impacts inventory.

9.1 Flood forecasting

Google Flood Forecasting Initiative — Nevo, Morin, Gerzi Rosenthal et al (Hassidim, Matias group, Google Research + Tel Aviv), Nature 627:559-563 (Mar 2024). LSTM ensemble trained on 5680 globally-distributed gauges; provides 7-day forecasts in 80+ countries by end-2023, 100+ countries by 2024; covered an area home to 460 M people, focused on ungauged Global South basins.
Floodbase (formerly Cloud-to-Street, NYC, founded Beth Tellman + Bessie Schwarz 2017, acquired by Munich Re unit 2024) — real-time flood-extent monitoring from Sentinel-1 SAR.
Fathom (Bristol UK) — global flood hazard maps at 30 m resolution; widely used by reinsurance.
JBA Risk Management (Skipton UK).
PrecisionHawk — drone-based post-flood damage assessment.

9.2 Wildfire risk

Pano AI (San Francisco) — mountaintop panoramic cameras + ML detection.
Vibrant Planet (Truckee CA, founded Allison Wolff 2020) — wildfire mitigation + forest-management planning platform.
First Street Foundation Risk Factor — property-level fire + flood + heat risk scores.
ALERTCalifornia — UCSD + Cal Fire 1100-camera detection network.
Reax Engineering — wildfire structure-loss modeling.

9.3 Tropical cyclone intensity

Salient Predictions (Boston) — subseasonal-to-seasonal forecasting using ML + climate teleconnections.
Tomorrow.io (Boston) — proprietary radar smallsat constellation + ML weather products.
Atmo (San Francisco, founded Alexander Levy + Johan Mathe 2023) — operational ML-only national weather forecasting (Philippines, Indonesia, Tuvalu); Aurora-class transformer architectures.
ClimaCell (rebranded Tomorrow.io 2021).
NOAA NHC HAFS (Hurricane Analysis and Forecast System) — operational since 2023, displacing HWRF; ML-augmented intensity guidance in development for 2026 cycles.
DeepCyclone, HURDAT-trained transformers — academic research on rapid-intensification prediction (Maskey et al 2020; Chen, Wang et al 2024).
Insurance-grade catastrophe modeling — Karen Clark & Company, Verisk AIR Worldwide, RMS (Moody’s), CoreLogic, Reask, Mitiga Solutions increasingly incorporate ML hazard layers alongside physical loss models.

9.4 Drought and index insurance

NDMC US Drought Monitor — University of Nebraska + USDA, weekly composite.
Sentinel-2 vegetation anomaly products — NDVI / EVI / SIF (sun-induced fluorescence).
African Risk Capacity (AU specialized agency since 2014) — sovereign-level drought insurance.
WIBI (Weather Index-Based Insurance) — ACRE Africa, Kilimo Salama (Syngenta Foundation + Safaricom), Pula Advisors.

9.4.1 Water resources

NASA GRACE / GRACE-FO groundwater — Famiglietti et al — monthly water-storage anomalies; ML used for spatial downscaling to aquifer-scale.
OpenET (CalTrust + Desert Research Institute + EDF + NASA + USDA + USGS 2021) — daily 30 m evapotranspiration for the conterminous US, blending Landsat + climate-station data through six independent ET algorithms; used by California SGMA groundwater agencies for crop-water accounting.
Stanford OpenAg / IrrigationFlow — ML pump scheduling.
Israeli + Spanish + Saudi desalination — Acwa Power, IDE Technologies use ML for membrane fouling prediction in reverse osmosis plants.

9.5 Heat-health early warning

NOAA HeatRisk (operational 2024) — 7-day color-coded heat-health forecast, jointly with CDC.
University of Washington Adaptation Hub + Vivid Economics — urban heat-vulnerability scoring.
Climate Resilience for All — global heat action platform.

ML role: downscaling reanalysis to ~1 km urban resolution, fusing with land-surface-temperature satellite retrievals (Landsat 8/9 TIRS, MODIS LST, ECOSTRESS on ISS), and projecting onto health-outcome statistical models (excess-mortality curves, Lancet Countdown indicators).

10. Climate justice and ethics

10.1 Energy cost of training

Strubell, Ganesh, McCallum 2019 ACL “Energy and Policy Considerations for Deep Learning in NLP.” Estimated training a 213 M-parameter Transformer with NAS at 626 t CO2 equivalent (US grid). Sparked the field.
GPT-3 175 B: ~552 t CO2 estimated (Patterson et al 2021).
LLaMA 3 405 B: ~1700+ t CO2 disclosed in Meta technical report 2024 (~11 M GPU-hours on H100, ~700 W/GPU TDP).
Efficiency advances: FlashAttention-2/3 (Dao 2023/2024), 8-bit / 4-bit quantization (QLoRA Dettmers et al 2023), Mixture-of-Experts (Mixtral 8x7B Mistral 2023), sparse models, distillation (Hinton 2015; DistilBERT Sanh et al 2019), Speculative decoding.
Inference dominates — at frontier-model scale, lifetime inference CO2 now exceeds training CO2 within months of deployment. Patterson et al 2024 estimate ~80-90% of total emissions for popular consumer models accrue from serving rather than training.
Comparative scale — global commercial-aviation emissions are ~900 Mt CO2/yr (2023). Total data-center electricity in 2023 ~460 TWh (IEA), of which AI-attributable share is rising fast from a small base. Even high-end 2030 projections (~1000-1500 TWh data-center electricity) place AI within an order of magnitude of aviation, not of total fossil-fuel emissions (~37 Gt CO2/yr).
Mitigation co-benefits — most ML-for-climate work targets sectors much larger than AI’s own footprint. The implicit thesis is that every ton of CO2 emitted training and serving climate-relevant models should unlock 10-100x emissions reductions downstream. This is plausible for weather-forecast-driven grid optimization, methane leak detection, precision agriculture, and materials discovery. It is much less obvious for general-purpose LLMs serving consumer chat.
Carbon-aware computing — workload shifting in time and geography to match low-carbon grid hours. Google’s “carbon-intelligent computing platform” (Radovanovic 2020), Microsoft Azure’s region-aware deployment, AWS’s Customer Carbon Footprint Tool. See WattTime (real-time marginal emission factors) and Electricity Maps (formerly Tomorrow) for the data layer.

See datacenter energy and cooling for the data-center side, and ai hardware and accelerators for chip-level efficiency.

10.2 Equitable deployment

Disparities in compute access (Africa <0.5% of global ML accelerator capacity, 2023 estimate by Lacoste / Mila); ML expertise concentration in Global North; satellite data is open (Sentinel, Landsat) but processing infrastructure is not.
Climate Change AI Innovation Grants explicitly target Global South PIs.
Deep Learning Indaba, Black in AI, LatinX in AI, Khipu Latin American AI/ML conference — communities pushing distribution.

10.3 Climate communication

Climate Central (Princeton non-profit) — attribution-of-events, sea-level-rise mapping (Surging Seas).
IPCC Interactive Atlas (since AR6 WG1 2021) — regional climate-change projections.
NASA Climate Spiral — temperature anomaly visualization (Ed Hawkins, University of Reading “warming stripes” originator).
Probable Futures (founded Spencer Glendon 2020) — interactive global climate-projection maps for non-specialist audiences.
ClimateAi, Jupiter Intelligence, The Climate Service (acquired by S&P Global 2022) — enterprise climate-risk visualization.
Berkeley Earth (Rohde, Muller et al) — open temperature dataset and analysis, foundational for public-facing communication.
Carbon Brief (London) — open data-journalism on climate science, policy, energy; widely cited by AR6 IPCC authors for visualization.
Our World in Data (Oxford, founded Max Roser 2014) — open energy + emissions + climate datasets and visualizations.
Project Drawdown (founded Paul Hawken 2017) — quantified solution portfolio + technical assessment.

10.3.1 Disaster response and humanitarian use

AIDA (UN Office for the Coordination of Humanitarian Affairs) Centre for Humanitarian Data — coordinates ML for early-action triggers.
510 Netherlands Red Cross — ML-based impact-based forecasting (IBF) for typhoon, flood, drought; partner deployments in Mozambique, Philippines, Bangladesh.
Anticipation Hub (German Red Cross / IFRC / Red Cross Climate Centre) — coordinates anticipatory action funding triggered by ML hazard forecasts.
GFDRR (Global Facility for Disaster Reduction and Recovery, World Bank) — funds open-data and ML hazard mapping in Global South.
FEWS NET (Famine Early Warning Systems Network, USAID + NOAA + NASA + USGS) — ML-augmented agro-climatic projections for ~30 food-insecure countries.

10.4 LLMs in climate workflows

Large language models have become a layer in many climate analyses:

ClimateGPT (EQTY Lab + LMU Munich + Erasmus, 2024) — retrieval-augmented over IPCC + peer-reviewed corpora.
ClimateQA (Ekimetrics + IPCC dev team) — citation-grounded climate Q&A.
ChatClimate (Vaghefi, Stammbach, Muccione et al ETH Zurich 2023) — RAG over IPCC AR6.
Anthropic Claude / OpenAI GPT-4o / Google Gemini in literature synthesis — used for systematic-review acceleration in IPCC AR7 prep, evidence-synthesis platforms (e.g. Stanford Cool Earth, Future Earth’s Sustainability Research and Innovation Congress).
Failure modes — hallucinated citations to climate papers, confident inversions of attribution direction, false equivalences. Best practice: cite-only-retrieved, refuse-when-uncertain, human-in-the-loop for high-stakes outputs.

11. Key organizations and funders

Climate Change AI (CCAI) — Rolnick, Donti, Kaack et al, established 2019 non-profit out of the 1906.05433 paper. Runs CCAI Workshops at NeurIPS/ICLR/ICML; CCAI Summer School; CCAI Innovation Grants programme (~$1 M/yr).
Stripe Frontier — $1 B+ AMC (Advanced Market Commitment) for carbon removal, founded 2020 by Stripe + Alphabet + Meta + Shopify + McKinsey Sustainability.
Microsoft Climate Innovation Fund — $1 B commitment 2020.
Breakthrough Energy Ventures — Bill Gates et al 2016, ~$3.5 B AUM across BEV Fund I/II/III.
Lowercarbon Capital — Chris + Crystal Sacca, $2 B+ AUM.
AI2 (Allen Institute for AI) — Climate Modeling team + Allen Institute Climate Initiative.
Climavision — proprietary radar network + ML, founded 2020 by Chris Goode + Jon Sohn (Louisville KY); private X-band radar gap-filling network in the US.
Atmo, Tomorrow.io, Salient Predictions — commercial ML weather.
ClimateGPT — EQTY Lab + LMU Munich + Erasmus + Roots & Wings 2024, open-source LLM fine-tuned on climate-science corpora (IPCC AR6, peer-reviewed papers).
ClimateBERT — Webersinke, Kraus, Bingler et al 2022, transformer fine-tuned for climate-related text classification + analysis.
Open-source toolchains: ClimateMachine.jl (CliMA Caltech), pysteps (radar nowcasting), wradlib, climetlab (ECMWF), keras-cv-climate, TorchClim, Climsight (LLM + climate database integrator, Yangyang Xu et al 2024).
Specialty datasets: CMORPH (NOAA CPC, satellite-radar-fused precipitation), IMERG (NASA GPM), CHIRPS (UCSB, Africa-focused 1981+), PRISM (Oregon State, US gridded climatology), Daymet (ORNL); ESM2M / GFDL reanalysis; 20CR Twentieth-Century Reanalysis (NOAA, sparse-obs-driven back to 1806); GISTEMP (NASA GISS, monthly surface T anomalies).
Ocean: NOAA CMEMS Copernicus Marine, HYCOM, ROMS, Argo float network (~4000 active autonomous profilers globally), BGC-Argo (biogeochemical), OceanParcels Lagrangian particle tracking.
Land surface: MERIT-Hydro (Yamazaki et al — global hydrography), HydroSHEDS, GLEAM evapotranspiration, FluxNet eddy-covariance network (~900 sites globally).

11.1 Conferences and venues

NeurIPS Climate Change AI workshops (annual since 2019) — primary publication venue alongside ICLR/ICML CCAI tracks.
EGU General Assembly (Vienna, ~18 000 attendees) and AGU Fall Meeting (San Francisco, ~25 000) — dedicated ML-for-Earth sessions since ~2018.
ECMWF AI4Earth Forecasting workshops and ECMWF/EUMETSAT ML/AI annual meeting.
CIRA-NOAA AI Workshop — applied operational meteorology.
NeurIPS Datasets & Benchmarks track — increasingly important for climate datasets (Prithvi, ClimateBench v2, WeatherBench 2 all published there).
Journals: J. Adv. Modeling Earth Systems (JAMES, the field’s home), Geophysical Research Letters (GRL), Bulletin of the American Meteorological Society (BAMS), Environmental Data Science (Cambridge 2022-), Communications Earth & Environment, NPJ Climate and Atmospheric Science.

12. AGI and transformative-AI debates

A more speculative thread argues that highly capable AI could accelerate the path to net-zero (Karnofsky 2021 Open Philanthropy; Christiano 2023 ARC blog; Hassabis 2024 various) — via faster materials/catalyst/biotech discovery, optimized energy systems, accelerated nuclear/fusion deployment, and aggressive industrial decarbonization. Counter-arguments emphasize that:

ML training energy is growing >2x/yr (Patterson 2021, Epoch AI estimates 2024).
NVIDIA H100 / B100 / B200 production ramps imply data-center electricity demand of 100-300 TWh by 2030 in mainstream forecasts (IEA, Goldman Sachs 2024 — see datacenter energy and cooling).
Cooling water concerns (Microsoft Phoenix 2022 disclosure: 2.4 ML water/day per hyperscale campus; West Des Moines Iowa cluster ~6% of district water use).
Nuclear-powered data centers as response: Microsoft + Constellation Energy 2024 deal to restart Three Mile Island Unit 1 (~835 MW), Amazon + Talen Energy 2024 PPA at Susquehanna nuclear; Oklo, X-Energy, NuScale, Kairos Power small modular reactor (SMR) deals with Google, Amazon, Microsoft 2024.

See nuclear power economics for the market context and nuclear reactor engineering for the technical foundations.

13. Limitations and critique

Hype vs deployment — many ML-for-climate demos remain papers; operational uptake at NWS, ECMWF, USGS lags by years. The 2022-2025 weather ML transition is the exception, not the norm.
Reproducibility crisis in EO ML — Beery, Cole, Gjoka 2022 NeurIPS “The iWildCam 2022 Challenge” and earlier audits found that >50% of remote-sensing ML papers do not release data or full code; many compare against unfair baselines.
Weak baselines — Persistence, climatology, and well-tuned linear models often match or beat published ML on long-range forecasting tasks (Schultz et al 2021 Phil Trans R Soc A; Dueben et al 2022).
Domain expertise gap — Climate scientists in-the-loop matters: see Reichstein, Camps-Valls, Stevens et al 2019 Nature 566:195-204 “Deep learning and process understanding for data-driven Earth system science.”
Misuse risks — Geoengineering optimization (solar radiation modification deployment via ML-controlled SAI delivery) raises governance issues currently unresolved; Climate Overshoot Commission 2023, AGU 2024 statements.
Greenwashing — ML-generated ESG reports and carbon-credit narratives risk producing plausible-but-unverified disclosures. Regulators (SEC climate disclosure rule 2024, EU CSRD/ESRS) increasingly demand audit trails. Use of LLMs in disclosure-drafting workflows should be treated as a known failure mode unless coupled with retrieval grounded in primary measurement data.
Spurious correlations in EO — Vegetation-index time series correlate with hundreds of socioeconomic variables; trained classifiers can latch onto proxies (e.g. predicting “deforestation” from road patterns rather than canopy loss). Manual error analysis remains essential.
Concept drift — Many models trained on pre-2020 data show degraded skill on 2023-2024 extremes because the underlying distribution has shifted. Annual retraining + drift-monitoring (e.g. ADWIN, KS-test on inputs) is now considered minimum operational hygiene.

14. Key papers (citable canon)

Rolnick, Donti, Kaack, Kochanski, Lacoste, Sankaran et al, 2022. “Tackling Climate Change with Machine Learning.” Comm. ACM 65(2):96-106. arXiv:1906.05433.
Reichstein, Camps-Valls, Stevens, Jung, Denzler, Carvalhais, Prabhat, 2019. “Deep learning and process understanding for data-driven Earth system science.” Nature 566:195-204.
Watson-Parris et al, 2022. ClimateBench v1.0. J. Adv. Mod. Earth Sys. 14:e2021MS002954.
Lam, Sanchez-Gonzalez, Willson et al, 2023. “Learning skillful medium-range global weather forecasting” (GraphCast). Science 382:1416-1421.
Bi, Xie, Zhang, Wang, Tian, 2023. “Accurate medium-range global weather forecasting with 3D neural networks” (Pangu-Weather). Nature 619:533-538.
Pathak, Subramanian, Harrington et al, 2022. “FourCastNet.” arXiv:2202.11214.
Kochkov, Yuval, Langmore et al, 2024. “Neural general circulation models for weather and climate” (NeuralGCM). Nature 632:1060-1066.
Price, Sanchez-Gonzalez, Alet et al, 2025. “Probabilistic weather forecasting with machine learning” (GenCast). Nature 637:84-90.
Bodnar, Bruinsma, Lucic et al, 2024. “Aurora: A Foundation Model of the Atmosphere.” arXiv:2405.13063.
Nevo, Morin, Gerzi Rosenthal et al, 2024. “Flood forecasting with machine learning models in an operational framework” (Google Flood Forecasting Initiative). Nature 627:559-563.
Merchant, Batzner, Schoenholz et al, 2023. “Scaling deep learning for materials discovery” (GNoME). Nature 624:80-85.
Szymanski, Rendy, Fei et al, 2023. “An autonomous laboratory for the accelerated synthesis of novel materials” (A-Lab). Nature 624:86-91.
Degrave, Felici, Buchli et al, 2022. “Magnetic control of tokamak plasmas through deep reinforcement learning.” Nature 602:414-419.
Severson, Attia, Jin et al, 2019. “Data-driven prediction of battery cycle life before capacity degradation.” Nature Energy 4:383-391.
Strubell, Ganesh, McCallum, 2019. “Energy and Policy Considerations for Deep Learning in NLP.” ACL 2019.
Bauer, Thorpe, Brunet, 2015. “The quiet revolution of numerical weather prediction.” Nature 525:47-55.
Rasp, Pritchard, Gentine, 2018. “Deep learning to represent subgrid processes in climate models.” PNAS 115:9684-9689.
Beucler, Pritchard, Rasp, Ott, Baldi, Gentine, 2021. “Enforcing analytic constraints in neural networks emulating physical systems.” PRL 126:098302.
Hersbach et al, 2020. “The ERA5 global reanalysis.” QJRMS 146:1999-2049.
Badgley, Freeman, Hamman, Cullenward et al, 2022. “Systematic over-crediting in California’s forest carbon offsets program.” Global Change Biology 28:1433-1445.

15. Tools and data libraries

Reanalyses: ERA5 (ECMWF Copernicus Climate Data Store, 1940-present); MERRA-2 (NASA GMAO); JRA-55/JRA-3Q (JMA); NCEP/NCAR.
CMIP6 archive — Earth System Grid Federation (ESGF), 100+ models, 50 PB.
Google Earth Engine — petabyte-scale planetary geospatial analysis; free for non-commercial.
Microsoft Planetary Computer — STAC-indexed PB-scale data + free compute for sustainability projects.
Copernicus Climate Data Store (CDS) + Atmosphere Data Store (ADS) — open ECMWF products.
Python stack: xarray (label-aware n-D arrays), Dask (parallel computing), Zarr (chunked array storage), cfgrib + eccodes (GRIB I/O), cartopy + matplotlib + iris (mapping/plotting), netCDF4.
ML frameworks: PyTorch, JAX (NeuralGCM, GraphCast, GenCast all use JAX), Flax, Equinox, Haiku.
Specialty libraries: pysteps (radar precipitation nowcasting), wradlib (weather radar), climetlab (ECMWF data access), earth2mip (NVIDIA — model intercomparison framework), TorchGeo (geospatial datasets + transforms), Raster Vision (Azavea, EO pipelines), TerraTorch (IBM, fine-tuning foundation models on EO).
Differentiable physics: JAX-CFD (Google Research — differentiable computational fluid dynamics), PhiFlow (TU Munich), NeuralGCM code (DeepMind, open-source released 2024), Pangolin (CliMA — differentiable Julia ocean model).
Workflow orchestration: Pangeo stack (xarray + Dask + Zarr + JupyterHub on Kubernetes; the de-facto Python community standard), Prefect / Airflow for production pipelines, Snakemake for reproducible bioinformatics-style climate pipelines.
Visualization beyond matplotlib: HoloViz stack (Datashader, hvPlot, Panel), Pangeo-Forge, Folium / Kepler.gl for web maps, napari for n-D image inspection, CesiumJS + Mapbox for 3D globe rendering.
Operational forecast platforms: NVIDIA Earth-2, Google Earth-2-style; Open-Source-NWP movement.

15.1 Reference workflows

A handful of canonical workflows recur across nearly every applied project:

ERA5 + xarray + Zarr ingest — convert from GRIB to chunked Zarr for random-access training; standard chunking ~(time=24, lat=720, lon=1440).
STAC + Planetary Computer Sentinel-2 pipeline — query by AOI + cloud-cover, lazy load via rioxarray, mosaic with stackstac.
PyTorch Lightning + Weights & Biases ML training loop — distributed multi-GPU with DDP / FSDP; gradient checkpointing for large models.
JAX + Haiku/Equinox + Optax — preferred for differentiable physics (NeuralGCM, GraphCast). Single-program-multiple-data via pjit / shard_map.
earth2mip / earth2studio (NVIDIA) — inference harness for FourCastNet / GraphCast / Pangu rollouts at scale.

16. Outlook (2026-2030)

Operational ML weather will become standard alongside NWP at all major centers; ECMWF AIFS, NOAA Graph-EFS, JMA, DWD likely 2026-2027 production.
Hybrid models like NeuralGCM extending to coupled atmosphere-ocean-land-ice systems; CliMA, E3SM, ICON, IFS efforts converging.
Foundation models for EO will commoditize: open weights for Prithvi, Clay, Aurora, TerraMind, plus task-specific fine-tunes will reach hundreds of derivative deployments.
Materials and catalyst discovery: GNoME-style at-scale screening combined with self-driving labs (A-Lab, MERMaid, Polybot) will compress 10-year discovery cycles to 1-2 years for battery, electrolyzer, and concrete chemistry.
Carbon-removal MRV: emergence of ML-audited registries (CDR.fyi style) backed by hyperspectral + isotopic ground truth; potential consolidation among credit raters.
National AI-for-climate strategies — the UK AI Safety Institute climate workstream, US DOE AI4Science, EU Destination Earth, China’s AI weather initiatives are all formalizing. Expect coordination + competition dynamics similar to those in foundation-model policy.
Open-data preservation — the open-data foundations the field depends on (Sentinel, Landsat, ERA5, CMIP6) face geopolitical and budget risk. Community mirrors and archive-redundancy efforts (NOAA Data Rescue, Climate Mirror) will matter more.
Hybrid weather-climate-EO models — projecting that 2026-2028 will see end-to-end models that ingest satellite radiances, produce atmospheric state, run ensemble forecasts, and feed downstream impact models, all in one differentiable pipeline.
Sub-seasonal-to-seasonal (S2S) skill — the “predictability gap” between 2-week weather and decadal climate is where the next round of operational ML revenue and impact will be earned.
AGI-class systems: if realized, could compress R&D timelines on the entire mitigation stack — but their own energy demand becomes a first-order climate variable, motivating nuclear/geothermal/SMR data-center deployment.
Risk axis: regulatory regimes for ML weather forecasts (aviation, marine), ML carbon credits (EU CRCF, US Article 6 mechanisms), and ML-controlled grid operations.

Cost curves and unit economics

A useful framing is to track the “cost per useful forecast” or “cost per ton CO2 verified” over time:

10-day deterministic global NWP: ~10 USD compute per forecast on Tier-1 supercomputers; relatively flat over the past decade despite resolution increases (Moore’s-law-style efficiency offsetting compute growth).
10-day ML deterministic forecast (GraphCast / Pangu / AIFS): ~0.01-0.05 USD per forecast on cloud GPU; 100-1000x cheaper than NWP at inference.
1000-member probabilistic ML ensemble (GenCast-class): ~5-20 USD per ensemble, similar to a 51-member NWP ensemble but with 20x more members.
ML carbon-credit MRV (forest, soil): ~0.05-2.00 USD per ton CO2 verified; vs ~1-10 USD/ton for traditional on-the-ground audit. Trust margins still being established.
Methane super-emitter detection via satellite + ML: ~10-100 USD per detected event flagged (Kayrros, GHGSat operational cost); leak repair often Pareto-dominates flaring/venting penalties under EPA OOOOb and EU Methane Regulation 2024.

These cost compressions are not merely incremental — they have already begun changing which decisions can be made data-driven (e.g. neighborhood-scale climate-risk pricing, intra-day grid-balancing markets, near-real-time supply-chain emissions auditing).

Failure modes to anticipate

Equally informative is to list what is most likely to go wrong in the next 5 years:

An ML weather model gives a confidently wrong forecast on a major extreme — likely scenario: a heat dome, tropical cyclone rapid intensification, or atmospheric river event lies outside the training envelope and one of the deployed models substantially under-forecasts it, exposing the lack of distribution-shift robustness.
A widely cited carbon-credit project, ML-MRV verified, turns out to be inflated — Verra / Gold Standard scandal redux, but with ML-derived backing data that adds a layer of plausible deniability.
A foundation model trained on biased EO data delivers systematically wrong outputs in a Global South region, triggering an accountability crisis (compare ImageNet skin-tone failures).
Carbon footprint of generative AI becomes a regulatory issue (EU AI Act, US SEC disclosure rule extensions) faster than the industry expects.
Solar geoengineering moves from research to early field experiments under thin governance; ML-enabled optimization of SAI delivery becomes a flashpoint.
A flagship ML weather model is pulled offline after a high-profile miss during a major event, forcing the operational community to formalize fallback procedures and dual-model verification.
An open-weights foundation model is fine-tuned for misuse — e.g. precision-targeted ecological damage, market manipulation via weather-trading, surveillance of indigenous land use.
Vendor lock-in for compute becomes a climate-policy issue when national met services find themselves dependent on a single cloud provider for forecast inference.

What to watch

First fully operational replacement of a Tier-1 deterministic global NWP run by an ML-only model (likely AIFS at ECMWF or a Pangu/GraphCast variant at a national met service).
First ML-derived material in commercial production for grid-scale storage, electrolyzer, or low-CO2 cement (closest candidates: GNoME Li-ion solid electrolytes via Berkeley A-Lab, MatterSim-screened catalysts).
First nuclear-powered hyperscaler data center brought online (Three Mile Island 2027-2028 target; SMR deployments 2030+).
First ML-MRV-verified carbon credit to clear a major registry’s quality bar (Verra VM0047 dynamic-baseline ARR, Gold Standard updates).
First operational fusion plasma controller trained via RL on a power-relevant device (likely SPARC commissioning 2027-2028).

16.1 Deeper case studies

Case study A — GraphCast operationalization

DeepMind’s GraphCast (Lam et al, Science 2023) provides the cleanest example of an ML model crossing from research to operations within 12 months. Released as open weights December 2023, the model was integrated into ECMWF’s experimental forecast system Q1 2024, then into the European Weather Cloud for member-state access mid-2024. The lessons learned:

Reproducibility matters: open weights + open inference code + working examples in JAX collapsed the time from publication to third-party operational reproduction from years to weeks.
Verification suite is the bottleneck: rather than retraining or fine-tuning the model, ECMWF spent most of the integration effort building automated verification pipelines (Z500, T2m, MSLP, wind, precipitation, TC tracking) so forecasters could trust the outputs.
The “boring” interfaces dominate: ingest ECMWF-IFS-format GRIB analysis, output GRIB forecasts, log to existing forecast verification dashboards. Most of the systems-integration work has nothing to do with ML.

Case study B — Google Flood Forecasting

Initially demonstrated 2018 on the Ganga-Brahmaputra basin, by 2024 the system covered 80+ countries and 460 M people. Key choices:

LSTM ensemble over physical hydrology — the model treats catchments as black boxes. Validated against thousands of stream gauges, it outperforms regional hydrological models in ungauged basins via cross-basin transfer learning (Kratzert et al 2019 HESS).
Open delivery channel — flood alerts delivered via Google Search, Maps, and the open Flood Hub API, with SMS partnerships in Bangladesh (BWDB), India (CWC), and Brazil (CEMADEN).
Equity by design — the system was deliberately deployed first in regions with limited national hydrological forecasting capacity.

Case study C — Pano AI wildfire detection

Pano AI’s mountaintop camera network plus ML classifier illustrates the operational sweet spot for “boring AI”:

Constrained ML problem — binary smoke-vs-no-smoke classification on a fixed-position camera with known background.
Human-in-loop — every alert is reviewed by a 24/7 monitoring center before forwarding to fire-service dispatch, eliminating the false-positive cost.
Hardware-software co-design — cameras specifically designed for the task (4K thermal + visible, pan-tilt-zoom), reducing the ML difficulty.
Commercial viability — sold to electric utilities (PG&E, SDG&E, Idaho Power) as wildfire-liability-mitigation, not to government agencies, sidestepping public procurement bottlenecks.

By 2024 Pano had deployed >2 000 cameras across the Western US, Australia, and Chile.

16.2 Cross-cutting infrastructure

A few horizontal pieces of infrastructure underlie a large share of the projects listed above. They are easy to overlook because they are not “ML” per se, but their absence would stop most operational deployment:

STAC (SpatioTemporal Asset Catalog) — open metadata specification for geospatial assets, now the lingua franca of Earth observation (USGS, NASA, ESA Copernicus, Microsoft Planetary Computer, Element 84). Allows uniform discovery + querying across petabyte-scale, multi-vendor archives.
Zarr — chunked, compressed, multidimensional array storage; the standard for cloud-native climate datasets. Native support in xarray; backed by Anaconda + Earthmover (commercial Arraylake).
COG (Cloud-Optimized GeoTIFF) — HTTP-byte-range-readable GeoTIFF; powers Pangeo + Planetary Computer raster I/O.
STAC + COG + Zarr combined enable lazy, parallel, lat-lon-cropped reads over the open internet — no full-archive downloads needed.
OGC API - Features / Coverages / EDR — modern web-standard alternatives to legacy WMS/WFS for serving climate + EO data.
Pangeo — the umbrella community + reference deployment (Pangeo Forge cloud-native data pipelines; Pangeo Cloud JupyterHub deployments on AWS/GCP/Azure).
NetCDF + CF conventions — the long-standing metadata convention for self-describing climate data; still the most widely supported scientific format.

16.2.1 Compute substrate

A note on the compute backing this stack. ML weather and emulation training runs are now in the same regime as foundation-model training:

GraphCast: ~32 Cloud TPU v4 chips for ~3 weeks for the final release model (DeepMind 2023).
Pangu-Weather: ~192 NVIDIA V100 GPUs for 16 days (Huawei 2023).
NeuralGCM: ~128 TPU v5 chips for several weeks (Google 2024).
Aurora: ~256 H100 GPUs for ~3 weeks for the base model (Microsoft 2024).

Inference is far cheaper — a single A100 / H100 / TPU v5e can run a 10-day GraphCast or Pangu forecast in seconds — which is why the operational economics work even when training is concentrated at a few institutions. See ai hardware and accelerators for the accelerator landscape.

16.3 Geographic concentration

Despite open data, ML-for-climate activity remains heavily concentrated geographically:

Weather-model R&D clusters in London (DeepMind, ECMWF satellite via Reading, Met Office Exeter), Reading, Boulder (NCAR), Princeton (GFDL), Beijing (Huawei, Tsinghua), Shanghai (Fudan, Shanghai AI Lab), Tokyo (RIKEN R-CCS, JMA), Paris (ECMWF, CNRS).
EO foundation models cluster around IBM Research, Microsoft AI for Science, NASA Goddard/JPL/IMPACT, Google DeepMind, ESA Φ-lab Frascati.
Materials discovery clusters at DeepMind, Microsoft Research, Berkeley Lab, MIT, CMU, Toyota Research Institute, Mila.
Climate-MRV startup activity clusters in San Francisco, Boston, London, Berlin, Toronto, Bengaluru (driven by EU CSRD + US SEC + Article 6 demand).

This concentration is a known equity issue and an explicit target of CCAI’s grant programme and the Lacuna Fund (Rockefeller + Google.org + IDRC) for African data infrastructure.

17. Open problems

A non-exhaustive list of unsolved or under-explored questions where serious ML progress would have outsized climate impact, framed as research questions where possible:

How do we evaluate ML weather models on extremes when extremes by definition are rare? WeatherBench 2 added tropical-cyclone metrics; ExtremeWeatherBench is broadening, but rare-event evaluation remains under-developed.
Can probabilistic ML emulators of GCMs span the full SSP scenario uncertainty cone? Or do they collapse around the training-set scenario distribution?
What is the right interface between ML weather forecasts and downstream economic decisions? Probabilistic outputs are scientifically right but operationally underused.
How should ML-MRV systems be audited? Third-party reproducibility of carbon-credit verification remains immature; what does an independent audit standard look like?
How do we attribute responsibility when an ML forecast misses an extreme and lives are lost? Liability and indemnification regimes for ML weather and flood forecasting are not yet codified.
Long-horizon hybrid models with reliable extremes — NeuralGCM-class models are stable in the mean but their tail behavior remains poorly characterized. Quantifying out-of-distribution heat dome / atmospheric river / TC rapid-intensification events under high warming is an open research frontier.
Differentiable Earth system models — end-to-end gradients from satellite observations through dynamical core to ML parameterizations and back. Pieces exist (NeuralGCM, JAX-CFD, ClimaAtmos) but a full coupled differentiable ESM does not.
Continual learning under non-stationary climate — models trained on 1979-2022 ERA5 may degrade as the climate state shifts beyond the training envelope. Online updating without catastrophic forgetting is an active topic.
Causal inference for attribution — distinguishing climate-change-driven signals from natural variability in operational decisions requires causal machinery beyond pure correlation; Camps-Valls, Reichstein and Runge groups lead here.
Trustworthy carbon-credit MRV — designing ML systems that are robust to gaming, regress to ground truth, and survive third-party audit at scale.
Open data for solar geoengineering — should models capable of optimizing stratospheric aerosol injection delivery be openly published? Governance precedent currently nonexistent (see Reynolds 2024 Science perspective).
Equity of operational deployment — ensuring ML-derived early-warning, flood, drought, and heat-health products reach populations most exposed.
Energy of inference — as climate-aware ML moves from research demos to always-on operational pipelines, the inference energy cost itself becomes a climate-relevant variable. Carbon-intensity-aware scheduling (running heavy inference at low-carbon grid hours) is an under-deployed mitigation.
Adversarial robustness — climate-MRV and grid-control systems are increasingly attractive targets for adversarial inputs (spoofed satellite imagery, manipulated sensor streams). The security literature here is thin.
Privacy and dual-use — sub-meter satellite imagery of farms, factories, homes carries genuine privacy and competitive-intelligence concerns. Regulatory frameworks (e.g. NOAA’s commercial remote-sensing licensing under NOAA’s Office of Space Commerce) are still maturing.

18. Practitioner notes

A handful of practical observations from working in the field, intended to save future agents some time:

Pre-2020 EO ML benchmarks are mostly broken. Many of the standard datasets (BigEarthNet, EuroSAT, AID) have severe geographic biases, label noise, and leakage. Treat any model trained only on these as a toy.
ERA5 is not ground truth. It is a reanalysis — a model-data hybrid — and inherits biases in regions with sparse observations (interior tropics, polar regions before 1979). Be especially cautious about precipitation, surface flux, and boundary-layer cloud fields.
6-hourly resolution is a constraint not a feature. Most ML weather models inherit 6 h time steps because that’s the ERA5 native step. Sub-daily extremes (convective storms, diurnal cycles in the tropics) are systematically under-represented.
Probabilistic > deterministic for decisions. Forecast users care about thresholds and tail risks, not point predictions. Generative ensemble models (GenCast, MetNet-3 probabilistic head) are decisively the right interface for downstream decisions.
Compute is rarely the bottleneck for impact. Inference of a SOTA weather model on a single A100 takes seconds. The bottleneck is data plumbing, deployment infrastructure, and credibility with the operational forecaster community.
Climate science is collegial. Most data is open (ECMWF, NOAA, NASA, ESA, CMIP6, Copernicus, NCAR), most code is open, most researchers are reachable. Read the methods sections of the source papers carefully before assuming a benchmark number means what you think it means.
Save your findings. Per machine-wide protocol, when a non-obvious result emerges, write it down with provenance — vault search will surface it for the next agent.
Beware single-number metrics. A GraphCast that beats IFS on Z500 ACC at day 7 may underperform on precipitation, tropical cyclones, or stratospheric variables. The right table has dozens of metric-variable-leadtime cells; demand to see them.
The reanalysis is not the world. A model that fits ERA5 perfectly may still mispredict the real atmosphere because ERA5 itself is biased. Some operational evaluation against independent observations (radiosondes, ARGO floats, FluxNet towers) is non-negotiable.
Open-weights ≠ reproducible. Many open-weights releases lack training data, training code, evaluation code, or hyperparameters. Reproduction often takes more effort than the original paper because of missing scaffolding.
Climate timescales are slow; product timescales are fast. A useful annual-mean signal can be obscured by noise for years. Be honest about the difference between weather skill (verifiable in days) and climate-projection skill (decades).
Defer to operational meteorologists for interpretation. ML models can produce technically valid forecasts that experienced forecasters know are physically suspect. The human-in-the-loop is not vestigial.
Treat “AI” as a feature, not the product. End users — farmers, grid operators, insurers, foresters, water managers — care about decisions, not architectures. If an ML capability cannot be integrated into an existing decision workflow, it does not matter how good the benchmark numbers are.
Use the vault. Run obsidian-research.mjs query "<topic>" --hybrid before reinventing background; save back findings with the canonical tag, log breakpoints, re-index after batches. The protocol exists because climate research is wide, slow-moving, and rewards cumulative context.

Adjacent

physical-climate-system — dynamical foundations of GCMs, NWP, and the atmosphere being emulated.
carbon-cycle-and-greenhouse-gases — quantities the satellites and accounting tools track.
climate-impacts-and-adaptation — sectoral impacts the adaptation tooling targets.
climate-mitigation-and-adaptation — broader mitigation portfolio in which ML is one lever.
datacenter energy and cooling — the energy footprint of training and inference.
electricity markets and grid economics — where renewable-forecasting and battery-management products meet markets.
optimization and control theory — the optimization backbone behind grid + plasma + HVAC control.
nuclear reactor engineering — the SMR + reactor restart hardware backing AI compute scale-up.
process engineering and control — the chemical-engineering substrate for industrial decarbonization stacks.
agricultural machinery and automation — the precision-ag hardware platform for ML in farms.
ai hardware and accelerators — the GPU/TPU/ASIC stack ML weather and emulation models rely on.
numerical methods and pdes — discretization, solvers, and stability analysis underpinning hybrid models.
optimization and control theory — the optimization machinery backing fusion, grid, and HVAC control.
nuclear power economics — economics behind the SMR and existing-reactor-restart wave powering datacenters.

Compendium

Explorer

AI & Machine Learning for Climate — Forecasting, Modeling, Mitigation, Adaptation