Walkthrough: Design a Hyperscale Data Center Cooling System (50 MW IT load)
A 50 MW IT-load hyperscale data center is a mid-sized hyperscaler hall, or a single phase of a multi-phase campus. Microsoft Azure Quincy (WA), Meta Prineville (OR), Google Council Bluffs (IA), and Equinix DC campuses in Ashburn (Northern Virginia, “Data Center Alley”) all build halls in this rough class and then aggregate to 200 MW to 1 GW total. The hard problem is not generating compute — it is rejecting 50 MW of heat reliably, every second, for 15 to 20 years, in whatever climate the site happens to sit in, while pushing PUE (Power Usage Effectiveness) under 1.20 and managing both grid and water usage to satisfy 24/7 carbon-free energy commitments.
This walkthrough builds a cooling architecture for a 50 MW IT-load facility, climate-comparing three reference siting decisions — Northern Virginia (Ashburn), Stockholm (Sweden), and Singapore — and resolving the design choice as a function of climate, water availability, rack power density, regulatory regime, and heat-reuse opportunity.
1. Heat-load envelope: what 50 MW actually means
50 MW IT load means 50 MW (~170 MMBtu/h, 170,608 kBTU/h) of waste heat that has to leave the white-space at the temperature the silicon can tolerate. At rack scale:
- Legacy enterprise rack: 5 to 10 kW per rack — air-cooled with raised-floor CRAH (Computer Room Air Handler) plus cold-aisle containment, no exotic technology
- High-density CPU rack: 15 to 25 kW per rack — still air-coolable with cold-aisle containment, in-row coolers, or rear-door heat exchangers (RDHx)
- AI inference rack (Nvidia H100 or AMD MI300X air-cooled): 30 to 50 kW per rack — at the upper limit of practical air cooling; AMD MI300X air-cooled SKUs draw ~50 kW per rack
- AI training rack (Nvidia GB200 NVL72 with 72 Blackwell GPUs and 36 Grace CPUs in one rack): 132 kW per rack, direct-to-chip liquid cooled — air cooling is physically impossible at this density
- Nvidia GB300 NVL72 (refreshed 2025 platform): ~140 kW per rack
- Custom hyperscaler reference (Microsoft, Meta, Google internal racks for training pods): 100 to 200 kW per rack with direct-to-chip cold plates and rear-door exchangers as a backup
At 50 MW IT load with a mixed fleet of training + inference + storage, plan for ~5,000 racks at average 10 kW (legacy), or 400 racks at 125 kW (Blackwell training pod), or somewhere between. The cooling architecture is fundamentally different at the two ends of that range. The 2026 industry weight is shifting hard toward 80+ kW per rack averaged, driven by GPU adoption.
Heat-density consequence
Air at sea level, 20°C, has volumetric heat capacity of ~1,200 J/m³·K. Pushing 50 MW through air at a 12°C ΔT (delivered cold-aisle 22°C, return hot-aisle 34°C) requires:
V_air = 50e6 W / (1,200 J/m³·K × 12 K) = 3,470 m³/s = 7.35 million CFM
That is ~360 commercial CRAH units at 20,000 CFM each. Practical but enormous; the fan power alone is a significant fraction of the PUE budget. For liquid cooling, water at 20°C carries ~4.18 MJ/m³·K — 3,500x denser:
V_water = 50e6 W / (4.18e6 J/m³·K × 12 K) = 1.0 m³/s = ~16,000 GPM
That fits in a single 24-inch (600 mm) header pair feeding the facility. This is why liquid is unavoidable for AI-density compute, and why every hyperscaler is migrating the racks under their hot densest training loads off air.
2. Cooling-stack reference architectures
There are three canonical stacks. A 50 MW hall will typically run two or all three in parallel for different rack classes.
Stack A: Air-cooled CRAH plus chilled-water (legacy enterprise / general compute)
Topology:
Server fans → cold aisle (~22°C) → server intake → server exhaust → hot aisle (~34°C)
→ CRAH coil → chilled-water primary loop (7°C supply, 14°C return)
→ chillers OR water-side economizer → cooling towers / dry coolers
→ atmosphere
Chillers are typically Trane Sintesis (R-1234ze, low-GWP refrigerant), York YK (Johnson Controls; R-134a or R-1233zd), or Carrier 19DV (centrifugal, magnetic-bearing, R-1233zd) at 1.5 to 3 MW thermal each — so a 50 MW hall needs ~20 chillers in 2N redundancy (40 installed). Cooling towers from BAC (Baltimore Aircoil), EVAPCO, or Marley/SPX handle heat rejection — counterflow induced-draft, sized for 1.5x peak duty for fouling and degradation margin.
PUE achievable: 1.30 to 1.45 in temperate climates; this is the colo / older hyperscaler norm.
Stack B: Free cooling (air-side or water-side economizer)
Air-side: outside air is filtered (MERV 13 to 14, F7/F8 EN779) and ducted directly into the white space when ambient is cold enough — Google’s Pacific Northwest sites, Meta Lulea (Sweden, just below the Arctic Circle), Microsoft Azure Quincy (WA, high-desert dry climate) all run heavy air-side economization. Adiabatic / evaporative augmentation (water spray into the inlet air) extends the operating range into hotter, drier climates (Texas, Arizona, Iowa).
Water-side: chilled-water loop bypasses the chillers when wet-bulb is low enough — the cooling tower or dry cooler directly produces the required chilled-water temperature.
Achievable hours of “100% free cooling” per year:
- Stockholm: ~7,500 to 8,200 hours per year (>85%, dry bulb often below 10°C)
- Ashburn (Northern Virginia): ~4,500 to 5,500 hours per year (~55%, summer wet-bulb hits 26°C)
- Singapore: ~0 hours (year-round dry bulb 25 to 32°C, wet bulb 24 to 27°C — equatorial; mechanical cooling required at all times)
Stack B PUE: 1.10 to 1.20 in cold/dry climates; 1.25 to 1.40 in mixed climates.
Stack C: Direct-to-chip liquid + heat reuse (AI / hyperscaler training pods, 80+ kW racks)
Topology:
GPU/CPU die → cold plate (CoolIT OMNI, Asetek RackCDU, Motivair MCDU, JetCool SmartPlate two-phase)
→ secondary loop (PG25 propylene glycol 25%, corrosion inhibitor, biocide, ~30 L/min per rack)
→ CDU (Coolant Distribution Unit — typically 1.5 to 2 MW each; Motivair, CoolIT, Vertiv Liebert XDU)
→ primary loop (warm water, 35 to 45°C supply, 45 to 55°C return)
→ dry cooler OR district heating handoff OR chilled-water loop
The CDU isolates the in-rack secondary loop (clean, filtered, monitored coolant) from the facility’s primary loop and provides leak detection, flow balancing, pressure control. ASHRAE TC 9.9 W32 / W45 / W+ classes define facility supply water temperatures up to 45°C, enabling year-round dry-cooler heat rejection (no chiller, no cooling tower water) even in warm climates. This is the architecture every new hyperscale AI build is centered on as of 2025-2026.
Rear-door heat exchangers (RDHx — Motivair ChilledDoor, Vertiv Liebert XDR, Stulz CyberRow series) catch the residual ~20% of heat that the cold plate misses (memory, NICs, PSU losses). With cold plate + RDHx, a rack handles 132 kW of GB200 with no white-space airflow at all — total in-room return temperature drops by 12 to 18°C compared to all-air.
Stack D: Immersion cooling (emerging for hot-density)
Single-phase: server PCBs submerged in dielectric mineral oil or synthetic fluid; pump circulates fluid through external heat exchanger. Submer SmartPodX, GRC ICEraQ, LiquidStack, Asperitas. Useful at 50 to 100 kW per tank.
Two-phase: refrigerant boils on hot components, vapor condenses on overhead coil. Higher heat flux capability (1,000+ W/cm²) but flammable / regulated fluids. 3M Novec 7100 / Fluorinert FC-3284 family was the legacy fluid; 3M announced exit from PFAS (per- and polyfluoroalkyl substances) production in December 2022, completing wind-down by end of 2025 due to environmental and regulatory pressure. The industry pivoted to Opteon (Chemours) and Solvay Galden replacements; some operators paused two-phase deployments pending alternatives. LiquidStack and Aragon Submer continue with these next-gen fluids; PFAS regulatory landscape in EU (ECHA universal PFAS restriction proposal 2023) and US EPA TSCA continues to evolve.
Immersion PUE: <1.05 achievable for cooling alone, but operational complexity and fluid cost have kept it niche; ~2 to 5% of global hyperscale capacity by 2026.
3. Site-specific design: Ashburn, Stockholm, Singapore
Ashburn (Northern Virginia)
- Climate: hot-humid summer (35°C dry bulb, 26°C wet bulb peak); mild winter
- Water: Loudoun County water authority adequacy contested; data-center sector consumes ~16% of county potable water (2024 reports); county has imposed restrictions on new water-cooled builds, pushing operators toward dry coolers and adiabatic-only
- Grid: PJM Interconnection; nuclear-heavy mix plus rapidly growing natural gas; transmission saturation since 2023 forced ~7-year interconnect queues
- Architecture: Hybrid — chilled-water plant with R-1234ze magnetic-bearing chillers (50 MW peak rejection at 35°C ambient); water-side economizer cuts in ~4,500 h/yr; direct-to-chip liquid loops on GPU racks rejecting to dry-cooler array (no water consumption above 32°C ambient via warm-water W45 operation); evaporative augmentation limited by water restrictions
- Expected PUE: 1.18 to 1.25; WUE (Water Usage Effectiveness): 0.3 to 0.8 L/kWh-IT under restriction-compliant operation
Stockholm (Sweden)
- Climate: cold (winter -10°C, summer 25°C peak); low wet-bulb year-round
- Grid: Nordic synchronous area; hydro + nuclear dominant; among world’s cleanest grids
- Heat reuse: Stockholm Data Parks initiative (Stockholm Exergi district-heating partner); guaranteed offtake of waste heat at 35 to 65°C for district heating network serving 800,000+ residences. Bahnhof Pionen and Equinix ST1/ST2 already supply heat to the Stockholm grid. The economic and regulatory environment specifically incentivizes designing the facility around heat export — heat is sold, not rejected.
- Architecture: Direct-to-chip liquid cooling at ~50°C return → heat pump (Mayekawa NewTon-R NH3 ammonia heat pump, or MAN Energy Solutions ETES) upgrades to 80 to 90°C for district heating supply. Air-side economization for legacy racks ~8,000 h/yr. Backup chiller plant minimal.
- Expected PUE: 1.07 to 1.12; effective PUE (including heat-reuse credit) <1.00 (i.e., facility is a net energy exporter on a thermal basis)
Singapore
- Climate: equatorial — dry bulb 25 to 33°C, wet bulb 25 to 27°C every day of the year; tropical maritime; relative humidity 70 to 90%
- Regulatory: Singapore IMDA imposed a data center moratorium 2019 to 2022; reopened with Pilot Programme tying new permits to PUE ≤1.30, water restrictions, renewable PPAs
- Architecture: Water-cooled chillers mandatory year-round (free cooling impossible); high-COP chillers (Carrier 19DV magnetic-bearing oil-free centrifugal, COP 6.5 to 7.0 at design); cooling towers with reclaimed NEWater; direct-to-chip liquid cooling at W32 (32°C primary supply) to reduce chiller lift; tropical-design enclosures + high-temperature operating envelopes (ASHRAE A2/A3) for IT
- Expected PUE: 1.28 to 1.35 (the equatorial penalty); regulators willing to grant licenses only at the floor of this range
4. The cooling loop architecture in detail
A reference 50 MW DC built for AI workload (the 2026 default) layers:
Loop 1: In-rack secondary
- Coolant: 25% propylene glycol + corrosion inhibitor (Chemours, Dynalene, Houghton); deionized-water alternative for closed cold-plate loops with internal demin
- Flow: 30 to 60 L/min per rack at 35°C supply, 50°C return (15 K ΔT under W45 ASHRAE class)
- Materials: copper cold plates (CoolIT, Motivair), 316L SS / EPDM manifolds, quick-disconnect fittings (Staubli RBE, CPC LQ8)
- Quality control: 0.5 µm filtration, conductivity monitoring (<5 µS/cm), biocide dosing
Loop 2: Coolant Distribution Unit (CDU)
- Rated 1.5 to 2 MW thermal per CDU; 50 MW hall needs ~30 CDUs in 2N redundancy (60 installed) or ~20 in N+1 (smaller blast radius); typical product Motivair MCDU-1500, CoolIT CHx1500, Vertiv Liebert XDU
- Brazed-plate heat exchanger between secondary and primary; redundant pumps; flow control valves per rack manifold; leak detection (capacitive sensors, optical)
Loop 3: Primary (facility) chilled-water or warm-water
- Pipe header: 600 to 750 mm diameter Schedule 40 carbon steel or HDPE-lined
- Flow: 50 MW at 15 K ΔT = ~800 kg/s = ~830 m³/h
- Pumps: variable-speed centrifugal (Grundfos NK, KSB Etanorm, Bell & Gossett), 4N + standby
- Temperature: W32 (32°C supply, 47°C return) for chilled operation; W45 for warm-water dry-cooler operation
Loop 4: Heat rejection
Air-cooled chillers (Trane Sintesis, Daikin Pathfinder) OR water-cooled chillers + cooling towers (BAC, EVAPCO, Marley) OR dry coolers (Güntner, Modine, Frigel) OR district-heating handoff.
For a 50 MW AI-density build, the dominant 2026 choice is high-temperature dry-cooler arrays (no water, no chiller, just air-blast on a warm-water loop) — this is what enables Microsoft to publicly commit to zero-water-consumption builds starting 2026 (announced December 2024).
5. PUE, WUE, CUE — the performance metrics
PUE = Total Facility Energy / IT Energy
- 1.00 = theoretical ideal (no overhead)
- <1.10 = best-in-class hyperscaler (Google fleet 2024 average 1.10, Meta 1.08, Microsoft 1.12)
- 1.25 to 1.40 = colocation industry average (Equinix global fleet 1.40, Digital Realty 1.45, NTT 1.50)
-
1.60 = aging enterprise data centers — replacement-candidate
WUE = Liters of Water / kWh-IT
- 0.0 = zero-water design (air-cooled chillers + dry coolers); Microsoft target post-2026 for new builds
- 0.2 to 0.8 = adiabatic / partial-evaporative
- 1.0 to 2.0 = cooling-tower-heavy (Singapore, hot climates)
CUE = Carbon UE = kg CO2e / kWh-IT — depends on grid mix; Stockholm <0.05, Ashburn ~0.35 (PJM), Singapore ~0.40
6. Power side (yes, you have to design this too)
Cooling cannot be designed independent of power. A 50 MW IT load with PUE 1.20 requires 60 MW total facility, served by:
- Utility feed: 230 kV transmission to a dedicated substation, two independent feeders (N+1); transformers to 34.5 kV or 13.8 kV distribution (ABB, Siemens, Hitachi Energy, GE-Prolec)
- Medium-voltage switchgear: 13.8 kV double-bus, two-circuit-breaker, electrically and mechanically interlocked (ABB Relion + GIS, Siemens 8DJH); arc-flash mitigation
- UPS: lithium-ion (Vertiv Liebert EXL S1, Schneider Galaxy VL, Eaton 93PM) increasingly preferred over VRLA lead-acid for footprint and lifecycle (~10x cycle life); 2N redundancy at the system level; ride-through ~5 to 10 minutes
- Backup generators: diesel reciprocating (Caterpillar 3516E rated 2,500 kVA, Cummins QSK95, MTU 4000 series, Kohler KD) at 3 to 5 MW each — 50 MW hall needs ~15 gensets in 2N (30 installed). Cat 3516E is the workhorse; 95% load step ~7 to 10 seconds. Some operators (Microsoft, Switch) piloting natural-gas reciprocating, fuel-cell (Bloom Energy SOFC), and lithium-ion-battery-only solutions to escape the diesel-fuel logistics and emissions penalty
- PDU + rack PDU: Schneider Square D, Vertiv Liebert PowerIT, ABB Smissline; per-outlet metering, remote switching, environmental monitoring
Resilience tier: Uptime Institute Tier III (concurrently maintainable, N+1 across all systems) is the colocation standard; Tier IV (fault-tolerant, 2N+1) for the most critical workloads but most hyperscalers run Tier III with redundancy at the application / distributed-systems layer instead.
7. Sustainability and 24/7 carbon-free energy
The 2024-2026 wave of hyperscaler nuclear PPAs reshapes the carbon picture:
- Microsoft: 20-year PPA with Constellation to restart Three Mile Island Unit 1 (renamed Crane Clean Energy Center), targeting 2028 operation, ~835 MW capacity
- Amazon (AWS): $650M acquisition of Talen Energy’s Cumulus data center campus in Pennsylvania adjacent to Susquehanna nuclear plant; direct nuclear-to-DC power; followed by Equinix-Oklo partnership, Energy Northwest agreement for X-Energy SMR, and others totaling multi-GW commitments through 2030
- Google: agreements with Kairos Power for 500 MW small modular reactor deployment (first reactor ~2030), with Constellation, with NextEra
- Meta: RFP issued Dec 2024 seeking 1 to 4 GW of nuclear capacity
- Oracle: 1 GW nuclear data center campus announced 2024
Beyond nuclear: solar + battery + wind matched on a 24/7 hourly basis (Google’s 24/7 CFE program 2030 target; Microsoft 100/100/0 commitment — 100% renewable, 100% time-matched, zero water increase by 2030). Heat reuse for district heating where geography permits (Sweden, Denmark, Finland, Netherlands).
8. Refrigerants in transition
- R-22 (HCFC): phased out under Montreal Protocol (US 2010 production ban)
- R-410A (HFC, GWP 2,088): being phased down under AIM Act 2020 (US) and F-Gas regulation (EU) — high-GWP HFCs reduce 85% by 2036
- R-454B (HFC blend, GWP ~466): R-410A successor for residential / small commercial chillers
- R-1234ze (HFO, GWP <1): preferred for large centrifugal chillers (Trane Sintesis, Daikin) — non-flammable A2L mild
- R-1233zd (HFO, GWP ~5): low-pressure centrifugal preferred (York YK, Carrier 19DV)
- Ammonia R-717 (GWP 0, ODP 0): excellent thermodynamic performance, toxic — restricted to industrial settings and machinery rooms with leak detection; common in heat-pump applications (Mayekawa NewTon, GEA RedAstrum)
- CO2 R-744 (GWP 1): emerging for low-temperature and transcritical applications
- Glycol secondary loops (ethylene or propylene glycol 20 to 40%): standard between chiller and remote air handlers
The R-410A phase-down forces a 2025-2030 refit cycle across older colocation; new hyperscaler builds default to R-1234ze / R-1233zd or pure water (warm-water cooling, no refrigerant in the facility loop).
9. Reference projects and what they tell us
- Microsoft Azure Quincy (Washington): 600+ MW campus, hydropower from Columbia River; aggressive air-side economization; PUE ~1.12
- Meta Prineville (Oregon) and Meta Luleå (Sweden): pioneered direct fresh-air cooling at scale (~100% economized hours at Luleå); Open Compute Project hardware reduces fan power 30%+; PUE ~1.08
- Google Council Bluffs (Iowa) and Google Hamina (Finland): Hamina uses seawater free cooling (former paper mill site); Council Bluffs deploys evaporative cooling on flat plains
- Equinix DC campus (Ashburn, VA): 1+ GW aggregate across multiple Tier III facilities; colocation operator average PUE 1.40
- NVIDIA SuperPOD reference designs (DGX H100, DGX GB200): published cooling guidance for partner data centers; specifies cold-plate + RDHx hybrid at >50 kW/rack
- CoreWeave (Wisconsin): greenfield AI-cloud build 2024-2025; direct-to-chip liquid throughout, dry-cooler heat rejection, ~120 MW Phase 1
- xAI Memphis (“Colossus”): 100,000 H100 cluster brought online mid-2024 (later expansion to 200,000); supplemental power via on-site Solar Energy + ENGIE natural-gas turbines + diesel generators (regulatory controversy); cooling via Supermicro liquid-cooled rack reference design
- Mistral (Paris, with Equinix): European sovereign-AI build; Equinix-supplied colocation with French nuclear-grid power; ~40 MW
- Switch SUPERNAP (Las Vegas, Reno, etc.): high-density colo specialized for hot-aisle separation and thermally-zoned air management
10. Build-out timeline for a 50 MW hall
- Months 0 to 6: site selection (utility availability, fiber, climate, water, permitting, tax incentives); RFI to utility
- Months 6 to 12: substation and transformer order (12 to 18-month lead times; supply-chain constrained since 2022); design freeze
- Months 12 to 24: civil + structural + MEP construction; chiller and CDU procurement (Trane / Daikin / Motivair lead times 9 to 14 months in 2025-2026)
- Months 18 to 30: commissioning (Cx) — Level 1 component, Level 2 system, Level 3 integrated, Level 4 load-bank-tested
- Months 24 to 36: utility energization, IT deployment, ramp to full IT load
For an AI-pod retrofit in an existing hall, the cycle compresses to 6 to 12 months — but only if the facility was originally built with floor-loading, header, and CDU space for liquid cooling. Most hyperscaler 2024+ designs include this even if not initially commissioned.
11. The mistake everyone makes
Most data-center cooling problems trace to one of three errors:
- Designing for steady-state full-load only — real workloads have a 3:1 to 5:1 ratio of peak to idle; cooling that throttles efficiently down to 20% load is essential (variable-speed chillers, magnetic-bearing oil-free centrifugals, modular CDU staging)
- Mixing rack-density classes on a shared loop — a 10 kW legacy rack and a 130 kW Blackwell rack want completely different supply temperatures; force them onto a single chilled-water loop and the high-density rack throttles while the legacy rack over-cools
- Underestimating supply-water temperature tolerance — designing to W17 (17°C supply) when the silicon is rated for W45 means you carry a chiller load and a refrigerant inventory you never needed; the W32/W45 default for new AI builds is the right answer
12. Adjacent
- design-district-energy-system — for the heat-reuse side when the data center exports thermal energy to a city district network
- design-turbomachinery-cooling-loop — the closely-related industrial cooling loop with similar primary/secondary/tertiary architecture
- HVAC-systems-overview — HVAC fundamentals
- Heat-exchangers — plate, shell-and-tube, finned-tube heat-transfer surfaces relevant to CDUs and dry coolers
- Data-center-power-demand — the macro picture of AI compute pulling on grid capacity
- Refrigerants-and-GWP — global warming potential of HFCs/HFOs and the phase-down timetable