Cloud Provider Service Mapping — AWS, GCP, Azure, Oracle, Alibaba, IBM
This is a Tier 3 family-index note. It maps equivalent services across the six major public clouds so that when an architecture references “S3” you can find the GCP / Azure / Oracle / Alibaba / IBM analog in one lookup, and when a service is unique (BigQuery serverless, Cloudflare R2 egress-free, Lambda 15-minute ceiling) you see why. The companion deep-dives live in adjacent Tier 3 notes (database engines, LLM landscape, ML frameworks, observability tools) and Tier 2 ecosystem notes for each hyperscaler.
Market context (2024 hyperscaler share)
Public-cloud infrastructure share, Q4 2024:
- AWS — ~31% of global IaaS+PaaS revenue. Still the leader by a wide margin; deceleration vs Azure’s growth rate is the running narrative.
- Microsoft Azure — ~25%. Fastest-growing of the top three, riding the OpenAI partnership (multi-tens-of-billions, multi-year) and the Entra ID / Microsoft 365 cross-sell into enterprise.
- Google Cloud (GCP) — ~11%. Smaller but profitable for the first full year in 2023 and growing at 30%+ on the back of BigQuery, Vertex AI, and TPU capacity.
- Alibaba Cloud — ~4%. Dominant in mainland China; weak outside APAC. Sells through Aliyun.
- Oracle Cloud Infrastructure (OCI) — ~3%. Smaller but punching above its weight on price/performance for compute and database workloads; the Oracle Database + OCI lock-in is a real moat.
- IBM Cloud — ~3%. Mainly hybrid + Red Hat OpenShift; the strategic story is hybrid Kubernetes via OpenShift on any cloud.
- Salesforce — ~3%. Counted in some IaaS+PaaS league tables because of Heroku + the Hyperforce migration of Sales/Service Cloud onto AWS+GCP infrastructure.
- Others (DigitalOcean, Linode/Akamai, Vultr, Hetzner, OVHcloud, Tencent Cloud, Huawei Cloud, IBM Watson, Cloudflare Workers + R2 as a partial cloud) make up the long tail.
The relevant fact for architecture choices: if you’re optimizing for a single vendor’s gravity, AWS and Azure each have enough surface area to do almost anything; GCP wins on data-warehouse and ML; OCI wins on Oracle DB workloads; Alibaba wins on China-resident traffic; IBM wins on regulated hybrid.
Compute
Virtual machines (general-purpose VMs)
| Vendor | Family / product | Notable instance types | Notes |
|---|---|---|---|
| AWS | EC2 | m6i / m7i (general Intel), m6a / m7a (AMD), m7g (Graviton ARM), c7i (compute-opt), r7i (memory-opt), x2idn (extra-large memory), g5 / g6 (NVIDIA L4/A10G), p4d / p5 / p5e / p6 (NVIDIA H100 / B100 / B200), Trn1 / Trn2 (Trainium), Inf2 (Inferentia2) | Most type diversity. Graviton ARM (m7g/c7g/r7g) is ~40% better price/perf than Intel for many workloads. |
| GCP | Compute Engine | N2 (Intel general), N2D (AMD), N4 (newest Intel), E2 (cost-optimized burst), T2D / T2A (Tau AMD/Arm), C3 (Intel Sapphire Rapids), H3 / H4 (HPC), A2 / A3 / A4 (NVIDIA A100 / H100 / B200), TPU v4 / v5e / v5p / v6 (“Trillium” 2024) | TPU is GCP-exclusive; v6 Trillium launched Oct 2024 with 4.7× the compute of v5e. |
| Azure | Azure VMs | D-series (general), F (compute), M (memory ≥4 TB), L (storage), ND / NV / NC (GPU — H100, A100, MI300X), NCC H100 v5 (confidential GPU 2024) | Azure offers AMD MI300X (Instinct) GPUs at scale ahead of AWS/GCP; ND H100 v5 is the OpenAI training fleet. |
| Oracle | OCI Compute | E4 / E5 (AMD EPYC), A1 / A2 (Ampere Altra ARM 80–128 cores), Standard.E4 / Standard.E5 Flex, BM.GPU.H100 / BM.GPU.GB200 (NVIDIA) | OCI’s Ampere ARM is the cheapest mainstream ARM cloud VM (Always Free tier includes 4 cores + 24 GB RAM). |
| Alibaba | ECS | ecs.g7 / g8 (general), ecs.c7 (compute), ecs.r7 (memory), ecs.gn7i / gn7e (NVIDIA A100), ecs.ebmgn8 (H100) | Largest cloud presence in mainland China. |
| IBM | IBM Cloud Virtual Servers | bx2 (balanced), cx2 (compute), mx2 (memory), gx3 (NVIDIA H100/L40S) | IBM’s compute is thinner; the story is OpenShift + Red Hat. |
The Graviton story for AWS is the most consequential single VM change of the last five years: anything compiled for ARM (or interpreted languages like Python, Node, Ruby, Java) typically runs 20–40% cheaper on Graviton than Intel x86, and Graviton’s share of new AWS EC2 launches is now north of 50%.
Bare-metal
- AWS — bare-metal EC2 instances (e.g. m7i.metal-48xl), AWS Outposts (AWS hardware in your data center), Snow Family for edge.
- GCP — Bare Metal Solution (mostly for SAP HANA and Oracle workloads), Google Distributed Cloud Edge.
- Azure — Azure Stack Hub / Edge / HCI for on-premises; bare-metal Azure VMware Solution.
- Oracle — Bare Metal Compute is a first-class OCI offering; price/perf-competitive for HPC.
- Alibaba — ECS Bare Metal (ebm series).
- IBM — Bare Metal Servers; Power Systems Virtual Server for AIX/IBM i.
Spot / preemptible
| Cloud | Name | Typical discount | Interruption SLA |
|---|---|---|---|
| AWS | Spot Instances | up to 90% off on-demand | 2-minute warning; can be reclaimed any time |
| GCP | Spot VMs (replaced Preemptible 2021) | 60–91% off | 30-second warning; max 24h for preemptible-legacy, no max for Spot |
| Azure | Azure Spot VMs | up to 90% off | 30-second warning |
| Oracle | OCI Preemptible Instances | ~50% off | hard 24-hour max |
| Alibaba | Preemptible Instances (Spot) | up to 90% off | similar to AWS |
For training jobs that can checkpoint and resume, spot is the single biggest cost lever in cloud compute. SageMaker, Vertex AI, and Azure ML all expose managed spot integration with auto-resume.
Container orchestration (managed Kubernetes)
- AWS — EKS (managed K8s control plane, $0.10/hr/cluster), EKS Anywhere (on-prem K8s), EKS Fargate (serverless pods), ECS (AWS-proprietary container orchestrator, simpler than K8s), ECS Fargate (serverless containers on ECS), App Runner (PaaS for containers).
- GCP — GKE (managed K8s; the original — Google open-sourced K8s in 2014), GKE Autopilot (serverless K8s — Google manages nodes), Cloud Run (serverless containers, request-scaled 0+), Cloud Run Jobs (batch).
- Azure — AKS (managed K8s), AKS Automatic (Auto-managed analog of Autopilot, 2024), Container Apps (serverless KEDA-based, similar to Cloud Run), Container Instances (single-container hosting).
- Oracle — OKE (Oracle Kubernetes Engine), Container Instances.
- Alibaba — ACK (Container Service for Kubernetes), Serverless Kubernetes.
- IBM — IBM Cloud Kubernetes Service, Red Hat OpenShift on IBM Cloud (premium managed OpenShift).
The cross-vendor pattern: every major cloud has a “managed K8s” (charge for control plane + worker nodes) and a “serverless container” tier (charge per request or per second of running pod). For greenfield, Cloud Run and Container Apps are the most ergonomic; for large fleets, EKS / GKE / AKS give you the full K8s API surface.
Serverless functions
| Vendor | Product | Max wall time | Max memory | Cold-start typical | Notes |
|---|---|---|---|---|---|
| AWS | Lambda | 15 minutes | 10,240 MB | 200ms–2s (longer with VPC) | Most mature; supports container images up to 10 GB; SnapStart for Java + .NET; Provisioned Concurrency removes cold starts. |
| AWS | Lambda@Edge | 5s (viewer), 30s (origin) | 10,240 MB (origin), 128 MB (viewer) | low | Runs on CloudFront PoPs. |
| GCP | Cloud Functions (2nd gen) | 60 minutes (HTTP), 9 minutes (event) | 16 GB | 200ms–2s | Built on Cloud Run under the hood. |
| GCP | Cloud Run | 60 minutes | 32 GB | 50ms–1s; min-instances=0 keeps it free | Best fit for HTTP services that want container flexibility; CPU-always-on or CPU-only-during-requests. |
| Azure | Azure Functions | 5 minutes (Consumption), 30 minutes (Premium), unlimited (Dedicated) | 1.5 GB Consumption, 14 GB Premium | varies | Multiple hosting plans; Durable Functions for orchestration. |
| Cloudflare | Workers | 50ms CPU (free), 30s CPU (paid), wall time unlimited via Durable Object | 128 MB | <5 ms (V8 isolates, not containers) | Cheapest at scale; runs in 300+ PoPs globally. |
| Cloudflare | Workers Durable Objects | unlimited | 128 MB | similar | Stateful single-instance objects globally addressable. |
| Fastly | Compute@Edge | 50ms typical | 256 MB | <1 ms (WebAssembly) | WASI-based; cold start essentially zero. |
| Vercel | Functions (Node + Edge) | 10s Hobby, 5min Pro, 15min Enterprise | up to 3 GB | varies | Edge runtime is V8 isolate (similar to Workers); Serverless functions run on AWS Lambda. |
| Netlify | Functions + Edge Functions | 10s Background-26 minutes | 1 GB | varies | Edge Functions are Deno-based on Cloudflare. |
| Deno Deploy | — | 50ms CPU per request, wall time unlimited | 512 MB | <50ms | Deno’s own V8 isolate platform. |
For workloads that fit in 50ms of CPU per request and are globally distributed, Cloudflare Workers is roughly 10× cheaper than Lambda. For traditional 100-1000ms request handlers with VPC access, Lambda still wins on ecosystem.
Storage
Object storage
| Vendor | Product | Durability | Egress pricing | Notable |
|---|---|---|---|---|
| AWS | S3 | 99.999999999% (11 nines) | $0.09/GB out (first 100 GB free/month) | The reference object store; S3 Intelligent-Tiering auto-moves between Standard / IA / Archive. |
| GCP | GCS (Google Cloud Storage) | same 11 nines | $0.085–0.12/GB | Single multi-region buckets (US, EU, ASIA) for global low-latency reads. |
| Azure | Blob Storage | 11 nines (GRS), 12 nines (RA-GZRS) | $0.087/GB | Hot / Cool / Cold / Archive tiers. |
| Cloudflare | R2 | 11 nines | $0/GB egress | The competitive lever — egress-free pricing breaks AWS’s data-gravity moat for many workloads. $0.015/GB stored. |
| Wasabi | Wasabi Hot Cloud | 11 nines | $0/GB egress (capped at stored amount) | Egress-free if you don’t egress more than 100% of your stored capacity per month. |
| Backblaze | B2 | 11 nines | $0.01/GB | Cheap, S3-compatible API, popular for backups. |
| MinIO | self-host | depends on deployment | n/a | Open-source, S3-API-compatible; runs in K8s or bare metal. |
| Oracle | OCI Object Storage | 11 nines | 10 TB/month free egress | Best free egress allowance of any major cloud. |
| Alibaba | OSS | 11 nines | varies | Standard in Aliyun region. |
Block storage
- AWS EBS — gp3 (default, baseline 3000 IOPS scales up), io2 (ultra-IO, up to 256k IOPS), io2 Block Express (4 GB volumes for SAP HANA), sc1 / st1 (cold/throughput-optimized HDD).
- GCP Persistent Disk — pd-standard, pd-balanced (default), pd-ssd, pd-extreme. Hyperdisk (2023) is the newer tier with separately-tunable capacity / IOPS / throughput.
- Azure Managed Disks — Standard HDD / Standard SSD / Premium SSD / Premium SSD v2 (newer, tunable like Hyperdisk) / Ultra Disk.
- OCI Block Volumes — Balanced / Higher Performance / Ultra High Performance.
File / NFS storage
- AWS — EFS (NFSv4 fully-managed elastic file system), FSx for NetApp ONTAP (commercial NetApp), FSx for Lustre (HPC parallel filesystem), FSx for OpenZFS, FSx for Windows File Server.
- GCP — Filestore (NFS, Basic / Enterprise / High Scale tiers).
- Azure — Azure Files (SMB + NFS), NetApp Files (premium NFS via NetApp partnership), Azure HPC Cache for compute-near caching.
- OCI — File Storage (NFS), OCI File Storage with Lustre for HPC.
Archive / cold storage
| Cloud | Service | Cents per GB-month |
|---|---|---|
| AWS | S3 Glacier Instant Retrieval | 0.4 |
| AWS | S3 Glacier Flexible Retrieval | 0.36 |
| AWS | S3 Glacier Deep Archive | 0.099 (cheapest in cloud) |
| GCP | GCS Coldline | 0.4 |
| GCP | GCS Archive | 0.12 |
| Azure | Blob Cool | 1.0 |
| Azure | Blob Cold | 0.36 |
| Azure | Blob Archive | 0.099 |
Glacier Deep Archive and Azure Archive are at parity for the absolute floor of cloud storage; both have 12+ hour retrieval times.
Networking
Virtual networks
- AWS VPC — regional (a VPC lives in one region); subnets are per-AZ; Transit Gateway for hub-and-spoke; VPC Peering for direct pairs; PrivateLink for service exposure across VPCs/accounts.
- GCP VPC — global by default (a single VPC spans all regions); subnets are regional; VPC Network Peering; Private Service Connect (GCP’s PrivateLink analog).
- Azure VNet — regional; VNet Peering; Azure Virtual WAN for global hub-and-spoke; Private Link.
- Oracle VCN — regional; Dynamic Routing Gateway (DRG) for connectivity.
The single biggest design difference: GCP’s global VPC means cross-region traffic between your own resources is on the Google backbone with no Transit Gateway in between. AWS forces you to wire up Transit Gateway or peering for cross-region private traffic.
Load balancing
| Cloud | L4 (TCP/UDP) | L7 (HTTP) | Global |
|---|---|---|---|
| AWS | Network Load Balancer (NLB) | Application Load Balancer (ALB) + Classic Load Balancer (CLB, deprecated) | Global Accelerator (anycast) |
| GCP | Internal/External Network Load Balancer | Cloud Load Balancing (HTTP(S)) — single product is globally anycast by default | n/a — Cloud LB is global by default |
| Azure | Azure Load Balancer | Application Gateway | Front Door (global HTTP) + Traffic Manager (DNS-based) |
| OCI | OCI Load Balancer (Public / Private) | same product handles L7 | n/a |
| Cloudflare | Spectrum (TCP/UDP) | Cloudflare Load Balancer | yes (default — anycast IPs in 300+ PoPs) |
GCP’s anycast global load balancer is unique among AWS/Azure/OCI — a single VIP serves users from the nearest PoP without DNS-based routing.
CDN + edge
- AWS CloudFront — 450+ edge locations, integrates with Lambda@Edge and CloudFront Functions, Shield Standard included free.
- Google Cloud CDN — integrates with the global Cloud Load Balancer; uses Google’s backbone.
- Azure CDN (now Front Door) — front-end + WAF + CDN combined.
- Cloudflare CDN — 300+ PoPs; the highest-traffic CDN by request volume; bundles with Workers + R2 + WAF + DDoS.
- Fastly — 70+ PoPs; pioneer of instant purge + VCL configurability; used by Shopify, GitHub, NYT, Stripe.
- Akamai — the original CDN; still owns the largest enterprise CDN footprint and now includes Linode IaaS (acquired 2022 $900M).
- KeyCDN — small Swiss-based CDN, popular for indie use.
- bunny.net — value-priced CDN gaining traction in 2024.
DNS
| Provider | Product | Notable |
|---|---|---|
| AWS | Route 53 | First with health-checked DNS failover (2010); Route 53 Resolver for hybrid. |
| GCP | Cloud DNS | Anycast; private zones. |
| Azure | Azure DNS + Private DNS | |
| Cloudflare | Cloudflare DNS | Public resolver at 1.1.1.1; one of the fastest authoritative DNS on the internet. |
| Google Public DNS | 8.8.8.8 — most-used public resolver. | |
| IBM (NS1 acq 2022) | NS1 | Most flexible programmable DNS for traffic-steering. |
| Quad9 | 9.9.9.9 | Privacy + threat-filtering resolver, nonprofit-operated. |
VPN + connectivity
- AWS Site-to-Site VPN + Client VPN, Direct Connect (dedicated 1/10/100 Gbps fiber), Transit Gateway (hub), PrivateLink (expose service to other VPCs without internet).
- GCP Cloud VPN, Cloud Interconnect (Dedicated + Partner), Cross-Cloud Interconnect (direct connection from GCP to AWS / Azure / Oracle, 2023), Private Service Connect.
- Azure VPN Gateway, ExpressRoute (dedicated, with peering relationships to many telco providers), Azure Virtual WAN (managed hub), Private Link.
- OCI FastConnect for dedicated connectivity.
Databases
(Cross-reference: database-engine-taxonomy is the deeper note on RDBMS / NoSQL / vector / etc engines themselves; this section is just the hosted-by-cloud service mapping.)
Relational (managed Postgres / MySQL / SQL Server / Oracle)
- AWS RDS — Postgres, MySQL, MariaDB, Oracle, SQL Server. Aurora is AWS’s proprietary fork of MySQL + Postgres with separated compute/storage and 3-AZ replication; Aurora Serverless v2 auto-scales by ACU (Aurora Capacity Unit). Babelfish lets Aurora speak the SQL Server wire protocol.
- GCP Cloud SQL — Postgres, MySQL, SQL Server. AlloyDB for PostgreSQL (2022) is GCP’s Aurora analog — separated compute/storage, columnar accelerator for analytical queries.
- Azure Database — Database for PostgreSQL (Flexible Server), MySQL, MariaDB, Azure SQL Database, Azure SQL Managed Instance, Azure Database for PostgreSQL Hyperscale (Citus).
- OCI — Autonomous Database is Oracle’s flagship — self-patching, self-tuning Oracle DB with ATP (transaction processing) and ADW (data warehouse) flavors; MySQL HeatWave combines OLTP + analytics in one engine.
NoSQL (managed)
- AWS DynamoDB — single-digit-ms key-value/document store; DynamoDB Streams for CDC; on-demand or provisioned billing; global tables (multi-region active-active).
- GCP Firestore (document, real-time sync, Firebase heritage) + Cloud Bigtable (wide-column, HBase-compatible, the Spanner/Bigtable internal Google heritage exposed externally).
- Azure Cosmos DB — multi-model (Core/SQL, MongoDB, Cassandra, Gremlin, Table); five consistency levels; multi-region writes.
- AWS DocumentDB — MongoDB wire-compatible store, AWS-managed; not actually MongoDB underneath.
- AWS Keyspaces — Cassandra-compatible.
- OCI NoSQL Database, Alibaba Tablestore.
Data warehouses
| Vendor | Product | Pricing model | Notable |
|---|---|---|---|
| AWS | Redshift + Redshift Serverless | per-node or per-RPU | Columnar; RA3 nodes separate compute from S3 storage. |
| GCP | BigQuery | per-byte-scanned or per-slot (flat rate / Editions) — serverless | The category-defining serverless warehouse; BigQuery ML lets you train models in SQL. |
| Azure | Synapse Analytics + Microsoft Fabric (newer unified data platform, 2023) | DWU or per-capacity | Successor to SQL Data Warehouse. |
| Snowflake | Snowflake Data Cloud (multi-cloud on AWS / GCP / Azure) | per-credit, separately-billed warehouses | The most-popular standalone warehouse; works across all three hyperscalers. |
| Databricks | Databricks SQL + Lakehouse (multi-cloud) | per-DBU | Photon engine; Delta Lake table format; competes with Snowflake. |
Analytics / query engines
- AWS Athena — serverless SQL on S3 using Trino (formerly PrestoSQL).
- AWS Glue — managed ETL + Data Catalog (Hive Metastore-compatible).
- AWS EMR — managed Spark / Hive / Presto / HBase / Flink on EC2 or EKS.
- GCP Dataproc — managed Spark / Hadoop / Presto on GCE.
- GCP Dataflow — managed Apache Beam (unified batch + streaming).
- GCP Dataform + Dataplex for data orchestration + governance.
- Azure HDInsight — managed Hadoop / Spark / Kafka.
- Azure Databricks — Databricks is first-class on Azure (Microsoft is an investor).
- Azure Synapse Pipelines for ETL.
Caching
- AWS ElastiCache — Redis (now under RedisJSON + RediSearch, but watch the Redis 7.4 license shift to RSALv2/SSPLv1 March 2024 — AWS forked to Valkey under the Linux Foundation), Memcached.
- AWS MemoryDB for Redis — durable Redis-compatible store.
- GCP Memorystore — Redis, Memcached, Redis Cluster.
- Azure Cache for Redis — Basic / Standard / Premium / Enterprise / Enterprise Flash (uses Redis Enterprise under the hood for Enterprise tier).
Search
- AWS OpenSearch Service — fork of Elasticsearch from when Elastic relicensed to SSPL in 2021; OpenSearch is now 2.x-series under Apache 2.0, governed by the OpenSearch Software Foundation.
- Elastic Cloud — managed Elasticsearch from Elastic NV (the original vendor); runs on AWS / GCP / Azure.
- Azure AI Search — formerly Azure Cognitive Search; combines text + vector search.
- GCP Vertex AI Search — built on Google Search infrastructure.
Streaming / messaging
- AWS Kinesis Data Streams (high-throughput streaming, manual scaling), Kinesis Data Firehose (loading into S3 / Redshift / OpenSearch), MSK (managed Apache Kafka), MSK Serverless, SNS (pub/sub topics), SQS (queues), EventBridge (event bus).
- GCP Pub/Sub (global pub/sub), Pub/Sub Lite (cheaper, regional), Dataflow for streaming processing.
- Azure Event Hubs (Kafka-compatible), Service Bus (queues), Event Grid (event routing), Stream Analytics.
- Confluent Cloud — managed Kafka by Confluent (Jay Kreps + the original Kafka authors), runs on AWS / GCP / Azure.
AI / ML services
(Cross-references: llm-landscape, ml-framework-comparison.)
Foundation model hosting
- AWS Bedrock — Anthropic Claude (3.5 / 3.7 Sonnet, 4.5 / 4.6 / 4.7, Opus 4.x, Haiku 3.5), Meta Llama 3 / 3.1 / 3.3, Mistral 7B / 8x7B / Large, Cohere Command R / R+, AI21 Jurassic / Jamba, Stability Stable Diffusion / Stable Image Core, Amazon Nova (own family — Micro, Lite, Pro, Premier launched 2024).
- GCP Vertex AI Model Garden — Gemini 1.5 / 2.0 / 2.5 Pro + Flash + Nano, Claude via Anthropic partnership, Llama, Mistral, Gemma (open Google models), Imagen, Veo (video gen).
- Azure OpenAI Service — GPT-4o, GPT-4o mini, GPT-4.5 (Orion), o1 / o1-mini / o3 / o3-mini reasoning models, DALL-E 3, Whisper, Sora; Azure has multi-year exclusive on serving OpenAI models for the enterprise.
- Azure AI Foundry (2024 rebrand of Azure AI Studio) — broader catalog including Mistral, Cohere, Llama, Phi (Microsoft’s own), DeepSeek.
- IBM watsonx.ai — Granite (IBM’s own), Mistral, Llama, Meta, plus IBM-customized variants.
- Anthropic direct API, OpenAI direct API, Mistral La Plateforme, Together AI, Anyscale, Fireworks AI, Lepton AI, Modal, Replicate.
Custom training
- AWS SageMaker — Studio IDE + Training Jobs + Pipelines + Feature Store + Model Registry + Endpoints.
- GCP Vertex AI — Workbench + Custom Training + Pipelines + Feature Store + Model Registry + Endpoints. Native TPU support.
- Azure Machine Learning — workspace-centric; integrates with Azure DevOps.
- Databricks Mosaic AI (formerly MosaicML, acquired June 2023 $1.3B) — multi-cloud training + serving + RAG; available across AWS / Azure / GCP.
GPU rental (specialized “neoclouds”)
- CoreWeave — NVIDIA-blessed; raised 19B valuation May 2024, then 35B** in private rounds. Backed by NVIDIA itself; serves Microsoft and OpenAI overflow capacity.
- Lambda Labs — long-running GPU cloud; raised 1.5B valuation; offers reserved H100 / H200 and on-demand A10 / A100 / H100.
- Crusoe Cloud — flare-gas-powered data centers; raised 2.8B valuation; partnered with NVIDIA on liquid-cooled data center designs.
- Voltage Park — Jed McCaleb-founded (Mt. Gox / Stellar / Ripple); 24,000 H100s, nonprofit-style pricing for AI researchers.
- Together AI — Together Compute = bare-metal GPU rental + Together Inference = managed inference; 3.3B valuation.
- RunPod — community-cloud + secure-cloud GPU rental, popular for indie ML.
- Vast.ai — peer-to-peer GPU marketplace (consumer + datacenter GPUs).
- Foundry / Foundry Local — research-focused GPU access.
- Nebius (formerly Yandex N.V. spinoff) — Netherlands-based GPU cloud, IPO’d on Nasdaq 2024.
- Hyperstack (NexGen Cloud) — UK GPU cloud.
Notebooks / development environments
- SageMaker Studio, SageMaker Notebooks, SageMaker Code Editor.
- Colab (free tier + Pro + Pro+), Colab Enterprise (managed in Vertex AI), Vertex AI Workbench.
- Azure ML Notebooks (in the AML workspace).
- Databricks Notebooks (on any cloud).
- JupyterHub (self-host).
AutoML
- SageMaker Autopilot, SageMaker Canvas (no-code).
- Vertex AI AutoML (tables, image, video, text).
- Azure ML Automated ML.
- DataRobot (enterprise AutoML platform), H2O Driverless AI, dotData.
Inference (managed model serving)
- SageMaker Endpoints — Real-time, Serverless Inference, Async Inference, Batch Transform.
- Bedrock On-Demand + Bedrock Provisioned Throughput (reserved capacity for predictable inference).
- Vertex AI Endpoints + Vertex AI Online Prediction.
- Azure ML Online Endpoints + Azure ML Batch Endpoints.
- NVIDIA NIM — containerized inference microservices, deploy anywhere; Triton Inference Server (open source NVIDIA serving).
- vLLM (PagedAttention + continuous batching; the de facto open inference engine in 2024–2026).
- TensorRT-LLM (NVIDIA’s optimized inference for LLMs on Hopper/Blackwell).
- TGI (Hugging Face Text Generation Inference), Ollama (local inference), LM Studio (GUI for local).
Speech / Vision / NLP (legacy “Cognitive Services”)
| Capability | AWS | GCP | Azure |
|---|---|---|---|
| Text-to-Speech | Polly | Speech-to-Text + Text-to-Speech (under “Speech”) | Speech (Cognitive Services Speech) |
| Speech-to-Text | Transcribe | Speech-to-Text | Speech |
| Image labeling / OCR | Rekognition + Textract | Vision AI + Document AI | Vision + Document Intelligence (Form Recognizer) |
| Translation | Translate | Translation API | Translator |
| NLP / sentiment | Comprehend | Natural Language API | Language (Cognitive Services Language) |
| Personalization | Personalize | Recommendations AI | Personalizer (deprecated 2024) |
Identity + security
(Cross-reference: auth-provider-catalog is the deeper note on auth and identity vendors as standalone products.)
IAM (within-cloud identity + access)
- AWS IAM + IAM Identity Center (formerly AWS SSO; renamed 2022) — IAM Identity Center is the modern way to wire AWS to external IdPs like Okta, Entra ID, Google Workspace.
- GCP Cloud IAM + Workload Identity Federation (lets you bind external identities — AWS IAM roles, Azure AD, GitHub Actions OIDC — to GCP service accounts without long-lived keys).
- Azure Entra ID (formerly Azure AD, rebranded July 2023) + Azure RBAC (on top of Entra for ARM resources).
- OCI IAM (compartments + policies; tag-based access control).
Key management + secrets
- AWS KMS + CloudHSM + Secrets Manager + Parameter Store.
- GCP Cloud KMS + Cloud HSM + Secret Manager.
- Azure Key Vault + Managed HSM + Azure Key Vault Secrets.
- OCI Vault + Key Management Service.
- HashiCorp Vault (multi-cloud; managed via HCP Vault Dedicated or HCP Vault Secrets, 2024).
WAF + DDoS
| Cloud | WAF | DDoS |
|---|---|---|
| AWS | AWS WAF | Shield Standard (free, included), Shield Advanced ($3,000/mo + traffic) |
| GCP | Cloud Armor | included in Cloud Armor |
| Azure | Azure Front Door WAF + Application Gateway WAF | DDoS Protection Standard (~$2,944/mo / 100 resources) |
| Cloudflare | Cloudflare WAF + Managed Rulesets | included in all paid plans; some at Free tier |
| Akamai | Kona Site Defender + Prolexic | Prolexic (network-layer DDoS scrubbing) |
Audit + compliance logs
- AWS CloudTrail — every API call, retained 90 days free + S3 for longer.
- GCP Cloud Audit Logs — Admin Activity (free) + Data Access + System Event + Policy Denied.
- Azure Activity Log + Microsoft Sentinel for SIEM.
- OCI Audit + integration with Logging service.
DevOps + observability
(Cross-reference: observability-tools-catalog.)
CI/CD
- AWS — CodePipeline + CodeBuild + CodeDeploy + CodeStar + CodeCommit (CodeCommit deprecated to new customers 2024); most AWS users actually run GitHub Actions or GitLab on top.
- GCP — Cloud Build + Cloud Deploy + Artifact Registry; deep integration with GitHub.
- Azure — Azure DevOps (Boards + Repos + Pipelines + Test Plans + Artifacts) + GitHub Actions (Microsoft-owned).
- Multi-cloud — GitHub Actions (the de facto winner for cloud-native CI; integrates with OIDC for keyless auth to AWS / GCP / Azure), GitLab CI/CD (especially in regulated / self-hosted environments), Jenkins (legacy but huge install base), CircleCI, Drone, Buildkite.
Monitoring + APM
- AWS CloudWatch — Metrics + Logs + Alarms + Dashboards + Synthetics + RUM + Container Insights + Application Signals (2024, APM-flavored).
- GCP Cloud Monitoring + Cloud Logging + Cloud Trace + Cloud Profiler + Cloud Debugger (the operations suite formerly known as Stackdriver).
- Azure Monitor + Application Insights + Log Analytics + Container Insights + VM Insights.
- Multi-cloud — Datadog, New Relic, Dynatrace, Splunk Observability Cloud, Grafana Cloud (Grafana + Mimir + Loki + Tempo + Pyroscope), Honeycomb, Chronosphere, Lightstep (acquired by ServiceNow 2021).
Container registries
| Cloud | Service |
|---|---|
| AWS | ECR (Elastic Container Registry), ECR Public |
| GCP | Artifact Registry (superseded Container Registry 2023) |
| Azure | ACR (Azure Container Registry) |
| OCI | OCI Registry |
| Multi | Docker Hub (still the default for docker pull without prefix), GitHub Container Registry (ghcr.io), Quay (Red Hat / IBM), JFrog Artifactory, Harbor (CNCF open source) |
Specialty regions
- AWS GovCloud (US) — separate partition, US persons only, FedRAMP High + DoD IL5; AWS China (Beijing + Ningxia) operated by Sinnet + NWCD respectively.
- Azure Government, Azure Government Secret, Azure Government Top Secret (air-gapped clouds for US intelligence); Azure China operated by 21Vianet.
- GCP Government (Assured Workloads + Distributed Cloud Hosted air-gapped); a sovereign-cloud push.
- OCI Government, OCI Dedicated Region (Oracle drops an OCI region in your data center, starting ~$1M/year).
- Sovereign-cloud trend 2024–2026 — every major hyperscaler has a sovereign / regulated EU offering (Microsoft Cloud for Sovereignty, AWS European Sovereign Cloud announced 2024, Google Cloud Sovereign Solutions).
Pricing models
Discount mechanisms
- Pay-as-you-go — default, no commitment, highest unit rate.
- Reserved Instances / Reserved Capacity (1-year, 3-year) — typically 40–72% savings vs on-demand; All Upfront > Partial Upfront > No Upfront.
- Savings Plans (AWS, 2019) — more flexible than RIs; commit to $/hour of compute spend, applies across EC2 + Fargate + Lambda; up to 72% savings.
- GCP CUDs (Committed Use Discounts) — 1-year (37%) or 3-year (55–70%) for vCPU + RAM commitments, separately bought per region.
- Azure Reserved VM Instances + Azure Savings Plans for compute (2022).
- Spot / Preemptible — up to 90% off, no SLA, can be reclaimed.
- Enterprise Agreements — bespoke pricing for 7-figure-plus annual spend; common for large enterprises.
- AWS RDS Reserved Instances + Aurora Savings + Redshift Reserved Nodes.
Egress as the strategic lever
| Cloud | Egress (per GB out to internet, first 100 GB+ scale) |
|---|---|
| AWS | $0.09/GB (first 100 GB free per month) |
| GCP | $0.085–0.12/GB (zone matters) |
| Azure | $0.087/GB |
| Oracle | first 10 TB/month free, then $0.0085/GB |
| Cloudflare R2 | $0/GB |
| Backblaze B2 | $0.01/GB |
| Wasabi | $0/GB (capped at 100% of stored amount/month) |
Egress is the single biggest moat for the legacy hyperscalers and the single biggest competitive lever for the newcomers (R2, Wasabi, B2, Oracle). The EU Data Act (enforceable September 2025) limits egress charges by regulation; AWS, Azure, GCP all responded by dropping egress fees for customers leaving the cloud as of mid-2024.
Major case studies (cloud commitments worth knowing)
- Netflix on AWS — the canonical reference customer; ran all-in on AWS by 2016; OpenConnect CDN is the only piece they self-host.
- Spotify on GCP — moved from on-prem + AWS to GCP in 2016; deep BigQuery + Pub/Sub usage; one of GCP’s flagship references.
- BMW on Azure — multi-year strategic deal; manufacturing cloud, BMW Group Cloud Data Hub.
- Capital One on AWS — first major US bank to publicly go all-in (announced 2015, completed 2020); closed all eight data centers.
- Pinterest on AWS + GCP — multi-cloud (mostly AWS, GCP for data + ML); a frequently-cited multi-cloud case.
- OpenAI on Azure — Microsoft’s multi-year, multi-tens-of-billions investment; OpenAI runs on dedicated Azure ND H100 v5 + GB200 capacity. Reports of Stargate joint-venture ($100B+ scope) over 2024–2026.
- Anthropic on AWS — 4B Nov 2024 (total $8B); Claude on Bedrock + Trainium 2 training. Also has commitments to GCP for inference.
- NVIDIA + hyperscalers — H100/B100/GB200 capacity deals across AWS (Project Ceiba supercomputer), GCP (A3 / A4 instances), Azure (NDv5 fleet); the chip allocation is the bottleneck for all three.
- Cerebras + G42 — UAE’s G42 owns dozens of Cerebras CS-3 wafer-scale systems for sovereign-AI build-out.
When to choose which cloud
- AWS — when ecosystem breadth, AWS-native vendor integrations, or the existence of every conceivable managed service in one place is the deciding factor. Also when the team already knows it.
- Azure — when you need OpenAI access, when you’re already on Microsoft 365 / Entra ID, when you have hybrid Windows Server / SQL Server / .NET workloads, or when the regulatory story (government clouds + sovereignty) matters.
- GCP — when the data + ML pipeline is the workload (BigQuery + Vertex AI + Dataflow), or when you want a global single VPC.
- Oracle OCI — when Oracle DB workloads dominate (the licensing math beats AWS RDS for Oracle), or when ARM (Ampere A1/A2) compute price/perf is the lever.
- Alibaba — when mainland-China end users are the audience.
- IBM Cloud — when OpenShift + hybrid + regulated industries (banking, healthcare on z/OS bridging) are the use case.
- Cloudflare — when edge compute + zero-egress storage + globally-distributed apps are the architecture (often paired with a hyperscaler for origin).
Adjacent
- database-engine-taxonomy — full RDBMS / NoSQL / vector / streaming / search engine catalog, including which clouds host each.
- llm-landscape — foundation-model catalog across Bedrock / Vertex AI / Azure OpenAI / direct vendor APIs.
- ml-framework-comparison — PyTorch / TF / JAX / scikit-learn comparison feeding into Vertex AI / SageMaker / Azure ML choice.
- observability-tools-catalog — CloudWatch / Cloud Monitoring / Azure Monitor and the multi-cloud Datadog / New Relic / Grafana stack.
- auth-provider-catalog — IAM and identity providers, including IAM Identity Center / Workload Identity Federation / Entra ID and the standalone Okta / Auth0 / Clerk ecosystem.
- _index — Tier 3 family-index root for the Compute library.