Service Architectures — Cross-System Comparison
This note compares the architectural styles a team picks between when shaping a system: monolith, modular monolith, microservices, nanoservices, serverless functions, event-driven, CQRS, event sourcing, hexagonal/ports-and-adapters, onion, clean, lambda, kappa, reactive, space-based, and pipeline. Each row maps to a real-world exemplar (Netflix, Amazon, Shopify, Stripe, Uber DOMA, LinkedIn, Twitter, Square, GitLab, Basecamp HEY, dbt Cloud, Vercel) and the tables compare on team size, deployment unit, data consistency, observability, cost, refactor cost, and lock-in. Read the dimension tables first, decision tree last.
See also
- microservices-patterns
- distributed-systems-fundamentals
- containers-service-mesh
- kubernetes-deep
- observability-stack
- networking-foundations
- _compare_consistency_models
- cloud-provider-service-mapping
- container-orchestrator-and-build-tool-catalog
- message-queue-and-streaming-catalog
1. The architectural spectrum
monolithic distributed
| |
big-ball-of-mud → monolith → modular → CQRS → microservices → nanoservices → functions
|
serverless
← event-sourcing → reactive → space-based →
pipeline (lambda / kappa)
Cross-cutting styles (orthogonal to deployment granularity):
hexagonal / ports-and-adapters
onion architecture
clean architecture
domain-driven design (DDD)
event-driven architecture (EDA)
CQRS (separate read + write models)
event sourcing (append-only ledger)
2. Deployment-unit granularity
| Style | Deployment unit | Code repo | Database | Team size sweet spot | Real exemplar |
|---|---|---|---|---|---|
| Big-ball-of-mud | 1 binary | 1 repo | 1 schema | 1–5 | early-stage startup MVP |
| Monolith | 1 binary | 1 repo | 1 schema | 5–30 | early Basecamp, GitHub through 2014 |
| Modular monolith | 1 binary, internal modules | 1 repo | 1 schema (logically partitioned) | 15–200 | Shopify, GitHub 2014+, Basecamp HEY, dbt Cloud (partial), early Stripe |
| Macroservices | ~5–30 services | mono or multi-repo | per-service-cluster | 30–200 | Stripe (2020s), GitLab (mostly mono), Square (early) |
| Microservices | hundreds | multi-repo or monorepo | per-service | 100–10,000 | Netflix (~2000 svcs), Amazon (~2-pizza teams = thousands), Uber peak (~4000 svcs) |
| Nanoservices | 1 function = 1 service | multi-repo | shared or n/a | rare; usually anti-pattern | over-decomposed orgs (cautionary tale) |
| Serverless functions | 1 fn | usually monorepo | external (Aurora, Dynamo) | any | Lego Group on AWS Lambda; Bustle.com migration 2017; Liberty Mutual claims |
| Edge functions | 1 fn at edge | usually monorepo | edge KV / regional DB | any | Vercel Edge Functions, Cloudflare Workers, Fastly Compute@Edge, Deno Deploy |
The 2020s consensus: modular monolith is the default for teams < 50; microservices kick in around 100+ engineers. The pre-2018 pendulum swung too far toward microservices; the DOMA (Domain-Oriented Microservice Architecture) Uber paper in 2020 (Adam Gluck) crystallized the consolidation — Uber went from ~4000 microservices back to ~2200 by organizing them into ~70 domains.
3. Real exemplars — what teams actually run
| Company | Architecture (2025) | Service count | Notable | Recovery from |
|---|---|---|---|---|
| Netflix | microservices on AWS | ~2000 | Chaos Monkey 2010, Hystrix → Resilience4j, Spinnaker, Eureka, Zuul, Conductor; multi-region active-active | rapid post-Qwikster scaling |
| Amazon | ”two-pizza teams” microservices | thousands | famous Bezos 2002 mandate; service-as-contract; Werner Vogels’ “you build it, you run it” | aws.amazon.com Prime Video 2023 ironically moved a few microservices to a monolith for cost reasons (90% savings) |
| Uber | DOMA (Domain-Oriented) | ~2200 (down from ~4000 peak) | Cadence/Temporal workflow engine, M3 metrics, Jaeger tracing | over-decomposition lessons published 2020 (Gluck “Microservices at Uber”) |
| Shopify | modular monolith (Rails) | 1 main app + ~200 modules | ”majestic monolith” branding; intentional resistance to microservices | 2018 Ben Curtis blog on “deconstructing the monolith” — modules, not services |
| Stripe | API-first macroservices | ~50–100 main services | Ruby + Scala; strong API design culture; rare downtime | minimal, mostly forward growth |
| GitLab | intentional monolith | 1 Ruby app + a few Go services (Gitaly, GitLab Pages, Workhorse) | self-host viability requires monolith | reinforced in 2020 platform docs |
| Basecamp HEY | modular monolith (Rails) | 1 app | DHH’s “Majestic Monolith” essay 2016; Basecamp / HEY use the same pattern | n/a |
| dbt Cloud | Python microservices | dozens | distinct platform, query, scheduler services | migrated from monolith ~2019-2021 |
| microservices + Espresso storage + Kafka stream | thousands | Kafka born here (Jay Kreps), Pegasus 2.0 RPC, Pinot OLAP | Kafka revolution 2011+ | |
| Twitter / X | microservices on Manhattan + Heron stream | thousands | Heron stream (post-Storm), Manhattan KV, Pelikan cache | massive 2022 layoff + post-acquisition consolidation |
| Square | service mesh microservices | hundreds | Envoy + Istio adoption; protobuf-RPC | gradual migration from Ruby monolith |
| microservices on AWS | hundreds | Mysql sharded + PinLater + KafkaConnect-like Singer | gradual scaling | |
| Airbnb | microservices (“Service-Oriented” then “SOA-to-microservices”) | hundreds | Apache Airflow born here, Presto extensive, Spinnaker user | post-IPO platform consolidation 2021+ |
| DoorDash | microservices on AWS + GCP | hundreds | Kotlin + gRPC; Cadence/Temporal heavy use | post-IPO platform scaling |
| Vercel | edge serverless | varies | Edge Functions on V8 isolates; Next.js convergence | edge-first since founding |
| Cloudflare | edge runtime + microservices in DCs | hundreds | Workers (V8 isolates), Durable Objects, R2, KV | constant evolution |
4. Cross-cutting architectural styles
| Style | Origin | What it adds | Used by |
|---|---|---|---|
| Hexagonal / Ports-and-Adapters | Alistair Cockburn 2005 | application core isolated from I/O via ports | Spring Boot apps, many DDD shops, dbt Cloud |
| Onion | Jeffrey Palermo 2008 | concentric dependency rule — outer depends on inner | .NET community heavily |
| Clean Architecture | Robert C. Martin (Uncle Bob) 2012 / book 2017 | unified hex + onion; dependency inversion | popular in iOS, Android, Spring |
| Domain-Driven Design (DDD) | Eric Evans 2003 | bounded contexts, aggregates, ubiquitous language | most large enterprise; Uber DOMA explicitly DDD |
| CQRS | Greg Young 2010 | separate read + write models | many event-sourced systems; Axon Framework |
| Event Sourcing | Greg Young 2006+; Martin Fowler | append-only event log as source of truth | Walmart Inventory, Klarna, EventStoreDB (commercial), Axon |
| Event-driven | broad — Faust, Spring Cloud Stream, EventBridge, Kafka Streams | services react to events on a bus | LinkedIn, Confluent customers, Netflix |
| Reactive | Reactive Manifesto 2014 (Bonér, Klang, Kuhn, Thompson) | responsive + resilient + elastic + message-driven | Akka, Vert.x, Spring WebFlux, RxJS |
| Space-based architecture | Roger Sessions 2008; Tarball / Tibco; later “Tuple Spaces” | in-memory grid + replicated data | Hazelcast, GigaSpaces, low-latency trading |
| Lambda architecture | Nathan Marz 2011 | batch + speed layer + serving | early big-data; mostly displaced by Kappa |
| Kappa architecture | Jay Kreps 2014 | unified stream-only architecture | Kafka-centric shops; Flink-first; Netflix Keystone |
| Pipeline / pipes-and-filters | classic UNIX | data flows through transformations | dbt, Airflow DAGs, Apache Beam |
5. Per-architecture trade-offs
| Style | Latency overhead | Operational cost | Refactor cost | Lock-in | Observability burden |
|---|---|---|---|---|---|
| Monolith | minimal | low | low (rename in IDE) | none | low (1 stack trace) |
| Modular monolith | minimal | low | medium (need module boundary discipline) | none | low |
| Microservices | high (10+ ms per hop) | high (K8s + service mesh) | high (distributed change) | medium (service-mesh, OTel) | very high (distributed tracing required) |
| Serverless functions | cold-start 100ms–10s | depends on usage | medium (per-function deploy) | high (per-cloud runtime) | provider-specific (CloudWatch, GCP Logs, Azure Monitor) |
| Event-driven | async (bus latency) | medium (Kafka cluster) | medium (event-schema evolution) | medium (broker) | very high (event tracing) |
| CQRS | medium (read model lag) | medium (two persistence layers) | high (must maintain two models) | low (concept-only) | high |
| Event sourcing | high (event replay cost) | medium (event store) | very high (event schema is forever) | medium-high | very high |
| Hexagonal | minimal | minimal | low (port-swap) | none | minimal |
6. Data consistency boundary — where it bites
| Style | Cross-component consistency | Typical answer |
|---|---|---|
| Monolith | 1 ACID transaction across all features | trivial |
| Modular monolith | 1 schema; 1 ACID; modules respect logical bounds | trivial |
| Microservices | per-service DB; cross-service eventually consistent | saga (orchestration via Temporal / Cadence; choreography via events on Kafka) |
| Event-driven | each consumer at its own offset; eventual | dead-letter queues + retry budget + idempotency keys |
| CQRS | write model is consistent; read model lags | inform UI of staleness; “your change has been recorded, may take a moment to appear” |
| Event sourcing | eventual on materialized views; perfect on event log | replay; snapshot every N events for performance |
| Lambda | batch is consistent; speed layer is approximate | reconcile (batch overwrites speed daily) |
| Kappa | stream is source-of-truth | replay from earliest |
The two-phase commit (2PC) across services is dead — too brittle, blocking-on-coordinator, bad partition tolerance. Modern services use sagas, outbox patterns, and event sourcing. See microservices-patterns §saga, outbox, and CDC.
7. The four service-coordination patterns
| Pattern | Mechanism | Used in | Trade-off |
|---|---|---|---|
| Synchronous RPC | HTTP/REST, gRPC, GraphQL request-response | most “regular” microservices | cascading failures; tight coupling |
| Asynchronous messaging | RabbitMQ, SNS, ActiveMQ | classical EDA | at-least-once delivery; ordering not guaranteed |
| Event streaming | Kafka, Pulsar, Kinesis, Redpanda | LinkedIn-style platforms | per-partition order; replayable |
| Workflow orchestration | Temporal, AWS Step Functions, Camunda, Cadence | Uber-DoorDash-Coinbase style | durable state machine; coding model not infra model |
Temporal (2019, Maxim Fateev + Samar Abbas, ex-Uber Cadence) deserves callout — by 2024 it’s the default for “I need a multi-step workflow that survives crashes” pattern, used by Snap, Coinbase, Stripe (partial), Datadog, Box, DoorDash. Step Functions is AWS-locked equivalent; Camunda 8 is the BPM-flavored variant.
8. Service mesh — when sidecars matter
| Mesh | Sidecar / dataplane | Adoption stage (2025) |
|---|---|---|
| Istio | Envoy | mature; consolidating with Cilium |
| Linkerd | linkerd2-proxy (Rust) | mature; lightweight |
| Consul Connect | Envoy + Consul | enterprise (HashiCorp) |
| Cilium Service Mesh | eBPF (no sidecar) | growing fast; sidecar-less |
| Kuma / Kong Mesh | Envoy | small footprint |
| AWS App Mesh | Envoy | being deprecated 2024 in favor of EKS + Istio |
| Open Service Mesh (Microsoft) | Envoy | discontinued 2024 |
| Solo.io Gloo Mesh | Envoy | commercial Istio packaging |
eBPF-based meshes (Cilium) are the late-2020s direction — no sidecar overhead, kernel-level enforcement. See ebpf-and-kernel-observability and containers-service-mesh.
9. The Prime Video 2023 reversal
Amazon Prime Video published a now-famous post-mortem (Marcin Kolny, March 2023): a monitoring service originally built as serverless Lambda functions + Step Functions was rewritten as a monolithic ECS task. The result: 90% cost reduction. Lesson: per-frame data flow is the wrong workload for serverless function pricing; long-lived shared memory wins.
This is not “microservices are dead” — it’s “match the workload to the architecture”. Prime Video’s broader catalog discovery / playback / DRM stack remains microservices.
10. Cloud provider preferences
| Provider | Microservices substrate | Serverless | Edge |
|---|---|---|---|
| AWS | ECS, EKS, App Runner | Lambda, Step Functions, App Sync, EventBridge | CloudFront Functions, Lambda@Edge |
| GCP | GKE, Cloud Run | Cloud Functions, Workflows, Eventarc | Cloud Run + Cloud CDN |
| Azure | AKS, Container Apps | Azure Functions, Durable Functions, Logic Apps | Azure Front Door / Functions Premium |
| Vercel | n/a | Serverless Functions, Edge Functions | yes (default) |
| Cloudflare | n/a | Workers | yes (default) |
| Fly.io | Fly Machines | n/a | yes (regional VM-per-app) |
| Render | Web Services | Background Workers | partial |
| Fastly | n/a | Compute@Edge (WASM) | yes (default) |
See cloud-provider-service-mapping for the full mapping.
11. Observability burden
| Architecture | Logs | Metrics | Traces | Required tooling |
|---|---|---|---|---|
| Monolith | 1 stream per replica | service-level | function-level | Datadog / New Relic / one-stack |
| Microservices | per-service streams | per-service + service-mesh | OpenTelemetry + Jaeger / Tempo / Honeycomb / Datadog APM | distributed tracing mandatory |
| Serverless | per-invocation | per-function | provider-native + OTel | CloudWatch + X-Ray, GCP Logging + Trace, etc. |
| Event-driven | per-consumer | per-topic + lag | event-level correlation IDs | Kafka UI / Datastreams |
| Edge functions | per-region | per-region | OTel + provider | Cloudflare Trace Workers, Vercel Observability |
OpenTelemetry (OTel, CNCF 2019 merger of OpenTracing + OpenCensus) is the de-facto standard 2023+ for distributed tracing. See observability-stack.
12. Cost profile
| Architecture | Compute cost | Network cost | Operational cost | Engineering cost |
|---|---|---|---|---|
| Monolith | low (reserved instance) | low (intra-process) | low | low |
| Modular monolith | low | low | low | medium (discipline) |
| Microservices | medium (overprovisioned to avoid cascading failure) | high (inter-service hops, NAT) | high (K8s + mesh + observability stack) | high (oncall + tooling) |
| Serverless | usage-based; explosive past breakeven | usage-based | low (provider-managed) | medium (cold start + provider lock-in) |
| Event-driven | medium | medium (broker) | medium (broker ops) | high (event schema discipline) |
| Edge functions | usage-based | minimal | low | medium (cold start, runtime limits) |
The serverless breakeven point: roughly 30% sustained utilization vs reserved capacity. Below 30%, serverless wins on cost. Above, reserved instances win. Provisioned concurrency softens the cold-start tax but costs like reserved.
13. Decision tree
What size is your team / how mature is the product?
├─ Solo / pre-PMF / < 5 engineers
│ → Monolith. Stop. Don't read the rest.
│ → e.g., Heroku-deployed Rails / Django / Phoenix
├─ 5–30 engineers, single product
│ → Modular monolith.
│ → Shopify, Basecamp, GitLab examples.
│ → Use bounded contexts (DDD) inside the monolith.
├─ 30–100 engineers, multiple product lines
│ → Macroservices (~5–30 services).
│ → Stripe-style: each service is large + owned by 1 team.
│ → Avoid premature decomposition.
├─ 100+ engineers, clear bounded contexts, scaling pain in deploy / data
│ → Microservices, organized into DDD domains.
│ → DOMA-style aggregation to avoid "service explosion".
│ → Need: service mesh, distributed tracing, K8s.
├─ 1000+ engineers
│ → Microservices with strong platform team.
│ → Netflix, Amazon, Uber DOMA scale.
│ → Internal developer platform (IDP) becomes a product.
├─ "I have spiky traffic and don't want to manage servers"
│ → Serverless (Lambda, Cloud Run, Vercel, Cloudflare Workers).
│ → Watch breakeven; cold start; vendor lock-in.
├─ "I need an audit log of every change, replayable"
│ → Event sourcing + CQRS.
│ → Financial systems, healthcare, regulated; Klarna, Walmart Inventory.
│ → Event schema discipline is mandatory; events live forever.
├─ "I need real-time analytics + transactional"
│ → Lambda or Kappa.
│ → Kappa preferred 2020+; Flink + Kafka.
├─ "I have a long-running multi-step workflow that needs to survive crashes"
│ → Workflow orchestration (Temporal, Step Functions, Cadence, Camunda).
│ → Don't roll your own state machine.
├─ "I want global low-latency"
│ → Edge functions (Cloudflare Workers, Vercel Edge, Fastly).
│ → Combine with edge DB (Turso libSQL, D1, Upstash).
└─ "I'm doing collaborative real-time"
→ CRDT-based + websocket/SSE.
→ Yjs + Liveblocks + edge.
14. The anti-patterns
- Distributed monolith — services that must deploy together because of tight schema coupling. Worse than a monolith.
- Nanoservices — one-function-one-service. Communication cost dwarfs compute. Uber learned this 2018–2020.
- Shared database across microservices — destroys the bounded-context boundary. Tempting; always wrong.
- Synchronous chain of calls through 5+ services — cascading failures, latency multiplication.
- 2PC across microservices — see §6 above. Use sagas.
- Event sourcing for everything — overkill for CRUD apps. Reserve for audit-heavy domains.
- Microservices before product-market fit — wastes engineering capacity solving a problem you don’t have.
- Monolith with no module boundary discipline — becomes a big-ball-of-mud at scale 50+.
15. The 2024–2026 frontier
- Internal Developer Platforms (IDPs) — Backstage (Spotify, CNCF), Humanitec, Port, Cortex. Codifies the platform team’s offerings as a product.
- Cell-based architecture (AWS, Netflix Vault) — bulkhead microservices into cells; each cell is independent. Recovery and blast-radius gains.
- Sidecar-less mesh (Cilium) — eBPF instead of Envoy sidecar; 50–80% latency reduction.
- WASM Components + WASI Preview 2 (2024) — WASM-as-microservice; Cosmonic, Fermyon Spin, Suborbital.
- Active-active multi-region by default — Aurora DSQL, Spanner, CockroachDB, Yugabyte ship with this.
- Service catalog + ownership (Backstage + OpsLevel + Cortex) — answering “who owns this service” at 1000+ service count is its own engineering problem.
- Server Components / RSC (React 18+, Next.js 13+, Remix) — blur monolith/edge boundary at the rendering layer.
- Database-per-tenant (Turso libSQL, Neon branches, Crunchy Bridge) — many small DBs instead of one big DB with row-level security.
- The Prime Video reversal — public lesson that “microservices for everything” was an overcorrection. Match workload to architecture.
Adjacent
- Distributed systems primitives — distributed-systems-fundamentals for FLP, consensus, and clock theory.
- Consensus — consensus-protocols for Paxos/Raft/ZAB underlying coordination.
- Consistency models — _compare_consistency_models for the data-side trade-offs.
- Containers + orchestration — containers-service-mesh, kubernetes-deep for the deployment substrate.
- Networking — networking-foundations, http2-http3-quic for service-to-service transport.
- Observability — observability-stack for OTel + Tempo / Jaeger / Honeycomb / Datadog.
- Microservices patterns — microservices-patterns for saga, outbox, CDC, dead-letter, bulkhead.
- Cloud mapping — cloud-provider-service-mapping.
- Queues + streams — message-queue-and-streaming-catalog for Kafka, Pulsar, RabbitMQ, NATS, SQS, EventBridge, Kinesis.
- Auth — auth-authz for service-to-service authentication patterns (mTLS, JWT, OAuth).
When to pick what
The fastest narrowing: < 30 engineers → modular monolith; 30–100 + bounded contexts → macroservices (5–30 large services); 100+ + clear domains → microservices organized DDD-style (DOMA); spiky / event-driven workload → serverless functions; global low-latency static-heavy → edge functions; audit-heavy financial / regulated → event sourcing + CQRS; multi-step durable workflow → Temporal/Step Functions. The single biggest mistake of 2015–2020 was premature microservices — splitting a monolith before you had bounded contexts to split along. The 2020–2025 lesson is the inverse direction: consolidate back when the inter-service tax outweighs the team-autonomy gain. Choose by team topology, deployment cadence, and blast-radius requirements; ignore Twitter’s microservices porn.