FPGA & Hardware Acceleration

Field-Programmable Gate Arrays (FPGAs) are integrated circuits whose digital logic can be reconfigured after manufacturing. Unlike fixed-function ASICs (which crystallise a design in silicon at the foundry) or general-purpose CPUs/GPUs (which interpret instructions), FPGAs literally rewire themselves to implement arbitrary digital circuits — adders, state machines, packet parsers, neural-network accelerators, anything that fits in their resource budget. They are the dominant choice when you need custom silicon performance with the flexibility to iterate, the latency floor is set by hardware not software, or production volumes don’t justify a multi-million-dollar ASIC tape-out.

Architecture

A modern FPGA is a sea of small configurable blocks tiled across a die with a programmable interconnect.

Logic Fabric

Look-Up Tables (LUTs) — small SRAM-backed truth tables that implement arbitrary Boolean functions. 4-input LUTs were standard through the 1990s; modern devices use 6-input LUTs (and 8-input “fracturable” LUTs in AMD UltraScale+) that can also operate as two 5-input LUTs sharing inputs. A single 6-LUT implements any function of 6 variables.
Flip-flops — typically one or two per LUT, for sequential state.
Carry chains — dedicated fast arithmetic propagation, critical for adders/subtractors.
Distributed RAM and shift registers — LUTs can also be configured as small RAMs or shift-register chains.

DSP Slices

Hardened multiply-accumulate blocks for arithmetic. AMD UltraScale+ DSP58 handles 27×18 signed multiplication plus a 48-bit accumulator at >700 MHz. Versal Premium AI Engines layer vector DSPs on top.
A typical large FPGA has thousands to tens of thousands of DSP slices — the workhorse for signal processing and ML.

Memory

Block RAM (BRAM) — distributed 36-kbit dual-port SRAMs scattered across the fabric. A large FPGA carries 2-10 MB of BRAM total.
UltraRAM (URAM) — larger 288-kbit blocks in UltraScale+ and Versal devices; up to ~36 MB total in Versal Premium parts.
HBM2/HBM2e/HBM3 — high-bandwidth memory integrated into the package via silicon interposer or substrate. AMD Alveo U280 ships 8 GB HBM2 at 460 GB/s; Versal HBM parts go to 32 GB HBM2e. Intel Stratix 10 MX integrates HBM2.
Off-chip DDR4/DDR5 — most accelerator cards add 16-128 GB of off-package DDR via memory controllers in the fabric or hardened MCs.

I/O and SerDes

GTH / GTY / GTM transceivers (AMD) and equivalents (Intel E-tile, P-tile) handle serial links at 1-112 Gb/s per lane.
400 Gb/s and 800 Gb/s Ethernet, PCIe Gen5/Gen6, CXL 2.0/3.0 endpoints are all built from SerDes lanes.
I/O blocks provide LVDS, single-ended, and protocol-specific signalling.

Clock Networks

Global, regional, and clock-region trees deliver low-skew clocks. Modern FPGAs run user logic at 200-700 MHz typical; hardened blocks (DSPs, transceivers, NoC) can exceed 1 GHz.

Columnar Layout

AMD/Xilinx 7-series, UltraScale, UltraScale+ all use a columnar floorplan — alternating columns of CLBs (Configurable Logic Blocks), DSPs, BRAMs, and IO. Versal adopts a Network-on-Chip (NoC) for cross-die communication.

Size

FPGA size is roughly captured by LUT count:

Small/IoT — 1k-100k LUTs (Lattice iCE40, Efinix Trion, low-end Cyclone).
Mid-range — 100k-500k LUTs (mid Cyclone, Arria, mid Kintex, Zynq).
High-end — 1M-3M LUTs (Virtex UltraScale+ VU13P, Stratix 10).
Datacenter monsters — 9M+ LUTs (Versal Premium VP1902 ~9.4M LUTs across multiple chiplets, the largest commercially available FPGA as of 2026).

Major Vendors

AMD (Xilinx)

AMD acquired Xilinx in 2022 for $35B (closed February 2022 after a 2020 announcement), making it the largest semiconductor acquisition at the time. Xilinx founded 1984; invented the FPGA. Current portfolio:

7-series (28 nm) — Spartan-7, Artix-7, Kintex-7, Virtex-7. Mature, still shipping.
UltraScale (20 nm), UltraScale+ (16 nm FF+) — Kintex, Virtex; the workhorse datacenter parts. Zynq UltraScale+ adds quad-core ARM Cortex-A53 + dual Cortex-R5 + Mali GPU on the same die.
Versal ACAP (Adaptive Compute Acceleration Platform) — 7 nm TSMC. Combines:
- FPGA programmable logic
- AI Engines (vector-DSP tiles for ML/DSP, INT4/INT8/BF16)
- ARM Cortex-A72 application processors and Cortex-R5 real-time processors
- Hardened NoC fabric for on-die data movement
- PCIe Gen5, CXL, HBM2e, 600G Ethernet
- Variants: Versal AI Core (ML), AI Edge, Premium (datacenter), HBM (memory-bandwidth), Prime
Alveo accelerator cards — pre-validated PCIe FPGA cards:
- U25 / U30 (smaller, video transcode)
- U50 (low-profile, ~$1500)
- U55C (HPC, HBM, ~$5000)
- U250, U280 (larger, U280 with HBM2 ~$9500)
- V70 (Versal AI inference, ~$10k+)
- X3/AMA T1/T2 (video transcoding for live streaming, post-Pensando integration)

Intel (Altera)

**Intel acquired Altera in 2015 for $16.7 B * * . A f t erye a rso f in t e g r a t i o n f r i c t i o nan d mi sse d ro a d ma pt a r g e t s, * * I n t e l ann o u n ce d in 2024 t ha t A lt er a w o u l d s p in o u t a s anin d e p e n d e n t co m p an y * * (w i t h I n t e l re t ainin g a s t ak e; p r i v a t ee q u i t y f i r m S i l v er L ak e l e d a$ 4.46B investment for 51% in April 2025, valuing Altera at ~$8.75B — far below the 2015 purchase price). Portfolio:

Cyclone V/10 — low-cost, mid-range; widely used in embedded.
Arria V/10 — mid-range; 28/20 nm.
Stratix V / 10 / 10 MX (HBM) — high-end; 28 nm and 14 nm Intel.
Agilex 5 / 7 / 9 — current generation on Intel 7 (formerly 10 nm) and TSMC. Agilex 7 includes hardened crypto and AI tensor blocks.
eASIC (acquired 2018) — structured-ASIC technology; Agilex 5 family integrates eASIC for fixed-function blocks.
Programmable Acceleration Cards — PAC D5005 (Stratix 10), N3000 (smartNIC), N6000 (Agilex). Many are reaching end-of-life as Altera refocuses.

Lattice Semiconductor

Focus on low-power, small, edge FPGAs. Far from AMD/Intel’s datacenter scale, but dominant in their niche.

iCE40 UltraPlus — ultra-low-power (sub-mW idle), used in always-on devices, mobile cameras, drones. Famously the first FPGA family with a fully open-source toolchain (yosys + nextpnr + Project IceStorm by Clifford Wolf).
ECP5 / ECP5-5G — mid-range, also fully open-source toolchain.
MachXO3 / MachXO5 — control-plane FPGAs.
CertusPro-NX — Lattice Nexus 28 nm FD-SOI platform; aimed at server motherboard management, 5G fronthaul.
Avant — newer mid-range platform on the Nexus 2 architecture, announced 2023.

Microchip (Microsemi)

Microchip acquired Microsemi in 2018 for $8.35B, inheriting Microsemi’s FPGA line (which Microsemi itself acquired from Actel in 2010).

PolarFire — mid-range, low-power 28 nm SONOS-based; flash-based config (no boot SPI, instant-on).
PolarFire SoC — adds RISC-V (5x SiFive U54) instead of ARM.
IGLOO / IGLOO2 — low-power legacy.
SmartFusion / SmartFusion2 — FPGA + ARM Cortex-M3.
RT PolarFire — radiation-tolerant for space (Mars rovers, satellites).

Achronix

US fabless company, ~$50M+ revenue, focus on high-performance datacenter FPGAs and embedded FPGA (eFPGA) IP licensed into ASICs.

Speedster7t — 7 nm TSMC, 1+ million LUTs, hardened ML matrix engine, 2D NoC, 400G Ethernet, GDDR6. Direct competitor to high-end Xilinx Versal and Intel Stratix 10.
Speedcore — eFPGA IP for licensing.

Efinix

Small fabless FPGA vendor focused on edge AI and embedded.

Trion — 40 nm, ultra-small footprint.
Titanium — 16 nm, mid-range; targets edge inference, vision.

QuickLogic

Specialises in low-power eFPGA IP and full-stack open-source tools. Major contributor to open-source FPGA flows (F4PGA).

Gowin

Chinese FPGA vendor, low-cost mid-range parts (LittleBee, Arora), widely used in consumer electronics inside China; gaining international visibility on cost.

Other Players

Cologne Chip — German vendor (GateMate), aimed at low-cost.
Menta, Flex Logix — eFPGA IP licensing.
TSMC, Samsung, GlobalFoundries — foundries that fabricate FPGAs.

ASIC Competitors

FPGAs share the “specialised silicon” market with:

General ASICs — designed once, fabricated on TSMC/Samsung/Intel Foundry. Networking ASICs (Broadcom Tomahawk, Trident, Jericho; Marvell), mobile SoCs (MediaTek, Qualcomm), custom cloud silicon (AWS Graviton/Nitro, Google TPU, Microsoft Cobalt/Maia).
GPUs — NVIDIA H100 (~$30k), H200, B200/B100 (Blackwell), AMD MI300X, MI325X.
TPUs — Google’s Tensor Processing Units, v1 (2015) through v5p (2023) and v6/Trillium (2024); systolic-array architecture.
AWS Trainium / Inferentia — custom AWS ML chips; Trainium 2 (2024).
Cerebras WSE-3 — wafer-scale engine, 900k cores on a single 46,225 mm² wafer.
Tenstorrent Wormhole / Grayskull / Blackhole — RISC-V based ML processor; led by Jim Keller post-2021.
Groq LPU — Language Processing Unit; deterministic dataflow, optimised for LLM inference latency.
SambaNova SN40L — reconfigurable dataflow architecture.
Furiosa AI Warboy / RNGD — Korean ML accelerator.
Rebellions ATOM / REBEL — Korean inference chip.
MatX — startup, LLM-specific.
Etched Sohu (2024) — first announced “transformer ASIC”, baking attention into silicon; bet that transformer architectures stay stable.

Programming Models

Hardware Description Languages

Verilog — Phil Moorby (Gateway Design Automation), 1984. Standardised IEEE 1364 (1995, 2001, 2005). The dominant HDL.
SystemVerilog — IEEE 1800 (2005, current revision 2023). Superset of Verilog adding interfaces, assertions, classes, randomisation, coverage. Standard for verification.
VHDL — DARPA VHSIC programme 1983-1985, IEEE 1076 (1987, 2008, 2019). Strongly typed; popular in Europe, defence, aerospace.
Chisel — Berkeley 2012, Scala-based hardware DSL by Jonathan Bachrach et al. Used by SiFive, the RISC-V Rocket and BOOM cores were written in Chisel. Generates Verilog via FIRRTL.
SpinalHDL — Scala-based; cleaner than Chisel for many uses; growing adoption.
MyHDL — Python-based.
Bluespec (BSV) — guarded atomic actions, very expressive; Bluespec Inc. open-sourced 2020.
Amaranth — Python-based, by M-Labs / lambdaconcept; modern alternative to MyHDL.

High-Level Synthesis (HLS)

HLS compiles C/C++ (with pragmas) or OpenCL into RTL. Productivity gain at the cost of less direct control.

Vitis HLS (AMD) — successor to Vivado HLS. C/C++ → RTL with #pragma HLS directives for pipelining, unrolling, dataflow.
Intel HLS Compiler / oneAPI for FPGAs — Intel’s HLS flow; integrates with oneAPI SYCL-based programming.
Catapult HLS (Siemens, formerly Mentor Graphics) — high-end commercial HLS.
Stratus HLS (Cadence) — high-end commercial HLS, popular in ASIC HLS too.
OpenCL FPGA flows — both AMD/Xilinx (SDAccel → Vitis) and Intel (Intel FPGA SDK for OpenCL → oneAPI) compile OpenCL kernels to FPGA.
LegUp (academic, → Microchip SmartHLS) — C/C++ HLS, now Microchip’s flow for PolarFire.

Datacenter Acceleration Runtimes

XRT (Xilinx Runtime) — userspace + kernel modules for managing Alveo cards; OpenCL-style host code with kernel binaries.
oneAPI — Intel’s unified programming model spanning CPU/GPU/FPGA via SYCL.
PYNQ — Python framework on Zynq SoCs; popular for prototyping.

Tooling

Vivado (AMD) — primary 7-series + UltraScale+ flow; replaced ISE (which was retired ~2015). Synthesis, P&R, timing, bitgen.
Vitis (AMD) — Unified suite spanning hardware (Vivado), software (cross-compilers), AI Engine (Versal AIE compiler), and accelerator libraries.
Quartus Prime (Intel) — Pro, Standard, and Lite editions. Modelsim → Questa for simulation.
Lattice Diamond / Radiant — Lattice toolchains.
Libero SoC (Microchip) — Microchip’s tool flow.

Open-source Tooling

Yosys — open-source synthesis (Clifford Wolf, originally for Lattice iCE40; now supports many targets).
nextpnr — open-source place-and-route; supports iCE40, ECP5, Nexus, some Gowin parts.
Project IceStorm / Trellis / Oxide — reverse-engineered bitstream documentation for Lattice families.
F4PGA (Free and Open Source Flow for FPGAs) — umbrella project; supports Lattice, some Xilinx 7-series (limited subset), QuickLogic.
Verilator — open-source Verilog simulator; cycle-accurate; very fast; used by RISC-V projects extensively.
GHDL — open-source VHDL simulator.

Cloud FPGA

AWS F1 — launched 2017 as the first major cloud FPGA offering. Original f1.2xlarge / f1.4xlarge / f1.16xlarge instances use Xilinx UltraScale+ VU9P (one to eight per instance). Custom AFI (Amazon FPGA Image) packaging. F2 instances announced November 2024 use Versal Premium with HBM, PCIe Gen5, 400 Gb/s Ethernet.
Azure NP-series — Xilinx U250 FPGAs; aimed at financial workloads. Microsoft also operates a much larger internal FPGA fleet via Project Catapult / Brainwave (see Bing/Azure deployments).
Alibaba Cloud F3 — Xilinx FPGA-as-a-service instances.
Tencent / Baidu / Huawei — internal FPGA fleets; some external offerings.
Bittware (now part of Molex, acquired 2017) — preferred OEM/ODM for FPGA accelerator cards; IA-420F (Agilex 7), IA-840F (Agilex 7 HBM), 520N-MX (Stratix 10 MX HBM). Bittware ships pre-validated cards to OEMs and end users; many AWS F1 / Azure NP cards are Bittware OEM.

Applications

Networking

SmartNICs and DPUs — many use FPGAs:
- NVIDIA/Mellanox ConnectX-7 (ASIC) and BlueField-3 DPU (ASIC + ARM); FPGAs less central here, ASICs dominate.
- Intel IPU E2000 — ASIC.
- AMD Pensando DSC2/DSC3 — ASIC (Pensando acquired by AMD 2022 for $1.9B).
- Marvell Octeon — ASIC.
- FPGA SmartNICs — Napatech, Silicom, Bittware FPGA-based offload cards; Microsoft Azure historically used FPGA SmartNICs (Catapult v2) before transitioning some workloads to ASIC.
F5 BIG-IP — historically used Xilinx FPGAs for SSL offload and traffic management; newer ASIC-heavy.
Fortinet — uses custom security ASICs but with FPGAs in some product lines.
5G fronthaul / RAN — Lattice CertusPro-NX, Xilinx Zynq RFSoC (FPGA + ADC/DAC integrated). O-RAN deployments lean on Zynq RFSoC for the L1 layer.

High-Frequency Trading

The classic FPGA application. Sub-microsecond market-data-to-order latency.

Solarflare (acquired by AMD/Xilinx in 2019) — FPGA-based NICs (Onload, X2522) used by major HFT firms.
Algo-Logic Systems — FPGA trading IP for major exchanges.
AMD U10/U15/U20 SmartNICs — pre-validated low-latency cards for tape arbitrage.
Customers (publicly discussed) — Jane Street, Jump Trading, IMC Trading, Optiver, Citadel Securities, Tower, Hudson River Trading, DRW, Virtu. The state of the art is sub-100 nanosecond tape arbitrage on heavily customised FPGAs reading from co-located exchange feeds.
CME, Nasdaq, ICE, Eurex — many exchanges themselves use FPGA matching engines.

Genomics

Illumina DRAGEN — Illumina acquired Edico Genome in 2018 (~$100M reported); DRAGEN platform uses Xilinx FPGAs (UltraScale+) for accelerated BWA short-read alignment, variant calling. Used by major sequencing centres.
Sentieon — competing genomics pipeline with FPGA acceleration via Bittware cards.
Roche / NovaSeq — sequencer-side FPGA processing.

AI / ML Inference

FPGAs had a moment 2015-2020 as ML accelerators but have been largely outclassed by GPUs and dedicated ASICs (TPU, Trainium, Cerebras, Groq) for LLM workloads.

Microsoft Project Catapult — pioneering FPGA datacenter deployment (announced 2014; ISCA 2014 paper “A Reconfigurable Fabric for Accelerating Large-Scale Datacenter Services” by Putnam et al.). Bing search ranking accelerated by FPGAs in production. Project Brainwave extended Catapult to ML inference (Stratix 10), publicised 2017.
AMD Versal AI Engines — vector DSP tiles; INT4/INT8/BF16; FP32 fallback. Targeted at edge ML, signal processing, sparse workloads. Alveo V70 packages Versal AI Edge.
Intel Stratix 10 NX — AI Tensor blocks (INT4/INT8/BF16 MAC arrays).
Microsoft Azure ML inference — historically used FPGAs (Brainwave) for some Bing ranking and Office services; transitioning to ASIC (Maia 100, Cobalt).
Edge AI — Lattice sensAI (iCE40 + ML), Microchip VectorBlox, Efinix Quantum — small FPGAs for always-on vision, keyword spotting, anomaly detection at <100 mW.

Video

NETINT Quadra T1/T2 — FPGA-based H.265/AV1 video transcoders for live streaming.
AMD AMA T1 / T2 (Alveo Media Accelerators) — succeeding Xilinx VCU1525, deployed by Twitch and others for live transcoding.
Broadcast infrastructure — Grass Valley, Evertz, NEP — heavy FPGA use for SMPTE 2110, ST 2022, 4K/8K SDI conversion.

Aerospace and Defence

Radar processing — Northrop Grumman, Lockheed, Raytheon, BAE all use FPGAs for radar/EW (electronic warfare) signal processing chains.
Signal intelligence — agencies use rad-hard FPGAs.
Rad-hard parts — Microchip RT PolarFire (TID 100 krad, SEU-immune flash config), Xilinx Kintex UltraScale RT/Virtex 5QV (Mars Perseverance rover uses Xilinx Virtex 5QV for image processing), NanoXplore NG-MEDIUM (European space FPGAs).
Phased array radar — Zynq RFSoC dominates ground-based and shipboard radar.

Quantum Computing Control

The classical-quantum interface needs nanosecond-latency arbitrary waveform generation and feedback.

Quantum Machines OPX / OPX+ / OPX1000 — uses Xilinx UltraScale+ + RFSoC; market leader for academic and commercial quantum control.
Zurich Instruments SHFQC / HDAWG — Xilinx-based.
Keysight, Tektronix — high-end AWG/digitiser instruments use FPGAs throughout.
IBM Quantum, Google Quantum AI, Rigetti, Quantinuum — all use custom FPGA-based control electronics.

Test and Measurement

Keysight, Tektronix, Rohde & Schwarz, NI/Emerson — virtually every modern oscilloscope, signal analyser, network analyser ships Xilinx or Intel FPGAs as the central data pipe between ADCs and the processor.
National Instruments PXI / FlexRIO / LabVIEW FPGA — FPGAs are NI’s bread and butter.

Cryptocurrency

Pre-ASIC era (Bitcoin ~2011-2013), FPGAs were the dominant miner — Spartan-6 LX150 cards. Once Bitcoin ASICs arrived in 2013, FPGAs were uneconomical for SHA-256. They retain some role for ASIC-resistant or new privacy coins (some Monero variants, periodic algorithm forks).

Performance Metrics

Throughput — Gb/s for networking, GFLOPS or TOPS for compute. Versal Premium tops 14 TFLOPS FP32, 200+ TOPS INT8.
Latency — nanoseconds. The defining FPGA advantage: deterministic single-digit-microsecond or sub-microsecond response. GPUs simply cannot match this because of memory hierarchies and kernel launch overhead.
Power — single-digit to ~75 W for typical accelerator cards. AWS F1’s f1.2xlarge holds one VU9P at ~75 W TDP. Compare to ~300-700 W for a single H100 GPU.
Perf/W — FPGA generally wins vs GPU for streaming, fixed-precision, low-batch workloads; loses to GPU for batched dense matrix multiplication; loses to ASIC almost everywhere except where flexibility matters.

Cost

Approximate retail / list prices (2024-2026):

Card	Approx. price
Alveo U50	$1,500-2,500
Alveo U55C	$5,000-8,000
Alveo U250	$5,000-9,000
Alveo U280	$9,000-12,000
Alveo V70	$10,000+
Versal Premium VPK180 dev board	$40,000+
Bittware IA-840F	$20,000-50,000
Stratix 10 dev boards	$10,000-50,000

ASIC NRE (non-recurring engineering) for a 5 nm tape-out runs $20 - 50 M + b e f ore an ys i l i co n s hi p s; 3 nm c l oser t o$ 80-100M+. Mask sets alone for a 5 nm full-reticle chip are $5-15M. This is the economic reason FPGAs exist: volumes under ~1M units per year usually can’t justify ASIC NRE, and time-to-market is 6-18 months for FPGA vs 18-36 months for ASIC.

Trade-offs vs Other Accelerators

Property	FPGA	GPU	ASIC
Flexibility	High (reprogram per workload)	Medium (software)	None (fixed at tape-out)
Latency	ns-μs deterministic	μs-ms variable	ns-μs deterministic
Throughput per chip	Medium	Very high	Highest
Perf/W	Good for streaming	Good for batch	Best
Time-to-market	Weeks-months	Off-shelf	Years
NRE	$0 (use off-shelf)	$0 (off-shelf)	$10M-100M+
Volume economics	Good for 10s-100k units	Good for any volume	Needs >1M units
Floating-point	Available but expensive	Native	As designed
Sparse / irregular	Excellent	Good (with caveats)	As designed

Memory Hierarchy

A representative AMD Versal Premium accelerator:

Tier	Capacity	Bandwidth	Latency
Per-LUT distributed RAM	bits	~1 GHz access	~1 ns
Block RAM (BRAM)	~2-3 MB total	~hundreds GB/s aggregate	1-2 cycles
UltraRAM (URAM)	up to ~36 MB	high aggregate	few cycles
HBM2/HBM2e (on-package)	8-32 GB	460-820 GB/s	tens of ns
DDR4/DDR5 (off-package)	16-128 GB	25-50 GB/s	~50-100 ns
Host PCIe Gen5 x16	unbounded	~63 GB/s bidir	μs

Designing an FPGA accelerator is largely a memory-hierarchy problem: arrange BRAMs and URAMs as on-chip caches to keep DSP slices fed without going to HBM/DDR on the critical path.

Future Trends

Chiplet-based FPGAs

Versal Premium and Intel Agilex 9 both use chiplets / multi-die packaging — separate dies for fabric, transceivers, memory controllers connected via silicon interposer (CoWoS for AMD, EMIB for Intel). Lets vendors mix and match die generations and yields, ship larger effective devices than single-die reticle limits allow.

AI Engines + NoC

Versal’s split between programmable logic, hardened AI Engine arrays, scalar processors, and the NoC is the model future devices will follow — fabric as one accelerator among many, all wired together on-die.

Embedded FPGA (eFPGA)

eFPGA blocks licensed into custom ASICs let designers reserve a reconfigurable region for post-tape-out flexibility. Vendors:

Achronix Speedcore — high-density eFPGA IP.
QuickLogic ArcticPro / EOS — low-power eFPGA.
Menta (France) — eFPGA IP.
Flex Logix EFLX — eFPGA IP licensed into networking and aerospace ASICs.

CXL-attached FPGAs

CXL 2.0/3.0 enables FPGAs to act as memory expanders or coherent accelerators on the CXL fabric. Samsung CXL SmartSSD prototypes used FPGAs; AMD Versal Premium supports CXL endpoint mode. Promising for near-memory computation in disaggregated datacentres.

Open-source Silicon and FPGAs

The convergence of yosys + nextpnr + RISC-V cores (Rocket, BOOM, CVA6, Ibex) means a non-trivial fraction of new chips can be designed with a fully open toolchain on FPGA first, then taped out via Open Source Silicon (OpenROAD, Skywater 130nm PDK, Google/Efabless shuttle runs).

When to Use FPGAs

Good fit:

Sub-microsecond deterministic latency (HFT, radar, control)
Custom networking / packet processing (SmartNICs, line-rate filters)
Signal processing with high I/O bandwidth (RFSoC, radar, software-defined radio)
Low-volume specialised acceleration (genomics, broadcast)
Prototyping ASICs before tape-out
Edge AI with tight power budgets

Bad fit:

Generic batched ML inference / training (GPUs and TPUs dominate)
Volumes >1M units where ASIC NRE amortises (just build the ASIC)
Pure floating-point HPC workloads (GPUs win)
Workloads requiring fast iteration on programming model (software accelerators win)

Adjacent

gpus and parallel architectures
asics and custom silicon
dpus and smartnics
digital design and rtl
high frequency trading systems
digital signal processing

Compendium

Explorer

FPGA & Hardware Acceleration — Architecture, HLS, Vitis, OpenCL, Bittware, AWS F1