Digital Logic
1. At a glance
Digital logic is the foundation of all modern computing — every microprocessor, microcontroller, FPGA, GPU, ASIC, communication chip, memory device, and digital sensor is built from it. The discipline operates at two abstraction levels:
- Gate level — primitive operators (AND, OR, NAND, NOR, XOR, NOT, inverter, buffer) and primitive storage elements (latches, flip-flops). Maps directly to transistors on silicon; consumed mainly by IC design, FPGA synthesis tools, and the dwindling world of discrete 74-series glue logic.
- Register-transfer level (RTL) — datapaths (adders, multipliers, register banks, multiplexers) connected by control logic (finite-state machines). Captured in VHDL or SystemVerilog, then synthesised to gates by tools (Vivado, Quartus, Yosys, Design Compiler). This is where modern hardware engineers spend almost all their time.
The physical implementation today is overwhelmingly CMOS in silicon — complementary NMOS+PMOS transistor pairs etched in process nodes from 28 nm down to 2 nm. CMOS dominates because it dissipates near-zero static power: current flows only during transitions, charging and discharging capacitances. Every other logic family (TTL, ECL, BiCMOS, NMOS-only) has been displaced for general-purpose digital except in narrow niches (ECL for sub-nanosecond serial-link clock drivers, BiCMOS for high-speed mixed-signal).
Engineering work splits roughly three ways. SoC / ASIC design captures the design in RTL and pushes it through synthesis, place-and-route, sign-off, and tapeout to a foundry — long cycle times (months to years), high NRE (USD 1 M to USD 100 M+), high volume amortisation. FPGA design uses the same RTL languages but synthesises to look-up tables (LUTs) and flip-flops baked into a programmable device — no silicon spin needed, weeks not years, fits prototypes, low-volume products, and applications where flexibility outvalues the silicon-efficiency penalty (3–10× larger, 3–5× slower, 5–20× more power than equivalent ASIC). Discrete-logic design wires 74-series single gates and small functions on a PCB — still used for glue logic, level shifting, fan-out, and protection on otherwise mostly-analog boards, but no longer for any system of meaningful size.
The two issues that consume more debug time than any other in real digital systems are metastability (and the clock-domain crossings that cause it) and timing closure (setup, hold, and clock-skew analysis). They are covered in 5p and 7p respectively.
2. First principles
Boolean algebra (Shannon 1938)
Claude Shannon’s MIT master’s thesis showed that two-valued (Boolean) algebra — developed by George Boole in the 1850s for symbolic logic — maps directly to switching circuits made of relays. The same algebra applies, unchanged, to vacuum-tube gates, transistor gates, and CMOS gates. The variables take values in {0, 1}; the three primitive operators are:
- AND (·, ∧, conjunction): A·B = 1 iff A = B = 1.
- OR (+, ∨, disjunction): A + B = 1 iff A = 1 or B = 1.
- NOT (¬, ’, overbar, negation): A’ = 1 iff A = 0.
Voltage levels carry the logical values: 0 = LOW (typically Gnd, 0 V) and 1 = HIGH (typically V_CC: 5.0 V, 3.3 V, 2.5 V, 1.8 V, 1.2 V, 0.9 V depending on family and node). Anything in between is a forbidden region — the receiving gate amplifies it back toward a defined level, or oscillates, or burns extra static power.
Combinational vs sequential
- Combinational logic — output is a Boolean function of the current inputs alone: out = f(in). No memory. Examples: AND/OR/NAND/NOR gates, multiplexers, adders, decoders, comparators, ALUs. Modelled as a directed acyclic graph of gates; analysed by truth table or Karnaugh map.
- Sequential logic — output depends on current inputs and on stored state: out = f(in, state); state_next = g(in, state). State is held in flip-flops or latches between clock edges. Examples: registers, counters, finite-state machines, memories, processor cores.
The discipline of synchronous design demands that every storage element be clocked from a single distribution network. State updates happen at well-defined moments (clock edges); between edges, combinational logic propagates from one register’s output to the next register’s input. This makes timing analysis tractable: you only need to verify that the worst-case combinational path completes before the next clock edge arrives.
De Morgan’s laws
Two identities that are used so often they deserve highlighting:
- ¬(A · B) = ¬A + ¬B (NOT of AND is OR of NOTs)
- ¬(A + B) = ¬A · ¬B (NOT of OR is AND of NOTs)
The practical consequence: any AND-OR network can be converted into a NAND-NAND network of identical depth, and any OR-AND into NOR-NOR. CMOS implements NAND and NOR more efficiently than AND and OR (an AND gate is internally a NAND followed by an inverter), so synthesis tools almost always emit NAND/NOR-dominated netlists. De Morgan also enables “bubble-pushing” optimisation, where inversions are slid across gate boundaries until they cancel out, often eliminating an inverter from the critical path.
Minimisation: K-maps and Quine-McCluskey
A truth table with n inputs has 2^n rows. The naïve sum-of-products implementation uses one AND gate per row whose output is 1. Minimisation reduces the gate count by combining adjacent rows (Karnaugh, 1953):
- Karnaugh map (K-map) — geometric arrangement of the truth table such that adjacent cells differ in exactly one input variable. Practical for n ≤ 6; rectangles of 1, 2, 4, 8, 16 cells become product terms with successively fewer literals. Five and six-variable K-maps work but are unwieldy.
- Quine-McCluskey — tabular minimisation algorithm; finds all prime implicants and then solves the covering problem. Exact (gives minimum cost) but exponential time in the worst case.
- Espresso heuristic (Brayton 1984) — finds near-optimal solutions in polynomial time for hundreds of variables; embedded in every commercial synthesis tool.
Software handles all real synthesis; hand minimisation is taught for intuition and used only in trivial cases.
3. Practical math / design equations
Boolean identities (memorise these)
| Identity | Form |
|---|---|
| Identity | A + 0 = A; A · 1 = A |
| Null | A + 1 = 1; A · 0 = 0 |
| Idempotence | A + A = A; A · A = A |
| Complement | A + A’ = 1; A · A’ = 0 |
| Double negation | (A’)’ = A |
| Commutativity | A + B = B + A; A · B = B · A |
| Associativity | (A + B) + C = A + (B + C); same for · |
| Distributivity | A · (B + C) = A·B + A·C; A + (B · C) = (A + B) · (A + C) |
| Absorption | A + A·B = A; A · (A + B) = A |
| Consensus | A·B + A’·C + B·C = A·B + A’·C |
| De Morgan | (A·B)’ = A’ + B’; (A + B)’ = A’·B’ |
Universal gates
Either NAND alone or NOR alone is functionally complete — any Boolean function can be implemented using only that one gate type. Two NAND gates make an inverter (tie inputs together), three NANDs make an AND, three NANDs make an OR (via De Morgan), and so on. This is why the simplest standard-cell CMOS libraries use NAND and NOR as their primary building blocks; complex gates (AOI, OAI — AND-OR-Invert and OR-AND-Invert) are added because they can be implemented in a single CMOS stack more efficiently than the equivalent NAND/NOR tree.
Common combinational blocks
- Multiplexer (n-to-1 mux) — selects one of n inputs based on log₂(n) select lines. The most-used digital element after the gate; every register file, ALU, and bus is built from muxes.
- Demultiplexer / decoder — n select lines, 2^n outputs, exactly one active. Address-decoders inside SRAMs and instruction-decoders in CPUs.
- Encoder — opposite of decoder; converts a one-hot input vector into a binary code. Priority encoder breaks ties.
- Half-adder — A XOR B = sum, A AND B = carry-out. Two inputs only.
- Full-adder — sum = A XOR B XOR C_in; carry-out = (A·B) + (C_in·(A XOR B)). Cascade n full-adders for a ripple-carry adder; or use carry-lookahead / carry-save / Kogge-Stone for log-depth instead of linear.
- Comparator — emits A > B, A = B, A < B. Built from XNOR (equality) plus a chain.
- ALU — arithmetic-logic unit; combines adder, logic operators, shifter, and result mux under control signals.
Storage elements
- SR latch — set/reset, asynchronous, prone to invalid state when both S and R are asserted. Educational; rarely used in synthesis.
- D latch — transparent when enable is high (output follows D), opaque when low (output holds). Level-sensitive.
- D flip-flop (D-FF) — edge-triggered; samples D on the rising (or falling) clock edge and holds until the next edge. The dominant storage element in all modern synthesis — virtually every register in an FPGA or standard-cell ASIC is a D-FF.
- JK flip-flop — like D-FF but with J/K inputs; J=K=1 toggles. Mostly historic; synthesis re-encodes any JK as a D-FF with combinational logic in front.
- T flip-flop — toggles on each clock edge when T=1. Used in counters; again, typically synthesised from a D-FF with feedback.
Sequential building blocks
- Register — bank of n D-FFs sharing a clock; stores an n-bit word.
- Shift register — chain of D-FFs where each output feeds the next input. Serial-in/serial-out, serial-in/parallel-out, parallel-in/serial-out variants. Used in UARTs, SPI, JTAG scan chains.
- Counter — register that increments on each clock edge. Binary counter (counts 0..2^n−1), BCD counter (0..9 then resets), ring counter (one-hot, only one bit set, rotates), Johnson counter (twisted-ring, 2n unique states from n bits), Gray-code counter (adjacent states differ in exactly one bit — used for clock-domain crossing FIFO pointers because no intermediate invalid count appears during a transition).
- Finite-state machine (FSM) — combinational next-state logic + state register + combinational output logic. Two flavours:
- Moore FSM — output = f(state). One clock edge of latency from input to output; outputs are glitch-free between clock edges.
- Mealy FSM — output = f(state, input). Faster (output reacts in the same cycle as input changes) but outputs can glitch with input changes; care needed if the output drives anything edge-sensitive.
Timing parameters
Every flip-flop has three fundamental timing numbers, all measured against the clock edge:
- t_CO (clock-to-Q, also t_clk-to-Q, t_pd, t_co) — propagation delay from active clock edge to the Q output becoming valid.
- t_setup (t_su) — minimum time the D input must be stable before the active clock edge.
- t_hold (t_h) — minimum time the D input must remain stable after the active clock edge.
Combinational gates have t_pd (propagation delay) and t_cd (contamination delay, minimum delay through the gate, used for hold analysis).
The fundamental synchronous-design inequality, for a path from one register through combinational logic to the next register:
Setup constraint: t_CO + t_comb_max + t_setup ≤ T_clk − t_skew
This must hold to avoid setup violations — data must arrive at the destination register input at least t_setup before the next clock edge. The maximum clock frequency is f_max = 1 / (T_clk min satisfying the constraint).
Hold constraint: t_CO + t_comb_min ≥ t_hold + t_skew
This must hold to avoid hold violations — data must not arrive too early, before the previous edge’s data has been captured. Hold violations are independent of clock period and cannot be fixed by slowing the clock; they require buffer insertion in the data path.
Clock skew (t_skew) is the difference in clock arrival time at the source and destination flip-flops. Positive skew (clock arrives later at destination) relaxes setup but tightens hold; negative skew does the opposite. Skew is the single most subtle parameter in real designs; clock-tree synthesis (CTS) tools spend significant effort balancing it.
Worked example 1 — Boolean simplification by K-map
Minimise F(A, B, C) = A·B’·C’ + A·B’·C + A·B·C + A’·B·C’.
Truth table (1 where F is asserted):
| A | B | C | F |
|---|---|---|---|
| 0 | 0 | 0 | 0 |
| 0 | 0 | 1 | 0 |
| 0 | 1 | 0 | 1 |
| 0 | 1 | 1 | 0 |
| 1 | 0 | 0 | 1 |
| 1 | 0 | 1 | 1 |
| 1 | 1 | 0 | 0 |
| 1 | 1 | 1 | 1 |
K-map with rows AB ∈ {00, 01, 11, 10} (Gray-coded) and columns C ∈ {0, 1}:
| C=0 | C=1 | |
|---|---|---|
| AB=00 | 0 | 0 |
| AB=01 | 1 | 0 |
| AB=11 | 0 | 1 |
| AB=10 | 1 | 1 |
Grouping rectangles of 1s (each rectangle must have 1, 2, 4, 8 cells in a power-of-2 shape, wrapping at edges):
- Pair (AB=10, C=0) + (AB=10, C=1) → A·B’ (eliminates C).
- Pair (AB=10, C=1) + (AB=11, C=1) → A·C (eliminates B).
- Single (AB=01, C=0) → A’·B·C’ (no neighbour).
Minimal sum-of-products: F = A·B’ + A·C + A’·B·C’ (six literals, three product terms — down from the original four product terms with twelve literals).
Verification at (A,B,C) = (1,1,1): A·B’ = 0, A·C = 1, A’·B·C’ = 0 → F = 1. Matches. At (0,1,0): A·B’ = 0, A·C = 0, A’·B·C’ = 1 → F = 1. Matches.
Worked example 2 — Maximum clock frequency
A pipeline stage between two D-FFs has the following parameters:
- t_CO = 1.0 ns
- Combinational worst-case (4 gate-delays of 2 ns each + wire delay 1 ns) = 9 ns
- t_setup = 0.5 ns
- Clock skew = 0 (synchronous, well-balanced clock tree)
Setup constraint: T_clk ≥ t_CO + t_comb_max + t_setup + t_skew = 1.0 + 9.0 + 0.5 + 0 = 10.5 ns.
f_max = 1 / 10.5 ns ≈ 95.2 MHz.
If we deepen the pipeline by inserting an extra register stage in the middle (cutting the combinational path roughly in half), each new stage has t_comb_max ≈ 4.5 ns:
T_clk ≥ 1.0 + 4.5 + 0.5 = 6.0 ns → f_max ≈ 167 MHz — a 75 % speed-up at the cost of one extra cycle of latency. This is the central trade-off of pipelining.
Hold check on the original stage (assuming t_comb_min = 1.0 ns from contamination-delay analysis of the same logic and t_hold = 0.2 ns):
t_CO + t_comb_min = 1.0 + 1.0 = 2.0 ns ≥ t_hold + t_skew = 0.2 + 0 = 0.2 ns. Hold passes with 1.8 ns of margin. (Hold typically passes by wide margins in a balanced clock tree; it becomes critical when clock skew is large or when the combinational path is unusually short.)
Worked example 3 — Mealy FSM for “1011” sequence detector
A serial bitstream arrives one bit per clock cycle on input D. The output Y must pulse high for one cycle when the most recent four bits are “1011” (overlapping detection allowed — “10111011” should fire twice).
State encoding (4 states):
| State | Meaning |
|---|---|
| S0 | No partial match. |
| S1 | Last bit was 1. |
| S2 | Last two bits were 10. |
| S3 | Last three bits were 101. |
Transition table (Mealy: output associated with the transition, not the state):
| Current | D | Next | Y |
|---|---|---|---|
| S0 | 0 | S0 | 0 |
| S0 | 1 | S1 | 0 |
| S1 | 0 | S2 | 0 |
| S1 | 1 | S1 | 0 |
| S2 | 0 | S0 | 0 |
| S2 | 1 | S3 | 0 |
| S3 | 0 | S2 | 0 |
| S3 | 1 | S1 | 1 |
The transition S3 + D=1 → S1 is the only one with Y = 1, and the next state is S1 (not S0) so that “10111” can still start matching the next sequence — overlap support.
With binary state encoding (state = {s1, s0}: S0 = 00, S1 = 01, S2 = 10, S3 = 11), the next-state and output logic synthesised from the table:
- s1_next = (s1·s0’·D’) + (s1·s0·D’) = s1·D’ → next-bit-was-zero advances or holds the “10” / “101” prefix.
- s0_next = (s1’·s0’·D) + (s1’·s0·D) + (s1·s0·D) = D·(s1’ + s0) → there is a prefix-1 currently active.
- Y = s1·s0·D.
In SystemVerilog:
typedef enum logic [1:0] {S0=2'b00, S1=2'b01, S2=2'b10, S3=2'b11} state_t;
state_t state, next_state;
always_ff @(posedge clk or negedge rst_n) begin
if (!rst_n) state <= S0;
else state <= next_state;
end
always_comb begin
next_state = state;
Y = 1'b0;
case (state)
S0: next_state = D ? S1 : S0;
S1: next_state = D ? S1 : S2;
S2: next_state = D ? S3 : S0;
S3: begin
next_state = D ? S1 : S2;
Y = D; // Mealy: output asserts on the "1" that completes 1011
end
endcase
endA Moore version would register Y as well, costing one cycle of latency but eliminating any combinational glitch on the output.
4. Reference data
Logic family voltage levels (5 V, 3.3 V, 2.5 V, 1.8 V supplies)
| Family | V_CC | V_OH min | V_IH min | V_IL max | V_OL max | Notes |
|---|---|---|---|---|---|---|
| 74LSxx (TTL) | 5.0 V | 2.7 V | 2.0 V | 0.8 V | 0.5 V | Bipolar low-power Schottky; legacy. |
| 74HCxx (CMOS) | 2–6 V | V_CC − 0.1 V | 0.7·V_CC | 0.3·V_CC | 0.1 V | CMOS; symmetric thresholds. |
| 74HCTxx | 4.5–5.5 V | V_CC − 0.1 V | 2.0 V | 0.8 V | 0.1 V | CMOS device, TTL-compatible inputs (bridges 5 V worlds). |
| 74ACxx, 74AHCxx | 2–5.5 V | V_CC − 0.1 V | 0.7·V_CC | 0.3·V_CC | 0.1 V | Advanced CMOS; ~3× faster than HC. |
| 74LVCxx | 1.65–5.5 V | V_CC − 0.2 V | 2.0 V | 0.8 V | 0.4 V | Modern default for 3.3 V designs; 5 V tolerant inputs. |
| 74AVCxx | 1.4–3.6 V | V_CC − 0.2 V | 0.65·V_CC | 0.35·V_CC | 0.2 V | High-speed translator class. |
| 74AUCxx | 0.8–2.7 V | V_CC − 0.2 V | 0.7·V_CC | 0.3·V_CC | 0.2 V | 1.8 V mobile / DDR companion logic. |
| LVCMOS33, LVCMOS25, LVCMOS18, LVCMOS12 | various | V_CC − 0.4 V | 0.7·V_CC | 0.3·V_CC | 0.4 V | FPGA single-ended I/O standards (JEDEC JESD8-x). |
| LVDS | 1.2 V common-mode, ±350 mV swing | n/a (differential) | n/a | n/a | n/a | High-speed point-to-point links. |
| SSTL, HSTL, POD12, POD15 | various | varies | varies | varies | varies | DDR memory interfaces; Vref-centred. |
| SLVS-EC, MIPI D-PHY | sub-volt swings | n/a | n/a | n/a | n/a | Camera and display serial. |
The “noise margin” of a family is min(V_OH − V_IH, V_IL − V_OL). Modern 1.2 V LVCMOS has noise margins of ~240 mV — orders of magnitude less than 5 V CMOS at ~1.5 V — which is why high-speed boards need careful PDN and termination.
Propagation delays (representative 2-input NAND, 25 °C, V_CC nominal)
| Part | V_CC | t_pd typ | Notes |
|---|---|---|---|
| 7400 (original TTL) | 5 V | 10 ns | 1960s technology; still made. |
| 74LS00 | 5 V | 10 ns | Low-power Schottky. |
| 74HC00 | 5 V | 7 ns | CMOS. |
| 74HC00 | 2 V | 90 ns | Same part at low supply — much slower. |
| 74AC00 | 5 V | 4 ns | Advanced CMOS. |
| 74AHC00 | 5 V | 5 ns | Advanced HC. |
| 74LVC1G00 | 3.3 V | 4 ns | Single-gate package (SC-70, SOT-23-5). |
| 74AUC1G00 | 1.8 V | 1.7 ns | Mobile-class. |
| 74AUP1G00 | 1.8 V | 4 ns | Low-power; for battery devices. |
| ECL 10K | −5.2 V | 1.5 ns | Differential current-steering; high power. |
| Standard-cell 28 nm NAND | 0.9 V | ~25 ps (gate) | Inside an SoC. |
| Standard-cell 7 nm NAND | 0.7 V | ~10 ps (gate) | Modern leading-edge SoC. |
Power: CMOS dynamic and static
CMOS power dissipation decomposes into three terms:
- Dynamic (switching) — P_dyn = α · C_L · V_DD² · f. α is activity factor (probability a node toggles each clock cycle, typically 0.05–0.25), C_L is total capacitive load, V_DD is supply, f is clock frequency. This V² dependence drives every low-power-design decision in the last 40 years (clock gating, voltage scaling, near-threshold computing).
- Short-circuit — P_sc = V_DD · I_peak · t_sc · f. Both PMOS and NMOS briefly conduct during each transition. Small fraction of total in well-designed standard cells (~10 %); becomes important when input slew is slow.
- Static (leakage) — subthreshold drain current + gate-oxide tunnelling + reverse-bias junction. In 90 nm and older, leakage was nanoamps per gate and negligible. At 28 nm and below, leakage is comparable to dynamic at full activity and dominates idle power. Mitigation: multi-V_t libraries (high-V_t cells in non-critical paths), power gating (sleep transistors), back-bias.
74-series part-number nomenclature
| Number | Function |
|---|---|
| 74xx00 | quad 2-input NAND |
| 74xx02 | quad 2-input NOR |
| 74xx04 | hex inverter |
| 74xx08 | quad 2-input AND |
| 74xx14 | hex Schmitt-trigger inverter |
| 74xx32 | quad 2-input OR |
| 74xx74 | dual D-FF with set and reset |
| 74xx86 | quad 2-input XOR |
| 74xx138 | 3-to-8 line decoder |
| 74xx139 | dual 2-to-4 decoder |
| 74xx151 | 8-to-1 multiplexer |
| 74xx153 | dual 4-to-1 multiplexer |
| 74xx157 | quad 2-to-1 mux |
| 74xx164 | 8-bit serial-in, parallel-out shift register |
| 74xx165 | 8-bit parallel-in, serial-out shift register |
| 74xx244 | octal buffer / line driver |
| 74xx245 | octal bidirectional bus transceiver |
| 74xx373 | octal transparent latch |
| 74xx374 | octal D flip-flop with tri-state |
| 74xx390 | dual decade ripple counter |
| 74xx595 | 8-bit shift-register with output latch — hobby favourite for I/O expansion |
| 74xx1G00 | single 2-input NAND in SOT-23-5 / SC-70 |
The single-gate (1G) suffix collection — 74LVC1G00 (NAND), 1G02 (NOR), 1G04 (INV), 1G08 (AND), 1G14 (Schmitt-trigger inverter), 1G32 (OR), 1G86 (XOR), 1G125 (buffer with enable) — is the modern way to add a missing gate on a PCB without a full quad package.
4000-series CMOS (legacy, wide V_CC range)
The 4000 series (introduced by RCA in 1968) accepts V_CC = 3–18 V and is still in production for high-voltage / industrial applications where its slow speed is acceptable.
| Part | Function |
|---|---|
| 4011 | quad 2-input NAND |
| 4013 | dual D flip-flop |
| 4017 | decade counter (one-hot of 10) |
| 4024 | 7-stage ripple counter |
| 4040 | 12-stage ripple counter |
| 4046 | phase-locked loop |
| 4060 | 14-stage ripple counter + oscillator |
| 4066 | quad analogue bilateral switch |
| 4093 | quad Schmitt-trigger NAND |
| 4511 | BCD-to-7-segment decoder |
5p. Theory
CMOS gate construction
A CMOS gate is built from two complementary networks:
- Pull-up network (PUN) — PMOS transistors between V_DD and the output. Conducts (pulls output high) when input(s) are low.
- Pull-down network (PDN) — NMOS transistors between output and Gnd. Conducts (pulls output low) when input(s) are high.
The two networks are duals: where PUN has series PMOS, PDN has parallel NMOS, and vice versa. This duality guarantees that at any static input combination, exactly one network conducts and the other is off — zero static current from V_DD to Gnd.
A 2-input NAND is two PMOS in parallel (PUN) and two NMOS in series (PDN). A 2-input NOR is two PMOS in series and two NMOS in parallel. NOR’s stacked PMOS make it slow (PMOS has ~2× lower mobility than NMOS for the same area, so to match drive strength its width must be doubled — and stacked-PMOS multiplies that area cost). For this reason, almost every CMOS standard-cell library is NAND-dominant: synthesis tools preferentially map combinational logic to NAND trees with inversions absorbed into adjacent cells.
From RTL to silicon — the synthesis flow
- RTL — designer writes VHDL or SystemVerilog at the register-transfer level. Code is functional: registers, arithmetic, FSMs.
- Elaboration — tool flattens the design hierarchy, resolves generics/parameters, builds a generic gate-and-FF representation.
- Logic synthesis — combinational logic is optimised (Espresso-style minimisation, retiming across registers, resource sharing) and mapped to the target library (a particular FPGA’s LUTs, or an ASIC standard-cell library).
- Floorplan / placement — gates are assigned physical locations. Goal: minimise wire length, satisfy region constraints (power-domain partitioning, sensitive analog regions).
- Clock-tree synthesis (CTS) — the clock signal is buffered and routed to every flip-flop. Balanced trees, H-trees, mesh networks; goal is to minimise skew within and across regions.
- Routing — signal wires are placed in metal layers (modern processes have 8–15 metal layers). DRC, antenna rules, electromigration limits.
- Static timing analysis (STA) — every register-to-register path is analysed. Slack at each endpoint = (data required time) − (data arrival time). Negative slack means a setup or hold violation.
- Sign-off — STA in all corners (slow-slow-cold, fast-fast-hot, plus on-chip variation), power analysis, IR-drop, electromigration, antenna, DRC, LVS. Sign-off failures send the design back to RTL.
- For FPGAs: bitstream generation. For ASICs: GDSII tapeout to the foundry.
The single biggest difference between FPGA and ASIC flows is that FPGA tools complete in minutes-to-hours and target a fixed silicon device; ASIC tools complete in days-to-weeks and produce a new mask set.
Static Timing Analysis (STA)
STA is not simulation. It computes worst-case delays through every register-to-register path in the design, deterministically, without applying input vectors. Every path is checked against the setup and hold inequalities of section 3.
Slack at a register’s input = T_clk − (t_CO_source + t_comb + t_setup − t_skew). Positive slack means the path meets timing with margin. Designers iterate by attacking the most negative-slack paths first — usually by re-pipelining, replicating, or rewriting RTL for the critical region.
Modern STA is multi-corner, multi-mode (MCMM): every path is analysed in all temperature, voltage, and process corners simultaneously (slow-slow-cold for setup, fast-fast-hot for hold typically), and across operating modes (functional, test, low-power). At advanced nodes (≤16 nm) on-chip variation (OCV) statistics push margins further, and statistical STA (SSTA) treats delays as random variables to avoid over-pessimism.
Clock-domain crossing (CDC) and metastability
When data must travel from a register clocked by clk_A to a register clocked by clk_B with no defined phase relationship between the two clocks, the receiving flip-flop may sample at the exact moment the data is changing. The flip-flop enters a metastable state — Q sits between 0 and 1 for an indefinite time before resolving randomly to one or the other.
The probability of metastability lasting longer than t after the clock edge falls exponentially: P(t) ≈ (T_w / T_clk) · exp(−t / τ), where T_w is the metastability window (~ns) and τ is the resolution time constant (~10 ps for modern logic). The mean time between failure (MTBF) of a single flip-flop sampling an asynchronous signal is:
MTBF = (exp(t_resolve / τ)) / (T_w · f_data · f_clk)
With t_resolve typically being one clock period of slack, MTBF for a single FF can be milliseconds — unacceptable. The standard fix is the two-flip-flop synchroniser: a second FF in series gives the first FF’s metastability one full clock period to resolve before being sampled again. This multiplies MTBF to years or centuries:
MTBF_2FF = (exp(T_clk / τ)) × MTBF_1FF
For control signals: 2-FF synchroniser (sometimes 3-FF in safety-critical designs). For data buses: synchronisers do not work in general — different bits resolve at different times, yielding corrupt intermediate values. Use an asynchronous FIFO with Gray-coded read/write pointers, or a handshake protocol (req/ack with full synchronisation in both directions). For vector control where one bit is the “valid” flag and the others are payload: use the synchroniser on the valid bit, and let it gate a stable payload that was held for at least one clk_B period before the valid pulse.
Synchronous vs asynchronous design
Synchronous design uses one global clock (or a small number of synchronised clock domains, with explicit CDC infrastructure between them). All FFs sample on the same clock edge; all combinational logic settles between edges; STA is well-defined. This is the methodology of every modern ASIC and FPGA design — and the reason synthesis tools work.
Asynchronous (clockless) design has FFs / state-elements that update on completion handshakes rather than a clock. Theoretical advantages: no clock distribution power, no skew, automatic adaptation to process and temperature, lower EMI. Practical disadvantages: no automated synthesis flow comparable to synchronous, very few engineers trained in it, harder debug, no commercial verification tools. Used in narrow specialty contexts: parts of NoCs in some research chips, the IBM EPN async network, the Achronix early architectures (later moved to synchronous). Building blocks include Mueller C-elements (output transitions only when all inputs agree), bundled-data (synchronous data + async-handshake control), dual-rail encoding (every wire is a pair encoding 0, 1, or “not yet”). Outside academia and a few specialty vendors, async is not commercially viable today.
6p. Application
Discrete logic on PCBs
The 74xx single-gate packages (1G family from TI, NXP, Diodes Incorporated, Nexperia, etc.) are the modern way to add small bits of logic on an otherwise mostly-MCU board: level shift a single signal, invert a strobe, gate a clock, debounce with a Schmitt trigger, build a simple OR. Cost is microscopic (USD 0.03–0.10 per gate); the only real cost is PCB area and the BOM line. Anything larger than a few gates belongs in a CPLD or in the MCU’s firmware.
CPLDs (Complex Programmable Logic Devices)
100–2000 macrocells (each macrocell is a flip-flop plus a product-term array). Non-volatile flash configuration (instant-on at power-up). Tens of nanoseconds clock-to-output. Tiny BGA / QFN packages. Use cases: board-level glue logic, boot sequencing, power-up state machines, system-level resets, supervisor functions. Representative parts:
- Lattice MachXO3LF / MachXO3D — 640–6,900 LUTs, on-chip flash, secure boot in MachXO3D; USD 1–8.
- Altera/Intel MAX V / MAX 10 — 240–8,000 LEs (MAX 10 has block RAM and ADC, blurring CPLD/FPGA boundary).
- Microchip ATF15xx, ATF16xx — legacy GAL/SPLD replacements; tiny.
- Xilinx CoolRunner-II (legacy, EOL).
FPGAs (Field-Programmable Gate Arrays)
Modern FPGAs contain millions of LUTs (lookup tables, typically 6-input each), flip-flops, dedicated DSP blocks (multipliers + accumulators), block RAMs (kilobits each), transceivers (multi-Gb/s SerDes), high-speed memory interfaces (DDR4/DDR5 / HBM), and increasingly hardened ARM Cortex-A53 / Cortex-A78 cores in SoC variants. The configuration is held in SRAM and reloaded on power-up from external flash (Xilinx, Intel) or in on-chip flash (Lattice ECP5 has external; iCE40 has dual on-chip).
| Vendor | Family | Use case |
|---|---|---|
| AMD (formerly Xilinx) | Spartan-7, Artix-7, Kintex-7, Virtex-7 (legacy 28 nm); UltraScale+; Versal (7 nm, AI engines) | Industry leader; Zynq SoC line embeds Cortex-A; Versal for AI inference and 5G. |
| Intel (formerly Altera) | Cyclone-V (low-cost), Arria-10 (mid), Stratix-10 (high-end), Agilex (10 nm SuperFin) | Strong in datacenter SmartNICs. |
| Lattice | iCE40 (tiny, low-power), MachXO3 (CPLD-class), ECP5 (mid-range), CertusPro-NX, Avant (mid-to-high) | iCE40 has fully open-source toolchain (yosys + nextpnr + icestorm); popular in hobbyist and edge-AI. |
| Microchip | PolarFire, IGLOO2, SmartFusion2 (Cortex-M3 + FPGA) | Non-volatile flash-based FPGAs; security focus; industrial / aerospace / defence. |
| QuickLogic | EOS S3, EOS S0 | Sensor hubs, eFPGA. |
| Achronix | Speedster7t, Speedcore eFPGA | Embedded FPGA IP in ASICs. |
| Efinix | Trion, Titanium | Quantum-architecture FPGA; very efficient mid-range. |
| GOWIN | LittleBee, Arora | Low-cost Chinese vendor; rapidly adopted in consumer/IoT. |
Typical applications: software-defined radio, real-time signal processing, video pipelines, ASIC prototyping, custom motor controllers, low-latency networking (HFT), space hardware (rad-hard variants).
ASICs
Full-custom or standard-cell silicon at TSMC / Samsung / Intel Foundry / GlobalFoundries / UMC / SMIC. Per-mask-set NRE: USD 1 M at 65 nm, USD 5–10 M at 28 nm, USD 50 M+ at 5 nm, USD 100+ M at 3 nm. Per-die cost at high volume is far lower than the equivalent FPGA. Worth it at million-unit volumes or for extreme performance / efficiency (mobile SoCs, GPUs, custom accelerators). The intermediate is structured ASIC (eFPGA + structured logic + IP) and multi-project wafers (MPWs) (sharing a mask set across many small designs, USD 10–100 k for a 1–10 mm² die) for prototyping and low-volume / research.
HDLs
| Language | Pedigree | Notes |
|---|---|---|
| VHDL (IEEE 1076, originated 1987 from US DoD VHSIC) | Rigorous, strongly typed, verbose | Standard in aerospace, defence, European industry. Latest revision 2019 includes generics, fixed-point, protected types. |
| Verilog (IEEE 1364, originated 1985 commercial; now subsumed) | Closer to C in feel | Dominant in 1990s ASIC; superseded by SystemVerilog. |
| SystemVerilog (IEEE 1800-2017) | Superset of Verilog with OOP, constrained-random, assertions | Industry standard for ASIC RTL and verification (UVM testbenches). Default for almost any new SoC project. |
| Chisel (Scala-based, UC Berkeley, 2012) | Generates Verilog | Used by SiFive (RISC-V), Esperanto, Google TPUs. Higher abstraction with Scala metaprogramming. |
| SpinalHDL (Scala-based) | Generates Verilog/VHDL | Open-source community-favoured Chisel alternative; cleaner syntax. |
| Amaranth (formerly nMigen, Python-based) | Generates Verilog | Open-source ecosystem; tight integration with iCE40 / ECP5 open toolchains. |
| Migen, MyHDL (Python) | Generates Verilog | Earlier-generation Python HDLs; still active but less momentum than Amaranth. |
| HLS C/C++ (Vitis HLS, Catapult, Stratus) | Auto-synthesises C/C++ to RTL | Used for algorithm-heavy datapaths (DSP, vision, ML); usually wrapped in a hand-written RTL shell. |
See [[Languages/Tier3/hdl]] for deeper coverage of HDL syntax, idioms, and toolchains.
7p. Edge cases & assumptions
Glitches in combinational logic. When an input to combinational logic transitions, the output can briefly transition through an intermediate value before settling at the new correct steady-state — even if the steady-state output would be the same before and after. Example: a 4-to-1 mux whose select changes while two of its inputs are at the same value can still glitch on the output. Glitches are harmless if the next stage is a flip-flop that only samples on the clock edge, after the combinational logic has settled. They are catastrophic if a combinational signal is used as a clock for another flip-flop, as a write-enable for a memory, or as an asynchronous reset. Rule: never use a combinational signal as a clock. Always use the master clock with a synchronous enable.
Metastability is real and inevitable at any clock-domain crossing or asynchronous input. The two-flip-flop synchroniser brings MTBF to acceptable levels; ignoring CDC produces designs that work in simulation and on the bench, then fail randomly at customer sites after thousands of operating hours. CDC is the #1 cause of bugs in production digital systems that pass functional verification.
Hold violations cannot be fixed by slowing the clock. Setup violations can be fixed by raising T_clk. Hold violations are independent of T_clk — they only care about absolute delays. Fix them by inserting buffer cells in fast data paths or by retiming.
Reset strategies — asynchronous assert, synchronous deassert. A purely asynchronous reset has a race condition at deassertion: if reset deasserts very close to a clock edge, different flip-flops may or may not come out of reset on that edge, depending on their individual clock-tree paths. The fix is to assert the reset asynchronously (instant response when reset is needed) and deassert it synchronously to the clock (a small reset-synchroniser circuit). ASICs and most FPGAs follow this convention; some FPGAs prefer fully synchronous reset for register-packing reasons (Xilinx UltraScale+ has fewer flip-flop primitives that support async reset). Reset strategy must be decided early and stuck with; mixing styles invites subtle bugs.
Tri-state buses — avoid on-chip. Two drivers actively driving opposite values on the same wire create a short-circuit V_DD → Gnd path that can destroy both drivers in microseconds. On-chip, multiplexers (one driver, n inputs selected) are always preferable. Tri-state is acceptable only on chip-to-chip buses with software arbitration (legacy PCI, some external memory buses) — and even there, modern point-to-point links (PCIe, DDR4 read/write distinct) have replaced multi-drop tri-state buses.
Crosstalk at gigabit speeds. Adjacent signal wires in a PCB or on-chip metal layer couple capacitively (charge spike on the aggressor pulls the victim) and inductively (di/dt spike). Below ~50 MHz, crosstalk is usually negligible. From 100 MHz to 10 GHz, crosstalk between long-parallel traces requires deliberate management: differential signalling, ground stitching, controlled spacing, length matching, and absence of long parallel runs.
Power-supply noise and simultaneous switching outputs (SSO). A large output buffer driving from V_DD to Gnd pulls a current spike of (C_L · V_DD) coulombs through the bond-wire inductance L_bond. The instantaneous voltage drop is L_bond · di/dt, which can be hundreds of millivolts. With 32 outputs switching simultaneously, the V_DD pin “bounces” and Gnd “lifts” by an aggregated amount that may corrupt other on-chip logic. Mitigation: decoupling capacitors (10 nF — 100 nF close to the chip, plus bulk electrolytics), low-inductance package (BGA over QFP), staggered output transitions, slew-rate control. On modern SoCs, on-die decoupling capacitance and per-domain PDN engineering are entire sub-disciplines.
ESD on inputs. I/O pads have integrated diode protection (typically clamping to V_DD and Gnd) rated to 2–8 kV HBM (human-body model). Standard manufacturing handling stays well below that; field stress (cable connections, user contact) can far exceed it. External TVS diodes on connector pins (USBLC6, SMAJ series, PESD3V3) are mandatory on any user-touchable port.
Latch-up in CMOS. Bulk silicon contains parasitic PNP and NPN transistors in the well/substrate structure that together form a parasitic SCR (thyristor). If a transient pulse on a pin briefly forward-biases the well-to-substrate diode, the SCR can trigger and create a low-impedance path from V_DD to Gnd that persists until the power is cycled — sometimes destroying the chip first by I²R heating. Modern CMOS processes have guard rings, deep trenches, and substrate ties to prevent latch-up under specified conditions (typically I_inj = ±100 mA at any pin per JEDEC JESD78). Designers must respect those conditions; abrupt power sequencing (V_OUT before V_DD, or any pin driven above V_DD or below Gnd by more than 0.5 V) is the most common trigger.
Datasheet timing is at nominal conditions. Like analog parts, digital parts have process-voltage-temperature (PVT) corners. A 74LVC1G00 datasheet says t_pd = 4 ns at V_CC = 3.3 V, T = 25 °C — but at V_CC = 1.65 V and T = 125 °C, t_pd can easily be 15 ns. Always design with margin and consult the worst-case spec for the operating envelope.
8p. Tools & software
Simulation
- ModelSim / Questa (Siemens EDA, commercial) — the industrial-default RTL simulator; UVM-aware; mixed-language (VHDL + Verilog + SystemVerilog).
- Xcelium (Cadence, commercial) — peer to Questa; preferred in many big-SoC houses.
- VCS (Synopsys, commercial) — peer to Questa and Xcelium; strong constrained-random performance.
- Verilator (open-source) — translates Verilog/SystemVerilog to C++, compiles to a native simulator. 10–100× faster than ModelSim for pure RTL; lacks
eventsemantics so testbenches need adaptation (or use Cocotb). Standard for open-source FPGA / ASIC dev. - Icarus Verilog (open-source) — interpreted Verilog simulator; slower than Verilator but supports event-driven constructs.
- GHDL (open-source) — VHDL simulator.
- Cocotb (open-source) — Python-based testbench framework that wraps a Verilog/VHDL simulator (Verilator, Icarus, GHDL, ModelSim, VCS, Xcelium). Lets engineers write testbenches in Python instead of SystemVerilog; rapidly gaining adoption.
Synthesis and place-and-route
- Vivado / Vitis (AMD/Xilinx, freemium) — synthesis + P&R + bitstream for all post-7-series Xilinx FPGAs.
- Quartus Prime (Intel/Altera, freemium) — synthesis + P&R for Cyclone, Arria, Stratix, Agilex.
- Diamond / Radiant / Lattice Propel (Lattice, freemium) — for ECP5, MachXO3, iCE40, Avant.
- Libero SoC (Microchip, freemium) — for PolarFire, IGLOO2, SmartFusion2.
- Synplify Pro (Synopsys, commercial) — third-party synthesis used by many ASIC teams targeting FPGAs.
- Design Compiler (Synopsys, commercial) — ASIC RTL synthesis golden tool.
- Genus (Cadence, commercial) — ASIC synthesis peer to Design Compiler.
- Yosys (open-source) — open synthesis for FPGAs and (with OpenROAD downstream) ASICs; supports SystemVerilog subset.
- nextpnr (open-source) — open P&R for iCE40, ECP5, Gowin, Nexus; pairs with Yosys.
- Project IceStorm / Project Trellis / Project Nexus / Apicula (open-source) — bitstream documentation projects for Lattice and GOWIN devices; enable the open Yosys+nextpnr flow.
ASIC physical design
- Innovus (Cadence) and IC Compiler II (Synopsys) — commercial place-and-route; industry-standard sign-off.
- OpenROAD (open-source) — full RTL-to-GDSII open flow; production-quality at 130 nm and 28 nm via SkyWater PDK and Globalfoundries 180 nm/130 nm shuttles, supported by the OpenLane / OpenLane2 flow integration.
- PrimeTime (Synopsys) — ASIC STA golden sign-off.
- Tempus (Cadence) — STA peer to PrimeTime.
Formal verification and equivalence checking
- JasperGold (Cadence), VC Formal (Synopsys), Questa Formal (Siemens) — commercial formal property verification.
- SymbiYosys (open-source) — formal flow built on Yosys with backend solvers (Z3, Yices, Boolector); good for property checking and bounded model checking.
- Formality (Synopsys), Conformal (Cadence) — sequential equivalence checking, used between RTL and netlist sign-off.
See [[Languages/Tier3/theorem-prover-dsls]] for the formal-property-language ecosystem (SVA, PSL, TLA+).
Bitstream programming
- Vendor flashers — programmer-cable + GUI tools for each FPGA family.
- OpenOCD (open-source) — JTAG / SWD generic driver with many target chains.
- iceprog, ecpprog, openFPGALoader (open-source) — bitstream loaders for iCE40, ECP5, and many others.
- Tigard, Bus Pirate, FT2232H breakouts — common JTAG / SPI dongles.
11. Cross-references
[[Engineering/semiconductor-devices]]— CMOS gates are paired NMOS + PMOS transistors; the device physics (V_th, R_DS(on), C_gd, C_gs) ultimately sets every digital-logic delay, power, and noise number.[[Engineering/op-amps]]— analog precursor to digital signal processing; comparators bridge the boundary; ADCs / DACs interface analog and digital.[[Engineering/circuit-analysis]]— DC analysis sets logic levels and noise margins; AC analysis sets bandwidth of interconnects and decoupling networks.[[Engineering/pcb-design]]— high-speed digital PCB design (controlled-impedance traces, ground planes, decoupling, length matching) is a specialty in itself; everything in section 7p has a PCB-design counterpart.[[Engineering/microcontrollers]]— embedded processors are large synchronous digital systems wrapped around a CPU core and peripherals.[[Engineering/fpga-design]]— extends this note’s FPGA section into the full design flow (toolchain, IP integration, debug, partial reconfiguration).[[Engineering/realtime-embedded]]— software running on digital hardware; RTOS scheduling, ISRs, DMA all sit atop the digital substrate.[[Engineering/power-electronics]]— digital control loops in motor drives and converters interact with high-power switching transients via digital-isolator and gate-driver interfaces.[[Robotics/comm-buses]]— every sensor’s digital interface (I2C, SPI, CAN, UART) is built from the building blocks here.[[Languages/Tier3/hdl]]— VHDL, SystemVerilog, Chisel, Amaranth, SpinalHDL catalogued there with idioms and examples.[[Languages/Tier3/assembly-and-encoding]]— what runs on top of the silicon; instruction set encoding is a digital-logic decode problem.[[Languages/Tier3/notation-spec]]— formal specification languages (SVA, PSL, TLA+) for property verification.[[Languages/Tier3/theorem-prover-dsls]]— formal-verification ecosystem.[[Languages/Tier3/hdl]]— for analog co-simulation of digital outputs into PCB and package parasitics.
12. Citations
- Mano, M. M., Kime, C. R. & Martin, T. (2015). Logic and Computer Design Fundamentals (5th ed.). Pearson. Standard introductory text; gates through simple CPU design.
- Wakerly, J. F. (2017). Digital Design: Principles and Practices (5th ed.). Pearson. Practical engineering focus, deep on real 74-series and CPLD/FPGA implementation.
- Harris, D. & Harris, S. (2021). Digital Design and Computer Architecture (2nd RISC-V ed.; also ARM ed.). Morgan Kaufmann. Gate-level through pipelined RISC-V processor in one volume.
- Patterson, D. A. & Hennessy, J. L. (2020). Computer Organization and Design: The Hardware/Software Interface (RISC-V ed., 2nd printing). Morgan Kaufmann. The classic for register-transfer-level processor design and quantitative trade-offs.
- Mead, C. & Conway, L. (1980). Introduction to VLSI Systems. Addison-Wesley. Foundational 1980 text that defined the modern CMOS-design methodology used to this day.
- Weste, N. H. E. & Harris, D. (2010). CMOS VLSI Design: A Circuits and Systems Perspective (4th ed.). Pearson. Standard graduate text on CMOS-cell-level design, layout, and the synthesis flow.
- Rabaey, J. M., Chandrakasan, A. & Nikolić, B. (2003). Digital Integrated Circuits: A Design Perspective (2nd ed.). Pearson. Strong on dynamic logic, low-power techniques, and on-chip interconnect.
- Kilts, S. (2007). Advanced FPGA Design: Architecture, Implementation, and Optimization. Wiley. Practical FPGA timing closure, including CDC, pipelining, retiming.
- Bhasker, J. (1999). A VHDL Primer (3rd ed.). Prentice Hall. Long-running VHDL reference.
- Sutherland, S., Davidmann, S. & Flake, P. (2006). SystemVerilog for Design (2nd ed.). Springer. SystemVerilog RTL constructs and idioms.
- Spear, C. & Tumbush, G. (2012). SystemVerilog for Verification (3rd ed.). Springer. Constrained-random verification; UVM foundations.
- Cummings, C. E. (2008). Clock Domain Crossing (CDC) Design & Verification Techniques Using SystemVerilog. SNUG San Jose paper. Standard practitioner reference for CDC; the two-FF synchroniser MTBF derivation in section 5p follows this paper’s conventions.
- Brayton, R. K., Hachtel, G. D., McMullen, C. T. & Sangiovanni-Vincentelli, A. (1984). Logic Minimization Algorithms for VLSI Synthesis. Kluwer. Espresso algorithm and follow-ons.
- IEEE Std 91-1984. IEEE Standard Graphic Symbols for Logic Functions. Standardised gate symbology.
- IEEE Std 1076-2019. IEEE Standard VHDL Language Reference Manual. Latest VHDL revision.
- IEEE Std 1800-2017. IEEE Standard for SystemVerilog — Unified Hardware Design, Specification, and Verification Language.
- JEDEC JESD8-A through JESD8-26. Logic Voltage Level Standards. Family of supply-and-threshold specifications for LVCMOS, LVDS, HSTL, SSTL, POD, SLVS-EC.
- JEDEC JESD78. IC Latch-Up Test. Specifies the I_inj and over-voltage robustness benchmark for CMOS parts.
- Xilinx UG901, UG903, UG949; Intel/Altera Quartus Handbook; Lattice Diamond / Radiant user guides. Vendor synthesis, P&R, and timing-closure guides — the canonical practical references for actual FPGA timing closure work.
- The Yosys Open Synthesis Suite and OpenROAD documentation (
yosyshq.net,theopenroadproject.org). Open-source equivalents of the commercial flows, increasingly production-grade at older process nodes.