PCIe Series — You Made It — A Look Back at Everything — VLSI Trainers
PCIe Series · Complete

You Made It 🎉

A look back at every major concept, mechanism, and skill built across the full 30-post PCIe series — from the first differential pair to 64 GT/s PAM4, from a Vendor ID probe to SR-IOV VF assignment, from 8b/10b to Reed-Solomon FEC and flit mode.

30
Posts
6
Generations
3
Protocol Layers
200+
Register fields
100+
SVG Figures

PCIe is not one thing — it is a complete system. It is the physical pair of differential wires carrying a 64 GT/s PAM4 signal. It is the 128-bit header describing a memory write transaction. It is the depth-first scan algorithm that discovers every device at boot. It is the IOMMU page table that enforces DMA isolation between virtual machines. It is the SR-IOV VF assigned to a VM’s NIC in a cloud data centre. All of it connects.

This wrap-up page catalogues every post in the series, maps which concepts connect to which, and lists the practical skills you built along the way. Use it as a reference when something comes up in hardware bring-up, driver debugging, or system design — the answer is almost certainly in one of these posts.

📋 Every Post in the Series

Part 1 — Foundations and Architecture

PCIe-01
Introduction to PCIe
Why PCIe replaced PCI, the serial point-to-point model, and the ecosystem that emerged
Foundations
PCIe-02
Architecture — Topology and Components
Root Complex, Endpoints, Switches, Bridges, and how the tree topology is built
Architecture
PCIe-03
The Three-Layer Model
Transaction, Data Link, and Physical layers — what each does and what it hands to the layer above
Architecture
PCIe-04
PCIe Generations — Gen 1 to Gen 6
Bandwidth numbers, encoding evolution, and what changed in each generation at a glance
Generations

Part 2 — Transaction Layer Protocol (TLPs)

PCIe-05
TLP Structure — Common Header Fields
Fmt, Type, TC, TD, EP, Attr, AT, Length — every DW0 field and what it controls
TLP
PCIe-06
Memory Read and Write TLPs
3DW vs 4DW headers, MRd split transaction, MWr posted, byte enables, and addressing
TLP
PCIe-07
Completion TLPs
Cpl and CplD, completion status codes (SC/UR/CA/CRS), byte count, lower address, and tag matching
TLP
PCIe-08
Configuration TLPs
Type 0 vs Type 1 config accesses, BDF routing, and how configuration reads reach downstream devices
TLP
PCIe-09
Message TLPs
INTx Assert/Deassert, PME, error signalling, slot power, vendor-defined messages, and routing rules
TLP
PCIe-10
TLP Ordering Rules
Posted vs non-posted ordering table, Relaxed Ordering, IDO, No Snoop, and why order matters for DMA
TLP
PCIe-11
Address Translation — AT Field, ATS, PRI
Default vs Translated vs Translation Request AT field values and how ATS caches IOMMU translations
TLP

Part 3 — Data Link and Flow Control

PCIe-12
Data Link Layer — DLLPs and Reliability
LCRC, Sequence Number, ACK/NAK protocol, Replay Buffer, and REPLAY_NUM/REPLAY_TIMER
DLL
PCIe-13
Flow Control
FC credits (P/NP/Cpl × Header/Data), InitFC/UpdateFC DLLPs, infinite credits, and credit starvation
DLL

Part 4 — Physical Layer

PCIe-14
Physical Layer — Lanes, Differential Signalling and Electrical
Lane structure, differential pairs, common mode voltage, electrical idle, lane reversal and polarity inversion
Physical
PCIe-15
8b/10b Encoding
Running disparity, K-codes, DC balance, clock recovery, and why 8b/10b was replaced at Gen 3
Physical
PCIe-16
128b/130b Encoding (Gen 3+)
Sync header, data/ordered-set blocks, scrambling, LFSR, and the 1.54% overhead advantage over 8b/10b
Physical
PCIe-17
Link Training and LTSSM
All 11 LTSSM states, TS1/TS2 ordered sets, equalization phases, Recovery, L-state transitions
Physical
PCIe-30
PCIe 5.0, 6.0 — PAM4, FEC and What Changed
NRZ wall, PAM4 four-level signalling, Reed-Solomon FEC, Gen 6 flit-based Transport Layer, retimers
Physical

Part 5 — Configuration Space

PCIe-18
Configuration Space — Type 0 Header
All 16 DWs of the endpoint header: Vendor/Device ID, Command/Status, Class Code, BARs, capabilities pointer
Config
PCIe-19
Configuration Space — Type 1 Header (Bridge)
Primary/Secondary/Subordinate bus numbers, I/O and memory Base/Limit windows, Bridge Control
Config
PCIe-20
Base Address Registers (BARs)
Memory vs I/O BARs, 32 vs 64-bit, prefetchable, sizing algorithm (write FFFFh, read back), ROM BAR
Config
PCIe-21
Capability Structures
PCI capability linked list, PM, PCIe Capability, MSI, MSI-X — structure and key register fields
Config
PCIe-22
Extended Configuration Space
ECAM, 4 KB per function, 32-bit extended cap header, AER, VC, DSN, ATS, PASID, LTR, L1SS, ACS, VSEC
Config

Part 6 — System Software and OS Interaction

PCIe-23
PCIe Enumeration
Depth-first scan, BDF assignment, FFh Subordinate placeholder, BAR sizing, bridge window programming, enable sequence
System SW
PCIe-24
Interrupts — INTx, MSI, and MSI-X
Virtual wire Assert/Deassert TLPs, INTx swizzling, MSI APIC write, MSI-X table in BAR, configuration sequences
System SW
PCIe-25
Power Management — ASPM and Link States
L0 through L3, L0p (Gen 6), L0s entry/exit with FTS, L1 DLLP handshake, L1.1/L1.2 sub-states, ASPM policy
System SW
PCIe-26
Device Power States — D0 to D3
D0 Uninitialized vs Active, D1/D2 optional, D3hot with No Soft Reset, D3cold, PME, PMCSR, Vaux
System SW
PCIe-29
DMA and IOMMU
Bus Master Enable, MWr/MRd DMA TLPs, IOVA translation, IOMMU page tables, domains, ATS, PASID, VFIO
System SW

Part 7 — Error Handling and Reliability

PCIe-27
Advanced Error Reporting (AER)
Correctable vs uncorrectable vs fatal taxonomy, all bit positions, Header Log, First Error Pointer, ECRC, data poisoning, advisory non-fatal
Reliability

Part 8 — Virtualisation

PCIe-28
SR-IOV — Single Root I/O Virtualisation
PF vs VF, SR-IOV Extended Cap (0010h), TotalVFs/NumVFs/VF Enable, VF BDF formula, ARI, VF BARs, ACS, IOMMU
Virtualisation

📋 How the Concepts Connect

PCIe is layered, but the layers are not independent — every layer depends on mechanisms in the layers below it, and every system feature connects to both hardware and software.

Physical Layer feeds into Data Link Layer

  • 8b/10b K-codes delimit ordered sets → DLL recognises packet boundaries
  • 128b/130b sync headers identify data vs ordered-set blocks → DLL uses this to separate TLPs from DLLPs
  • Gen 6 flit boundaries (256 B) → DLL ACK/NAK operates at flit granularity, not TLP granularity
  • RS-FEC corrects errors before LCRC check → DLL replay is rarely triggered on Gen 5/6
  • LTSSM L0s/L1 states → DLL marks link as DL_Down/DL_Up accordingly

Data Link Layer feeds into Transaction Layer

  • Flow control credits (P/NP/Cpl × H/D) gate when TLPs can be sent
  • ACK/NAK on sequence numbers triggers replay, not TLP retry
  • DL_Active status must be true before TLPs can flow
  • DLLP types (FC, ACK/NAK, PM) are invisible to the Transaction Layer

TLP format and routing

  • Fmt+Type together select the TLP type (PCIe-05)
  • Address in header → Memory routing (MRd/MWr go to matching BAR)
  • BDF in header → Config and Completion routing
  • TC field → mapped to VC by VC Capability (PCIe-22) → separate FC credit pools
  • AT field → IOMMU treatment (PCIe-11, PCIe-29)
  • PASID TLP Prefix → per-process IOMMU context selection

Configuration space and enumeration

  • Vendor ID read = presence probe (PCIe-23) → uses Config TLP (PCIe-08)
  • Header Type byte → bridge vs endpoint decision in depth-first scan
  • BAR sizing writes → determines MMIO window (PCIe-20)
  • Bridge windows programmed bottom-up (PCIe-23) → Type 1 header registers (PCIe-19)
  • Capability pointer → walks to MSI/MSI-X (PCIe-24) and PM (PCIe-26)
  • Extended Cap list (100h+) → AER (PCIe-27), L1SS (PCIe-22), SR-IOV (PCIe-28)

Interrupts depend on TLPs and config

  • INTx Assert/Deassert are Message TLPs (Local routing) — PCIe-09, PCIe-24
  • MSI = MWr TLP to APIC address — address/data programmed in MSI capability
  • MSI-X = MWr TLP per vector — address/data in MMIO BAR table (needs BAR and Memory Space Enable)
  • Interrupt Disable bit must be set before enabling MSI/MSI-X
  • Bus Master Enable must be off during interrupt setup or MSI write could fire before vector is registered

Power management connects D-states and L-states

  • D0 Active → link free to use L0, L0s, L1 ASPM (PCIe-25)
  • D0 Active + Gen 6 → L0p in-band bandwidth reduction possible
  • D1/D2/D3hot → PM_Enter_L1 DLLP → link forced to L1 (PCIe-25, PCIe-26)
  • D3cold → L2/L3 Ready handshake → power removed → L2 or L3 (PCIe-25, PCIe-26)
  • LTR messages (PCIe-22) gate L1.2 entry — device reports latency tolerance

SR-IOV, ACS, and IOMMU are inseparable

  • SR-IOV PF contains the VF-creation registers (PCIe-28)
  • ACS P2P Request Redirect forces VF DMA upstream through IOMMU (PCIe-22, PCIe-28)
  • IOMMU domain per VM — page tables translate VF’s IOVA to physical pages (PCIe-29)
  • ATS (PCIe-11, PCIe-22) caches IOMMU translations in device for high-throughput DMA
  • PASID (PCIe-22, PCIe-29) gives each compute kernel its own IOMMU address space
  • ARI (PCIe-28) extends function numbering to 256 per bus — needed for >8 VFs

AER connects to every layer

  • Correctable errors originate at Physical Layer (LCRC, receiver error) and DLL
  • Uncorrectable errors include TLP-level (Malformed TLP, UR, Poisoned TLP, ECRC)
  • Header Log captures the TLP header — needs Transaction Layer knowledge to decode
  • Error messages are Message TLPs (Route-to-Root) — needs TLP routing knowledge
  • AER Extended Capability (PCIe-22) stores the registers — needs ECAM to read
  • Error Source ID (Root Complex register) gives the BDF — needs BDF/enumeration knowledge

Practical Skills Built in This Series

Completing the series means you can do all of the following from first principles — without looking up basic definitions:

Hardware Bring-Up

  • Read a link status register and interpret the negotiated width and speed
  • Identify which LTSSM state a link is stuck in from a logic analyser trace
  • Explain why a link trains to Gen 3 instead of Gen 5 (equalization failure, missing FEC support, retimer needed)
  • Decode a TLP header from a PCIe protocol analyser trace (Fmt, Type, TC, Address, Length)
  • Size a BAR by writing 0xFFFFFFFF and reading back, then determine if it’s 32-bit or 64-bit
  • Distinguish a completion timeout from a completer abort from a UR in LCRC-captured data

Driver Development

  • Walk the PCI capability linked list to find MSI and MSI-X capability structures
  • Configure MSI-X with per-vector APIC targeting for NUMA-aware IRQ assignment
  • Drain Transactions Pending before writing D3hot to PMCSR
  • Handle D3hot → D0 resume with and without No Soft Reset bit
  • Map and unmap DMA buffers correctly, with IOTLB invalidation on unmap
  • Configure ASPM safely by reading Acceptable Latency and comparing to path latency
  • Enable Bus Master only after IOMMU domain is configured

System Architecture

  • Calculate the maximum bandwidth for a given link width and generation
  • Design a PCIe switch topology for a multi-GPU server with correct BAR allocation headroom
  • Choose between INTx, MSI, and MSI-X for a given device class and OS environment
  • Configure SR-IOV VF count, ARI, and IOMMU domains for a multi-tenant cloud system
  • Explain why Gen 6 requires retimers on standard add-in card channels
  • Design an ACS policy that allows SR-IOV device assignment while blocking P2P DMA escapes

Error Investigation

  • Read Root Error Status to find the source BDF of an AER error interrupt
  • Decode the Header Log to identify the address, requester, and TLP type of the guilty TLP
  • Use the First Error Pointer to find the first uncorrectable error when multiple bits are set
  • Distinguish data poisoning from ECRC failure from Malformed TLP
  • Recognise advisory non-fatal errors and why they generate ERR_COR instead of ERR_NONFATAL
  • Trace a completion timeout to its root cause (device offline, IOMMU fault, or routing error)

Security and Virtualisation

  • Explain why unrestricted DMA (no IOMMU) is a critical security vulnerability
  • Configure an IOMMU domain with correct page table mapping for VF passthrough
  • Verify that ACS is enabled on all switch ports before SR-IOV device assignment
  • Use PASID for per-process IOMMU isolation in shared GPU/accelerator scenarios
  • Explain what PCIe IDE adds to DMA security and how it relates to FEC in Gen 6
  • Configure VFIO correctly including IOMMU group validation and interrupt remapping

Gen 6 Specifics

  • Explain why NRZ fails at 32 GT/s and why PAM4 was chosen
  • Describe how Reed-Solomon FEC corrects PAM4 symbol errors before the Data Link Layer sees them
  • Explain flit mode: how 256-byte flits pack multiple TLPs and why per-TLP LCRC is gone
  • Identify which Gen 6 features are visible to software (L0p capability, Physical Layer 64 GT/s cap, IDE) vs transparent (flit mode, FEC)
  • Explain why ATS, PASID, and IOMMU are more critical at Gen 6 than at Gen 3/4

📋 What to Read Next

The series covered PCIe from the spec perspective. The natural next areas to explore — all of which build directly on what you learned here:

Thank you for following the full series. PCIe is one of the most deeply-specified interconnects ever built — 1600+ pages of spec, backward compatible from 2003 to 2022, and still doubling bandwidth every three to four years. You now understand not just what it does but why every decision was made the way it was. That knowledge transfers to every PCIe-based system you will ever work on — and given that virtually every modern compute device uses PCIe in some form, that means almost everything.
Scroll to Top