PCIe Series — PCIe-03: The Three-Layer Model in Detail — VLSI Trainers
PCIe Series · PCIe-03

The Three-Layer Model in Detail

How all three PCIe layers fit together — TLP header fields, flow control credits, virtual channels, the ACK/NAK state machine, the replay buffer, and how the Physical Layer changes completely from Gen 1 through Gen 6 while the upper two layers stay the same.

📋 Layer Responsibilities — the Full Picture

Every PCIe port implements exactly three layers. They stack the same way regardless of whether the port is in a Root Complex, a Switch, or an Endpoint. The layers define what each piece of hardware is responsible for — they are not optional, and a design does not have to be physically partitioned this way to be spec-compliant.

Transmitter Receiver Software / Device Core — generates requests and consumes completions Provides: type, address, data, TC, length, attributes Software / Device Core — receives decoded commands or completion data Gets: command + data delivered after TL decoding Transaction Layer TX: Build TLP header + payload + optional ECRC Check FC credits · Place in VC buffer · Enforce ordering Pass TLP to DLL when credits OK and TLP selected by arbiter Transaction Layer RX: Check optional ECRC · Decode TLP header Route to egress (Switch) or deliver to core (Endpoint) Update FC credit counters as buffers drain Data Link Layer TX: Assign 12-bit Sequence Number · Calculate LCRC Copy TLP into Replay Buffer · Forward to Physical Layer Generate ACK/NAK and Flow Control Update DLLPs Data Link Layer RX: Check LCRC · Validate Sequence Number Good → ACK DLLP + pass TLP up · Error → NAK DLLP + drop Consume FC DLLPs and PM DLLPs from link partner Physical Layer Gen 1/2: Add STP/SDP/END chars + 8b/10b encode + scramble Gen 3–5: Add sync headers + 128b/130b + stronger scramble Gen 6: Pack into 256-byte flits + FEC encode + PAM4 map All gens: Byte-stripe across N lanes → Serialise → Diff TX Also runs LTSSM for link training and power-state transitions Physical Layer CDR clock recovery → Deserialise → Lane de-skew → De-stripe Gen 1/2: 8b/10b decode → elastic buffer → strip STP/END Gen 3–5: block lock → 128b/130b decode → descramble Gen 6: FEC correction → unpack from flit → pass TLP/DLLP up Elastic buffer compensates TX/RX clock frequency difference Link diff. pairs Transaction + Data Link layers: identical Gen 1 → Gen 6  ·  Physical Layer: changes significantly each generation
Figure 1 — All three PCIe layers, transmit and receive sides. The Physical Layer is the only one that changes between generations. The Transaction and Data Link layers produce and consume the same packets whether the link is Gen 1 at 2.5 GT/s or Gen 6 at 64 GT/s.

📋 TLP Common Header — Every Field

Every TLP starts with the same first Doubleword (DW 0). It tells the receiver everything about what kind of packet is coming and how to parse the rest of the header.

TLP Common Header — DW 0 (bits 31 → 0, left to right) 31 30:29 28:24 23 22:20 19 18 17 16 15:14 13:12 11:10 9:0 R Fmt [30:29] 3DW/4DW Type [28:24] MRd/MWr/Cpl… R TC [22:20] 0=low 7=high R Attr [2] LN hint TH [17] TPH hint TD [16] ECRC here EP [15] Poisoned Attr [14:13] RO · NS AT [12:11] Addr Type R Length [9:0] — payload DW count 0x000 = 1024 DW = 4096 bytes (max) Fmt encoding (determines header size and whether payload follows) 000 → 3DW, no data (MRd) 010 → 3DW, with data (Cpl) 001 → 4DW, no data (MRd 64-bit) 011 → 4DW, with data (MWr 64-bit) Field meanings you’ll see most often TC — Traffic Class 3-bit priority (0=lowest, 7=highest). Determines which VC buffer the TLP enters. TC 0 always goes to VC0. Ordering rules apply within a single TC/VC. Gen 1–6: unchanged TD — TLP Digest Set to 1 if an ECRC field is appended after the payload. ECRC is end-to-end (survives routing through Switches, unlike LCRC which is per-hop only). Optional but recommended Attr — Attributes 3 bits total across [2] and [14:13]. Bit[1]=RO (Relaxed Ordering): can bypass posted writes ahead. Bit[0]=NS (No Snoop): hint to cache hardware to skip snoop. Used by GPUs and high-BW DMA Length — Payload Size 10-bit field, value in DWs. 0x001 = 1 DW = 4 bytes. 0x3FF (1023) = 1023 DW. 0x000 = 1024 DW = 4096 bytes (maximum payload — wraps). No payload for MRd, Msg, Cpl
Figure 2 — TLP first Doubleword. All TLP types share this same first DW. Colour coding: blue = transaction type/size info, green = QoS, orange = attributes and AT, purple = ECRC flag, red = poisoned data flag. Reserved bits always read as 0.

📋 Complete TLP on the Wire

By the time a TLP reaches the physical link it has grown from what the Transaction Layer built. Each layer below adds its own fields:

Transaction Layer builds: Header (3 or 4 DW) Data Payload (0–1024 DW) — optional ECRC* * optional, TD=1 Data Link Layer adds: SeqNo Header Payload ECRC LCRC Physical Layer adds (Gen 1/2 example): STP SeqNo Header Payload ECRC LCRC END Gen 6: instead of STP/END chars, the whole packet (SeqNo+Hdr+Payload+ECRC+LCRC) is packed into one or more 256-byte flits with FEC parity blocks appended
Figure 3 — TLP assembly across all three layers. Row 1 = what the Transaction Layer builds. Row 2 = after the Data Link Layer adds its SeqNo (12-bit) and LCRC (32-bit CRC). Row 3 = after the Physical Layer adds framing (STP/END for Gen 1/2, sync headers for Gen 3–5, flit packing for Gen 6). The receiver strips these fields in reverse order.
LCRC vs ECRC — the key difference. LCRC (Link CRC) is calculated fresh at every hop — when a Switch receives a TLP, it checks the LCRC. When it retransmits the TLP, it calculates a new LCRC for that outgoing link. ECRC (End-to-End CRC, controlled by the TD bit) is calculated by the original sender and only checked by the final destination. This means ECRC can detect errors that happen inside a Switch’s internal forwarding path — which LCRC cannot, because LCRC is stripped and recalculated at the switch boundary.

📋 Flow Control — Credit-Based

PCIe uses a credit-based flow control scheme. The receiver advertises its available buffer space to the transmitter as credits. The transmitter tracks those credits and only sends a TLP when enough credits are available. As the receiver processes TLPs and frees buffer space, it returns credits to the transmitter via FC Update DLLPs.

This is not a stop-and-wait scheme — the transmitter can send continuously as long as credits allow. There is no back-pressure signal. If credits run out, the transmitter stalls and waits for a FC Update DLLP to arrive.

Flow Control Credit Loop — Runtime Operation Transmitter FC Credit Counters PH=12 · PD=48 · NPH=4 · NPD=4 · CPLH=∞ · CPLD=∞ Before sending: ① Check: credits ≥ required for this TLP ② Yes → send TLP, decrement counters ③ No → stall, wait for FC Update DLLP ④ On FC Update: increment counters + retry TLP (consumes credits) FC Update DLLP (returns credits) sent when VC buffer drains Receiver VC Buffers P-Hdr P-Data NP-Hdr NP-D TLPs processed by device core freed buffer space → trigger FC Update DLLP to transmitter Completion buffers always = ∞ (must always accept completions) Credit sizes 1 header credit = space for 1 TLP header 1 data credit = 4 bytes of payload Max header credit: 8 bits (0–127, 0=infinite) Max data credit: 12 bits (0–2047, 0=infinite) FC Update DLLPs can always be sent regardless of credit state — they bypass TLP flow control to prevent deadlock
Figure 4 — Flow control credit loop. Credits flow in the opposite direction to TLPs: TLPs consume credits, FC Update DLLPs return them. The transmitter’s counters track what the receiver has available — it never sends more than the receiver can hold.

📋 The Six Credit Types

Credits are tracked separately for three TLP categories, and within each, header and data are counted independently — six pools total.

Abbrev.Covers1 unit = ?Infinite allowed?Why?
PHPosted Header (MWr, Msg headers)1 TLP headerNoBuffer must have space for the header before accepting the TLP
PDPosted Data (MWr payload, MsgD payload)4 bytesNoBuffer must have space for the payload bytes
NPHNon-Posted Header (MRd, IORd/Wr, CfgRd/Wr headers)1 TLP headerNoNon-posted requests are held in the NP buffer until a completion returns
NPDNon-Posted Data (IOWr, CfgWr payloads only)4 bytesNoSmall — only IOWr and CfgWr carry NP data
CPLHCompletion Header (Cpl, CplD headers)1 TLP headerYes — must be ∞Deadlock prevention — a device that sent a read request must always be able to receive its completion
CPLDCompletion Data (CplD payload)4 bytesYes — must be ∞Same reason — completion data must always have space in the requester’s receive buffer
The deadlock scenario — why CPLH/CPLD must be infinite. Suppose an NVMe SSD has sent 64 read requests. The RC now holds 64 completions waiting to be sent back. But the SSD’s receive buffer is full of other posted writes. Those writes are waiting because the SSD’s core is busy. The SSD’s core is busy waiting for its read completions. If the RC needed completion credits to send CplD TLPs, the whole system would freeze — no one can move. PCIe prevents this by mandating that endpoints always advertise infinite CPLH/CPLD, meaning the RC can always send completions back regardless of the endpoint’s state.

📋 Credit Initialisation — The DLCMSM

Credits are exchanged before any TLP can flow. This happens automatically in hardware after Physical Layer link training completes. The Data Link Control and Management State Machine (DLCMSM) runs the process.

DLCMSM — Data Link Control & Management State Machine DL_Inactive DL_Down reported to upper layers waiting for Physical Layer ready LinkUp=1 from LTSSM FC_Init1 Both sides send InitFC1 DLLPs P→NP→CPL (repeat until seen) received other side’s Init1 FC_Init2 Both sides send InitFC2 DLLPs confirms values were received OK FC_Init complete DL_Active DL_Up reported TLPs may now flow InitFC1 DLLP — 8 bytes total (6 bytes content + 2 bytes CRC) Type [3:0] VC ID [2:0] HdrFC [7:0] DataFC [11:0] CRC16 Type field values for FC DLLPs 0100 = InitFC1 Posted 0101 = InitFC1 Non-Posted 0110 = InitFC1 Completion 1100 = InitFC2 Posted 0010 = UpdateFC Posted HdrFC = header credit value · DataFC = data credit value · 0x00 in HdrFC or DataFC = infinite credits
Figure 5 — DLCMSM states and FC initialisation. The DLCMSM starts in DL_Inactive after reset. Once the Physical Layer’s LTSSM raises LinkUp, it enters FC_Init1. Both sides simultaneously advertise their buffer sizes using InitFC1 DLLPs. After receiving the other side’s values, each side sends InitFC2 and transitions to DL_Active — at which point TLPs can flow for the first time.
FC_Init1 is sent repeatedly — not just once — because a single DLLP could get corrupted during the noisy early moments after link training. The spec requires continuing to send InitFC1 DLLPs until the partner’s values have been received reliably. Only then does the link transition to FC_Init2.

📋 Virtual Channels and Traffic Classes

The Traffic Class (TC) field in the TLP header is a 3-bit priority selector. The hardware maps TCs to Virtual Channels (VCs) — physical buffer partitions — and uses an arbiter to decide which VC’s packets are sent next.

TC → VC Mapping and Arbitration TLP TC Field TC 7 (highest) TC 6 TC 5 … TC 1 TC 0 (default) TC 0 always → VC0 cannot be changed VC Transmit Buffers VC2 — High Priority receives TC 7, TC 6 VC1 — Medium Priority receives TC 5 … TC 1 VC0 — Default (mandatory) always receives TC 0 VC Arbiter strict priority or weighted RR SW-configurable Link Key Rules • VC0 is the only mandatory VC • TC 0 must always map to VC0 • TC 1–7 can map to any VC • Ordering enforced per TC/VC • Cross-TC ordering undefined • VC1–7 only init when enabled • Real systems mostly use VC0 (only storage/video uses more) • Gen 6: same mapping, unchanged Example: Camera stream → TC 7 → VC2 (guaranteed BW) · Background DMA → TC 0 → VC0 (best effort)
Figure 6 — TC→VC mapping. The TC field in the TLP header selects a priority level. Hardware maps TCs to VC buffers according to a software-programmed TC-VC map. The VC arbiter selects which VC sends next — higher VCs can be given strict priority to guarantee bandwidth for time-sensitive traffic.

📋 Transaction Ordering Rules

Within a Virtual Channel, TLPs normally exit in order. The ordering rules define the exceptions — when a TLP waiting behind another may “pass” it. These rules exist to prevent deadlocks and to match the ordering guarantees PCI software already depends on.

New TLP (waiting) \ TLP ahead in queue Posted Write Non-Posted Read Completion Non-Posted Write
Posted Write No — must not pass Yes — may pass Yes — may pass Yes — may pass
Non-Posted Read Yes — may pass No — must not pass Yes — may pass No — must not pass
Completion Yes — may pass Yes — may pass No — must not pass Yes — may pass
Non-Posted Write Yes — may pass No — must not pass Yes — may pass No — must not pass

The single most important rule: Posted writes may not pass posted writes. Memory write ordering is the foundation of DMA correctness — if a CPU writes a “data ready” flag after writing the data, the flag write must arrive after the data write.

The Relaxed Ordering (RO) attribute bit in the TLP header overrides these rules when set — a TLP with RO=1 is allowed to bypass posted writes ahead of it. GPUs and high-bandwidth DMA engines use this to increase throughput when out-of-order delivery is safe for that particular transfer type.

📋 Data Link Layer — ACK/NAK State Machine

The Data Link Layer’s core job is to make the unreliable physical link look reliable to the Transaction Layer above. It does this with Sequence Numbers, LCRC, and the ACK/NAK retry protocol. The whole thing is hardware — no software is involved, and the Transaction Layer never knows it happened.

ACK/NAK — Transmit Side State Machine IDLE No unACKed TLPs in buffer TLP from TL SENDING Assign SeqNo Append LCRC → PL Copy to replay buffer TLP sent WAITING Awaiting ACK/NAK Replay timer running May send more TLPs ACK DLLP received ACK Flush replay ≤ SeqN SeqN from ACK DLLP no more TLPs → back to IDLE NAK or timeout NAK / Timeout Replay all unACKed TLPs from buffer re-enter SENDING with oldest unACKed TLP Replay Limit 4 failed replay attempts → escalate to PL error LTSSM enters Recovery → link may reset Sequence Number Rules • 12 bits wide (0–4095) · wraps around after 4095 • ACK(N) = cumulative: “flush everything ≤ N from replay buffer” · NAK(N) = “replay from N onwards”
Figure 7 — ACK/NAK transmit state machine. Every sent TLP awaits an ACK DLLP. ACK flushes the replay buffer up to that sequence number (cumulative). NAK or timeout triggers replay of all unacknowledged TLPs. After 4 failed replays the error escalates to the Physical Layer — the LTSSM may reset the link.

📋 Replay Buffer Mechanics

The replay buffer is the insurance policy of the Data Link Layer. Every TLP that leaves the transmitter’s Data Link Layer is copied into the replay buffer before being handed to the Physical Layer. It stays there until an ACK DLLP arrives confirming safe delivery at the neighbour.

Replay Buffer — Content at a Point in Time Replay Buffer (transmitter side) TLP SeqN=0 ← ACK arrived TLP SeqN=1 ← ACK arrived TLP SeqN=2 waiting ACK TLP SeqN=3 waiting ACK empty ACK(1) received → flush SeqN 0 and 1 Still in buffer awaiting ACK(2) Replay Buffer Rules • Holds copy of every sent TLP until ACK received • ACK(N) is cumulative — flushes all ≤ N in one go • On NAK(N): replay from N, not just N (later TLPs may have been discarded by receiver) • Min size: enough to cover 1 round-trip latency (at Gen 6 64 GT/s this is much larger than Gen 1) • Gen 6: replay adapted to flit level (whole flits replayed)
Figure 8 — Replay buffer contents. Sequence numbers 0 and 1 have been ACKed and can be freed. Numbers 2 and 3 are still in the buffer awaiting acknowledgement. ACK(N) means “flush everything up to and including N” — it takes only one ACK DLLP to free multiple TLPs.

📋 DLLP Types

DLLPs are always 8 bytes: 6 bytes of content + 2 bytes of 16-bit CRC. They are created and consumed only within the Data Link Layer — never seen by the Transaction Layer, never routed by Switches.

DLLP TypeDirectionPurpose
ACKRX port → TX portCumulative acknowledgement up to SeqNum N — transmitter may flush replay buffer ≤ N
NAKRX port → TX portError signal at SeqNum N — transmitter replays from N onwards
UpdateFC_PRX → TXReturns Posted (PH/PD) flow control credits as the receiver drains its buffers
UpdateFC_NPRX → TXReturns Non-Posted (NPH/NPD) flow control credits
UpdateFC_CplRX → TXReturns Completion (CPLH/CPLD) flow control credits — typically always infinite
InitFC1_P/NP/CplBoth (simultaneous)Advertise initial buffer sizes during FC initialisation (sent in order: P → NP → Cpl)
InitFC2_P/NP/CplBoth (simultaneous)Confirm receipt of partner’s InitFC1 values — transitions link to DL_Active
PM_Enter_L1Downstream → UpstreamRequest entry into L1 ASPM power state
PM_Enter_L23Either directionRequest entry into L2/L3 power state
PM_Request_AckUpstream → DownstreamAcknowledge the power management request from downstream device

📋 Physical Layer Gen 1/2 — 8b/10b Logical Sub-block

Gen 1 (2.5 GT/s) and Gen 2 (5.0 GT/s) use 8b/10b encoding. Every 8-bit byte maps to a unique 10-bit symbol on the wire — 20% overhead, but it delivers three critical properties the link needs.

Gen 1/2 — Transmit Pipeline (8b/10b path) DLL TLP/DLLP with SeqNo and LCRC Add Framing STP (start TLP) SDP (start DLLP) END (end packet) Scramble LFSR reduces EMI patterns (data only) 8b/10b Encode 8 bits → 10 bits +20% overhead DC balance · CDR lock Byte Stripe bytes split across N lanes (x1, x4, x16…) Serialise + Diff TX 2.5 GT/s (Gen 1) 5.0 GT/s (Gen 2) NRZ · 2 voltage levels Receive path (exact reverse): CDR → Deserialise → Elastic Buffer (clock tolerance) → 8b/10b Decode → Descramble → Strip STP/END → pass TLP to DLL Elastic Buffer compensates for the ±300 ppm clock frequency difference between two connected devices K-characters: special 10-bit symbols with no 8-bit equivalent — used for STP/SDP/END/IDLE ordered sets
Figure 9 — Gen 1/2 transmit pipeline. The 20% encoding overhead is real cost, but 8b/10b provides guaranteed DC balance and sufficient transition density for CDR lock — both essential properties for a high-speed serial link. K-characters enable unambiguous packet framing.

📋 Physical Layer Gen 3–5 — 128b/130b Logical Sub-block

Gen 3 (8 GT/s), Gen 4 (16 GT/s), and Gen 5 (32 GT/s) all use 128b/130b encoding. The headline improvement: only 1.5% overhead instead of 20%. This is why Gen 3 at 8 GT/s delivers about the same effective throughput as Gen 2 at 10 GT/s would have — without the encoding tax.

128b/130b Block — Gen 3, 4, 5 Sync Hdr 2 bits 128 bits of data (scrambled before encoding) Total = 130 bits per block · Overhead = 2/130 = 1.5% vs 8b/10b: 20% overhead · 128b/130b recaptures most of that bandwidth Sync Header — 2 bits 01 = Data Block Block contains scrambled data bytes 10 = Control Block Block carries framing/ordered set info (replaces STP/END K-characters from 8b/10b) No K-characters — framing via sync header Receiver must achieve block lock first Scrambling mandatory and stronger than Gen 1/2 Link equalization added in Gen 3 (FIR filter coefficients negotiated during LTSSM) Gen 4/5: same encoding, higher clock rate Gen 4: 16 GT/s · Gen 5: 32 GT/s (NRZ still) Block lock: receiver must find the sync header boundary before data can be decoded (more complex than 8b/10b symbol alignment)
Figure 10 — 128b/130b block structure. 128 bits of payload, 2-bit sync header, total 130 bits. The sync header bits (01 or 10) tell the receiver whether this block carries data or control information — replacing the K-character framing of 8b/10b entirely. Scrambling is mandatory to ensure adequate CDR transitions since 128b/130b has no inherent DC balance mechanism.

📋 Physical Layer Gen 6 — PAM4 + FEC + Flit

Gen 6 (64 GT/s) is the biggest Physical Layer change since 8b/10b → 128b/130b in Gen 3. Three mechanisms work together — none of them can be used without the others at this speed.

Gen 6 — Three Interdependent Mechanisms ① PAM4 — 2 bits per symbol NRZ (Gen 5) PAM4 (Gen 6) 1 0 1 bit / symbol 11 10 01 00 2 bits / symbol Same 32 GBaud as Gen 5 NRZ → 64 GT/s effective bit rate Eye margin = 1/3 of NRZ → raw BER rises sharply → FEC mandatory to fix it ② FEC — Forward Error Correction Corrects bit errors in hardware before DLL sees them Data payload FEC parity RS(544,514) — Reed-Solomon code Detects + corrects multiple bit errors per flit block BER improvement PAM4 raw BER: ~10⁻⁶ to 10⁻⁸ After FEC: < 10⁻¹⁵ (spec target) FEC correction happens in PL before handing data to DLL — DLL sees corrected data, same as in Gen 1–5 from DLL’s view ③ Flit — 256-byte Fixed Frame Replaces STP/END/sync-header framing TLP-A TLP-B DLLP ← 236 bytes payload ────────────────────→ FEC parity (20 bytes) Total flit: 256 bytes = 236 B payload + 20 B FEC Multiple TLPs/DLLPs per flit Large TLPs span multiple flits ACK/NAK replay operates at flit granularity in Gen 6 Why fixed size? FEC works efficiently on fixed blocks.
Figure 11 — Gen 6’s three interdependent mechanisms. PAM4 doubles bandwidth at the same baud rate, but reduces voltage margins. FEC corrects the increased error rate in hardware, restoring effective BER to spec levels. Flit framing provides fixed-size blocks that make FEC encoding efficient and clean — you can’t have one without the others at this speed.
What stays the same in Gen 6 from upper layers’ perspective. From the Transaction Layer’s point of view, nothing has changed. It still builds the same TLP headers with the same fields. It still uses the same six flow control credit types. The same ACK/NAK protocol runs in the Data Link Layer. The same ordering rules apply. A device driver or firmware written for Gen 3 hardware works on Gen 6 without any modification. The entire change in Gen 6 is contained within the Physical Layer.

📋 What Changes Between Generations

Layer / FeatureGen 1/2Gen 3–5Gen 6
Transaction Layer Unchanged — same TLP formats, FC credit types, VC/TC model, ordering rules
Data Link Layer Unchanged — same ACK/NAK, LCRC, replay buffer, DLLP types Same — replay granularity adapted to flit level in Gen 6
PL — Encoding 8b/10b (20% overhead) 128b/130b (1.5% overhead) PAM4 (2 bits/symbol)
PL — Framing STP / SDP / END K-characters Sync header bits (01 or 10) 256-byte flits
PL — Error correction Detection only (LCRC) Detection only (LCRC) Mandatory FEC per flit + LCRC
PL — Equalization Simple TX pre-emphasis Explicit FIR coefficient negotiation (LTSSM) Advanced multi-tap FIR + DSP
PL — Modulation NRZ — 2 voltage levels NRZ — 2 voltage levels PAM4 — 4 voltage levels
Software view Completely unchanged — same BDF, config registers, memory map, driver model across all generations

📋 Quick Reference

ConceptKey Point
Transaction Layer roleBuild and decode TLPs; check FC credits; enforce ordering; manage VC/TC priority
TLP DW0Fmt (3/4 DW, data or not) + Type (MRd/MWr/Cpl…) + TC (priority 0–7) + TD (ECRC present) + EP (poisoned) + Attr (RO/NS) + AT + Length
ECRC vs LCRCLCRC: per-hop, mandatory, recalculated at each Switch. ECRC: end-to-end, optional, survives routing, detects intra-Switch errors.
Six credit typesPH · PD · NPH · NPD (all finite) + CPLH · CPLD (must be infinite — deadlock prevention)
1 data credit= 4 bytes of payload space in the receiver’s VC buffer
FC initialisationDLCMSM: DL_Inactive → FC_Init1 (both sides advertise) → FC_Init2 (both confirm) → DL_Active → TLPs can flow
TC/VCTC is priority label in TLP. VC is physical buffer. TC 0 always maps to VC0. Ordering rules apply within a VC.
Ordering rule #1Posted writes must not pass posted writes — guarantees DMA write ordering
Relaxed OrderingAttr bit in TLP — allows bypass of posted writes. Used by GPU scatter-gather and high-BW DMA.
Data Link Layer roleAdd 12-bit SeqNo + 32-bit LCRC; copy to replay buffer; run ACK/NAK; generate FC and PM DLLPs
ACK DLLPCumulative — “all TLPs ≤ SeqN received correctly, flush them from replay buffer”
NAK DLLP“Error at SeqN — replay from N onwards (all unACKed TLPs from that point)”
Replay limit4 consecutive failed replays → Physical Layer escalation → link may reset via LTSSM Recovery
Gen 1/2 encoding8b/10b: 20% overhead, K-characters for framing, DC balance guaranteed, max 5 identical bits in a row
Gen 3–5 encoding128b/130b: 1.5% overhead, sync header (01/10) for framing, mandatory scrambling, explicit equalization
Gen 6 PAM44 voltage levels, 2 bits/symbol, 32 GBaud = 64 GT/s, eye margin 1/3 of NRZ → raw BER rises
Gen 6 FECMandatory Reed-Solomon per flit, corrects errors in hardware before DLL, restores BER to < 10⁻¹⁵
Gen 6 Flit256-byte fixed frame: 236 B payload + 20 B FEC parity. Multiple TLPs/DLLPs per flit. Replay at flit granularity.
Coming next: PCIe-04 covers PCIe Generations Gen 1 to Gen 6 — the detailed bandwidth maths behind each generation, why Gen 3 chose 8 GT/s instead of 10 GT/s, what drove each design decision from the spec, and a deep dive into the Gen 6 flit format and FEC structure with worked numbers.
Scroll to Top