PCIe Series — PCIe-16: 128b/130b Encoding (Gen 3+) — VLSI Trainers
PCIe Series · PCIe-16

128b/130b Encoding (Gen 3+)

Why 8b/10b was abandoned at Gen 3, how 128b/130b works — 16-byte data blocks, the 2-bit sync header, block types, framing tokens, the Gen 3 scrambler, block alignment, and how the encoding carries through Gen 4, Gen 5, and into Gen 6’s flit model.

📋 Why 8b/10b Was Replaced

Gen 3 was the first PCIe generation to double bandwidth without doubling frequency. Gen 1 ran at 2.5 GT/s, Gen 2 at 5 GT/s. Simply doubling to 10 GT/s for Gen 3 was considered impractical — the signal conditioning required at 5 GHz Nyquist frequency would demand expensive board materials and aggressive equalization that would price PCIe out of mainstream use.

The solution came from a different direction: keep the frequency increase modest (5 GT/s to 8 GT/s is only 60% more) and reclaim the 20% overhead that 8b/10b encoding had been wasting. The result is Gen 3 at 8 GT/s with ~98.5% efficiency — delivering approximately the same useful throughput as a hypothetical 10 GT/s system with 8b/10b.

The Two Ways to Double Bandwidth — Gen 3 Chose the Second Option A — Double the Frequency Gen 2: 5 GT/s with 8b/10b → 500 MB/s per lane Option A: 10 GT/s with 8b/10b → ~800 MB/s per lane Problems: expensive laminates, aggressive equalization, high power Board infrastructure incompatible — not backward compatible at system level Option B — Drop 8b/10b (chosen for Gen 3) Gen 2: 5 GT/s with 8b/10b → 500 MB/s per lane Gen 3: 8 GT/s with 128b/130b → ~984 MB/s per lane Only 60% more frequency, but nearly 100% more throughput Existing connectors and board stackups still usable with better laminates
Figure 1 — Two paths to doubling Gen 2 bandwidth. Option A (10 GT/s with 8b/10b) would require prohibitively expensive signal integrity measures. Option B (8 GT/s with 128b/130b) recovers the 20% overhead while keeping frequency increases manageable. Gen 3 chose Option B.

Three specific problems with simply doubling frequency drove this decision: higher frequencies require far more expensive PCB laminate materials due to dielectric loss; signal conditioning logic (equalization) at 5 GHz+ is complex and power-hungry; and the existing PCIe connector and board infrastructure designed for 5 GT/s would have needed complete redesign for 10 GT/s signals.

📋 The Bandwidth Maths

The arithmetic behind 128b/130b is simple but important to understand precisely:

Encoding Efficiency Comparison — Per Lane Gen 2 with 8b/10b Raw rate: 5.0 GT/s Overhead: 2 bits per 10 = 20% Efficiency: 8/10 = 80% 500 MB/s per lane = 5.0 × 0.80 / 8 = 4.0 Gbps / 8 = 500 MB/s Gen 3 with 128b/130b Raw rate: 8.0 GT/s Overhead: 2 bits per 130 = ~1.54% Efficiency: 128/130 = ~98.46% ~984 MB/s per lane = 8.0 × 0.9846 / 8 = 7.877 Gbps / 8 ≈ 984 MB/s Why it matters Frequency increase: 5→8 GT/s = +60% Useful bandwidth increase: +97% Nearly double throughput from only 60% more line rate x16 Gen 3: 32 × 984 = ~31.5 GB/s x16 Gen 2: 32 × 500 = ~16 GB/s
Figure 2 — Bandwidth maths. Gen 2 at 5 GT/s with 8b/10b delivers 500 MB/s useful data per lane. Gen 3 at 8 GT/s with 128b/130b delivers ~984 MB/s — nearly double — despite only 60% more raw frequency. The efficiency jump from 80% to 98.5% is what makes this possible.

📋 128b/130b Block Structure

128b/130b groups 128 bits (16 bytes) of data into a single block and prepends a 2-bit sync header. This gives 130 bits on the wire for every 128 bits of payload — hence the name. Unlike 8b/10b which operates symbol-by-symbol, 128b/130b operates block-by-block.

128b/130b Block — Sync Header + 16 Data Bytes Sync 2 bits 01 = data 10 = ord.set 128-bit Payload — 16 bytes of data or ordered set content Scrambled (mandatory). Content depends on block type. For data blocks: STP/SDP tokens + TLP/DLLP bytes + IDL padding For ordered set blocks: 16 bytes of the ordered set pattern (same on all lanes) 2 bits 128 bits (16 bytes) = 130 bits total
Figure 3 — A 128b/130b block. The 2-bit sync header is transmitted first, followed by 128 bits (16 bytes) of payload. The sync header tells the receiver whether to interpret the following 128 bits as a data block (TLPs, DLLPs, idle) or an ordered set block. All 128 payload bits are scrambled.

The block boundary is fundamental to how Gen 3 framing works. Unlike 8b/10b where any K-code can appear between data characters, 128b/130b has no K-codes in the middle of data. Instead, the block structure defines what kind of content is present. A receiver must first achieve block alignment — knowing exactly where each 130-bit block starts in the serial bitstream — before it can interpret any data.

📋 The 2-bit Sync Header

The sync header is always exactly 2 bits, transmitted first in the block. Only two values are defined:

Sync Header Values — Only Two Are Defined 01 Data Block The 128-bit payload contains TLPs, DLLPs, idle bytes, or framing tokens. All bytes are scrambled. Receiver strips the 2-bit header before passing to the Data Link Layer. Most blocks in a working link will be Data Blocks. 10 Ordered Set Block The 128-bit payload contains 16 bytes of an ordered set (SKIP, EIEOS, EIOS, SDS, etc). Ordered set blocks are NOT scrambled. They must appear on all active lanes simultaneously — used for deskew and alignment.
Figure 4 — Sync header values. 01 means data block (scrambled, contains TLPs/DLLPs). 10 means ordered set block (not scrambled, same content on all lanes). The values 00 and 11 are illegal — if a receiver detects an illegal sync header it reports a block alignment error and initiates link retraining.
Why 01 and 10 instead of 0 and 1? With a 2-bit header, there are four possible values: 00, 01, 10, 11. The spec defines only 01 (data) and 10 (ordered set). The two unused values — 00 and 11 — are specifically chosen because they cannot arise from bit errors on valid sync headers without detectable disparity in the header itself. A single bit flip on 01 gives either 00 or 11 (both illegal, detectable) or 11 (illegal, detectable). This gives the sync header a self-detecting quality for single-bit errors.

📋 Block Types — Data vs Ordered Set

The sync header divides blocks into two categories with fundamentally different behaviour:

PropertyData Block (sync=01)Ordered Set Block (sync=10)
ContentTLP bytes · DLLP bytes · STP/SDP framing tokens · IDL padding · EDB · EDS tokensOne of the defined ordered sets: SKIP, EIEOS, EIOS, SDS, FTS
ScramblingYes — all 128 payload bits are scrambled with the LFSRNo — sent in the clear, receiver must see exact patterns for alignment
Lane requirementContent may differ per lane (each lane carries its stripe of the packet)Same on all active lanes simultaneously — required for lane-to-lane deskew
Data Stream contextPart of the Data Stream — TLPs and DLLPs flow in data blocksInterrupts the Data Stream when it is something other than SKIP; SKIP ordered sets may appear within a Data Stream without interrupting it
How framedSTP token marks TLP start; SDP marks DLLP start; length from STP token counts endEntire block is the ordered set — no separate framing tokens needed

📋 Framing Tokens in Data Blocks

In 8b/10b, K-codes (STP, SDP, END, EDB) are control characters that mark packet boundaries. In 128b/130b there are no K-codes — framing is done instead by special Framing Tokens embedded within data blocks. These tokens are specific byte patterns that the receiver recognises within the 128-bit payload of a data block.

Five Framing Tokens — Replace 8b/10b K-codes in Data Blocks STP Start TLP 2-byte token Nibble of 1111 + 11-bit length + 4-bit CRC + parity bit Includes TLP length SDP Start DLLP 2-byte token Fixed pattern No length field DLLPs always 8 bytes fixed No end marker EDB End Bad Packet 4-byte token (4× EDB bytes) Appended to nullified TLPs LCRC inverted Switch cut-through EDS End Data Stream Last symbol of a Data Stream Signals upcoming non-SKIP OS block Does not end stream if next block is SKIP IDL Logical Idle All-zero bytes Fills data blocks when no TLPs or DLLPs to send Scrambled like all data content
Figure 5 — Five framing tokens embedded inside data blocks. STP now includes the TLP’s full DW length count (11-bit field), allowing the receiver to find the end of a TLP by counting DWs rather than looking for an END K-code. SDP marks DLLPs; no end marker is needed since DLLPs are always 8 bytes. EDS marks the end of a data stream before an ordered set block interrupts it.

STP token — the key improvement over 8b/10b STP

In 8b/10b, the TLP framing was: STP K-code at start, END K-code at end. The receiver had to wait to see the END to know where the TLP finished. In 128b/130b the STP token includes the TLP’s complete DW count as an 11-bit length field. The receiver can calculate exactly where the TLP ends from the moment it sees the STP. This enables faster cut-through forwarding at switches and simpler receiver logic.

The STP length field also has a 4-bit Frame CRC and an additional parity bit to protect against errors in the length itself — an error in the length field would cause the receiver to misalign on all subsequent packet boundaries until recovery. The triple-bit-flip detection capability of this combined protection makes the length field very robust.

📋 No K-Codes — What Replaced Each One

Every K-code from 8b/10b that had a function in PCIe Gen 1/2 is replaced in Gen 3 by either a framing token or an ordered set mechanism:

8b/10b K-codeFunctionGen 3 replacement
COM (K28.5)Ordered set start, symbol lockBlock alignment via sync header — receiver finds block boundaries by searching for the valid 01/10 sync header pattern
STP (K27.7)Start of TLPSTP framing token (2 bytes in data block with embedded length)
SDP (K28.2)Start of DLLPSDP framing token (2 bytes in data block)
END (K29.7)End of good packetNot needed — TLP end is calculated from STP length field. Absence of EDB means packet is good.
EDB (K30.7)End of bad (nullified) packetEDB framing token (4 bytes = four EDB bytes) appended to nullified TLPs
SKP (K28.0)Clock compensationSKIP ordered set — still exists but now an ordered set block (sync=10) instead of K-code characters; SKP bytes may be added/removed
FTS (K28.1)L0s exit trainingFTS ordered set block — same purpose, now sync=10 block type
IDL (K28.3)Logical idle, electrical idle entryIDL framing token for logical idle; EIOS ordered set block for electrical idle entry
PAD (K23.7)Lane padding on wide linksIDL framing tokens fill unused space in data blocks
EIE (K28.7)Electrical idle exitEIEOS ordered set block (same purpose, now sync=10)

📋 Gen 3 Scrambler — Why Mandatory

In 8b/10b, scrambling was optional (a bit in the training sequence could disable it). In Gen 3, scrambling is mandatory and cannot be disabled. This is because 128b/130b encoding no longer guarantees transition density or DC balance by itself — it only provides the 2-bit sync header. The scrambler is the only mechanism that prevents long runs of the same bit value in the 128-bit payload, ensures sufficient transitions for CDR (Clock and Data Recovery), and maintains DC balance across the link.

Scrambling Role — 8b/10b vs 128b/130b 8b/10b (Gen 1/2) — Scrambling Optional DC balance: maintained by CRD mechanism (running disparity) Transition density: guaranteed ≤5 same bits by 8b/10b rules Scrambling: optional · “disable scrambling” bit in TS1/TS2 Spectral flatness: scrambling helps eliminate periodic tones Without scrambling: 8b/10b still works but has spectral peaks 128b/130b (Gen 3+) — Scrambling Mandatory DC balance: ONLY from scrambling — no CRD mechanism Transition density: ONLY from scrambling — no max run limit Scrambling: cannot be disabled at 8 GT/s or above Without scrambling: all-zero payload → DC imbalance → CDR fails Without scrambling: 0xFF payload → 8 same bits → run violation
Figure 6 — Scrambling role comparison. In 8b/10b, the CRD mechanism and encoding rules guarantee DC balance and transition density independently. In 128b/130b, there is no CRD and no run-length limit — the scrambler is the only thing providing both properties. Disabling it would cause immediate link failures at any meaningful data pattern.

📋 Scrambler LFSR Details

The Gen 3 scrambler uses a 23-bit Linear Feedback Shift Register (LFSR), significantly more complex than the 16-bit LFSR used in Gen 1/2. The generator polynomial is x²³ + x¹⁸ + 1. Each lane has its own independent LFSR, seeded with a different initial value per lane — this ensures that even if all lanes carry identical data bytes, the scrambled bitstreams on adjacent lanes are different, preventing crosstalk from becoming coherent.

PropertyGen 1/2 scramblerGen 3+ scrambler
LFSR length16 bits23 bits
Generator polynomialx¹⁶ + x⁵ + x⁴ + x³ + 1x²³ + x¹⁸ + 1
Reset/resync triggerEvery COM K-code resets all lanes’ LFSRs simultaneouslyEvery EIEOS ordered set resets all lanes to defined per-lane seeds
Per-lane seedingSame seed on all lanesDifferent seed per lane — intentional scrambling diversity
Disable optionYes — “disable scrambling” bit in TS1/TS2No — cannot be disabled at 8 GT/s or higher
What is scrambledData bytes before 8b/10b encoding · K-codes not scrambledAll data block payload bytes · ordered set blocks not scrambled
Different LFSR seeds per lane. On a x16 link, each of the 16 lanes has a different initial LFSR seed. This means Lane 0’s scrambled bitstream is statistically independent from Lane 1’s scrambled bitstream even when both carry exactly the same data. This is deliberately designed to prevent systematic crosstalk: if all lanes had the same scrambling pattern, any noise that coupled identically from one lane to another would be correlated and could cause systematic bit errors.

📋 Block Alignment at the Receiver

Before a Gen 3 receiver can decode any data, it must achieve block alignment — determining exactly which bit in the incoming serial stream is the first bit of each 130-bit block. This replaces the symbol lock that 8b/10b achieved using the COM character’s unique pattern.

The procedure for achieving block alignment:

  1. The receiver tentatively picks a starting bit position and looks at the first 2 bits as the sync header. It checks whether the value is 01 or 10 (valid) or 00 or 11 (invalid).
  2. If valid, it jumps 130 bits forward and checks the next 2-bit sync header. If that is also valid, it repeats several more times.
  3. After seeing a sufficient number of consecutive valid sync headers at the expected 130-bit intervals, the receiver declares block alignment achieved.
  4. If at any point an invalid sync header appears (00 or 11), the receiver tries the next bit position and starts over.
  5. The EIEOS ordered set (received as a sync=10 block with a specific all-ones-then-all-zeros pattern) provides a reliable reference point because it is unscrambled and has a well-known pattern.
Block alignment, not symbol lock. In 8b/10b, symbol lock was per-lane (finding 10-bit symbol boundaries). In 128b/130b, block alignment is per-lane (finding 130-bit block boundaries). The principle is similar — use a recognisable pattern at known intervals — but the mechanism is entirely different because there are no K-code characters at arbitrary positions in the bitstream.

📋 Lane-to-Lane Deskew in Gen 3

On a multi-lane link, the parallel bitstreams on different lanes travel slightly different path lengths and through slightly different electrical characteristics. They arrive at the receiver at slightly different times — lane-to-lane skew. The receiver must re-align all lanes before reassembling the packet byte stream.

In 8b/10b, the COM K-code served as the deskew reference — it appeared on all lanes simultaneously. In Gen 3, COM no longer exists. Deskew is performed using ordered set blocks, which must be transmitted simultaneously on all lanes. Any ordered set can serve as the deskew marker, but SKIP (SOS), SDS (Start Data Stream), and EIEOS are most commonly used because they appear regularly.

PropertyGen 1/2Gen 3
Deskew referenceCOM K-code (K28.5) detected on all lanes simultaneouslyOrdered set blocks (SKIP/SDS/EIEOS) appearing simultaneously on all lanes
Max receivable skew20 ns (Gen 1) / 8 ns (Gen 2) = 5–4 symbol times6 ns = 6 symbol times at 1 ns per symbol
MechanismDelay early-arriving COM characters until all lanes are in syncDelay early-arriving ordered set blocks until all lanes show the ordered set simultaneously
Ongoing deskewEvery COM character provides an opportunity for adjustmentSKIP ordered sets (SOS) sent every 370–375 blocks provide regular adjustment opportunities

📋 SKIP Ordered Set for Clock Compensation

Clock tolerance compensation still works in Gen 3 via SKIP ordered sets (SOS), but the mechanism differs from 8b/10b. In Gen 1/2, the transmitter could insert SKP K-codes at fairly fine granularity within the bitstream. In Gen 3, insertion and deletion happen at block boundaries in multiples of 4 SKP symbols (bytes) per SOS.

Coarser compensation granularity in Gen 3. In Gen 1/2, SKP characters could be added or removed one character at a time. In Gen 3, they are added or removed 4 bytes at a time within an SOS block. This coarser granularity is compensated by the reduced clock tolerance at 8 GT/s speeds — at higher frequencies, the ±300 ppm tolerance means clock differences accumulate faster, but the elastic buffer is also designed to handle larger incremental adjustments.

📋 Gen 4 and Gen 5 Extensions

Gen 4 (16 GT/s) and Gen 5 (32 GT/s) both continue to use 128b/130b encoding. The block structure, sync header values, data/ordered set distinction, framing tokens, and scrambler mechanism are all carried forward unchanged. What changes generation-to-generation is the raw bit rate and the equalization requirements.

GenerationData rateEncodingUseful BW per laneKey additions vs Gen 3
Gen 38 GT/s128b/130b~984 MB/sIntroduction of 128b/130b, mandatory scrambling, 3-tap Tx FIR
Gen 416 GT/s128b/130b~1.97 GB/sWider Tx FIR coefficient range, stricter eye mask, retimer support
Gen 532 GT/s128b/130b~3.94 GB/sTighter coefficient resolution, adaptive equalization, FEC optional
Gen 664 GT/s (PAM4)Flit + FEC~7.5 GB/sPAM4 modulation, mandatory RS FEC, 256-byte flit framing, no 128b/130b

For Gen 4 and Gen 5, the encoding overhead and block structure are identical to Gen 3. The equalization training (Recovery.Equalization state in the LTSSM) runs more iterations with a larger coefficient search space, and the physical channel requirements become tighter — lower-loss laminates, tighter impedance control, and more aggressive equalization are all needed.

Gen 6 — Beyond 128b/130b

Gen 6 does not use 128b/130b encoding. The switch to PAM4 modulation at 32 Gbaud creates challenges that 128b/130b was not designed to handle. Instead, Gen 6 uses a completely different approach: flit-based framing with mandatory FEC.

Gen 3–5 (128b/130b) vs Gen 6 (Flit + FEC) Gen 3, 4, 5 — 128b/130b Block size: 130 bits (2-bit header + 128 payload) Framing: sync header identifies block type Error detection: code violation (illegal sync=00/11) Error correction: none — relies on DLL ACK/NAK replay Overhead: 2/130 = 1.54% NRZ: 2 levels · 1 bit per symbol Max raw BER before replay: ~10⁻¹² Gen 6 — Flit + FEC Flit size: 256 bytes (2048 bits) of TLP/DLLP data Framing: flit header (position-based, no sync header bits) Error detection: RS(544,514) Reed-Solomon code Error correction: up to 15 symbol errors per flit — corrected in hardware Overhead: FEC parity bytes per flit (~5.8%) PAM4: 4 levels · 2 bits per symbol Raw BER ~10⁻⁶ tolerable — FEC corrects to 10⁻¹⁵
Figure 7 — 128b/130b vs Gen 6 flit+FEC. The shift to PAM4 at Gen 6 reduces the eye opening margin dramatically — raw BER rises to ~10⁻⁶ which would be completely unusable with ACK/NAK replay alone. FEC corrects errors in hardware before the Data Link Layer sees them. The flit structure also allows multiple TLPs to be packed efficiently at 64 GT/s speeds.

Why 128b/130b could not scale to Gen 6

PAM4’s reduced eye opening means bit errors occur far more frequently than with NRZ. At 10⁻⁶ raw BER, a Gen 6 x16 link would have roughly one uncorrected bit error every microsecond without FEC — each requiring an ACK/NAK replay that consumes far more bandwidth than the original error saved. 128b/130b has no error correction — it can detect a bad sync header but cannot fix it. FEC is the only practical solution at PAM4 error rates.

Additionally, 128b/130b’s 130-bit block size is too fine-grained for efficient FEC coding. RS(544,514) operates on 256-byte (2048-bit) codewords — more than 15 times larger than a 130-bit block. Flit-based framing was designed specifically to match the FEC block size, giving the RS code enough data to correct multiple symbol errors per block efficiently.

128b/130b is used in Gen 3, Gen 4, and Gen 5 only. Gen 6 uses flit-based framing. If you are designing or debugging a Gen 3–5 interface, the 2-bit sync header, scrambler LFSR, block alignment, and framing tokens described in this post are exactly what you will encounter. For Gen 6, refer to the PCIe-12 post on the Data Link Layer flit mechanism.

📋 Quick Reference

ItemValue / Rule
Reason for change8b/10b’s 20% overhead made doubling Gen 2’s bandwidth impossible at a reasonable frequency increase. Dropping 8b/10b and using 128b/130b delivers nearly the same useful throughput at 8 GT/s as 10 GT/s with 8b/10b would have.
Block structure130 bits total: 2-bit sync header + 128-bit payload (16 bytes)
Overhead2/130 = ~1.54% — vs 20% for 8b/10b
Sync header: 01Data block — payload contains TLPs, DLLPs, framing tokens, IDL. All bytes scrambled.
Sync header: 10Ordered set block — payload contains ordered set pattern. Not scrambled. Same on all lanes.
Sync header: 00 or 11Illegal — block alignment error. Link must retrain.
Framing tokensSTP (with length+CRC) · SDP · EDB (4 bytes) · EDS · IDL — embedded inside data blocks
STP improvementIncludes 11-bit TLP DW count + 4-bit Frame CRC + parity. Receiver knows where TLP ends from start, without waiting for END K-code.
No END K-codeTLP end determined by STP length field. If no EDB follows, packet is assumed good.
ScramblingMandatory at Gen 3 and above — cannot be disabled. Only mechanism for DC balance and transition density.
Gen 3 LFSR23-bit, polynomial x²³+x¹⁸+1. Different seed per lane. Reset on EIEOS ordered set.
Block alignmentReceiver finds 130-bit block boundaries by searching for valid 01/10 sync headers at 130-bit intervals. Replaces 8b/10b symbol lock via COM.
Deskew referenceOrdered set blocks (SKIP/SDS/EIEOS) appearing simultaneously on all lanes. Replaces COM-based deskew of Gen 1/2.
SKIP ordered setEvery 370–375 data blocks. Sync=10 block. Receiver adds/removes 4 SKP bytes for clock compensation. No consecutive SOS blocks allowed at Gen 3.
Gen 4 and Gen 5Same 128b/130b block structure, same tokens, same scrambler approach. Only line rate and equalization complexity change.
Gen 6Does not use 128b/130b. PAM4 with 256-byte flit framing and RS(544,514) FEC replaces it. FEC corrects 10⁻⁶ raw BER to 10⁻¹⁵ effective.
Generations using 128b/130bGen 3 (8 GT/s), Gen 4 (16 GT/s), Gen 5 (32 GT/s)

Scroll to Top