PCIe 128b/130b Encoding Explained — Gen3 and Beyond

📋 Why 8b/10b Was Replaced

Gen 3 was the first PCIe generation to double bandwidth without doubling frequency. Gen 1 ran at 2.5 GT/s, Gen 2 at 5 GT/s. Simply doubling to 10 GT/s for Gen 3 was considered impractical — the signal conditioning required at 5 GHz Nyquist frequency would demand expensive board materials and aggressive equalization that would price PCIe out of mainstream use.

The solution came from a different direction: keep the frequency increase modest (5 GT/s to 8 GT/s is only 60% more) and reclaim the 20% overhead that 8b/10b encoding had been wasting. The result is Gen 3 at 8 GT/s with ~98.5% efficiency — delivering approximately the same useful throughput as a hypothetical 10 GT/s system with 8b/10b.

Figure 1 — Two paths to doubling Gen 2 bandwidth. Option A (10 GT/s with 8b/10b) would require prohibitively expensive signal integrity measures. Option B (8 GT/s with 128b/130b) recovers the 20% overhead while keeping frequency increases manageable. Gen 3 chose Option B.

Three specific problems with simply doubling frequency drove this decision: higher frequencies require far more expensive PCB laminate materials due to dielectric loss; signal conditioning logic (equalization) at 5 GHz+ is complex and power-hungry; and the existing PCIe connector and board infrastructure designed for 5 GT/s would have needed complete redesign for 10 GT/s signals.

📋 The Bandwidth Maths

The arithmetic behind 128b/130b is simple but important to understand precisely:

Figure 2 — Bandwidth maths. Gen 2 at 5 GT/s with 8b/10b delivers 500 MB/s useful data per lane. Gen 3 at 8 GT/s with 128b/130b delivers ~984 MB/s — nearly double — despite only 60% more raw frequency. The efficiency jump from 80% to 98.5% is what makes this possible.

📋 128b/130b Block Structure

128b/130b groups 128 bits (16 bytes) of data into a single block and prepends a 2-bit sync header. This gives 130 bits on the wire for every 128 bits of payload — hence the name. Unlike 8b/10b which operates symbol-by-symbol, 128b/130b operates block-by-block.

Figure 3 — A 128b/130b block. The 2-bit sync header is transmitted first, followed by 128 bits (16 bytes) of payload. The sync header tells the receiver whether to interpret the following 128 bits as a data block (TLPs, DLLPs, idle) or an ordered set block. All 128 payload bits are scrambled.

The block boundary is fundamental to how Gen 3 framing works. Unlike 8b/10b where any K-code can appear between data characters, 128b/130b has no K-codes in the middle of data. Instead, the block structure defines what kind of content is present. A receiver must first achieve block alignment — knowing exactly where each 130-bit block starts in the serial bitstream — before it can interpret any data.

📋 The 2-bit Sync Header

The sync header is always exactly 2 bits, transmitted first in the block. Only two values are defined:

Figure 4 — Sync header values. 01 means data block (scrambled, contains TLPs/DLLPs). 10 means ordered set block (not scrambled, same content on all lanes). The values 00 and 11 are illegal — if a receiver detects an illegal sync header it reports a block alignment error and initiates link retraining.

Why 01 and 10 instead of 0 and 1? With a 2-bit header, there are four possible values: 00, 01, 10, 11. The spec defines only 01 (data) and 10 (ordered set). The two unused values — 00 and 11 — are specifically chosen because they cannot arise from bit errors on valid sync headers without detectable disparity in the header itself. A single bit flip on 01 gives either 00 or 11 (both illegal, detectable) or 11 (illegal, detectable). This gives the sync header a self-detecting quality for single-bit errors.

📋 Block Types — Data vs Ordered Set

The sync header divides blocks into two categories with fundamentally different behaviour:

Property	Data Block (sync=01)	Ordered Set Block (sync=10)
Content	TLP bytes · DLLP bytes · STP/SDP framing tokens · IDL padding · EDB · EDS tokens	One of the defined ordered sets: SKIP, EIEOS, EIOS, SDS, FTS
Scrambling	Yes — all 128 payload bits are scrambled with the LFSR	No — sent in the clear, receiver must see exact patterns for alignment
Lane requirement	Content may differ per lane (each lane carries its stripe of the packet)	Same on all active lanes simultaneously — required for lane-to-lane deskew
Data Stream context	Part of the Data Stream — TLPs and DLLPs flow in data blocks	Interrupts the Data Stream when it is something other than SKIP; SKIP ordered sets may appear within a Data Stream without interrupting it
How framed	STP token marks TLP start; SDP marks DLLP start; length from STP token counts end	Entire block is the ordered set — no separate framing tokens needed

📋 Framing Tokens in Data Blocks

In 8b/10b, K-codes (STP, SDP, END, EDB) are control characters that mark packet boundaries. In 128b/130b there are no K-codes — framing is done instead by special Framing Tokens embedded within data blocks. These tokens are specific byte patterns that the receiver recognises within the 128-bit payload of a data block.

Figure 5 — Five framing tokens embedded inside data blocks. STP now includes the TLP’s full DW length count (11-bit field), allowing the receiver to find the end of a TLP by counting DWs rather than looking for an END K-code. SDP marks DLLPs; no end marker is needed since DLLPs are always 8 bytes. EDS marks the end of a data stream before an ordered set block interrupts it.

STP token — the key improvement over 8b/10b STP

In 8b/10b, the TLP framing was: STP K-code at start, END K-code at end. The receiver had to wait to see the END to know where the TLP finished. In 128b/130b the STP token includes the TLP’s complete DW count as an 11-bit length field. The receiver can calculate exactly where the TLP ends from the moment it sees the STP. This enables faster cut-through forwarding at switches and simpler receiver logic.

The STP length field also has a 4-bit Frame CRC and an additional parity bit to protect against errors in the length itself — an error in the length field would cause the receiver to misalign on all subsequent packet boundaries until recovery. The triple-bit-flip detection capability of this combined protection makes the length field very robust.

📋 No K-Codes — What Replaced Each One

Every K-code from 8b/10b that had a function in PCIe Gen 1/2 is replaced in Gen 3 by either a framing token or an ordered set mechanism:

8b/10b K-code	Function	Gen 3 replacement
COM (K28.5)	Ordered set start, symbol lock	Block alignment via sync header — receiver finds block boundaries by searching for the valid 01/10 sync header pattern
STP (K27.7)	Start of TLP	STP framing token (2 bytes in data block with embedded length)
SDP (K28.2)	Start of DLLP	SDP framing token (2 bytes in data block)
END (K29.7)	End of good packet	Not needed — TLP end is calculated from STP length field. Absence of EDB means packet is good.
EDB (K30.7)	End of bad (nullified) packet	EDB framing token (4 bytes = four EDB bytes) appended to nullified TLPs
SKP (K28.0)	Clock compensation	SKIP ordered set — still exists but now an ordered set block (sync=10) instead of K-code characters; SKP bytes may be added/removed
FTS (K28.1)	L0s exit training	FTS ordered set block — same purpose, now sync=10 block type
IDL (K28.3)	Logical idle, electrical idle entry	IDL framing token for logical idle; EIOS ordered set block for electrical idle entry
PAD (K23.7)	Lane padding on wide links	IDL framing tokens fill unused space in data blocks
EIE (K28.7)	Electrical idle exit	EIEOS ordered set block (same purpose, now sync=10)

📋 Gen 3 Scrambler — Why Mandatory

In 8b/10b, scrambling was optional (a bit in the training sequence could disable it). In Gen 3, scrambling is mandatory and cannot be disabled. This is because 128b/130b encoding no longer guarantees transition density or DC balance by itself — it only provides the 2-bit sync header. The scrambler is the only mechanism that prevents long runs of the same bit value in the 128-bit payload, ensures sufficient transitions for CDR (Clock and Data Recovery), and maintains DC balance across the link.

Figure 6 — Scrambling role comparison. In 8b/10b, the CRD mechanism and encoding rules guarantee DC balance and transition density independently. In 128b/130b, there is no CRD and no run-length limit — the scrambler is the only thing providing both properties. Disabling it would cause immediate link failures at any meaningful data pattern.

📋 Scrambler LFSR Details

The Gen 3 scrambler uses a 23-bit Linear Feedback Shift Register (LFSR), significantly more complex than the 16-bit LFSR used in Gen 1/2. The generator polynomial is x²³ + x¹⁸ + 1. Each lane has its own independent LFSR, seeded with a different initial value per lane — this ensures that even if all lanes carry identical data bytes, the scrambled bitstreams on adjacent lanes are different, preventing crosstalk from becoming coherent.

Property	Gen 1/2 scrambler	Gen 3+ scrambler
LFSR length	16 bits	23 bits
Generator polynomial	x¹⁶ + x⁵ + x⁴ + x³ + 1	x²³ + x¹⁸ + 1
Reset/resync trigger	Every COM K-code resets all lanes’ LFSRs simultaneously	Every EIEOS ordered set resets all lanes to defined per-lane seeds
Per-lane seeding	Same seed on all lanes	Different seed per lane — intentional scrambling diversity
Disable option	Yes — “disable scrambling” bit in TS1/TS2	No — cannot be disabled at 8 GT/s or higher
What is scrambled	Data bytes before 8b/10b encoding · K-codes not scrambled	All data block payload bytes · ordered set blocks not scrambled

Different LFSR seeds per lane. On a x16 link, each of the 16 lanes has a different initial LFSR seed. This means Lane 0’s scrambled bitstream is statistically independent from Lane 1’s scrambled bitstream even when both carry exactly the same data. This is deliberately designed to prevent systematic crosstalk: if all lanes had the same scrambling pattern, any noise that coupled identically from one lane to another would be correlated and could cause systematic bit errors.

📋 Block Alignment at the Receiver

Before a Gen 3 receiver can decode any data, it must achieve block alignment — determining exactly which bit in the incoming serial stream is the first bit of each 130-bit block. This replaces the symbol lock that 8b/10b achieved using the COM character’s unique pattern.

The procedure for achieving block alignment:

The receiver tentatively picks a starting bit position and looks at the first 2 bits as the sync header. It checks whether the value is 01 or 10 (valid) or 00 or 11 (invalid).
If valid, it jumps 130 bits forward and checks the next 2-bit sync header. If that is also valid, it repeats several more times.
After seeing a sufficient number of consecutive valid sync headers at the expected 130-bit intervals, the receiver declares block alignment achieved.
If at any point an invalid sync header appears (00 or 11), the receiver tries the next bit position and starts over.
The EIEOS ordered set (received as a sync=10 block with a specific all-ones-then-all-zeros pattern) provides a reliable reference point because it is unscrambled and has a well-known pattern.

Block alignment, not symbol lock. In 8b/10b, symbol lock was per-lane (finding 10-bit symbol boundaries). In 128b/130b, block alignment is per-lane (finding 130-bit block boundaries). The principle is similar — use a recognisable pattern at known intervals — but the mechanism is entirely different because there are no K-code characters at arbitrary positions in the bitstream.

📋 Lane-to-Lane Deskew in Gen 3

On a multi-lane link, the parallel bitstreams on different lanes travel slightly different path lengths and through slightly different electrical characteristics. They arrive at the receiver at slightly different times — lane-to-lane skew. The receiver must re-align all lanes before reassembling the packet byte stream.

In 8b/10b, the COM K-code served as the deskew reference — it appeared on all lanes simultaneously. In Gen 3, COM no longer exists. Deskew is performed using ordered set blocks, which must be transmitted simultaneously on all lanes. Any ordered set can serve as the deskew marker, but SKIP (SOS), SDS (Start Data Stream), and EIEOS are most commonly used because they appear regularly.

Property	Gen 1/2	Gen 3
Deskew reference	COM K-code (K28.5) detected on all lanes simultaneously	Ordered set blocks (SKIP/SDS/EIEOS) appearing simultaneously on all lanes
Max receivable skew	20 ns (Gen 1) / 8 ns (Gen 2) = 5–4 symbol times	6 ns = 6 symbol times at 1 ns per symbol
Mechanism	Delay early-arriving COM characters until all lanes are in sync	Delay early-arriving ordered set blocks until all lanes show the ordered set simultaneously
Ongoing deskew	Every COM character provides an opportunity for adjustment	SKIP ordered sets (SOS) sent every 370–375 blocks provide regular adjustment opportunities

📋 SKIP Ordered Set for Clock Compensation

Clock tolerance compensation still works in Gen 3 via SKIP ordered sets (SOS), but the mechanism differs from 8b/10b. In Gen 1/2, the transmitter could insert SKP K-codes at fairly fine granularity within the bitstream. In Gen 3, insertion and deletion happen at block boundaries in multiples of 4 SKP symbols (bytes) per SOS.

The transmitter sends an SOS every 370–375 blocks (each block being 130 bits). At 8 GT/s this is approximately every 6000 UI.
The SOS is an ordered set block (sync=10) containing 16 bytes of SKIP content. The receiver may add or remove 4 SKP bytes from the SOS to adjust its elastic buffer level.
Two consecutive SOS blocks are not allowed at Gen 3 — they must be separated by at least one data block.
If a large TLP is in progress when an SOS is scheduled, the SOS transmission is deferred to the next packet boundary. Receivers must tolerate SOS separations up to: 375 blocks + (maximum packet size / 16 bytes per block) blocks.

Coarser compensation granularity in Gen 3. In Gen 1/2, SKP characters could be added or removed one character at a time. In Gen 3, they are added or removed 4 bytes at a time within an SOS block. This coarser granularity is compensated by the reduced clock tolerance at 8 GT/s speeds — at higher frequencies, the ±300 ppm tolerance means clock differences accumulate faster, but the elastic buffer is also designed to handle larger incremental adjustments.

📋 Gen 4 and Gen 5 Extensions

Gen 4 (16 GT/s) and Gen 5 (32 GT/s) both continue to use 128b/130b encoding. The block structure, sync header values, data/ordered set distinction, framing tokens, and scrambler mechanism are all carried forward unchanged. What changes generation-to-generation is the raw bit rate and the equalization requirements.

Generation	Data rate	Encoding	Useful BW per lane	Key additions vs Gen 3
Gen 3	8 GT/s	128b/130b	~984 MB/s	Introduction of 128b/130b, mandatory scrambling, 3-tap Tx FIR
Gen 4	16 GT/s	128b/130b	~1.97 GB/s	Wider Tx FIR coefficient range, stricter eye mask, retimer support
Gen 5	32 GT/s	128b/130b	~3.94 GB/s	Tighter coefficient resolution, adaptive equalization, FEC optional
Gen 6	64 GT/s (PAM4)	Flit + FEC	~7.5 GB/s	PAM4 modulation, mandatory RS FEC, 256-byte flit framing, no 128b/130b

For Gen 4 and Gen 5, the encoding overhead and block structure are identical to Gen 3. The equalization training (Recovery.Equalization state in the LTSSM) runs more iterations with a larger coefficient search space, and the physical channel requirements become tighter — lower-loss laminates, tighter impedance control, and more aggressive equalization are all needed.

⚡ Gen 6 — Beyond 128b/130b

Gen 6 does not use 128b/130b encoding. The switch to PAM4 modulation at 32 Gbaud creates challenges that 128b/130b was not designed to handle. Instead, Gen 6 uses a completely different approach: flit-based framing with mandatory FEC.

Figure 7 — 128b/130b vs Gen 6 flit+FEC. The shift to PAM4 at Gen 6 reduces the eye opening margin dramatically — raw BER rises to ~10⁻⁶ which would be completely unusable with ACK/NAK replay alone. FEC corrects errors in hardware before the Data Link Layer sees them. The flit structure also allows multiple TLPs to be packed efficiently at 64 GT/s speeds.

Why 128b/130b could not scale to Gen 6

PAM4’s reduced eye opening means bit errors occur far more frequently than with NRZ. At 10⁻⁶ raw BER, a Gen 6 x16 link would have roughly one uncorrected bit error every microsecond without FEC — each requiring an ACK/NAK replay that consumes far more bandwidth than the original error saved. 128b/130b has no error correction — it can detect a bad sync header but cannot fix it. FEC is the only practical solution at PAM4 error rates.

Additionally, 128b/130b’s 130-bit block size is too fine-grained for efficient FEC coding. RS(544,514) operates on 256-byte (2048-bit) codewords — more than 15 times larger than a 130-bit block. Flit-based framing was designed specifically to match the FEC block size, giving the RS code enough data to correct multiple symbol errors per block efficiently.

128b/130b is used in Gen 3, Gen 4, and Gen 5 only. Gen 6 uses flit-based framing. If you are designing or debugging a Gen 3–5 interface, the 2-bit sync header, scrambler LFSR, block alignment, and framing tokens described in this post are exactly what you will encounter. For Gen 6, refer to the PCIe-12 post on the Data Link Layer flit mechanism.

📋 Quick Reference

Item	Value / Rule
Reason for change	8b/10b’s 20% overhead made doubling Gen 2’s bandwidth impossible at a reasonable frequency increase. Dropping 8b/10b and using 128b/130b delivers nearly the same useful throughput at 8 GT/s as 10 GT/s with 8b/10b would have.
Block structure	130 bits total: 2-bit sync header + 128-bit payload (16 bytes)
Overhead	2/130 = ~1.54% — vs 20% for 8b/10b
Sync header: 01	Data block — payload contains TLPs, DLLPs, framing tokens, IDL. All bytes scrambled.
Sync header: 10	Ordered set block — payload contains ordered set pattern. Not scrambled. Same on all lanes.
Sync header: 00 or 11	Illegal — block alignment error. Link must retrain.
Framing tokens	STP (with length+CRC) · SDP · EDB (4 bytes) · EDS · IDL — embedded inside data blocks
STP improvement	Includes 11-bit TLP DW count + 4-bit Frame CRC + parity. Receiver knows where TLP ends from start, without waiting for END K-code.
No END K-code	TLP end determined by STP length field. If no EDB follows, packet is assumed good.
Scrambling	Mandatory at Gen 3 and above — cannot be disabled. Only mechanism for DC balance and transition density.
Gen 3 LFSR	23-bit, polynomial x²³+x¹⁸+1. Different seed per lane. Reset on EIEOS ordered set.
Block alignment	Receiver finds 130-bit block boundaries by searching for valid 01/10 sync headers at 130-bit intervals. Replaces 8b/10b symbol lock via COM.
Deskew reference	Ordered set blocks (SKIP/SDS/EIEOS) appearing simultaneously on all lanes. Replaces COM-based deskew of Gen 1/2.
SKIP ordered set	Every 370–375 data blocks. Sync=10 block. Receiver adds/removes 4 SKP bytes for clock compensation. No consecutive SOS blocks allowed at Gen 3.
Gen 4 and Gen 5	Same 128b/130b block structure, same tokens, same scrambler approach. Only line rate and equalization complexity change.
Gen 6	Does not use 128b/130b. PAM4 with 256-byte flit framing and RS(544,514) FEC replaces it. FEC corrects 10⁻⁶ raw BER to 10⁻¹⁵ effective.
Generations using 128b/130b	Gen 3 (8 GT/s), Gen 4 (16 GT/s), Gen 5 (32 GT/s)