P1.3 Links, Lanes, and Bandwidth: Exploring Scalable Performance in PCIe

In our previous lectures, we discussed how PCI Express (PCIe) shifted to a serial transport model and utilized differential signaling to break the physical speed barriers of legacy parallel buses. Now, let’s look at how PCIe actually scales its performance to meet the varying demands of different devices by utilizing flexible Links and Lanes.

Here is a breakdown of how the PCIe physical connection is structured and how bandwidth is calculated across the first three generations of the technology.

The Building Blocks: Links and Lanes

In the PCIe architecture, the complete physical connection between two devices is called a Link.

This Link is constructed from fundamental building blocks known as Lanes. A single Lane consists of exactly one differential transmit signal pair and one differential receive signal pair. Because this is a dual-simplex architecture, data can travel in both directions simultaneously.

While a single Lane (x1) is perfectly sufficient for all basic communications between devices, modern systems require vastly different levels of performance depending on the peripheral.

Scaling Performance with Link Widths

To achieve scalable performance, PCIe allows designers to aggregate multiple Lanes together into a single Link. The total number of Lanes used in a connection is called the Link Width.

The PCIe specification supports Link Widths in powers of 2 up to 32 Lanes: x1, x2, x4, x8, x16, and x32 (a unique x12 Link is also supported, originally intended to align with InfiniBand standards).

This flexibility allows a platform designer to easily scale performance up or down. The engineering trade-off is straightforward: adding more Lanes drastically increases the bandwidth of the Link, but it proportionally increases the physical space required on the board, the manufacturing cost, and the overall power consumption.

Calculating Bandwidth Across Gen1, Gen2, and Gen3

When calculating the aggregate bandwidth of a PCIe Lane, we must account for the raw bit rate, the fact that data travels in two directions simultaneously (x2), and the overhead introduced by the data encoding process.

Here is how the bandwidth math breaks down across the first three generations:

Gen1 (PCIe 1.x)

The Math: Gen1 operates at a raw bit rate of 2.5 GT/s. However, the data stream uses 8b/10b encoding to embed the clock and prevent errors, meaning 10 bits are physically transmitted for every 8 bits (1 byte) of actual data.
Calculation: (2.5 Gb/s x 2 directions) / 10 bits per symbol = 0.5 GB/s aggregate bandwidth per Lane.

Gen2 (PCIe 2.x)

The Math: To double the performance for Gen2, engineers simply doubled the raw frequency to 5.0 GT/s. It still relies on the same 8b/10b encoding scheme.
Calculation: (5.0 Gb/s x 2 directions) / 10 bits per symbol = 1.0 GB/s aggregate bandwidth per Lane.

Gen3 (PCIe 3.0)

The Math: For Gen3, the industry wanted to double bandwidth again. However, simply doubling the frequency to 10 GT/s presented severe physical and electrical challenges. Instead, engineers increased the raw frequency to just 8.0 GT/s, but completely removed the inefficient 8b/10b encoding. They replaced it with a highly efficient 128b/130b encoding scheme. Because the 2-bit overhead per 128 bits is so incredibly small, we can simply divide by 8 bits per byte for our calculation.
Calculation: (8.0 Gb/s x 2 directions) / 8 bits per byte = 2.0 GB/s aggregate bandwidth per Lane.

The Multiplier Effect

To find the total peak bandwidth of any PCIe connection, you simply take the single-Lane bandwidth for that generation and multiply it by the Link Width.

For example, a modern graphics card utilizing a Gen3 x16 slot takes the 2.0 GB/s per Lane and multiplies it across all 16 Lanes, delivering a staggering 32 GB/s of aggregate bandwidth.

P1.3 Links, Lanes, and Bandwidth: Exploring Scalable Performance in PCIe

Leave a Comment Cancel Reply