In our final deep dive into the PCIe layered architecture, we reach the foundation: the Physical Layer. This is the lowest hierarchical layer, responsible for taking the fully assembled packets from the Data Link Layer and transforming them into the raw electrical signals that physically travel across the wire.
Here is a breakdown of how the Physical Layer preps, encodes, and successfully negotiates the transmission of data at blistering speeds.
Prepping the Data: Striping and Scrambling
Before data can be serialized and sent over the pins, the Logical portion of the Physical Layer must prep it for the journey.
- Framing: For Gen1 and Gen2 speeds, the layer first adds Start and End “framing” characters so the receiver can easily detect the exact boundaries of the packet.
- Byte Striping: If the connection utilizes multiple Lanes (like a x4 or x16 slot), the layer performs byte striping, seamlessly splitting the packet’s bytes across all the available Lanes. In effect, each Lane operates as an independent serial path, allowing massive amounts of data to be pushed out simultaneously.
- Scrambling: To prevent repetitive bit patterns from traveling down the wire and to reduce the overall electromagnetic interference (EMI) generated by the Link, the bytes are mathematically scrambled.
The Encoding Evolution: 8b/10b vs. 128b/130b
As we learned in previous lectures, PCIe embeds the clock directly into the data stream, which requires a specific data encoding process.
- Gen1 and Gen2 (8b/10b Encoding): The first two generations of PCIe utilize an 8b/10b encoding scheme. This logic takes the 8-bit data characters and converts them into 10-bit symbols before transmission. While this guarantees enough signal transitions for the receiver to easily recover the clock, it introduces a 20% performance overhead.
- Gen3 (128b/130b Encoding): When designing PCIe Gen3, engineers realized that carrying over the 20% overhead of 8b/10b encoding would be too inefficient at 8.0 GT/s speeds. To solve this, Gen3 hardware completely skips the 8b/10b step and utilizes a highly efficient 128b/130b encoding scheme.
Once encoded, the bits are serialized and clocked out over the physical traces as differential signals.
Receiving and Reassembling
When the receiver takes in this high-speed bit stream, it must reverse engineer the entire process.
After converting the serial bits back into symbols (or bytes for Gen3), the data is pushed through an elastic buffer. This is a clever hardware mechanism designed to compensate for the slight frequency drift and clock tolerances between the transmitting and receiving devices’ internal clocks. The data is then decoded, de-scrambled, and un-striped from the multiple Lanes back into a single, cohesive packet. Once the framing characters are stripped away, the completed packet is pushed up to the Data Link Layer.
Link Training and Initialization
Before any of this high-speed communication can happen, the two connected devices must figure out how to talk to each other. The Physical Layer is responsible for an automatic, hardware-driven initialization process known as Link Training.
Managed by the Link Training and Status State Machine (LTSSM), this process uses special, un-routed signals called Ordered Sets to establish the connection. During Link Training, the hardware automatically discovers and resolves several crucial connection variables, including:
- Link Width and Data Rate: Determining how many Lanes are physically wired and what maximum speed both devices can support (Gen1, Gen2, or Gen3).
- Lane Reversal and Polarity Inversion: Automatically correcting issues if the motherboard designer accidentally wired the Lanes backwards or flipped the positive/negative differential pins.
- Bit Lock and Symbol Lock: Ensuring the receiver’s Phase-Locked Loop (PLL) has successfully recovered the clock and found the correct starting position in the bit stream.
- Lane-to-Lane De-skew: Because signals traveling across multiple parallel Lanes may arrive at slightly different times, the receiver aligns the incoming data across all Lanes to perfectly reassemble the striped bytes.
