In our previous lectures, we explored the foundations of the PCI bus and how it revolutionized the PC industry with high-speed, plug-and-play parallel data transfers. However, as processors grew faster and peripheral bandwidth demands skyrocketed, the legacy PCI architecture eventually hit a wall. Parallel buses inevitably reach a practical ceiling on effective bandwidth and cannot readily be made to go faster.
Here is a look at the physical and electrical limitations that caused the parallel PCI bus to hit its speed limit, setting the stage for the next revolution in computer hardware.
Reflected-Wave Signaling and the Load Limit
To understand the PCI speed limit, we first must look at how it transmits electrical signals. To reduce power consumption and manufacturing costs, PCI devices use a technique called “reflected-wave signaling”.
Instead of driving a signal at full strength, devices implement weak transmit buffers that only drive the signal to about half the voltage needed to actually switch the signal. Because the transmission line is intentionally left unterminated at the end, the signal hits an “infinite impedance” and reflects back. This reflection is additive; as the wave makes its way back to the transmitter, it doubles the signal to the full required voltage.
The problem? The total elapsed time for this entire process—the signal propagating down the wire, reflecting back, and allowing for setup time at the receiver—must be less than a single clock period.
- At 33 MHz: The clock period is 30 nanoseconds, which allows the signal enough time to safely bounce across about 10 to 12 electrical loads (roughly 4 to 5 add-in card slots).
- At 66 MHz: The clock period is halved to 15 nanoseconds. With only half the timing budget, the number of loads must be drastically reduced, meaning a 66 MHz PCI bus can typically only support a single add-in card slot.
The Shrinking Timing Budget
Parallel bus designs rely on a common (or distributed) clock, meaning data is driven out by the transmitter on one clock edge and latched by the receiver on the very next clock edge. Therefore, your entire “timing budget” to move data is exactly one clock period. As frequencies increase, that clock period shrinks, magnifying several inherent physical problems:
1. Signal Skew In a parallel bus like PCI, 32 or 64 bits of data are sent across parallel wires at the exact same time. However, because of minute physical differences in the traces and pins, these signals experience slightly different delays and arrive at the receiver at slightly different times. This is known as signal skew. Because the receiver cannot latch the data until every single bit is ready and stable, the system is always forced to wait for the slowest bit.
2. Clock Skew Just as the data signals experience delays, so does the shared system clock. The arrival time of the common clock at the transmitting device is not precisely the same as its arrival time at the receiving device. This difference, known as clock skew, further reduces the already tight timing budget. Board layout designers work tirelessly to minimize clock skew, but in a parallel architecture, it can never be completely eliminated.
3. Flight Time and the Limits of Parallel Trace Lengths The actual time it takes for a signal to propagate from the transmitter to the receiver is called the flight time. For the parallel bus model to work, the flight time must be less than the clock period.
With a 66 MHz clock, the total period is just 15 nanoseconds, and after subtracting the required setup time at the receiver (3 ns) and output delays at the transmitter, the remaining time for the signal to physically fly across the motherboard is incredibly small. To ensure the signal arrives within this tiny window, motherboard designers are forced to implement shorter and shorter physical signal traces. Eventually, making traces short enough to beat the shrinking clock period becomes completely unrealistic for practical board design.
Reaching the Ceiling
Because of the combined hurdles of reflected-wave signaling, signal skew, clock skew, and flight time limits, the PCI bus frequency could not realistically be increased beyond 66 MHz using its original “non-registered input” model. These physical limits of parallel architectures ultimately forced the industry to rethink how devices communicate, leading first to the registered inputs of PCI-X, and eventually to the abandonment of shared parallel buses entirely in favor of the serial point-to-point architecture of PCI Express (PCIe)
