As we saw in the previous lecture, the traditional parallel PCI bus eventually hit a physical speed ceiling around 66 MHz. To address the relentless demand for higher bandwidth without abandoning the massive existing ecosystem of PCI hardware and software, the industry introduced PCI-X (PCI-eXtended).
PCI-X was designed as a logical extension of the PCI architecture, maintaining complete backward compatibility while drastically improving bus performance and efficiency. Here is a look at the engineering innovations that allowed PCI-X to push parallel bus bandwidth up to 4 GB/s.
Beating the Clock with Registered Inputs
To push clock frequencies higher than 66 MHz, engineers had to find a way to shrink the required “timing budget” of the bus. The traditional PCI model used “non-registered inputs,” which required a relatively long signal setup time (about 3 nanoseconds) at the receiver.
PCI-X solved this by using registered inputs. By registering (or latching) all input signals with a Flip-Flop at the input pin of the target device, PCI-X reduced the required signal setup time to less than 1 nanosecond. Additionally, PCI-X implemented internal Phase-Locked Loops (PLLs) to provide phase-shifted clocks, allowing outputs to be driven slightly earlier and inputs to be sampled slightly later.
These clever timing tricks bought enough time for signal propagation to allow the shared parallel bus frequency to be successfully doubled to 133 MHz (achieving roughly 1 GB/s). Later, the PCI-X 2.0 revision achieved staggering peak data rates of up to 4 GB/s by implementing a source-synchronous clocking model that supported Dual Data Rate (DDR) and Quad Data Rate (QDR) transfers.
Maximizing Efficiency with the Split-Transaction Model
Increasing the raw clock speed was only half the battle; PCI-X also fundamentally changed how devices handled delays. In legacy PCI, if a target device wasn’t immediately ready to return requested data, it would either stall the entire bus by inserting “Wait States” or force the initiator to repeatedly “Retry” the transaction. Both methods wasted valuable bus time.
PCI-X introduced the highly efficient Split-Transaction Model, which explicitly divides communicating devices into two roles: the Requester and the Completer. Here is how it completely eliminated bus stalling:
- The Request: The Requester initiates a read transaction on the bus.
- The Split Response: If the Completer cannot return the data immediately, it memorizes the transaction details and issues a “split response”.
- Freeing the Bus: The Requester receives this response, puts the transaction in a queue, and completely releases the bus to the idle state. The bus is now free for other devices to use, and the Requester is free to do other work—including initiating new requests.
- The Split Completion: Once the Completer has finally gathered the requested data, it arbitrates for bus ownership and initiates a “split completion” bus cycle to deliver the data back to the Requester.
The Attribute Phase and Overall Efficiency
To make the Split-Transaction model possible, PCI-X added a new Attribute Phase to the beginning of each transaction. During this phase, the Requester broadcasts critical information, including exactly how much data it wants (the byte count) and its unique identity (its Bus, Device, and Function number). Because the Completer knows exactly who asked for the data and how much is needed, it can successfully target the correct device later with the split completion.
By preventing devices from holding the bus hostage while waiting for data, these protocol enhancements drastically increased real-world throughput. While standard PCI generally hovered around 50% to 60% transfer efficiency, the PCI-X protocol pushed bus utilization efficiency to an impressive 85%.
