In our exploration of PCI Express (PCIe) architecture, we have established that devices use configuration space to achieve a “plug-and-play” environment. However, an important rule governs this space: only the Root Complex is permitted to originate Configuration Requests. This restriction ensures that the system processor acts as the central authority, preventing the anarchy that would result if any device could alter the configuration of another.
Because standard processors are generally only capable of generating Memory and IO requests, the Root Complex must serve as a translator, converting specific processor accesses into Configuration Requests on the bus. Over the evolution of PC architecture, the method for doing this has shifted dramatically. Here is a look at the legacy PCI mechanism and the modern PCIe Enhanced Access method.
The Legacy PCI Mechanism: The IO-Indirect Method
When the original PCI standard was developed, the dominant Intel x86 processors were limited to a tiny 64KB IO address space, which was already highly cluttered with existing system resources. Because mapping the configuration registers for every possible device directly into IO or memory space was not seen as a feasible solution at the time, engineers opted for an “indirect” address mapping mechanism.
This legacy IO-indirect method relies on two 32-bit IO ports located in the Host Bridge of the Root Complex: the Configuration Address Port (at IO addresses 0CF8h – 0CFBh) and the Configuration Data Port (at IO addresses 0CFCh – CFFh).
Generating a transaction requires a strict two-step process:
- Setting the Address: The CPU first executes an IO write to the Configuration Address Port (CF8h). This 32-bit write contains the target Bus, Device, and Function (BDF) numbers, the specific Register Number being targeted, and an “Enable” bit (bit 31) set to 1,,.
- Accessing the Data: Next, the CPU executes an IO read or write to the Configuration Data Port (CFCh). The Host Bridge sees this access, recognizes that the Enable bit was set, and translates the CPU’s IO request into a true Configuration Request on the PCI bus,.
Hitting the Limits of the Legacy Model
While the IO-indirect method perfectly solved the address space limitations of the 1990s, it eventually became a major bottleneck for two reasons:
- The 256-Byte Limit: The format of the legacy Configuration Address Port only allocates enough bits to target the first 64 doublewords (256 bytes) of a device’s configuration space. It is physically incapable of reaching the massive new 4KB Extended Configuration Space introduced by PCIe.
- The Multi-Threading Problem: Early PCs used single-core processors running single threads, so taking two separate steps to complete an access was perfectly safe. However, in modern multi-core, multi-threaded CPUs, this two-step process is highly vulnerable. Without complex locking semantics, Thread A could write its target to the Address Port, but before it can read the Data Port, Thread B might simultaneously overwrite the Address Port with a completely different request,.
The PCIe Solution: Enhanced Configuration Access Mechanism
To solve both the size limitations and the multi-threading conflicts, the PCIe specification introduced the Enhanced Configuration Access Mechanism. Instead of using cumbersome indirect IO ports, PCIe directly maps all configuration space into the system’s memory address space.
Because PCIe allows up to 4KB of configuration space for every possible function, and a system can support up to 256 buses, 32 devices per bus, and 8 functions per device, the system allocates a dedicated 256MB block of memory address space strictly for configuration mapping. Within this 256MB range, the specific memory address itself carries the routing information (the target Bus, Device, Function, and Register).
This modern memory-mapped approach completely revolutionized configuration:
- Thread Safety: It transforms configuration access into a single-step process. A single memory read or write instruction to the specified address automatically generates the Configuration Request, eliminating the race-condition vulnerabilities of the two-step legacy IO method.
- Full Access: It seamlessly provides access to the full 4KB PCIe Extended Configuration Space, enabling advanced PCIe capabilities.
- Negligible Cost: While dedicating 256MB of memory space would have been impossible in the early days of PCI, modern architectures support massive 36-bit to 48-bit physical memory address spaces, making a 256MB allocation completely insignificant to overall system resources.
