How the OS discovers every bus, device, and function from power-on — the depth-first scan algorithm, BDF assignment, what happens when a device is missing or not ready, BAR sizing and allocation, bridge window programming, interrupt routing, and how enumeration adapts for Gen 6 topologies.
When a computer powers on, the PCIe fabric is a mystery. The processor knows nothing about what is plugged in, how many buses exist, where each device sits, or what address ranges each device needs. Enumeration is the process that resolves all of this — systematically walking the fabric, discovering every device, and configuring it for use.
Enumeration has four concrete outcomes:
Enumeration runs twice in a typical system boot:
Only the Root Complex can initiate configuration transactions — no PCIe device can read or modify another device’s configuration space. This restriction prevents misbehaving devices from corrupting the system configuration during or after enumeration.
Every PCIe function is uniquely identified by its BDF — a three-part address encoded in configuration TLPs. The full BDF is 16 bits:
| BDF component | Bits | Range | Assigned by | PCIe constraint |
|---|---|---|---|---|
| Bus | 8 | 0–255 | Software (enumeration) | Bus 0 = RC internal. Each bridge creates one new bus. Max 256 buses total per hierarchy. |
| Device | 5 | 0–31 | Hardware (strapped) | External PCIe links always attach to Device 0. Devices 1–31 only used on RC/switch internal virtual buses. |
| Function | 3 | 0–7 | Hardware (designed in) | Function 0 must always be implemented. Functions 1–7 are optional. Non-sequential function numbers are allowed. |
The fundamental probe operation is a 32-bit configuration read to offset 00h (Vendor ID + Device ID) of the target BDF. A single read returns both registers simultaneously. Software examines the Vendor ID field (bits [15:0]) for two cases:
Software reads one BDF at a time — Bus 0, Device 0, Function 0 first. Then Bus 0, Device 0, Function 1 (only if Function 0 was found and had the MFD bit set). Then Bus 0, Device 1, Function 0. And so on, following the depth-first rule when a bridge is found.
When a configuration read targets a BDF for which no device exists, the bridge upstream of the target bus returns a Completion with UR (Unsupported Request) status. For backward compatibility with legacy PCI software (which expected all-1s on a timeout), the Root Complex converts this UR into an all-1s data response. The processor reads 0xFFFF_FFFF from the Configuration Data Port or ECAM address.
Software checks Vendor ID bits [15:0]. If they equal FFFFh, the device does not exist and software moves to the next probe address. This approach never generates a system error — UR responses during enumeration are expected and are silently absorbed by the Root Complex.
A device that is present but not yet ready to handle configuration accesses (e.g. it is still loading firmware from SPI flash) returns a Completion with Configuration Request Retry Status (CRS). CRS is a special completion status that only exists for configuration TLPs — it is illegal in response to memory or I/O requests.
The Root Control register has a CRS Software Visibility Enable bit. When this bit is set:
CRS is legal only within the first second (1.0s) after reset deassertion. A device that still returns CRS after 1 second is considered non-functional. For Gen 3 and above, the additional constraint is that software must wait at least 100ms after link training completes (not just after reset) before sending the first configuration read — because Gen 3 link equalization can take up to 50ms on its own.
Once a valid Vendor ID is confirmed, software reads offset 0Ch bits [22:16] — the Header Type register. This single byte tells software how to interpret the rest of the configuration space and whether to descend further into the topology:
A device with bit 7 of the Header Type register set to 1 implements more than one function. Software must probe all eight possible function numbers (0–7) to discover which are present. Multi-function devices do not need to implement functions sequentially — a device might implement only Functions 0, 2, and 5, leaving 1, 3, 4, 6, and 7 absent (probes return FFFFh).
Common multi-function devices include:
The enumeration algorithm is strictly depth-first: whenever a bridge is found, the algorithm immediately descends to the new bus before continuing to scan the rest of the current bus. This approach ensures that bus numbers are assigned in a predictable, consistent order and that the Subordinate Bus Number can be correctly determined through backtracking.
The following captures the complete depth-first scan logic. It runs recursively — whenever a bridge is found, scan_bus is called on the new bus before the current loop continues:
next_bus = 1
function scan_bus(bus_num):
for device in 0..31:
vendor_id = config_read(bus_num, device, 0, offset=0)
if vendor_id[15:0] == 0xFFFF: continue // not present
header_type = config_read(bus_num, device, 0, offset=0x0E)
mfd = header_type[7] // multi-function bit
fn_list = [0] + (1..7 if mfd else [])
for fn in fn_list:
if fn > 0 and config_read(bus, device, fn, 0)[15:0] == 0xFFFF: continue
htype = config_read(bus_num, device, fn, 0x0E)[6:0]
if htype == 1: // bridge
sec_bus = next_bus; next_bus += 1
config_write(bus_num, device, fn, Pri=bus_num, Sec=sec_bus, Sub=0xFF)
max_bus = scan_bus(sec_bus) // RECURSE
config_write(bus_num, device, fn, Sub=max_bus) // fix Sub
else: // endpoint
allocate_bars(bus_num, device, fn)
configure_interrupts(bus_num, device, fn)
return next_bus - 1 // highest bus we assigned downstream
scan_bus(0) // start from Bus 0
| Register | Written when | Value | Updated when |
|---|---|---|---|
| Primary Bus Number | Bridge first found | Bus number of the bus the bridge sits on | Never (permanent) |
| Secondary Bus Number | Bridge first found | Next available bus number (next_bus) | Never (permanent) |
| Subordinate Bus Number | Bridge first found (placeholder) | FFh (maximum — assume all buses downstream) | After recursion completes — set to actual highest bus found downstream |
Writing Subordinate = FFh immediately when a bridge is found is essential — it makes the bridge temporarily forward all type-1 configuration TLPs downstream for any bus number. Without this, configuration reads to the newly-discovered downstream bus would not route correctly, because intermediate bridges would not know that those bus numbers are within their range.
A concrete step-by-step trace through the depth-first algorithm for a simple topology: Root Complex with one downstream bridge (Bridge A), which connects to an endpoint and one more bridge (Bridge B) which connects to an endpoint.
After an endpoint is found and its type is confirmed (Header Type = 0), software allocates address space for each of its BARs. The sizing procedure is:
After all endpoint BARs in a downstream subtree are allocated, software programs each bridge’s address windows to cover exactly the ranges used by all downstream devices. The bridge’s three window registers must be set to the tightest fit possible:
The granularity mismatch is important: if a downstream endpoint needs only 4 KB of NP-MMIO, the bridge must still open a 1 MB window. The unused portion of the window is address space that cannot be assigned to devices on other buses — it is wasted (though not accessible). This is why BIOS allocation order matters: allocating large devices first, then small devices within the same window, minimises wasted space.
Configuring BARs and windows is not enough — devices must be explicitly enabled before they respond to memory or I/O accesses, and before they can initiate DMA. The Command register at offset 04h has three relevant bits that must be set in a specific order:
| Step | Register / Bit | Action | Effect |
|---|---|---|---|
| 1 | Command bit 1 — Memory Space Enable | Set to 1 after BAR programming | Device now responds to memory TLPs targeting its BAR address range. Driver MMIO accesses work. |
| 2 | Command bit 0 — I/O Space Enable | Set to 1 if device has I/O BARs | Device responds to I/O port accesses targeting its I/O BAR. Only needed for legacy devices. |
| 3 | Command bit 2 — Bus Master Enable | Set to 1 after IOMMU configuration | Device can now initiate DMA (MRd/MWr TLPs as Requester). Must configure IOMMU first — if enabled, program IOMMU page tables before setting this bit. DMA is impossible without this bit. |
| 4 | Device Control bit 10 — Interrupt Disable | Set to 1 before enabling MSI/MSI-X | Disables legacy INTx. Must be done before enabling MSI or MSI-X to prevent both firing simultaneously. |
| 5 | MSI or MSI-X Enable bit | Set to 1 after programming message address and data | Device uses MSI/MSI-X for interrupts. Driver loads and interrupt handling begins. |
Bridges also need the Command register set. Specifically, bridges need Memory Space Enable (bit 1) set to forward memory TLPs through their windows, and Bus Master Enable (bit 2) set to allow them to forward configuration TLPs they themselves initiate. If a bridge’s Memory Space Enable is 0, it will not forward any memory TLPs downstream — devices below it will be completely unreachable.
After BARs are allocated and devices can be accessed, interrupt routing must be configured. Modern systems prefer MSI-X over INTx for every device that supports it.
If neither MSI nor MSI-X is present, the driver uses legacy INTx. The Interrupt Pin register (offset 3Ch [15:8]) declares which virtual pin (INTA#–INTD#) the device uses. BIOS programs the Interrupt Line register with the IRQ number. Each PCIe bridge may apply an interrupt swizzle (rotating INTA→INTB→INTC→INTD) to prevent all devices on a multi-device bus from sharing the same IRQ.
| Property | BIOS/UEFI enumeration (POST) | OS PCI bus driver |
|---|---|---|
| Goal | Get boot devices working — storage, video, network | Enumerate all devices, load drivers, enable IOMMU |
| BAR strategy | Conservative — small allocations, may leave some BARs unallocated | Full allocation — resizes BARs, enables ReBAR (Resizable BAR) |
| Interrupts | Legacy INTx or BIOS-assigned MSI. No MSI-X. | MSI-X preferred, per-vector CPU affinity, NUMA-aware IRQ assignment |
| IOMMU | Usually disabled during POST | IOMMU enabled before Bus Master Enable bit is set |
| Error handling | Minimal — UR responses silently absorbed | AER configured, error handlers registered |
| Speed | Must complete in < few seconds | Async — driver loading happens after kernel is up |
| Bus numbers | Assigned. OS generally preserves these. | May rescan but usually inherits BIOS bus numbers |
| Bridge windows | May leave extra space for hot-plug | May rebalance after all devices discovered |
The enumeration algorithm — depth-first BDF assignment, BAR sizing, bridge window programming, Command register sequencing — is completely unchanged in Gen 6. Configuration TLPs still use the same BDF addressing. The Vendor ID probe still returns FFFFh for absent devices. The Header Type register still distinguishes bridges from endpoints.
What is different about Gen 6 enumeration in practice:
| Aspect | Change for Gen 6 deployments |
|---|---|
| Waiting before first config read | Gen 6 equalization at 64 GT/s PAM4 may take longer than Gen 3 equalization. Software must still wait 100ms after link training completes — the wait may be slightly longer before the link is stable, but the 100ms rule remains. |
| BAR sizes | AI accelerators with 80–512 GB of HBM device memory require 64-bit prefetchable BARs of matching size. Enumeration software and BIOS must have Resizable BAR (ReBAR) support to expose full device memory. BIOS memory maps above 4 GB must accommodate these enormous BARs. |
| Bus number allocation | Unchanged. 256 buses maximum. Gen 6 systems with massive scale-out (hundreds of accelerators) may approach the bus number limit — requiring careful topology design with fewer bridge layers. |
| CXL devices | CXL.io devices appear as standard PCIe endpoints to the enumeration algorithm — same Vendor/Device ID probe, same BAR allocation. The CXL Alternate Protocol capability (Extended Cap 002Bh) is discovered during capability walking, not during bus scanning. |
| Flit-mode enumeration | Configuration TLPs always use the standard PCIe format regardless of whether the link is in flit mode for data traffic. Flit mode is negotiated after L0 is reached and the link is already functional — enumeration reads config space normally before flit mode is negotiated. |
| IDE / SPDM | After enumeration, software may run SPDM device attestation before setting Bus Master Enable — verifying device identity before granting DMA access. This adds a pre-BusMaster step not present in earlier generations. |
| Item | Rule / Value |
|---|---|
| Enumeration initiator | Root Complex only — no PCIe device can originate config TLPs to other devices |
| Bus 0 | Always assigned to Root Complex internal bus before enumeration starts |
| Presence probe | 32-bit config read to offset 00h. Vendor ID = FFFFh → absent (or UR returned). Vendor ID = 0001h → present but not ready (CRS). |
| CRS timeout | Device must be ready within 1.0s of reset deassertion. For Gen 3+, wait 100ms after link training completes before first config read. |
| Header Type [6:0] | 0 = Type 0 / endpoint (6 BARs, no bus number registers). 1 = Type 1 / bridge (2 BARs, bus number registers, windows). |
| Multi-function bit | Header Type bit 7. If 1, probe all Functions 0–7. If 0, only Function 0 is present — skip the other 7 probes. |
| Depth-first rule | When a bridge is found, assign its Secondary Bus, write Sub=FFh, then recurse into the new bus before continuing to scan remaining devices on the current bus. |
| Sub=FFh placeholder | Must be written immediately when a bridge is found, before descending. Allows config TLPs to any downstream bus to route through this bridge temporarily. |
| Subordinate Bus update | After recursion completes, write the actual highest bus number assigned downstream. This permanently closes the window at the correct boundary. |
| BAR allocation step 1 | Clear Memory Space Enable and I/O Space Enable in Command register |
| BAR sizing | Write 0xFFFFFFFF to BAR. Read back. Mask type bits. Lowest 1-bit position N → size = 2^N bytes. |
| 64-bit BAR pair | Types [2:1] = 10b → next consecutive BAR is the upper 32 bits. Write base address to both. Software skips the upper BAR in the loop. |
| Unimplemented BAR | Reads all zeros after write 0xFFFFFFFF. Software skips allocation and moves to next slot. |
| Three allocation pools | I/O (avoid), NP-MMIO (32-bit, reads have side effects), P-MMIO (64-bit capable, no read side effects). Each BAR’s type bits determine which pool. |
| Bridge window disable | Set Limit < Base for any unused window type. Never leave uninitialized — could forward unintended addresses. |
| Bridge window granularity | NP-MMIO and P-MMIO: 1 MB minimum. I/O: 4 KB minimum. |
| Enable sequence | Memory Space Enable (1) → I/O Space Enable if needed (2) → IOMMU config → Bus Master Enable (3) → Interrupt Disable (4) → MSI/MSI-X Enable (5) |
| Error reporting during enum | Keep all error reporting disabled (Device Control bits [3:0] = 0) until device driver loads and registers error handlers. |
| Gen 6 changes | Algorithm unchanged. BAR sizes can reach hundreds of GB for AI accelerators (ReBAR required). CXL devices enumerate as standard endpoints. Flit mode is negotiated after enumeration. SPDM attestation may precede Bus Master Enable. |