PCIe Series — PCIe-23: PCIe Enumeration — VLSI Trainers
PCIe Series · PCIe-23

PCIe Enumeration

How the OS discovers every bus, device, and function from power-on — the depth-first scan algorithm, BDF assignment, what happens when a device is missing or not ready, BAR sizing and allocation, bridge window programming, interrupt routing, and how enumeration adapts for Gen 6 topologies.

📋 What Enumeration Does

When a computer powers on, the PCIe fabric is a mystery. The processor knows nothing about what is plugged in, how many buses exist, where each device sits, or what address ranges each device needs. Enumeration is the process that resolves all of this — systematically walking the fabric, discovering every device, and configuring it for use.

Enumeration has four concrete outcomes:

📋 Who Runs Enumeration

Enumeration runs twice in a typical system boot:

Two-Phase Enumeration — BIOS then OS Phase 1 — BIOS / UEFI Firmware Runs in real mode (x86) or early UEFI DXE phase Assigns bus numbers and allocates BARs for boot devices Configures bridge windows. Enables storage, network, video. Result: ACPI tables (MCFG, DSDT, SSDT) describe topology for OS Phase 2 — OS PCI Bus Driver Runs during kernel init (PCI bus scan) Re-scans all buses, re-reads BIOS assignments or replaces them Loads device drivers, configures interrupts (MSI/MSI-X) Linux: /sys/bus/pci. Windows: PCI bus driver in HAL
Figure 1 — Two-phase enumeration. BIOS runs first to get the system to a bootable state — boot device drivers need to work before the OS loads. The OS then re-enumerates to build its own device tree, overriding BIOS assignments where needed, enabling IOMMU, and assigning MSI/MSI-X vectors through its interrupt management infrastructure.

Only the Root Complex can initiate configuration transactions — no PCIe device can read or modify another device’s configuration space. This restriction prevents misbehaving devices from corrupting the system configuration during or after enumeration.

📋 Bus, Device, Function (BDF) Addressing

Every PCIe function is uniquely identified by its BDF — a three-part address encoded in configuration TLPs. The full BDF is 16 bits:

BDF — Bus:Device:Function — 16-bit Address Space Bus Number [15:8] 8 bits → 256 buses (0–255) · Bus 0 always the Root Complex internal bus Device Number [7:3] 5 bits → 32 devices per bus (0–31) Fn# [2:0] 3 bits → 8 functions PCIe constraint: each link has exactly one device (always Device 0). Only the Root Complex and Switches expose multiple virtual devices on a bus. Max 65536 functions per system (256×32×8).
Figure 2 — BDF address space. The 8-bit Bus field gives 256 buses. The 5-bit Device field gives 32 device slots per bus. The 3-bit Function field gives 8 functions per device. In native PCIe, each physical link connects exactly one device (Device 0) — the 32-device slots only matter for Root Complex and switch internal virtual buses.
BDF componentBitsRangeAssigned byPCIe constraint
Bus80–255Software (enumeration)Bus 0 = RC internal. Each bridge creates one new bus. Max 256 buses total per hierarchy.
Device50–31Hardware (strapped)External PCIe links always attach to Device 0. Devices 1–31 only used on RC/switch internal virtual buses.
Function30–7Hardware (designed in)Function 0 must always be implemented. Functions 1–7 are optional. Non-sequential function numbers are allowed.

📋 How a Device is Detected

The fundamental probe operation is a 32-bit configuration read to offset 00h (Vendor ID + Device ID) of the target BDF. A single read returns both registers simultaneously. Software examines the Vendor ID field (bits [15:0]) for two cases:

Software reads one BDF at a time — Bus 0, Device 0, Function 0 first. Then Bus 0, Device 0, Function 1 (only if Function 0 was found and had the MFD bit set). Then Bus 0, Device 1, Function 0. And so on, following the depth-first rule when a bridge is found.

📋 Device Not Present — FFFFh Response

When a configuration read targets a BDF for which no device exists, the bridge upstream of the target bus returns a Completion with UR (Unsupported Request) status. For backward compatibility with legacy PCI software (which expected all-1s on a timeout), the Root Complex converts this UR into an all-1s data response. The processor reads 0xFFFF_FFFF from the Configuration Data Port or ECAM address.

Software checks Vendor ID bits [15:0]. If they equal FFFFh, the device does not exist and software moves to the next probe address. This approach never generates a system error — UR responses during enumeration are expected and are silently absorbed by the Root Complex.

Don’t enable error reporting during enumeration. Device Control register bits 0–3 (correctable, non-fatal, fatal, and UR error reporting enables) must all be 0 during enumeration. If UR reporting were enabled, every probe of an empty slot would trigger an error message — potentially crashing early firmware that cannot handle error interrupts. Error reporting should only be enabled after a device is fully configured and its driver loaded.

📋 Device Not Ready — CRS Response

A device that is present but not yet ready to handle configuration accesses (e.g. it is still loading firmware from SPI flash) returns a Completion with Configuration Request Retry Status (CRS). CRS is a special completion status that only exists for configuration TLPs — it is illegal in response to memory or I/O requests.

The Root Control register has a CRS Software Visibility Enable bit. When this bit is set:

CRS is legal only within the first second (1.0s) after reset deassertion. A device that still returns CRS after 1 second is considered non-functional. For Gen 3 and above, the additional constraint is that software must wait at least 100ms after link training completes (not just after reset) before sending the first configuration read — because Gen 3 link equalization can take up to 50ms on its own.

📋 Bridge or Endpoint — Header Type Check

Once a valid Vendor ID is confirmed, software reads offset 0Ch bits [22:16] — the Header Type register. This single byte tells software how to interpret the rest of the configuration space and whether to descend further into the topology:

Header Type Register — Bits [6:0] Determine Topology Action Type 0 → Endpoint Header Type bits [6:0] = 000_0000b Has 6 BARs. No downstream bus. Allocate BARs and continue scanning current bus Type 1 → Bridge Header Type bits [6:0] = 000_0001b Has 2 BARs + bus number registers. Assign bus numbers, then depth-first into downstream bus Bit 7 = Multi-Function Bit 0 → single function; skip Fn 1–7 probes 1 → multi-function device; probe all Fn 0–7 Always check bit 7 of Function 0 before probing Fn 1–7
Figure 3 — Header Type register determines enumeration action. Type 0 = endpoint, no further descending needed for bus numbers. Type 1 = bridge, assign bus numbers and recurse depth-first. Bit 7 = Multi-Function Device (MFD) flag — only probe Functions 1–7 if bit 7 = 1, saving 7 unnecessary config reads per single-function device.

📋 Multi-Function Devices

A device with bit 7 of the Header Type register set to 1 implements more than one function. Software must probe all eight possible function numbers (0–7) to discover which are present. Multi-function devices do not need to implement functions sequentially — a device might implement only Functions 0, 2, and 5, leaving 1, 3, 4, 6, and 7 absent (probes return FFFFh).

Common multi-function devices include:

Optimisation: skip Functions 1–7 if MFD = 0. Checking MFD bit 7 before probing Functions 1–7 eliminates 7 configuration reads per single-function device. In a system with 16 endpoints, this saves 112 unnecessary config reads during boot — meaningful since each config read requires a PCIe round-trip, and many BIOS enumeration routines run before interrupts are enabled so they must poll for completion.

📋 Depth-First Algorithm

The enumeration algorithm is strictly depth-first: whenever a bridge is found, the algorithm immediately descends to the new bus before continuing to scan the rest of the current bus. This approach ensures that bus numbers are assigned in a predictable, consistent order and that the Subordinate Bus Number can be correctly determined through backtracking.

Depth-First Scan Order — Numbers Show Discovery Sequence Root Complex (Bus 0) ① Bridge A Pri=0 · Sec=1 · Sub=255→3 ⑤ Bridge B Pri=0 · Sec=4 · Sub=255→4 ② Bridge C Pri=1 · Sec=2 · Sub=255→2 ④ Bridge D Pri=1 · Sec=3 · Sub=255→3 ⑥ Endpoint (Bus=4) ③ Endpoint (Bus=2) ⑤a Endpoint (Bus=3) ← backtrack Sub=2 ← backtrack Sub=3
Figure 4 — Depth-first scan order. Circled numbers show the discovery sequence. Bridge A is found first ①. The algorithm immediately descends to Bus 1. Bridge C is found ② and descended to Bus 2. Endpoint ③ has no children — backtrack updates Bridge C’s Subordinate to 2. Continue on Bus 1: Bridge D ④, endpoint ⑤a, backtrack updates D and A. Only then does the algorithm return to Bus 0 and find Bridge B ⑤.

Algorithm pseudocode

The following captures the complete depth-first scan logic. It runs recursively — whenever a bridge is found, scan_bus is called on the new bus before the current loop continues:

next_bus = 1

function scan_bus(bus_num):
  for device in 0..31:
    vendor_id = config_read(bus_num, device, 0, offset=0)
    if vendor_id[15:0] == 0xFFFF: continue // not present
    
    header_type = config_read(bus_num, device, 0, offset=0x0E)
    mfd = header_type[7] // multi-function bit
    fn_list = [0] + (1..7 if mfd else [])
    
    for fn in fn_list:
      if fn > 0 and config_read(bus, device, fn, 0)[15:0] == 0xFFFF: continue
      htype = config_read(bus_num, device, fn, 0x0E)[6:0]       
      if htype == 1: // bridge
        sec_bus = next_bus; next_bus += 1
        config_write(bus_num, device, fn, Pri=bus_num, Sec=sec_bus, Sub=0xFF)
        max_bus = scan_bus(sec_bus) // RECURSE
        config_write(bus_num, device, fn, Sub=max_bus) // fix Sub
      
      else: // endpoint
        allocate_bars(bus_num, device, fn)
        configure_interrupts(bus_num, device, fn)
  
  return next_bus - 1 // highest bus we assigned downstream

scan_bus(0) // start from Bus 0

📋 Bus Number Assignment Rules

RegisterWritten whenValueUpdated when
Primary Bus NumberBridge first foundBus number of the bus the bridge sits onNever (permanent)
Secondary Bus NumberBridge first foundNext available bus number (next_bus)Never (permanent)
Subordinate Bus NumberBridge first found (placeholder)FFh (maximum — assume all buses downstream)After recursion completes — set to actual highest bus found downstream

Writing Subordinate = FFh immediately when a bridge is found is essential — it makes the bridge temporarily forward all type-1 configuration TLPs downstream for any bus number. Without this, configuration reads to the newly-discovered downstream bus would not route correctly, because intermediate bridges would not know that those bus numbers are within their range.

Sub = FFh must be written before descending. Failing to write Subordinate = FFh before recursing into the new bus will cause the bridge to drop configuration TLPs destined for downstream devices — those devices will appear absent. This is a common firmware bug in early bring-up of new SoC designs.

Worked Enumeration Example

A concrete step-by-step trace through the depth-first algorithm for a simple topology: Root Complex with one downstream bridge (Bridge A), which connects to an endpoint and one more bridge (Bridge B) which connects to an endpoint.

Step-by-Step Bus Number Assignment After Step 1 After Step 2 After Step 3 After Step 4 (final) RC Bus=0 Bridge A found P=0·S=1·Sub=FF → descend Bus 1 RC Bus=0 Bridge A P=0·S=1·Sub=FF EP (Bus=1) Bridge B P=1·S=2·Sub=FF → descend Bus 2 RC Bus=0 Bridge A P=0·S=1·Sub=FF EP (Bus=1) Bridge B P=1·S=2·Sub=FF EP (Bus=2) RC Bus=0 Bridge A P=0·S=1·Sub=2 ✓ EP (Bus=1) Bridge B P=1·S=2·Sub=2 ✓ EP (Bus=2)
Figure 5 — Four-step bus number assignment trace. Step 1: Bridge A found, S=1, Sub=FFh placeholder. Step 2: Bus 1 scanned; EP and Bridge B found; Bridge B gets S=2, Sub=FFh. Step 3: Bus 2 scanned; EP found (no bridges). Step 4 (backtrack): Bridge B Sub updated to 2; Bridge A Sub updated to 2 (highest bus under it). Final state is consistent and permanent.

📋 BAR Allocation During Enumeration

After an endpoint is found and its type is confirmed (Header Type = 0), software allocates address space for each of its BARs. The sizing procedure is:

  1. Disable address decoding — clear Memory Space Enable and I/O Space Enable in the Command register. Prevents the device from claiming TLPs during sizing.
  2. Write 0xFFFFFFFF to the BAR. For a 64-bit BAR, also write 0xFFFFFFFF to the next BAR slot.
  3. Read back. The lowest 1-bit in the address field reveals the required alignment and size: size = 2^(position of lowest 1-bit).
  4. Allocate a region of that size from a pool of available address space. Maintain three separate pools: I/O, NP-MMIO, and Prefetchable-MMIO.
  5. Write the base address into the BAR (upper address bits). For 64-bit BARs, write the upper 32 bits into the next BAR slot.
  6. Repeat for all six BARs. Skip unimplemented BARs (read back all zeros).
BAR Allocation — Three Address Space Pools I/O Space Pool Max 64 KB total (16-bit decode) Heavily fragmented in practice Avoid for new PCIe designs Only for legacy device compatibility NP-MMIO Pool Non-prefetchable memory space Must be in 32-bit range (below 4 GB) 1 MB minimum window granularity Control registers, FIFOs, status regs P-MMIO Pool Prefetchable memory — no read side effects Can be 64-bit (anywhere in address space) 1 MB minimum window granularity GPU framebuffers, AI accelerator memory
Figure 6 — Three address space pools maintained by enumeration software. I/O space is avoided for modern devices. NP-MMIO must fit below 4 GB (bridge windows are 32-bit only). P-MMIO can span 64 bits — critical for Gen 6 AI accelerators with tens of gigabytes of device memory that need 64-bit BAR placement above 4 GB.

📋 Bridge Window Programming

After all endpoint BARs in a downstream subtree are allocated, software programs each bridge’s address windows to cover exactly the ranges used by all downstream devices. The bridge’s three window registers must be set to the tightest fit possible:

Windows must be programmed bottom-up. After allocating BARs for all endpoints in a subtree (leaves), software programs the bridge directly above them. Then it moves up to the bridge above that, widening the windows as needed to include all the downstream allocations. This bottom-up programming ensures each bridge’s window correctly covers all devices below it before the bridge above it is programmed.

The granularity mismatch is important: if a downstream endpoint needs only 4 KB of NP-MMIO, the bridge must still open a 1 MB window. The unused portion of the window is address space that cannot be assigned to devices on other buses — it is wasted (though not accessible). This is why BIOS allocation order matters: allocating large devices first, then small devices within the same window, minimises wasted space.

📋 Enabling Devices — Command Register Sequence

Configuring BARs and windows is not enough — devices must be explicitly enabled before they respond to memory or I/O accesses, and before they can initiate DMA. The Command register at offset 04h has three relevant bits that must be set in a specific order:

StepRegister / BitActionEffect
1 Command bit 1 — Memory Space Enable Set to 1 after BAR programming Device now responds to memory TLPs targeting its BAR address range. Driver MMIO accesses work.
2 Command bit 0 — I/O Space Enable Set to 1 if device has I/O BARs Device responds to I/O port accesses targeting its I/O BAR. Only needed for legacy devices.
3 Command bit 2 — Bus Master Enable Set to 1 after IOMMU configuration Device can now initiate DMA (MRd/MWr TLPs as Requester). Must configure IOMMU first — if enabled, program IOMMU page tables before setting this bit. DMA is impossible without this bit.
4 Device Control bit 10 — Interrupt Disable Set to 1 before enabling MSI/MSI-X Disables legacy INTx. Must be done before enabling MSI or MSI-X to prevent both firing simultaneously.
5 MSI or MSI-X Enable bit Set to 1 after programming message address and data Device uses MSI/MSI-X for interrupts. Driver loads and interrupt handling begins.

Bridges also need the Command register set. Specifically, bridges need Memory Space Enable (bit 1) set to forward memory TLPs through their windows, and Bus Master Enable (bit 2) set to allow them to forward configuration TLPs they themselves initiate. If a bridge’s Memory Space Enable is 0, it will not forward any memory TLPs downstream — devices below it will be completely unreachable.

📋 Interrupt Assignment

After BARs are allocated and devices can be accessed, interrupt routing must be configured. Modern systems prefer MSI-X over INTx for every device that supports it.

MSI/MSI-X assignment (preferred)

  1. Walk the capability list to find Cap ID 05h (MSI) or Cap ID 11h (MSI-X).
  2. For MSI: read Multiple Message Capable bits [3:1] to learn how many vectors requested. Allocate a contiguous block of interrupt vectors from the OS interrupt controller. Write Message Address (APIC address) and Message Data (base vector) to the capability structure. Set Multiple Message Enable. Set MSI Enable bit. Set Interrupt Disable in Command register.
  3. For MSI-X: determine Table Size from Message Control bits [10:0]. Allocate one vector per table entry needed. Map the MSI-X Table BAR. Write each table entry’s Address, Data, and Vector Control fields individually. Set MSI-X Enable. Set Interrupt Disable in Command register.

Legacy INTx (fallback)

If neither MSI nor MSI-X is present, the driver uses legacy INTx. The Interrupt Pin register (offset 3Ch [15:8]) declares which virtual pin (INTA#–INTD#) the device uses. BIOS programs the Interrupt Line register with the IRQ number. Each PCIe bridge may apply an interrupt swizzle (rotating INTA→INTB→INTC→INTD) to prevent all devices on a multi-device bus from sharing the same IRQ.

📋 BIOS Enumeration vs OS Enumeration

PropertyBIOS/UEFI enumeration (POST)OS PCI bus driver
GoalGet boot devices working — storage, video, networkEnumerate all devices, load drivers, enable IOMMU
BAR strategyConservative — small allocations, may leave some BARs unallocatedFull allocation — resizes BARs, enables ReBAR (Resizable BAR)
InterruptsLegacy INTx or BIOS-assigned MSI. No MSI-X.MSI-X preferred, per-vector CPU affinity, NUMA-aware IRQ assignment
IOMMUUsually disabled during POSTIOMMU enabled before Bus Master Enable bit is set
Error handlingMinimal — UR responses silently absorbedAER configured, error handlers registered
SpeedMust complete in < few secondsAsync — driver loading happens after kernel is up
Bus numbersAssigned. OS generally preserves these.May rescan but usually inherits BIOS bus numbers
Bridge windowsMay leave extra space for hot-plugMay rebalance after all devices discovered

Enumeration in Gen 6

The enumeration algorithm — depth-first BDF assignment, BAR sizing, bridge window programming, Command register sequencing — is completely unchanged in Gen 6. Configuration TLPs still use the same BDF addressing. The Vendor ID probe still returns FFFFh for absent devices. The Header Type register still distinguishes bridges from endpoints.

What is different about Gen 6 enumeration in practice:

AspectChange for Gen 6 deployments
Waiting before first config readGen 6 equalization at 64 GT/s PAM4 may take longer than Gen 3 equalization. Software must still wait 100ms after link training completes — the wait may be slightly longer before the link is stable, but the 100ms rule remains.
BAR sizesAI accelerators with 80–512 GB of HBM device memory require 64-bit prefetchable BARs of matching size. Enumeration software and BIOS must have Resizable BAR (ReBAR) support to expose full device memory. BIOS memory maps above 4 GB must accommodate these enormous BARs.
Bus number allocationUnchanged. 256 buses maximum. Gen 6 systems with massive scale-out (hundreds of accelerators) may approach the bus number limit — requiring careful topology design with fewer bridge layers.
CXL devicesCXL.io devices appear as standard PCIe endpoints to the enumeration algorithm — same Vendor/Device ID probe, same BAR allocation. The CXL Alternate Protocol capability (Extended Cap 002Bh) is discovered during capability walking, not during bus scanning.
Flit-mode enumerationConfiguration TLPs always use the standard PCIe format regardless of whether the link is in flit mode for data traffic. Flit mode is negotiated after L0 is reached and the link is already functional — enumeration reads config space normally before flit mode is negotiated.
IDE / SPDMAfter enumeration, software may run SPDM device attestation before setting Bus Master Enable — verifying device identity before granting DMA access. This adds a pre-BusMaster step not present in earlier generations.
The invariant that makes PCIe timeless. PCIe’s backward compatibility guarantee means a kernel PCI bus driver written for Gen 1 hardware can enumerate Gen 6 hardware without modification — it will find the correct Vendor ID, read the Header Type, assign bus numbers, size BARs, and set the Command register exactly as before. The Gen 6-specific features (PAM4, FEC, flit mode, SPDM) are all invisible to the configuration-space algorithm and only surface through extended capabilities or link status registers, not through enumeration itself.

📋 Quick Reference

ItemRule / Value
Enumeration initiatorRoot Complex only — no PCIe device can originate config TLPs to other devices
Bus 0Always assigned to Root Complex internal bus before enumeration starts
Presence probe32-bit config read to offset 00h. Vendor ID = FFFFh → absent (or UR returned). Vendor ID = 0001h → present but not ready (CRS).
CRS timeoutDevice must be ready within 1.0s of reset deassertion. For Gen 3+, wait 100ms after link training completes before first config read.
Header Type [6:0]0 = Type 0 / endpoint (6 BARs, no bus number registers). 1 = Type 1 / bridge (2 BARs, bus number registers, windows).
Multi-function bitHeader Type bit 7. If 1, probe all Functions 0–7. If 0, only Function 0 is present — skip the other 7 probes.
Depth-first ruleWhen a bridge is found, assign its Secondary Bus, write Sub=FFh, then recurse into the new bus before continuing to scan remaining devices on the current bus.
Sub=FFh placeholderMust be written immediately when a bridge is found, before descending. Allows config TLPs to any downstream bus to route through this bridge temporarily.
Subordinate Bus updateAfter recursion completes, write the actual highest bus number assigned downstream. This permanently closes the window at the correct boundary.
BAR allocation step 1Clear Memory Space Enable and I/O Space Enable in Command register
BAR sizingWrite 0xFFFFFFFF to BAR. Read back. Mask type bits. Lowest 1-bit position N → size = 2^N bytes.
64-bit BAR pairTypes [2:1] = 10b → next consecutive BAR is the upper 32 bits. Write base address to both. Software skips the upper BAR in the loop.
Unimplemented BARReads all zeros after write 0xFFFFFFFF. Software skips allocation and moves to next slot.
Three allocation poolsI/O (avoid), NP-MMIO (32-bit, reads have side effects), P-MMIO (64-bit capable, no read side effects). Each BAR’s type bits determine which pool.
Bridge window disableSet Limit < Base for any unused window type. Never leave uninitialized — could forward unintended addresses.
Bridge window granularityNP-MMIO and P-MMIO: 1 MB minimum. I/O: 4 KB minimum.
Enable sequenceMemory Space Enable (1) → I/O Space Enable if needed (2) → IOMMU config → Bus Master Enable (3) → Interrupt Disable (4) → MSI/MSI-X Enable (5)
Error reporting during enumKeep all error reporting disabled (Device Control bits [3:0] = 0) until device driver loads and registers error handlers.
Gen 6 changesAlgorithm unchanged. BAR sizes can reach hundreds of GB for AI accelerators (ReBAR required). CXL devices enumerate as standard endpoints. Flit mode is negotiated after enumeration. SPDM attestation may precede Bus Master Enable.
Scroll to Top