Why configuration space had to grow beyond 256 bytes, how ECAM maps the full 4 KB per function into memory, the 32-bit extended capability header format, and every significant extended capability structure — AER, Virtual Channel, PASID, LTR, L1 Sub-States, and more.
The original PCI configuration space was 256 bytes per function. The first 64 bytes are the standard header. The remaining 192 bytes hold PCI capability structures — PM, MSI, the PCIe Capability. By the time PCIe added mandatory capabilities (PM, MSI, MSI-X, PCIe Capability, each 8–24 bytes) and then AER, VC, ATS, PASID, and dozens of other optional structures, 192 bytes became a tight constraint.
Two problems drove the expansion. First, there was simply no room for new capability structures as PCIe evolved through generations. Second, the legacy PCI IO-indirect access mechanism (writing to CF8h then reading/writing CFCh) was not multi-thread safe — two CPU threads accessing configuration space simultaneously could corrupt each other’s accesses.
PCIe solves both problems by extending configuration space to 4 KB per function and providing a new memory-mapped access mechanism called ECAM. The first 256 bytes remain backward compatible with legacy PCI software. The remaining 3840 bytes (offsets 100h–FFFh) form the Extended Configuration Space — a new region accessible only via ECAM.
ECAM (Enhanced Configuration Access Mechanism) maps the entire 4 KB configuration space of every function as a flat memory-mapped region. A single 32-bit or 64-bit memory read or write to the correct address generates exactly one configuration read or write — atomically, without the two-step CF8/CFC dance.
ECAM requires a 256 MB base address region in physical memory space. This region is partitioned per bus/device/function. The base address of the ECAM region is advertised to the OS through the ACPI MCFG table (Memory Mapped Configuration table). Software maps this region and then accesses any function’s configuration space via ordinary memory operations.
Every ECAM access targets exactly one function at exactly one register offset. The address is constructed from four components:
Example: access offset 108h (AER Uncorrectable Error Status) in Bus 2, Device 0, Function 0 with ECAM_BASE = 0xE0000000:
Address = 0xE0000000 + (2 × 0x100000) + (0 × 0x8000) + (0 × 0x1000) + 0x108 = 0xE0200108
A 32-bit read to 0xE0200108 returns the AER Uncorrectable Error Status register directly — no two-step IO dance needed.
Extended capabilities use a different header format from the PCI capabilities in the 00h–FFh region. Instead of a 2-byte header (ID + Next Pointer), each extended capability starts with a 32-bit header that packs three fields together:
The 12-bit Next Capability Offset allows pointing anywhere in the 4 KB space (offsets 000h–FFFh, DWORD-aligned). The first extended capability always starts at offset 100h. If Next Offset = 000h, the list ends. Note that 000h is used as the end-of-list sentinel precisely because offset 000h is the Vendor ID register in the PCI-compatible header — no extended capability can ever be at offset 0, so 000h is unambiguous as “no next entry.”
Software walks the extended capability list starting at offset 100h:
/sys/bus/pci/devices/<BDF>/config provides. On Windows, the PCIe Bus Driver uses MCFG-mapped ECAM. Hardware debuggers and BIOS use the raw ECAM memory region.
AER is the most important extended capability structure. It provides detailed, standardised error logging and reporting beyond what the basic PCI Status register offers. While the PCI Status register can only tell you “a fatal error occurred,” AER tells you exactly which error type, which TLP caused it (via the Header Log), which packet prefix caused it (TLP Prefix Log), and whether the error was the first in a sequence. AER is implemented by all PCIe-native endpoints and bridges.
| Bit | Error type | Default Severity | Description |
|---|---|---|---|
| 4 | Data Link Protocol Error | Fatal | Received DLLP with bad CRC or out of sequence. Fatal — link is unreliable. |
| 5 | Surprise Down Error | Fatal | Link went down unexpectedly (DL_Down without EIOS). Fatal — link lost. |
| 12 | Poisoned TLP Received | Non-Fatal | Received a TLP with EP (Error Poison) bit set. |
| 13 | Flow Control Protocol Error | Fatal | Flow control credits exceeded or invalid FC DLLP received. |
| 14 | Completion Timeout | Non-Fatal | Sent a non-posted request but the completion never arrived within timeout period. |
| 15 | Completer Abort | Non-Fatal | Received a completion with Completer Abort status (CA). |
| 16 | Unexpected Completion | Non-Fatal | Received a completion that doesn’t match any outstanding request tag. |
| 17 | Receiver Overflow | Fatal | Received TLP that overflowed the receive buffer (flow control violated). |
| 18 | Malformed TLP | Fatal | Received a TLP that violated formatting rules (bad length, mismatched fields, etc.). |
| 19 | ECRC Error | Non-Fatal | ECRC check failed — data was corrupted in flight. (Only if ECRC enabled.) |
| 20 | Unsupported Request Error | Non-Fatal | Received a request that the device does not support (UR status returned). |
| 21 | ACS Violation | Non-Fatal | TLP violated ACS (Access Control Services) policy at a switch port. |
| Bit | Error type | Description |
|---|---|---|
| 0 | Receiver Error | 8b/10b code violation, disparity error, or 128b/130b sync header error at the physical layer. |
| 6 | Bad TLP | TLP received with LCRC error or sequence number error — replayed automatically by DLL. |
| 7 | Bad DLLP | DLLP received with CRC error. |
| 8 | Replay Number Rollover | REPLAY_NUM counter reached 4 — link integrity suspect (four consecutive failed replays). |
| 12 | Replay Timer Timeout | ACK not received before REPLAY_TIMER expired — ACK latency too high for this configuration. |
| 13 | Advisory Non-Fatal Error | An uncorrectable error was downgraded to Non-Fatal advisory (e.g. UR with advisory bit). |
| Bits | Field | Description |
|---|---|---|
| [4:0] | First Error Pointer | Read-only. Bit position of the first uncorrectable error that set a bit in Uncorrectable Error Status. Identifies which error was logged first in the Header Log when multiple errors hit simultaneously. |
| 5 | ECRC Generation Capable | Read-only. Device supports generating ECRC (End-to-End CRC) on outgoing TLPs. |
| 6 | ECRC Generation Enable | Read/Write. When 1, device adds ECRC to all outgoing TLPs (sets TD bit in header). |
| 7 | ECRC Check Capable | Read-only. Device can verify ECRC on incoming TLPs. |
| 8 | ECRC Check Enable | Read/Write. When 1, device checks ECRC on incoming TLPs and reports failures. |
The Virtual Channel capability structure configures which Traffic Classes (TC0–TC7) map to which Virtual Channels (VC0–VC7) and controls VC arbitration. VC0 is always present and always assigned to TC0. Additional VCs (VC1–VC7) are optional and allow QoS-critical traffic to travel through dedicated buffers that cannot be blocked by lower-priority traffic.
| Register group | Offset from cap start | Key fields |
|---|---|---|
| VC Capability Register 1 | +04h | Extended VC Count (how many VCs beyond VC0), Low Priority Extended VC Count, Reference Clock, Port Arbitration Table Entry Size |
| VC Capability Register 2 | +08h | VC Arbitration Capability — bitmask of supported arbitration schemes (Round Robin, WRR, Time-Based WRR, etc.) |
| VC Control and Status | +0Ch | VC Arbitration Select — chooses current arbitration scheme. VC Load VC Arbitration Table — triggers loading of the arbitration table. |
| VC0 Resource Capability | +10h | Port Arbitration Capability (which arbitration types VC0 supports), Maximum Time Slots |
| VC0 Resource Control | +14h | TC/VC Map [7:0] — bitmask of TCs mapped to VC0. Enable VC0 bit. Port Arbitration Select for VC0. |
| VC0 Resource Status | +18h | Negotiation Pending — when 1, VC0 TC mapping is being re-negotiated. Port Arbitration Table Status. |
| VCn registers (for n=1–7) | repeating | Same structure as VC0 for each additional VC implemented. TC/VC Map controls which TCs flow through this VC. |
The Device Serial Number capability provides a globally unique 64-bit serial number for the device. The serial number is split across two DWs: DW1 holds the lower 32 bits and DW2 holds the upper 32 bits. The format is defined such that the upper 24 bits come from a registered IEEE OUI (Organizationally Unique Identifier) ensuring global uniqueness.
DSN is used by virtualisation hypervisors to uniquely identify SR-IOV physical functions across reboots, by hot-plug systems to distinguish replacing a card with the same PCI Device ID, and by Thunderbolt-over-USB4 to verify trusted device chains. In PCIe 4.0 and later, the DSN also forms part of the SPDM (Security Protocol and Data Model) device authentication chain.
| DW | Offset from cap start | Content |
|---|---|---|
| DW0 | +00h | Extended Capability Header (ID=0003h) |
| DW1 | +04h | Serial Number [31:0] — lower 32 bits of 64-bit unique number |
| DW2 | +08h | Serial Number [63:32] — upper 32 bits, includes IEEE OUI in bits [63:40] |
ATS enables a PCIe device to cache address translations from an IOMMU. Instead of every DMA requiring an IOMMU lookup, the device requests a translation (getting a Translated address back), caches it locally, and then marks subsequent DMA TLPs with AT=10b (Translated) — bypassing the IOMMU lookup and reducing latency. This is essential for SR-IOV and virtual machine DMA performance.
| Register | Offset | Key bits |
|---|---|---|
| ATS Capability | +04h | Invalidate Queue Depth [4:0] — max outstanding invalidate requests device can queue. Page Aligned Request bit — device only sends aligned Translation Requests. |
| ATS Control | +06h | Enable [15] — when 1, device may issue Translation Requests and use AT=10b in DMA TLPs. STU [4:0] — Smallest Translation Unit (smallest region device will request a translation for). |
ATS works in conjunction with the AT field in TLP headers (PCIe-11). Without ATS, devices use AT=00b (Untranslated — every DMA goes through IOMMU). With ATS enabled and a cached translation, DMA TLPs carry AT=10b and bypass the IOMMU table walk. The IOMMU can invalidate cached translations via an Invalidation Request TLP when mappings change.
PASID allows a PCIe device to tag its DMA transactions with a Process Address Space Identifier — effectively telling the IOMMU “this DMA should be translated using process P’s address space, not just the VM’s address space.” This enables per-process IOMMU isolation, critical for shared GPU compute (CUDA/OpenCL) and SR-IOV environments where multiple processes share one physical device.
Without PASID, all DMA from a device is translated using a single address space (the VM or driver context). With PASID, each DMA TLP carries a 20-bit PASID value that the IOMMU uses to select the correct page table — allowing true per-process memory isolation even when multiple processes share the same physical function.
| Register | Offset | Key bits |
|---|---|---|
| PASID Capability | +04h | Execute Permission Supported [1] — device can tag requests with execute permission intent. Privileged Mode Supported [2] — device can indicate privileged-mode DMA. Max PASID Width [12:8] — highest PASID bit the device supports (0–19, so 1–20 bits total). |
| PASID Control | +06h | PASID Enable [0] — when 1, device may include PASID TLP Prefix in its DMA TLPs. Execute Permission Enable [1] · Privileged Mode Enable [2]. |
LTR allows a device to report how much service latency it can tolerate before performance degrades or data is lost. The Root Complex uses this information to make intelligent decisions about when to service requests — for example, keeping memory in self-refresh longer if all attached devices advertise high latency tolerance, saving significant platform power.
LTR is sent as a Message TLP (Point-to-Point, routed to Root Complex). Switches aggregate LTR messages from all downstream ports, reporting the minimum (most demanding) latency upstream. The Root Complex is not required to honour LTR requests, but is strongly encouraged to. Devices should send an updated LTR message whenever their service requirements change and must send one with Requirement=0 before entering a low-power state.
| Register | Offset | Key bits |
|---|---|---|
| LTR Capability | +04h | No control fields — capability just signals support. Enabled via LTR Mechanism Enable in PCIe Capability Device Control 2 register. |
| Max Snoop Latency | +04h (some variants) | Maximum Snoop Latency Value [9:0] · Scale [12:10] — maximum LTR value a Root Port will accept upstream before treating the message as advisory only. |
| Max No-Snoop Latency | +06h | Same encoding as Max Snoop Latency for no-snoop path. |
The L1 PM Sub-States capability adds additional power levels within the L1 ASPM link state. Standard L1 requires both the upstream and downstream component to be in L1 simultaneously. L1 PM Sub-States introduce four refined sub-states that allow even more aggressive power savings while keeping the entry/exit latency predictable.
| Register | Offset | Key bits |
|---|---|---|
| L1SS Capabilities | +04h | L1.1 ASPM Supported [0] · L1.2 ASPM Supported [1] · L1.1 PM Supported [2] · L1.2 PM Supported [3]. LTR_L1.2_THRESHOLD [31:16] — minimum LTR value required to allow L1.2 entry. |
| L1SS Control 1 | +08h | L1.1 ASPM Enable [0] · L1.2 ASPM Enable [1] · L1.1 PM Enable [2] · L1.2 PM Enable [3]. Common_Mode_Restore_Time [15:8] — time needed to restore common mode on exit. LTR_L1.2_THRESHOLD [31:16] — programmed threshold for L1.2 eligibility. |
| L1SS Control 2 | +0Ch | T_POWER_ON [4:0] — time needed to power on the PLL after L1.2 exit. T_POWER_ON Scale [6:5] — units for T_POWER_ON (2µs, 10µs, 100µs). |
ACS prevents a PCIe device from bypassing the IOMMU by sending DMA directly to another PCIe device (peer-to-peer DMA). Without ACS, a compromised SR-IOV Virtual Function could DMA directly to another VF’s memory — bypassing all OS memory protection. ACS enforces that all DMA passes through the Root Complex and IOMMU, even when the source and destination are on the same switch.
| ACS feature bit | When enabled |
|---|---|
| ACS Source Validation (SV) | Switch verifies that the Requester ID in incoming upstream TLPs matches a valid downstream port. Prevents spoofed requester IDs. |
| ACS Translation Blocking (TB) | Switch/root port rejects TLPs with AT≠00b (non-default AT field) unless they came from a trusted IOMMU. Prevents devices from sending translated addresses that bypass IOMMU. |
| ACS P2P Request Redirect (RR) | All peer-to-peer memory requests are redirected upstream to the Root Complex (and IOMMU) instead of being forwarded directly downstream to the target device. |
| ACS P2P Completion Redirect (CR) | Completions resulting from redirected P2P requests are also sent upstream through the Root Complex. |
| ACS Upstream Forwarding (UF) | TLPs received on a downstream port that have a Requester ID not matching the port’s downstream device are forwarded upstream. Used in multicast and broadcasting scenarios. |
| ACS P2P Egress Control (EC) | Fine-grained control of which downstream ports may receive peer-to-peer TLPs based on a bitmask table. Allows P2P between trusted ports but not untrusted ones. |
| ACS Direct Translated P2P (DT) | Allows translated (AT=10b) P2P requests between trusted ports — used for high-performance GPU-to-GPU DMA via IOMMU-managed translations. |
VSEC allows vendors to expose proprietary extended registers in the 100h–FFFh region in a format that is self-describing and discoverable via the standard linked list. Software can skip VSEC structures it doesn’t recognise without corrupting the chain. Any number of VSEC structures may be present.
| Field | Offset | Content |
|---|---|---|
| Extended Cap Header | +00h | Cap ID = 000Bh · Version · Next Offset |
| VSEC Header | +04h | VSEC ID [15:0] — vendor-assigned ID for this specific VSEC structure. VSEC Rev [19:16] — version. VSEC Length [31:20] — total byte length of this VSEC structure including the 8-byte header. |
| VSEC Registers | +08h onwards | Vendor-defined. Size = VSEC Length − 8 bytes. |
Common uses of VSEC: proprietary debug/trace registers, firmware update interfaces, thermal monitoring, hardware performance counters, vendor-specific link health statistics, and manufacturing test access ports. Examples: Intel’s VSEC for PCIe Performance Monitoring Units (PMU), NVIDIA’s VSEC for GPU health status and throttle reasons.
All existing extended capability structures are completely unchanged in Gen 6. AER, VC, DSN, ATS, PASID, LTR, L1SS, ACS, VSEC — their formats, register offsets, and software interfaces work identically across all PCIe generations from Gen 1 to Gen 6.
Gen 6 adds new extended capability IDs for its new features:
| Extended Cap ID | Structure | Gen 6 purpose |
|---|---|---|
| 002Ch | Physical Layer 64.0 GT/s Capability | Reports Gen 6 PAM4 equalization status, FEC capability and enable, flit mode negotiation status, lane margining at Gen 6 speeds. Equivalent to the Gen 3 (002Bh), Gen 4 (0026h), Gen 5 (002Ah) PHY capability structures but for 64 GT/s. |
| 002Bh | Alternate Protocol | Negotiates CXL protocol over PCIe 6.0 physical link. CXL 3.0 uses PCIe 6.0 PHY and flit mode but adds cache coherence protocols. This capability identifies whether a link is operating as CXL.io, CXL.cache, or CXL.mem in addition to or instead of standard PCIe. |
| 0033h | SPDM (Security Protocol and Data Model) | Device identity and attestation using PCIe DOE (Data Object Exchange) and SPDM. Critical for secure AI accelerator deployment — allows host to verify firmware integrity and device authenticity before trusting it with sensitive model weights or user data. |
| 0034h | IDE (Integrity and Data Encryption) | PCIe IDE extends SPDM to provide TLP-level encryption and integrity protection. Protects PCIe traffic from physical tapping on the PCB. Critical for Gen 6 AI infrastructure where model IP protection is required. |
| Item | Value / Rule |
|---|---|
| Extended Config Space location | Offsets 100h–FFFh — 3840 bytes (960 DWs) per function |
| Access mechanism | ECAM (MMCFG) only — CF8/CFC cannot reach 100h+ |
| ECAM region size | 256 MB total — 256 buses × 32 devices × 8 functions × 4 KB |
| ECAM address formula | ECAM_BASE + (Bus×1MB) + (Dev×32KB) + (Fn×4KB) + Offset |
| ECAM base advertisement | ACPI MCFG table — consumed by OS during boot |
| Extended cap header | 32-bit DW: Cap ID [15:0] · Cap Version [19:16] · Next Offset [31:20] |
| First extended cap offset | Always 100h — first DWORD at 100h is the first extended cap header |
| End-of-list sentinel | Next Offset = 000h (points to Vendor ID register — never a valid cap location) |
| AER Cap ID | 0001h — Uncorrectable Error Status/Mask/Severity + Correctable Status/Mask + AECR + Header Log (4 DWs) + Root Error registers (RC only) |
| AER ECRC fields in AECR | Bits 5–8: ECRC Gen Capable, Gen Enable, Check Capable, Check Enable |
| AER First Error Pointer | AECR bits [4:0] — bit position of first uncorrectable error type that fired |
| AER Header Log | 16 bytes (4 DWs at 11Ch–12Bh) — captures TLP header that caused first uncorrectable error |
| VC Cap ID | 0002h — TC/VC mapping, VC arbitration. VC0+TC0 always linked. VC1–7 optional. |
| DSN Cap ID | 0003h — 64-bit globally unique serial number. Upper 24 bits = IEEE OUI. |
| ATS Cap ID | 000Fh — enables IOMMU translation caching. Device marks DMA with AT=10b after caching translation. |
| ACS Cap ID | 000Dh — prevents P2P DMA bypassing IOMMU. Required for SR-IOV security. Linux VFIO checks ACS before device assignment. |
| PASID Cap ID | 001Bh — 20-bit Process Address Space ID tags DMA with per-process IOMMU context. Required for per-process isolation on shared accelerators. |
| LTR Cap ID | 0018h — device reports snoop+no-snoop latency tolerance. Sent as point-to-point Message TLP to Root Complex. |
| LTR enable path | Device Control 2 register (PCIe Capability DW5) LTR Mechanism Enable bit — must be set before device sends LTR messages. |
| L1SS Cap ID | 001Eh — L1.1/L1.2 ASPM and PM sub-states. L1.2 deepest — PLL off. Requires LTR to know when it’s safe to enter L1.2. |
| VSEC Cap ID | 000Bh — vendor-specific extended registers. VSEC Length [31:20] tells software how many bytes to skip. |
| Gen 6 new cap IDs | 002Ch: PHY 64.0 GT/s · 002Bh: CXL Alternate Protocol · 0033h: SPDM attestation · 0034h: IDE encryption |
| Extended Cap ID | Name | Mandatory? | Key purpose |
|---|---|---|---|
| 0001h | AER | Strongly recommended | Detailed error logging, Header Log, ECRC |
| 0002h | Virtual Channel | Required if multi-VC | TC→VC mapping, VC arbitration |
| 0003h | Device Serial Number | Optional | Globally unique 64-bit ID |
| 0004h | Power Budgeting | Optional | Hot-plug power budgeting |
| 000Bh | VSEC | Optional | Vendor-specific registers |
| 000Dh | ACS | Required for SR-IOV | P2P DMA blocking, IOMMU enforcement |
| 000Fh | ATS | Optional (perf) | IOMMU translation caching |
| 0018h | LTR | Optional | Latency tolerance → platform PM |
| 001Bh | PASID | Required for per-process isolation | Per-process IOMMU context tagging |
| 001Eh | L1 PM Sub-States | Optional | L1.1/L1.2 deeper power states |
| 002Ch | Physical Layer 64.0 GT/s | Required for Gen 6 | PAM4 EQ status, FEC capability |
| 002Bh | Alternate Protocol (CXL) | Required for CXL | CXL protocol negotiation over PCIe 6.0 PHY |
| 0034h | IDE | Optional (security) | TLP-level encryption and integrity |