The 2-bit Address Type field in every TLP header, why IOMMUs need hardware address translation support, how Address Translation Services negotiate translated addresses, and how the Page Request Interface lets devices handle page faults gracefully. Gen 6 context throughout.
In a standard PCIe system, a device uses physical host memory addresses directly. When it performs a DMA write, the address in the TLP header is a physical address that the Root Complex maps straight to system RAM. This works perfectly for a single operating system with full trust in its devices.
Virtualisation breaks this model. A virtual machine believes it owns contiguous physical memory starting at address 0. In reality that memory is scattered across the real physical address space by the hypervisor. A PCIe device assigned to a virtual machine must be able to use the VM’s view of addresses — called Guest Physical Addresses (GPA) — while the hardware translates them transparently to real Host Physical Addresses (HPA) before they reach memory.
Without an IOMMU, either the hypervisor must reprogramme every device BAR and DMA address every time a VM migrates (expensive and slow), or devices can DMA to arbitrary physical addresses — a severe security hole that lets a malicious VM corrupt any other VM’s memory or the hypervisor itself.
An IOMMU (Input/Output Memory Management Unit) sits in the Root Complex and intercepts every inbound DMA request before it reaches the memory controller. It maintains page tables — similar to the CPU’s MMU page tables — mapping each device’s view of addresses to real physical addresses.
The IOMMU provides three security and isolation guarantees:
The challenge: IOMMU translation adds latency to every DMA access — the IOMMU must walk its page tables before forwarding each TLP. For high-throughput devices like Gen 5/6 GPUs and NVMe arrays, this latency is unacceptable. Address Translation Services (ATS) solves this.
The AT (Address Type) field is a 2-bit field at DW0 bits 11:10 of every memory and AtomicOp TLP header. It tells the Root Complex and any intermediate IOMMU-aware hardware whether the address in this TLP has already been translated by the IOMMU — or whether it still needs translation.
| AT[1:0] | Name | Meaning at the IOMMU / Root Complex | Who sets it |
|---|---|---|---|
| 00 | Default / Untranslated | Normal DMA transaction. The IOMMU must translate this address through its page tables before forwarding to memory. Every DMA access incurs the full table-walk latency. | All devices that have not negotiated ATS. Every PCIe device that does not implement ATS uses AT=00 exclusively. |
| 01 | Translation Request | This TLP is an ATS Translation Request — the device is asking the Root Complex to provide the translated physical address for a given GPA. The RC consults the IOMMU and returns the result. This is not a data access — it is a request for an address mapping. | Devices that implement ATS, during the pre-translation phase before a DMA window is open. |
| 10 | Translated | This TLP carries an address that has already been translated by the IOMMU into a Host Physical Address. The RC and IOMMU may skip the page-table walk and forward directly to the memory controller. The device caches the translated address in its Translation Lookaside Buffer (TLB). | Devices that have previously obtained a translation via AT=01 and are now issuing the actual DMA using the cached HPA. |
| 11 | Reserved | Must not be used. A TLP with AT=11 is treated as a Malformed TLP by the receiver. | Never — reserved by the spec. |
ATS is the mechanism that allows a device to pre-fetch and cache translated addresses from the IOMMU, so that subsequent DMA accesses use AT=10 (already translated) and bypass the per-access page-table walk. The device builds its own Translation Lookaside Buffer (TLB) of GPA→HPA mappings, keyed by GPA range.
An ATS Translation Request is a standard Memory Read Request (MRd) with AT=01. The device sends it to ask the IOMMU (via the RC) to translate a GPA into an HPA. The header format is identical to an MRd — the only distinguishing feature is AT[1:0] = 01 in DW0.
In a normal MRd, Length specifies how many DWs of data to return. In an ATS Translation Request, Length specifies the size of the GPA window the device wants translated — how large a contiguous block of GPA space it wants a mapping for. The IOMMU returns the HPA base of that window along with attributes indicating the access permissions and cache hints.
The Root Complex/IOMMU responds to an ATS Translation Request with a standard Completion with Data (CplD). The payload carries the translated address plus permission and attribute bits. The format of the payload is defined by the ATS capability:
| Completion field | Value |
|---|---|
| Fmt, Type | CplD (Fmt=010, Type=0_1010) |
| Status | SC (success) — translation found. UR — GPA not mapped (page fault, device must handle via PRI). |
| Length | 1 DW per 4 KB of requested translation window |
| Payload DW format | HPA[63:12] in bits [63:12] · U bit (Untranslated Access Only) in bit 2 · Write permission in bit 1 · Read permission in bit 0 |
| Global Invalidate hint | S bit in the response indicates whether the translation is shared or device-specific |
The device stores the returned HPA in its TLB, tagged with the original GPA, its size, and the permission bits. All subsequent DMA accesses to that GPA range use AT=10 with the cached HPA, completely bypassing the IOMMU table walk. The latency saving is significant — a table walk on a modern system takes 50–200 ns; AT=10 bypasses it entirely.
When the IOMMU needs to change a mapping — for example when a VM is migrated, a memory page is reclaimed, or an IOMMU page table entry is updated — it must ensure the device’s TLB no longer holds a stale translation for that address range. The ATS Invalidation mechanism handles this.
The Invalidate Request is carried as an ATS Invalidation Request TLP (a specialised NP TLP type). Invalidation completion is mandatory — the IOMMU must not remap a page until the device confirms its TLB is clean for that range. This prevents a device from issuing an AT=10 DMA to an HPA that the IOMMU has already remapped to a different GPA.
ATS support is declared and negotiated via the ATS Extended Capability structure, which lives in the PCIe Extended Configuration Space (above offset 0x100) of the endpoint. The key registers are:
| Register | Key fields | Purpose |
|---|---|---|
| ATS Capability Register | Invalidate Queue Depth (5 bits) · Page Aligned Request | Declares the maximum number of Invalidate Requests the device can queue simultaneously. A value of 0 means the device can handle any depth. Page Aligned Request bit says the device always requests page-aligned translations. |
| ATS Control Register | Enable bit · STU (Smallest Translation Unit) field | Enable bit: software sets this to activate ATS for this function. STU field: the minimum page size the device supports for translations. Encodings: 0=4 KB, 1=8 KB, 2=16 KB, and so on in powers of 2. |
ATS is only active when the platform enables it. Software discovers the ATS capability by scanning the extended capability chain in config space for Capability ID = 0x000F. Both the endpoint and the platform (RC) must support ATS before the device may issue AT=01 or AT=10 TLPs. Issuing AT≠00 without ATS negotiation is an Access Control Services (ACS) violation.
ATS Translation Requests work well when the target GPA is mapped and the page is already resident in physical memory. But in a virtualised system pages can be paged out to disk, swapped, or simply not yet allocated. When a device sends an AT=01 Translation Request for a GPA that has no current valid mapping, the IOMMU cannot simply return an error — the device needs to wait while the page is brought back into memory.
The Page Request Interface (PRI) is the mechanism that handles this gracefully. Instead of failing the DMA and crashing the device driver, PRI allows the device to signal the OS page manager that a page fault has occurred and to wait for the OS to service it before resuming DMA.
PRI operates through three TLP types, all implemented as standard Message TLPs (Msg) or Non-Posted Requests:
| Message | Direction | Routing | Purpose |
|---|---|---|---|
| Page Request | Device → RC | To Root Complex (routing code 000) | Device signals that it needs a specific GPA range to be made resident in physical memory before it can proceed with DMA. Carries the GPA, requested access type (read/write), and a PRG (Page Request Group) index. |
| Page Response | RC → Device | To Device (ID routing to requester) | OS has serviced the page request. Response codes: Success (page is now resident), Invalid Request (bad address), Response Failure (OS cannot service). Device retries AT=01 on Success. |
| Stop Markers | Device → RC | To Root Complex | Signals the end of a PRI request stream. Used when a device needs to flush all outstanding page requests and wait for all corresponding Page Responses before proceeding. |
A PRG is a group of related page requests that all belong to the same DMA transaction. The device gives each group a PRG index. When all pages in a PRG are resident, the OS sends a single Page Response referencing the PRG index. This allows one OS response to unblock multiple related DMA pages at once rather than responding one page at a time.
The complete PRI sequence for a device performing DMA to a paged-out memory region:
Access Control Services (ACS) is the security layer that prevents devices from abusing the AT field. Without ACS, a malicious device could set AT=10 on a TLP carrying an arbitrary HPA, bypassing the IOMMU entirely and reading or writing any physical memory location.
| ACS function | What it prevents |
|---|---|
| Source Validation | Verifies that the Requester ID of an incoming AT=10 TLP belongs to a device that has been granted translated access. A device cannot use AT=10 without prior ATS negotiation. |
| Translation Blocking | When ACS Translation Blocking is enabled, the downstream port rejects any TLP with AT≠00 from devices that have not completed ATS negotiation. This ensures untrusted devices cannot issue translated addresses. |
| Peer-to-Peer Control | Prevents one PCIe device from issuing DMA directly to another PCIe device’s memory space, bypassing the IOMMU. All peer-to-peer transactions are redirected through the RC so the IOMMU can enforce access control. |
| Upstream Forwarding | Forces all peer-to-peer requests upstream through the RC, ensuring IOMMU visibility on every access regardless of the source or destination device. |
ACS is configured via the ACS Extended Capability structure (Capability ID = 0x000D) in the config space of switches and Root Ports. A switch port with ACS enabled will reject AT=10 TLPs from devices that have not been validated, returning a UR completion to the offending device.
The AT field encoding, ATS protocol, and PRI message format are all unchanged in Gen 6. They are Transaction Layer features — Gen 6 changes only the Physical Layer. ATS Translation Requests and PRI Page Request messages are packed into Gen 6 flits the same way any other TLP is, with no modifications to their header or payload formats.
| Item | Value / Rule |
|---|---|
| AT field location | DW0 bits 11:10 of every Memory and AtomicOp TLP header |
| AT=00 | Default / Untranslated. Normal DMA — IOMMU walks page tables. All devices that do not implement ATS use this exclusively. |
| AT=01 | Translation Request. Device asking IOMMU/RC to translate a GPA into an HPA. Looks like a standard MRd to the fabric but is intercepted by the IOMMU. |
| AT=10 | Translated. Address is already an HPA, obtained via a prior AT=01 exchange. IOMMU may skip page-table walk. Zero translation latency for the DMA access. |
| AT=11 | Reserved. Malformed TLP if received. |
| AT applies to | Memory TLPs (MRd, MWr) and AtomicOps only. Must be 00 for IO, Config, and Message TLPs. |
| ATS purpose | Pre-translate GPA→HPA, cache in device TLB. Subsequent DMA uses AT=10, bypassing IOMMU table walks. Critical for high-bandwidth Gen 6 devices. |
| ATS Translation Request | Standard MRd with AT=01. GPA in the address field. Length field specifies the size of the GPA window to translate. |
| ATS Translation Completion | Standard CplD. Payload carries HPA plus permission bits (read/write). Device stores this in its TLB. |
| ATS Invalidation | IOMMU sends Invalidate Request to device when a page table entry changes. Device flushes TLB entry and sends Invalidate Complete. IOMMU waits before remapping. |
| ATS config space | ATS Extended Capability, Capability ID=0x000F. Enable bit in ATS Control Register. STU field sets minimum translation unit. |
| PRI purpose | Allows device to handle paged-out memory gracefully. Device sends Page Request, OS pages in the data, OS sends Page Response, device retries translation. |
| Page Request | Device → RC. Routing code 000 (to Root). Carries GPA, access type, PRG index. |
| Page Response | RC → Device. ID routing. Response codes: Success / Invalid Request / Response Failure. |
| PRG | Page Request Group — groups related page requests. One Page Response can unblock an entire PRG. |
| ACS | Access Control Services — prevents misuse of AT=10 by unvalidated devices. Source Validation + Translation Blocking. Required for security in multi-tenant deployments. |
| PASID | Process Address Space Identifier — TLP Prefix that tags translations with a per-process identity. Used with ATS in SR-IOV and multi-tenant Gen 6 scenarios. |
| Gen 6 impact | AT field, ATS, and PRI formats are all unchanged. ATS becomes more valuable at Gen 6 speeds. CXL 3.0 and PASID extend the usage model without changing the core protocol. |