PCIe Series — PCIe-11: Address Translation — AT Field, ATS, and PRI — VLSI Trainers
PCIe Series · PCIe-11

Address Translation — AT Field, ATS, and PRI

The 2-bit Address Type field in every TLP header, why IOMMUs need hardware address translation support, how Address Translation Services negotiate translated addresses, and how the Page Request Interface lets devices handle page faults gracefully. Gen 6 context throughout.

📋 The Virtualisation Problem

In a standard PCIe system, a device uses physical host memory addresses directly. When it performs a DMA write, the address in the TLP header is a physical address that the Root Complex maps straight to system RAM. This works perfectly for a single operating system with full trust in its devices.

Virtualisation breaks this model. A virtual machine believes it owns contiguous physical memory starting at address 0. In reality that memory is scattered across the real physical address space by the hypervisor. A PCIe device assigned to a virtual machine must be able to use the VM’s view of addresses — called Guest Physical Addresses (GPA) — while the hardware translates them transparently to real Host Physical Addresses (HPA) before they reach memory.

The Virtualisation Address Problem Virtual Machine Believes it owns RAM at address 0x0000 Guest Physical Address (GPA) PCIe Device Assigned to the VM DMA to GPA 0x1000 → goes into TLP header as-is IOMMU Intercepts DMA address GPA 0x1000 ↓ page table lookup HPA 0x8F3A_1000 Physical RAM Real location: 0x8F3A_1000 not 0x1000
Figure 1 — Virtualisation address gap. The VM and its assigned PCIe device think the memory is at GPA 0x1000. The IOMMU translates that to the real Host Physical Address before the access reaches DRAM. Without IOMMU translation, the device’s DMA would corrupt memory belonging to a different VM or the hypervisor itself.

Without an IOMMU, either the hypervisor must reprogramme every device BAR and DMA address every time a VM migrates (expensive and slow), or devices can DMA to arbitrary physical addresses — a severe security hole that lets a malicious VM corrupt any other VM’s memory or the hypervisor itself.

📋 What an IOMMU Does

An IOMMU (Input/Output Memory Management Unit) sits in the Root Complex and intercepts every inbound DMA request before it reaches the memory controller. It maintains page tables — similar to the CPU’s MMU page tables — mapping each device’s view of addresses to real physical addresses.

The IOMMU provides three security and isolation guarantees:

  1. Isolation — a device assigned to VM-A cannot access memory owned by VM-B or the hypervisor, even if compromised or malicious
  2. Remapping — a VM can be migrated to different physical memory without reprogramming the device. The IOMMU page tables are updated; the device sees the same guest addresses
  3. Translation — the device uses logical addresses; the IOMMU silently translates to physical. The device driver never needs to know physical addresses

The challenge: IOMMU translation adds latency to every DMA access — the IOMMU must walk its page tables before forwarding each TLP. For high-throughput devices like Gen 5/6 GPUs and NVMe arrays, this latency is unacceptable. Address Translation Services (ATS) solves this.

📋 The AT Field in TLP Header DW0

The AT (Address Type) field is a 2-bit field at DW0 bits 11:10 of every memory and AtomicOp TLP header. It tells the Root Complex and any intermediate IOMMU-aware hardware whether the address in this TLP has already been translated by the IOMMU — or whether it still needs translation.

AT Field — Position in DW0 and All Encodings Fmt [30:28] Type [27:24] TC · LN · TH · TD · EP · Attr [23:12] AT [11:10] 2 bits Length [9:0] payload size in DW 00 = Default normal DMA 01 = Translation Request asking IOMMU to translate 10 = Translated address already translated by IOMMU 11 = Rsvd
Figure 2 — AT field position (DW0 bits 11:10) and all four encodings. AT=00 is the default for all normal DMA traffic. AT=01 marks an ATS Translation Request — asking the IOMMU/RC to translate a GPA. AT=10 marks a TLP that already carries a translated HPA, bypassing IOMMU table walks.

📋 AT Field Encodings

AT[1:0]NameMeaning at the IOMMU / Root ComplexWho sets it
00 Default / Untranslated Normal DMA transaction. The IOMMU must translate this address through its page tables before forwarding to memory. Every DMA access incurs the full table-walk latency. All devices that have not negotiated ATS. Every PCIe device that does not implement ATS uses AT=00 exclusively.
01 Translation Request This TLP is an ATS Translation Request — the device is asking the Root Complex to provide the translated physical address for a given GPA. The RC consults the IOMMU and returns the result. This is not a data access — it is a request for an address mapping. Devices that implement ATS, during the pre-translation phase before a DMA window is open.
10 Translated This TLP carries an address that has already been translated by the IOMMU into a Host Physical Address. The RC and IOMMU may skip the page-table walk and forward directly to the memory controller. The device caches the translated address in its Translation Lookaside Buffer (TLB). Devices that have previously obtained a translation via AT=01 and are now issuing the actual DMA using the cached HPA.
11 Reserved Must not be used. A TLP with AT=11 is treated as a Malformed TLP by the receiver. Never — reserved by the spec.
AT=00 is the only value allowed for IO, Configuration, and Message TLPs. The AT field is meaningful only for Memory and AtomicOp TLPs. For all other TLP types, AT must be 00 or the receiver may treat it as a Malformed TLP error.

📋 Address Translation Services — Overview

ATS is the mechanism that allows a device to pre-fetch and cache translated addresses from the IOMMU, so that subsequent DMA accesses use AT=10 (already translated) and bypass the per-access page-table walk. The device builds its own Translation Lookaside Buffer (TLB) of GPA→HPA mappings, keyed by GPA range.

ATS — Two-Phase Operation Phase 1 — Translation Request (AT=01) Device TLB miss for GPA 0x1000 Sends AT=01 request to RC → TLB entry AT=01 · GPA 0x1000 RC/IOMMU Walks page table GPA 0x1000 → HPA 0x8F3A_1000 Returns translation in CplD CplD · HPA 0x8F3A_1000 Phase 2 — Real DMA Access (AT=10) Device TLB has cached HPA 0x8F3A_1000 Issues MWr or MRd with AT=10 AT=10 · HPA direct Memory RC sees AT=10 Skips IOMMU page-walk Data goes to DRAM Phase 1 runs once per DMA window (or on TLB miss). Phase 2 runs for every subsequent access in that window — with zero IOMMU table-walk latency.
Figure 3 — ATS two-phase operation. Phase 1 (AT=01): device asks for the translation of a GPA, IOMMU walks its page tables, returns the HPA via CplD. Device stores this in its TLB. Phase 2 (AT=10): all subsequent DMA accesses use the cached HPA directly, bypassing IOMMU table walks entirely.

📋 ATS Translation Request TLP

An ATS Translation Request is a standard Memory Read Request (MRd) with AT=01. The device sends it to ask the IOMMU (via the RC) to translate a GPA into an HPA. The header format is identical to an MRd — the only distinguishing feature is AT[1:0] = 01 in DW0.

ATS Translation Request — MRd Header with AT=01 DW0: Fmt=000/001 · Type=0_0000 (MRd) · TC · Attr · AT=01 · Length Identical to MRd DW0 except AT field is set to 01 — everything else same as memory read DW1: Requester ID · Tag · Last DW BE=0 · First DW BE DW2/3: Guest Physical Address (GPA) — the address the device wants translated DW0 DW1 DW2/3
Figure 4 — ATS Translation Request is a standard MRd (no payload) with AT=01. The GPA being translated goes in the address field exactly where a normal DMA address would go. The RC recognises AT=01 and routes the request to the IOMMU instead of forwarding it to the memory controller.

What the Length field means in a Translation Request

In a normal MRd, Length specifies how many DWs of data to return. In an ATS Translation Request, Length specifies the size of the GPA window the device wants translated — how large a contiguous block of GPA space it wants a mapping for. The IOMMU returns the HPA base of that window along with attributes indicating the access permissions and cache hints.

📋 ATS Translation Completion

The Root Complex/IOMMU responds to an ATS Translation Request with a standard Completion with Data (CplD). The payload carries the translated address plus permission and attribute bits. The format of the payload is defined by the ATS capability:

Completion fieldValue
Fmt, TypeCplD (Fmt=010, Type=0_1010)
StatusSC (success) — translation found. UR — GPA not mapped (page fault, device must handle via PRI).
Length1 DW per 4 KB of requested translation window
Payload DW formatHPA[63:12] in bits [63:12] · U bit (Untranslated Access Only) in bit 2 · Write permission in bit 1 · Read permission in bit 0
Global Invalidate hintS bit in the response indicates whether the translation is shared or device-specific

The device stores the returned HPA in its TLB, tagged with the original GPA, its size, and the permission bits. All subsequent DMA accesses to that GPA range use AT=10 with the cached HPA, completely bypassing the IOMMU table walk. The latency saving is significant — a table walk on a modern system takes 50–200 ns; AT=10 bypasses it entirely.

One translation covers many DMA operations. A device might request a 2 MB translation window for a DMA ring buffer. That single AT=01 request covers thousands of subsequent 4 KB DMA operations, all of which use AT=10. The amortised cost of the translation lookup approaches zero for large DMA workloads.

📋 ATS Invalidation — Keeping Caches Fresh

When the IOMMU needs to change a mapping — for example when a VM is migrated, a memory page is reclaimed, or an IOMMU page table entry is updated — it must ensure the device’s TLB no longer holds a stale translation for that address range. The ATS Invalidation mechanism handles this.

ATS Invalidation Handshake RC/IOMMU Page table changed Must revoke TLB Sends Invalidate Request message Invalidate Request carries GPA range to revoke Device Receives Invalidate Flushes matching TLB entries Sends Invalidate Complete Invalidate Complete Key Rule RC must not use the new page table mapping until the device has confirmed its TLB entry is gone. DMA operations with AT=10 after this are safe. In-flight AT=10 TLPs must also complete before remap.
Figure 5 — ATS Invalidation handshake. The RC sends an Invalidate Request specifying the GPA range to revoke. The device flushes those TLB entries and replies with Invalidate Complete. Only after receiving the completion can the IOMMU safely remap that GPA range to a different HPA.

The Invalidate Request is carried as an ATS Invalidation Request TLP (a specialised NP TLP type). Invalidation completion is mandatory — the IOMMU must not remap a page until the device confirms its TLB is clean for that range. This prevents a device from issuing an AT=10 DMA to an HPA that the IOMMU has already remapped to a different GPA.

📋 ATS Configuration Registers

ATS support is declared and negotiated via the ATS Extended Capability structure, which lives in the PCIe Extended Configuration Space (above offset 0x100) of the endpoint. The key registers are:

RegisterKey fieldsPurpose
ATS Capability Register Invalidate Queue Depth (5 bits) · Page Aligned Request Declares the maximum number of Invalidate Requests the device can queue simultaneously. A value of 0 means the device can handle any depth. Page Aligned Request bit says the device always requests page-aligned translations.
ATS Control Register Enable bit · STU (Smallest Translation Unit) field Enable bit: software sets this to activate ATS for this function. STU field: the minimum page size the device supports for translations. Encodings: 0=4 KB, 1=8 KB, 2=16 KB, and so on in powers of 2.

ATS is only active when the platform enables it. Software discovers the ATS capability by scanning the extended capability chain in config space for Capability ID = 0x000F. Both the endpoint and the platform (RC) must support ATS before the device may issue AT=01 or AT=10 TLPs. Issuing AT≠00 without ATS negotiation is an Access Control Services (ACS) violation.

📋 Page Request Interface — Overview

ATS Translation Requests work well when the target GPA is mapped and the page is already resident in physical memory. But in a virtualised system pages can be paged out to disk, swapped, or simply not yet allocated. When a device sends an AT=01 Translation Request for a GPA that has no current valid mapping, the IOMMU cannot simply return an error — the device needs to wait while the page is brought back into memory.

The Page Request Interface (PRI) is the mechanism that handles this gracefully. Instead of failing the DMA and crashing the device driver, PRI allows the device to signal the OS page manager that a page fault has occurred and to wait for the OS to service it before resuming DMA.

PRI vs ATS — What Happens on a Page Fault Without PRI — Translation Fails ① Device sends AT=01 for GPA 0x5000 ② IOMMU: page not resident (paged out) ③ IOMMU returns UR (Unsupported Request) ④ Device driver sees error → abort DMA ⑤ Application crash or DMA failure ⑥ No recovery possible at hardware level With PRI — Graceful Page Fault Handling ① Device sends AT=01 for GPA 0x5000 ② IOMMU: page not resident ③ Device sends PRI Page Request message ④ OS pages in the memory from disk ⑤ OS sends Page Response → “page ready” ⑥ Device retries AT=01 → success → DMA
Figure 6 — PRI enables graceful page-fault recovery. Without PRI, a TLB miss on a paged-out page returns UR and the DMA fails. With PRI, the device sends a Page Request message and waits while the OS pages in the data. Once the OS signals “page ready” via Page Response, the device retries and succeeds.

📋 PRI Messages

PRI operates through three TLP types, all implemented as standard Message TLPs (Msg) or Non-Posted Requests:

MessageDirectionRoutingPurpose
Page Request Device → RC To Root Complex (routing code 000) Device signals that it needs a specific GPA range to be made resident in physical memory before it can proceed with DMA. Carries the GPA, requested access type (read/write), and a PRG (Page Request Group) index.
Page Response RC → Device To Device (ID routing to requester) OS has serviced the page request. Response codes: Success (page is now resident), Invalid Request (bad address), Response Failure (OS cannot service). Device retries AT=01 on Success.
Stop Markers Device → RC To Root Complex Signals the end of a PRI request stream. Used when a device needs to flush all outstanding page requests and wait for all corresponding Page Responses before proceeding.

Page Request Group (PRG)

A PRG is a group of related page requests that all belong to the same DMA transaction. The device gives each group a PRG index. When all pages in a PRG are resident, the OS sends a single Page Response referencing the PRG index. This allows one OS response to unblock multiple related DMA pages at once rather than responding one page at a time.

📋 PRI Full Flow

The complete PRI sequence for a device performing DMA to a paged-out memory region:

  1. Device issues ATS Translation Request (AT=01) for GPA range. IOMMU checks page tables — page not resident.
  2. IOMMU responds with UR or a specific “needs paging” completion status. Device’s ATS logic sees the miss.
  3. Device sends Page Request Message to the RC, specifying the GPA, access permissions, and PRG index. Device pauses this DMA context — it does not abort, just waits.
  4. RC forwards the Page Request to the OS page manager (via an interrupt or kernel notification path).
  5. OS pages in the required memory from disk or swap, updates the IOMMU page tables, marks the page as resident.
  6. OS sends a Page Response to the device (via the RC) indicating success and the PRG index that was satisfied.
  7. Device receives the Page Response. It retries the ATS Translation Request (AT=01) for the same GPA.
  8. This time IOMMU finds the page resident, returns the HPA via CplD.
  9. Device adds the translation to its TLB. Subsequent DMA uses AT=10. DMA proceeds normally.
PRI requires OS support. The OS must implement a PRI page fault handler. On Linux this is part of the IOMMU subsystem (intel-iommu or AMD IOMMU driver). On Windows it is handled by the HAL and IOMMU driver stack. A device that sends a Page Request to an OS without PRI support will receive a Response Failure and must fall back to AT=00 default DMA.

📋 ACS — Access Control Services

Access Control Services (ACS) is the security layer that prevents devices from abusing the AT field. Without ACS, a malicious device could set AT=10 on a TLP carrying an arbitrary HPA, bypassing the IOMMU entirely and reading or writing any physical memory location.

ACS functionWhat it prevents
Source Validation Verifies that the Requester ID of an incoming AT=10 TLP belongs to a device that has been granted translated access. A device cannot use AT=10 without prior ATS negotiation.
Translation Blocking When ACS Translation Blocking is enabled, the downstream port rejects any TLP with AT≠00 from devices that have not completed ATS negotiation. This ensures untrusted devices cannot issue translated addresses.
Peer-to-Peer Control Prevents one PCIe device from issuing DMA directly to another PCIe device’s memory space, bypassing the IOMMU. All peer-to-peer transactions are redirected through the RC so the IOMMU can enforce access control.
Upstream Forwarding Forces all peer-to-peer requests upstream through the RC, ensuring IOMMU visibility on every access regardless of the source or destination device.

ACS is configured via the ACS Extended Capability structure (Capability ID = 0x000D) in the config space of switches and Root Ports. A switch port with ACS enabled will reject AT=10 TLPs from devices that have not been validated, returning a UR completion to the offending device.

AT Field, ATS, and PRI in Gen 6

The AT field encoding, ATS protocol, and PRI message format are all unchanged in Gen 6. They are Transaction Layer features — Gen 6 changes only the Physical Layer. ATS Translation Requests and PRI Page Request messages are packed into Gen 6 flits the same way any other TLP is, with no modifications to their header or payload formats.

ATS relevance in Gen 6 scale-out systems

What changes for Gen 6 AT/ATS/PRI. Nothing in the TLP format. The AT field stays at DW0 bits 11:10. AT=00/01/10/11 encodings are unchanged. ATS Translation Request and Completion formats are unchanged. PRI message formats are unchanged. The only Gen 6 additions are capability register updates (FEC capability, Flit Mode capability) and PASID usage patterns in multi-tenant deployments — neither of which affects the AT/ATS/PRI protocol itself.

📋 Quick Reference

ItemValue / Rule
AT field locationDW0 bits 11:10 of every Memory and AtomicOp TLP header
AT=00Default / Untranslated. Normal DMA — IOMMU walks page tables. All devices that do not implement ATS use this exclusively.
AT=01Translation Request. Device asking IOMMU/RC to translate a GPA into an HPA. Looks like a standard MRd to the fabric but is intercepted by the IOMMU.
AT=10Translated. Address is already an HPA, obtained via a prior AT=01 exchange. IOMMU may skip page-table walk. Zero translation latency for the DMA access.
AT=11Reserved. Malformed TLP if received.
AT applies toMemory TLPs (MRd, MWr) and AtomicOps only. Must be 00 for IO, Config, and Message TLPs.
ATS purposePre-translate GPA→HPA, cache in device TLB. Subsequent DMA uses AT=10, bypassing IOMMU table walks. Critical for high-bandwidth Gen 6 devices.
ATS Translation RequestStandard MRd with AT=01. GPA in the address field. Length field specifies the size of the GPA window to translate.
ATS Translation CompletionStandard CplD. Payload carries HPA plus permission bits (read/write). Device stores this in its TLB.
ATS InvalidationIOMMU sends Invalidate Request to device when a page table entry changes. Device flushes TLB entry and sends Invalidate Complete. IOMMU waits before remapping.
ATS config spaceATS Extended Capability, Capability ID=0x000F. Enable bit in ATS Control Register. STU field sets minimum translation unit.
PRI purposeAllows device to handle paged-out memory gracefully. Device sends Page Request, OS pages in the data, OS sends Page Response, device retries translation.
Page RequestDevice → RC. Routing code 000 (to Root). Carries GPA, access type, PRG index.
Page ResponseRC → Device. ID routing. Response codes: Success / Invalid Request / Response Failure.
PRGPage Request Group — groups related page requests. One Page Response can unblock an entire PRG.
ACSAccess Control Services — prevents misuse of AT=10 by unvalidated devices. Source Validation + Translation Blocking. Required for security in multi-tenant deployments.
PASIDProcess Address Space Identifier — TLP Prefix that tags translations with a per-process identity. Used with ATS in SR-IOV and multi-tenant Gen 6 scenarios.
Gen 6 impactAT field, ATS, and PRI formats are all unchanged. ATS becomes more valuable at Gen 6 speeds. CXL 3.0 and PASID extend the usage model without changing the core protocol.
Scroll to Top