PCIe Series — PCIe-05: Transaction Layer in Depth — VLSI Trainers
PCIe Series · PCIe-05

Transaction Layer in Depth

Every TLP type explained with full header diagrams — Memory Read/Write, Completions, Config, IO, Messages, and AtomicOps. Address routing vs ID routing vs implicit routing. Byte enables, the Tag field, split transactions, and how routing decisions are made at every node. Gen 6 TLP considerations throughout.

📋 TLP Structure Overview

A TLP (Transaction Layer Packet) is the fundamental unit of communication in PCIe. It carries commands and data between the software stacks of two devices — one the requester, one the completer. Every TLP starts with a header of 3 or 4 Doublewords (12 or 16 bytes), optionally followed by a data payload, and optionally ending with an ECRC.

TLP Anatomy — Four Parts, Three Mandatory Header 3 DW (12 bytes) or 4 DW (16 bytes) DW0: Fmt + Type + TC + TD + EP + Attr + AT + Length DW1–2 (or DW1–3): Requester ID, Tag, BEs, Address/ID Mandatory in every TLP Data Payload 1–1024 DW (4–4096 bytes) Present only in MWr, CplD, IOWr, CfgWr, MsgD, AtomicOp Optional — Fmt[1] = 1 means present ECRC 1 DW (4 bytes) End-to-end CRC covers header + payload Optional — TD bit = 1 means present What each piece is for: Header: type, address, length, requester, tag, attributes Payload: data being transferred ECRC: end-to-end integrity check, survives switches DLL then adds: Sequence Number (2B) + LCRC (4B) · Physical Layer adds framing · Gen 6: packed into 256B flits
Figure 1 — TLP anatomy. The header is mandatory; payload and ECRC are conditional. Data Link Layer adds SeqNo and LCRC wrapping the entire TLP. In Gen 6, one or more wrapped TLPs are packed into a 256-byte flit with FEC parity appended.

📋 Fmt and Type Field Encoding

The first two fields in DW0 tell the receiver everything about the TLP’s structure before it parses any other field. Fmt sets the header size and whether a payload follows; Type selects the TLP variety.

Fmt[2:0] Encoding and Type[4:0] Key Values Fmt[2:0] — Format Field Encoding Meaning Typical use 000 3DW header, no data MRd 32-bit 001 4DW header, no data MRd 64-bit, Msg 010 3DW header, with data MWr 32-bit, CplD 011 4DW header, with data MWr 64-bit, MsgD 100 TLP Prefix (1DW) LPrfx / EPrfx Type[4:0] — Key Values Type[4:0] TLP name(s) Routing 0_0000 MRd, MRdLk, MWr Address 0_0010 IORd, IOWr Address 0_0100 / 0_0101 CfgRd0/Wr0 · CfgRd1/Wr1 ID 0_1010 Cpl · CplD · CplLk · CplDLk ID 0_1100–0_1110 FetchAdd · Swap · CAS (AtomicOp) Address 1_0rrr Msg · MsgD (rrr = routing code) Address / ID / Implicit 0_0001 MRdLk (locked memory read) Address
Figure 2 — Fmt encoding (left) and Type[4:0] key values (right). Each row in the Type table is its own TLP group with a clean Type field value, name, and routing method. The receiver parses Fmt first to know header size, then Type to know what to do with it.

📋 Memory Read (MRd) — 3DW and 4DW Headers

A Memory Read request (MRd) asks the completer to return a block of data from a memory-mapped address. It is non-posted — a completion with data (CplD) must come back. The requester uses a Tag to match the completion to its request.

MRd — 3DW Header (32-bit address, target < 4 GB) Fmt = 000 · Type = 0_0000 · No payload · address[1:0] always 00 (DW-aligned) Fmt 000 Type 0_0000 R TC 2:0 R TD ECRC EP poison Attr RO · NS AT 1:0 R Length 9:0 · in DW · 0=1024DW DW0 — 32 bits Requester ID [31:16] Bus[15:8] · Dev[7:3] · Fn[2:0] Tag [15:8] 8-bit (0–255) Last DW BE [7:4] First DW BE [3:0] — bit set = byte valid DW1 Address[31:2] 32-bit MMIO or memory address — bits[1:0] always 00 (DW aligned) 00 DW2 — 3DW header ends here 4DW Header — 64-bit address (target ≥ 4 GB) Fmt = 001 · Same DW0 and DW1 DW0: identical to 3DW above DW1: Requester ID + Tag + Last BE + First BE (identical) DW2: Address[63:32] — upper 32 bits of 64-bit address DW3: Address[31:2] + 00 — lower 32 bits 00 4DW header = 16 bytes total Rule: address > 0xFFFF_FFFF → must use 4DW address ≤ 0xFFFF_FFFF → should use 3DW (more efficient)
Figure 3 — MRd header layouts. Left: 3DW (12-byte) for 32-bit addresses below 4 GB. Right: 4DW (16-byte) for 64-bit addresses. The Requester ID identifies who sent the request; the Tag distinguishes this read from up to 255 other simultaneous reads from the same function. Byte Enables indicate which bytes in the first and last DW are valid.

📋 Memory Write (MWr)

A Memory Write (MWr) posts data to a memory address. It is posted — no completion comes back. The requester sends it and immediately continues. This is what makes DMA writes fast: the CPU or DMA engine fires the write and moves on without waiting for confirmation.

MWr — 4DW Header (64-bit address) with Payload Fmt = 011 (4DW, with data) · Type = 0_0000 · Posted · No completion returns DW0: Fmt=011 · Type=0_0000 · TC · TD · EP · Attr · AT · Length (in DW) DW1: Requester ID [31:16] · Tag [15:8] · Last BE [7:4] · First BE [3:0] DW2: Address[63:32] (upper 32 bits) DW3: Address[31:2] + 00 (lower 32 bits) 00 Data Payload 1–1024 DW of data Length in DW0 specifies count First + Last DW BEs select valid bytes at boundaries Key differences vs MRd: ① Fmt data-bit set → payload follows header immediately ② Posted → no completion TLP ever returns to requester Link-level ACK DLLP still returns per-hop (Data Link Layer reliability)
Figure 4 — MWr header with payload. The header is identical to MRd but with Fmt’s data bit set, and the payload immediately follows the last header DW. Since MWr is posted, a completion TLP never comes back — but the Data Link Layer’s per-hop ACK DLLP still confirms delivery to the adjacent neighbour.
Why MWr is posted but MRd is not. With a write, the data is in the TLP itself — the requester has everything it needs to continue its work. With a read, the data is at the target — the requester must wait for the completion to arrive before it can proceed. Posting memory writes and returning completions only for reads is the PCIe version of the “fire and forget” principle that makes DMA engines so efficient.

📋 Byte Enables — First DW and Last DW

Every MRd and MWr (and IORd/IOWr) header carries two 4-bit Byte Enable fields: one for the first DW of the transfer and one for the last DW. Each bit selects one byte within its DW.

Byte Enable Fields — What Each Bit Means First DW Byte Enables [3:0] Bits in header DW1[3:0] — one bit per byte of the first addressed DW Byte 0 BE[0]=1 → valid Byte 1 BE[1]=1 → valid Byte 2 BE[2]=1 → valid Byte 3 BE[3]=1 → valid Common patterns: 1111 = 0xF → all 4 bytes 0001 = 0x1 → byte 0 only Last DW Byte Enables [3:0] Bits in header DW1[7:4] — for the last addressed DW When Length = 1 DW: Last BE must be 0000 (only First BE applies) When Length ≥ 2 DWs: Last BE selects valid bytes in the final DW Example: 6 bytes from byte offset 1 → First BE=1110, Last BE=0011
Figure 5 — Byte Enable fields. First DW BE controls validity of bytes in the first DW of the transfer; Last DW BE controls the final DW. When Length=1, Last BE must be 0000. Together they allow sub-DW granularity at both ends of any transfer.

📋 Tag Field — Multiple Outstanding Requests

The Tag field allows a single function to have multiple non-posted requests outstanding simultaneously without confusing the completions when they arrive back. Each outstanding request gets a unique Tag; the completer echoes it in the completion header; the requester uses it to match the completion to the original request.

Tag Field — Matching Completions to Requests NVMe SSD Requester BDF 03:00.0 MRd Tag=5 → pending MRd Tag=6 → pending MRd Tag=7 → pending 3 in-flight reads MRd Tag=5 MRd Tag=6 MRd Tag=7 Root Complex Completer fetches data from RAM for each Tag separately Sends CplD with matching ReqID + Tag back to the requester CplD Tag=6 (first back) CplD Tag=7 (second) CplD Tag=5 (arrives last) NVMe Tag Matching CplD Tag=6 arrives → NVMe checks Tag 6 → matches outstanding MRd Tag=6 → deliver CplD Tag=7 arrives → Tag 7 matched → deliver CplD Tag=5 arrives last → Tag 5 matched → deliver Completions arrive in ANY order — Tag matching sorts them correctly
Figure 6 — Tag field enables multiple outstanding requests. The NVMe sends three simultaneous MRd TLPs, each with a different Tag (5, 6, 7). Completions return in any order — Tag=6 first, Tag=7 second, Tag=5 last. The NVMe’s Transaction Layer uses the Tag to deliver each completion to the correct waiting context.
Tag capabilityTag bitsMax simultaneous requests per functionEnabled by
Standard (Gen 1–3)8 bits256Default
Extended Tag (Gen 3+)10 bits1024Extended Tag Enable bit in Device Control register
10-bit Tag (Gen 6)10 bits1024Supported in Gen 6 as default for high-BW devices

📋 Completion (Cpl and CplD)

A Completion is the response to any non-posted request. It routes back to the requester using the Requester ID (BDF) embedded in the original request. It carries the Tag from the request to match back to the original transaction, a Completion Status code, and — for reads — the requested data payload.

CplD Header — 3DW (12 bytes) · Completion with Data DW0: Fmt=010 (3DW, with data) · Type=0_1010 · TC · Attr · Length (data DWs) Note: Fmt=000 (no data) for Cpl — status-only response to writes/failed reads Completer ID [31:16] Bus·Dev·Fn of the device sending Cpl Status [15:13] 000=SC · 001=UR 010=CRS · 100=CA BCM [12] Byte Count [11:0] Remaining bytes in the original request (for split completions — explained below) Reserved [31:16] Requester ID [31:16] BDF of who sent the original MRd (used to route this Cpl back upstream) Tag [15:8] Echo of the Tag from original MRd R Lower Address [6:0] Byte address of first byte of data in this completion (used for split Cpls) Data Payload (CplD only) The requested data — may be partial (split completion) Completion Status codes: SC (000) = Successful Completion · UR (001) = Unsupported Request (target doesn’t recognise address/command) · CRS (010) = Config Request Retry Status · CA (100) = Completer Abort
Figure 7 — CplD (Completion with Data) header. DW1 carries the Completer ID (who is sending this completion), Status code, and Byte Count. DW2 carries the Requester ID (who gets the completion — used for routing), the echoed Tag, and Lower Address. The data payload follows immediately after DW2.

Split Transaction — End-to-End Walk-Through

PCIe uses a split transaction model for all non-posted requests. The request and response are two separate TLPs on potentially different paths. A single read may even return in multiple completion TLPs if the data straddles buffer boundaries at the completer.

Split Transaction — GPU DMA Read from System Memory GPU Requester BDF 01:00.0 Tag pool 0–255 Switch Routes MRd upstream Routes CplD downstream by Req ID bus=01 Root Complex Completer fetches from RAM BDF 00:01.0 ① MRd (Tag=42 · Addr=0x1000) ReqID=01:00.0 · Len=16DW ACK DLLP — link-level only ① MRd forwarded same TLP · new LCRC per hop reads RAM → builds CplD ② CplD (Tag=42 · Status=SC) CplID=00:01.0 · ReqID=01:00.0 64 bytes data payload Bus=01 → downstream port facing Bus 1 selected ② CplD delivered GPU matches Tag=42 releases DMA context
Figure 8 — Split transaction flow. The MRd travels upstream (GPU → Switch → RC). The RC fetches from RAM and returns a CplD with the same Tag (42) and the requester’s BDF (01:00.0) for routing. The Switch routes the CplD downstream based on the Requester ID’s bus number (bus 01). The GPU’s Transaction Layer matches Tag 42 and delivers the data.

Split completion — when one read returns multiple CplD TLPs

A completer is allowed to return fewer bytes than requested in a single CplD if its internal buffer or packet size constraints require it. Multiple CplD TLPs can satisfy one MRd. The requester reassembles them using the Byte Count field (tracks remaining bytes) and the Lower Address field (tracks the byte offset of the current chunk).

📋 Configuration TLPs — Type 0 and Type 1

Configuration TLPs access the 4 KB configuration space of PCIe functions. They are non-posted — a completion always comes back. Only the Root Complex may generate configuration requests (no peer-to-peer configuration is allowed).

Config TLP Header — 3DW · CfgRd0 / CfgRd1 / CfgWr0 / CfgWr1 DW0: Fmt (000=read / 010=write) · Type (0_0100=Type0 / 0_0101=Type1) · TC · Attr · Length=1 (config always 1 DW) Requester ID [31:16] BDF of Root Complex port Tag [15:8] 0000 Reserved [7:4] + First BE [3:0] Last DW BE must be 0000 (Length always = 1) Bus [31:24] Target bus Device [23:19] 5 bits Fn [18:16] 3 bits Register Number [11:2] DW address within config space · bits[1:0] always 00 00
Figure 9 — Configuration TLP header. DW2 carries the target Bus/Device/Function and the Register Number (DW offset within the 4 KB config space). Type0 (Type=0_0100) targets a device on the Secondary Bus; Type1 (Type=0_0101) is forwarded further downstream until a bridge converts it to Type0 at the target bus.
TLPType fieldWhen usedCompleter action
CfgRd00_0100Target device is on the Secondary Bus of the forwarding bridge — it sees Type 0 directlyReads config register, returns CplD
CfgRd10_0101Target device is further downstream — bridges forward it until a bridge’s Secondary Bus matches the target bus, then converts to CfgRd0Forwarded until Type1→Type0 conversion
CfgWr00_0100Write to local bus deviceWrites register, returns Cpl (no data)
CfgWr10_0101Write to downstream deviceForwarded, converted, then Cpl returns

📋 IO Read and Write (IORd / IOWr)

IO TLPs target the legacy IO address space (16-bit on x86 systems). They use 3DW headers (IO space is always 32-bit, well under 4 GB). Both are non-posted — IOWr must return a Cpl to confirm the write landed, which is essential because legacy device drivers often depend on write-ordering guarantees in IO space.

The PCIe spec discourages IO address space and indicates it may be deprecated in a future revision. Only Legacy PCIe Endpoints (older PCI/PCI-X devices with a PCIe interface) should use IO space. Native PCIe Endpoints use MMIO only.

📋 Message TLPs

Message TLPs replaced the sideband signals of PCI — interrupt pins, error pins, power management signals — with in-band packets. They always use a 4DW header. They are posted (no completion). Their routing is controlled by the lower 3 bits of the Type field.

Message CodeRoutingPurpose
INTx Assert/DeassertImplicit → RootLegacy PCI interrupt emulation (INTA#, INTB#, INTC#, INTD# replacement)
PME (Power Management Event)Implicit → RootDevice wakeup request — device has data and wants the link powered up
ERR_COR, ERR_NONFATAL, ERR_FATALImplicit → RootError reporting to Root Complex for AER handling
UnlockImplicit → broadcast downTerminates locked transaction sequence (legacy)
Slot Power LimitImplicit → broadcast downRoot informs card of physical slot power budget
Vendor-Defined Type 0/1Address or IDVendor-specific message routed to a specific address or BDF
Attention Button PressedImplicit → RootHot-plug slot attention button event
Presence Detect ChangedImplicit → RootHot-plug card insertion/removal event
Why implicit routing “to Root”? The Root Complex is always at the top of the tree. A message routed “to Root” just travels upstream at every hop — no address or BDF needed. Any Switch receiving a “to Root” message on a downstream port forwards it upstream. The message terminates at the Root Complex, which is always upstream of everything.

📋 AtomicOp TLPs — Read-Modify-Write in Hardware

AtomicOp TLPs (introduced in PCIe 2.1) allow a requester to perform an atomic read-modify-write operation on a memory location at the completer, without software locks. The operation is performed atomically — no other requester can access that location between the read and the write. A completion returns the original value of the location before the operation.

AtomicOpType[4:0]PayloadOperation
FetchAdd0_11001 or 2 DW (operand)Reads current value, adds operand, writes back. Returns old value.
Swap0_11011 or 2 DW (new value)Reads current value, writes new value. Returns old value.
CAS0_11102 or 4 DW (compare + swap)Compares current value with first DW/2DW. If match, writes second DW/2DW. Returns old value always.

All AtomicOps are non-posted (completion returns old value). They can target 32-bit or 64-bit data (1 DW or 2 DW operands). AtomicOps require the completer to declare AtomicOp Routing/Completer support in its PCIe Capability structure.

📋 TLP Routing — Three Methods

Every TLP that arrives at a port is inspected by the Transaction Layer to determine if it should be consumed locally or forwarded to another port. The routing method is determined by the TLP Type field.

Routing methodTLP typesHow routing is decided
Address RoutingMRd, MWr, IORd, IOWr, AtomicOpThe address in the TLP header is compared against the port’s BAR values and Base/Limit registers in the Type 1 header
ID RoutingCfgRd/Wr, Cpl/CplD, some MsgThe Bus/Device/Function number in the header is compared against the port’s BDF and its Secondary/Subordinate range
Implicit RoutingMost Msg TLPsThe routing sub-field (Type[2:0]) specifies “toward Root” or “broadcast downstream” or “terminate here” — no address or ID comparison needed

📋 Address Routing — How a Switch Makes Its Decision

When a TLP using address routing arrives at a Switch port, the Switch checks the target address against three things in order:

Address Routing — Switch Decision Tree for Incoming TLP ① Check own BARs Does this address match any BAR in this port’s Type 1 header? YES CONSUME locally Port is the target NO ② Check Base/Limit registers Is target addr in any downstream port’s NP-MMIO, P-MMIO, or IO range? YES FORWARD to the matching downstream port (the port whose Base/Limit registers contain the target address) NO ③ Return Unsupported Request (UR) completion
Figure 10 — Address routing decision tree at a Switch port. Step 1: check own BARs. Step 2: check Base/Limit register ranges for downstream ports. If neither matches: Unsupported Request. For upstream-traveling TLPs, the same logic applies in reverse — step 2 would check if the address should go further upstream.

📋 ID Routing — Completions and Configuration

Completions (Cpl/CplD) use ID routing to get back to the requester. The Requester ID field in the completion header contains the BDF of the original requester. Every Switch compares the target bus number against its Secondary/Subordinate range to decide whether to forward downstream — the same mechanism used for Type 1 configuration packets.

ID Routing for Completions — Travelling Downstream RC Sends CplD ReqID=03:00.0 Tag=14 CplD: ReqID=03:00.0 Tag=14 Switch Checks ReqID bus=03 Downstream Port 2: Secondary=3, Sub=5 → MATCH forward to Port 2 CplD forwarded downstream NVMe SSD BDF 03:00.0 Matches ReqID + Tag=14 Delivers data to DMA engine Switch routes by ReqID bus number — no address comparison needed for completions
Figure 11 — ID routing for a completion. The Switch extracts the target bus number (03) from the Requester ID field. It checks which downstream port’s Secondary/Subordinate range contains bus 03 — matches Port 2 (Secondary=3, Subordinate=5). Forwards on that port. NVMe SSD (03:00.0) receives it and matches Tag=14.

📋 Implicit Routing — Messages

Message TLPs use implicit routing — a 3-bit code in Type[2:0] tells every Switch how to route it without needing an address or BDF lookup.

Type[2:0]RoutingBehaviourExample messages
000→ Root ComplexEvery Switch forwards upstream. Root Complex terminates.INTx, PME, ERR_*, Hot-plug
001By AddressNormal address lookup in Base/Limit registersVendor-defined
010By IDNormal BDF lookupVendor-defined
011Broadcast downstreamSwitch duplicates message to all downstream portsUnlock, Slot Power Limit
100Local — terminate hereMessage is consumed by the receiving port, not forwardedSet_Slot_Power_Limit
101Gather → RootForwarded upstream; switch may combine with othersPM_PME

Gen 6 — TLP in Flit Context

In Gen 6, TLPs are carried inside 256-byte flits. The TLP format itself — header fields, payload structure, routing information — is unchanged from Gen 1. What changes is how those TLPs are physically transported across the link.

The bottom line for TLP authors. If you are writing RTL, firmware, or drivers that generate or parse TLPs, you do not need to change anything for Gen 6. The TLP format, byte enables, tags, routing fields, and completion matching are all unchanged. Gen 6 differences are entirely in the Physical Layer — the TLP you send looks exactly the same on both ends.

📋 Quick Reference

ConceptKey Point
TLP structureHeader (3 or 4 DW) + optional payload (1–1024 DW) + optional ECRC (1 DW)
Fmt[2:0]000=3DW/no-data · 001=4DW/no-data · 010=3DW/data · 011=4DW/data · 100=prefix
3DW vs 4DW header3DW for addresses < 4 GB (32-bit) · 4DW for addresses ≥ 4 GB (64-bit) or Msg
MRdNon-posted · Fmt=000/001 · Type=0_0000 · No payload · Completion with data returns
MWrPosted · Fmt=010/011 · Type=0_0000 · Payload required · No completion
Byte EnablesFirst DW BE[3:0] and Last DW BE[7:4] select valid bytes in boundary DWs · Last BE=0000 when Length=1
Tag field8-bit (256 tags) standard · 10-bit (1024 tags) extended · enables multiple simultaneous outstanding reads
CplD headerCompleter ID + Status + Byte Count (DW1) + Requester ID + Tag + Lower Address (DW2) + data payload
Completion StatusSC=000 (success) · UR=001 (unsupported) · CRS=010 (config retry) · CA=100 (abort)
Split transactionRequest and completion are separate TLPs · one read may return multiple CplD TLPs · Byte Count tracks remaining bytes
CfgRd0/CfgWr0Type=0_0100 · targets device on Secondary Bus of receiving bridge · always 1 DW access
CfgRd1/CfgWr1Type=0_0101 · forwarded downstream until bridge converts it to Type0 at target bus
Address routingCompare target address against BAR (consume), then Base/Limit (forward), else UR
ID routingCompare target BDF against own BDF (consume), then Secondary/Subordinate range (forward downstream)
Implicit routingType[2:0]=000 → Root · 011 → broadcast · 100 → local · no address comparison needed
Gen 6 TLP impactZero — TLP format unchanged · flit packing is transparent to TL · same Max_Payload_Size
Coming next: PCIe-06 covers the TLP Ordering Rules in depth — the full 12-rule ordering table, why posted writes must not pass posted writes, the Relaxed Ordering and No Snoop attributes, and how ordering interacts with Virtual Channels and Traffic Classes.
Scroll to Top