PCIe Series — PCIe-07: Completion TLPs — VLSI Trainers
PCIe Series · PCIe-07

Completion TLPs

The full Cpl and CplD header — Completer ID, Status codes, Byte Count, Lower Address, Requester ID, and Tag — plus split-completion mechanics, Completion Timeout, CplLk/CplDLk for locked transactions, and what none of this changes in Gen 6.

📋 What a Completion Is

Every non-posted request — MRd, MRdLk, IOWr, IORd, CfgRd, CfgWr, AtomicOp — must eventually receive a completion TLP in return. The completion is how the completer tells the requester: “I have processed your request. Here is the result.” Without the completion, the requester has no way to know whether its request succeeded, failed, or was even received.

Split-Transaction Model — Request and Completion Are Separate TLPs Requester NVMe SSD BDF 03:00.0 Tag=42 pending waits for CplD with Tag=42 ① MRd · Tag=42 · Len=16DW non-posted · travels upstream Completer Root Complex BDF 00:00.0 reads from RAM builds CplD echoes Tag=42 ② CplD · Tag=42 · Status=SC carries 64 bytes of data Key properties • Always 3DW header (12 B) • Routes by Requester ID • Tag echoed from request • Always infinite FC credits • TC matches original request
Figure 1 — Split-transaction model. The MRd travels upstream (① blue arrow). While it is being processed, the requester continues other work. The CplD returns downstream (② orange arrow) with the echoed Tag=42 to identify which request it satisfies. These two TLPs are independent packets — they may take different paths if the topology allows.

Only non-posted requests generate completions. Posted requests (MWr, Msg) do not — by design. Completions must always have infinite receive buffer credits at the requester to prevent deadlock (covered in PCIe-13 on Flow Control).

📋 Cpl, CplD, CplLk, CplDLk

There are four completion TLP types, all sharing the same 3DW header layout. The Fmt field distinguishes whether data is present; the Type field is always the same.

NameFmtTypeData?Used for
Cpl 000 0_1010 No Status-only response to IOWr, CfgWr, and failed reads (UR/CA)
CplD 010 0_1010 Yes Response to MRd, IORd, CfgRd, AtomicOp — carries the requested data
CplLk 000 0_1011 No Status-only response to MRdLk when no data (error case)
CplDLk 010 0_1011 Yes Response to MRdLk — carries data, used in legacy locked transactions
How to tell Cpl from CplD at a glance. Look at Fmt bit[1] (DW0 bit 29). If it is 0 (Fmt=000), this is a Cpl — no data follows the header. If it is 1 (Fmt=010), this is a CplD — a data payload immediately follows the 12-byte header. The receiver determines this before reading any further into the packet.

📋 Completer ID (DW1, Bytes 4–5)

The Completer ID is the 16-bit BDF (Bus:Device:Function) of the device that is sending this completion. It is placed in DW1 bytes 4–5 — the same position that would be the “Requester ID” in a request TLP, but in a completion it identifies the completer rather than the requester.

During normal successful operation the Completer ID is largely informational — it is useful for debug when tracking which device returned data for a given request. The Requester ID in DW2 is what actually routes the completion back to its destination. However the spec notes it is valuable for error diagnosis: if a completion arrives with an unexpected status code, the Completer ID tells you exactly which device failed.

Completer ID vs Requester ID — easy to confuse. In DW1 (Bytes 4–5) lives the Completer ID — who is sending this completion. In DW2 (Bytes 8–9) lives the Requester ID — who originally sent the request and who should receive this completion. Switches use the Requester ID to route the completion downstream. The Completer ID is not used for routing.

📋 Completion Status Codes

The 3-bit Status field in DW1 byte 6 bits [7:5] tells the requester whether its request was serviced correctly and, if not, what went wrong.

Completion Status Codes — Byte 6 bits [7:5] 000 — SC Successful Completion Request processed correctly. CplD carries the data for reads. Cpl carries just status for writes. Normal operation — most completions will have this status code. Byte Count must equal total bytes remaining in a split completion. AER: no error reported 001 — UR Unsupported Request Completer does not recognise or cannot handle this request. No data is returned with UR. Common causes: · Access to unimplemented BAR range · Unsupported TLP type for device · Address outside device MMIO space Advisory Non-Fatal Error since PCIe 1.1 AER: ERR_NONFATAL may be sent 010 — CRS Config Request Retry Status Device is not yet ready to respond to a configuration request. Only valid for Config requests. RC should retry the config access. Device has up to 1 second after reset to stop returning CRS. If CRS SW Visibility enabled: VendorID read returns 0x0001 to signal “not ready yet” to SW 100 — CA Completer Abort Completer could have serviced the request but has failed. Distinct from UR: device recognised the request but could not complete it. Uncorrectable error condition. No data is returned with CA. Examples: internal device error, ECC error that cannot be corrected. AER: ERR_FATAL or ERR_NONFATAL
Figure 3 — The four completion status codes. Codes 011 and 101–111 are reserved — a completion with a reserved status code is treated as if it were UR. A status code other than SC terminates the transaction — no further completions are expected for that request, and any data already received should be discarded.
A non-SC status terminates the transaction immediately. If a CplD with Status=CA arrives and this is the second of three expected split completions, the requester must discard all data received so far and consider the request failed. No further completions will come for this Tag. The Tag is freed, and the error is reported to the device driver.

📋 BCM — Byte Count Modified

BCM is a 1-bit field at DW1 byte 6 bit [4]. It is a legacy compatibility flag used only by PCI-X completers that may exist behind a PCIe-to-PCI-X bridge. In a pure PCIe system, BCM is always 0.

When BCM=1 (PCI-X bridges only): the Byte Count field in this first completion reports the size of this completion’s payload rather than the total remaining bytes for the original request. Subsequent completions reset BCM to 0 and Byte Count reverts to its normal meaning (remaining total). This distinction matters because a requester uses Byte Count to know when all splits of a request have arrived — BCM=1 signals “do not use this Byte Count to determine completion”.

For all native PCIe completers, BCM is always 0 and can be ignored by the receiver.

📋 Byte Count Field (DW1, 12 bits)

The 12-bit Byte Count field at DW1 byte 6 bits [3:0] + byte 7 bits [7:0] carries the number of bytes remaining to satisfy the original request, including the bytes in this completion. It counts down with each successive completion TLP.

Byte Count — Counting Down Across Split Completions Example: MRd requesting 128 bytes (32 DW) split into two CplDs of 64 bytes each Original MRd Length = 32 DW = 128 bytes requested First CplD Length = 16 DW = 64 bytes Byte Count = 128 total remaining incl. this CplD Requester: 128 ≠ 64 → more to come Second CplD Length = 16 DW = 64 bytes Byte Count = 64 remaining incl. this CplD Requester: 64 = 64 → done! Complete 128 bytes received Tag freed Data delivered Byte Count Rule — How the Requester Knows It Has All the Data Byte Count starts at the total bytes requested and counts down. When Byte Count in a CplD equals the Length of that CplD (in bytes), the requester knows this is the last completion — all data has arrived. Formula: Last CplD when Byte Count = Length × 4 bytes. Note: Byte Count is in bytes; Length is in DWs. Convert: Length × 4 = Byte Count for last CplD (when Length × 4 = Byte Count).
Figure 4 — Byte Count counting down. The original request asks for 128 bytes. The first CplD returns 64 bytes and has Byte Count=128 (128 total remaining, including this completion’s 64 bytes). The second CplD returns the remaining 64 bytes with Byte Count=64. When Byte Count equals Length×4, the requester knows this is the final completion.

📋 Requester ID and Tag (DW2)

DW2 carries the two fields that tie a completion back to its original request: the Requester ID and the Tag.

FieldLocationValuePurpose
Requester ID Bytes 8–9 Copied from the original request’s Requester ID field Used for routing — switches compare the bus number against their Secondary/Subordinate ranges to forward the completion downstream to the correct port
Tag Byte 10 Copied exactly from the original request’s Tag field Used for matching — the requester’s Transaction Layer matches incoming CplD tags against its table of outstanding requests to deliver data to the correct waiting context
The Requester ID routes; the Tag matches. These are distinct jobs. A switch that receives a completion looks at the Requester ID bus number (bytes 8–9) to decide which downstream port to forward it to. It does not look at the Tag at all. The Tag is only examined by the final destination — the original requester — which uses it to find the pending request and deliver the data.

📋 Lower Address Field (DW2 Byte 11 bits [6:0])

Lower Address is a 7-bit field at byte 11 bits [6:0] of DW2. It carries the byte address of the first valid data byte being returned in this completion. It is not the full address — it is the low 7 bits of the byte-level start address of the data in this completion.

Its primary purpose is to help the requester calculate how many bytes remain before hitting the next Read Completion Boundary (RCB). RCB is either 64 or 128 bytes depending on a Root Complex configuration register. Completions that are entirely within one RCB must be returned in a single CplD. The Lower Address field tells the requester the exact byte position within the current RCB so it can do this calculation.

For AtomicOp completions, Lower Address is reserved (set to zero). For all other completion types (write completions — Cpl with no data), Lower Address is also zero.

📋 Calculating the Lower Address Field

The completer calculates Lower Address from the DW-aligned start address and the First DW Byte Enable pattern of the original request. The result is the byte offset within the first DW at which the first valid data byte lives.

Lower Address Calculation — From First DW BE and Start Address First DW BE Byte offset in first DW Lower Address formula Example (start addr=0x100) 1111 All 4 bytes valid · offset = 0 Lower Address = start_addr[6:0] 0x100 → Lower Addr = 0x00 1110 Byte 0 skipped · offset = 1 Lower Address = start_addr[6:0] + 1 0x100 → Lower Addr = 0x01 1100 Bytes 0–1 skipped · offset = 2 Lower Address = start_addr[6:0] + 2 0x100 → Lower Addr = 0x02 1000 Bytes 0–2 skipped · offset = 3 Lower Address = start_addr[6:0] + 3 0x100 → Lower Addr = 0x03 Only 7 bits: Lower Address = (start_addr + byte_offset) mod 128 · Used with RCB to calculate bytes until next RCB boundary
Figure 5 — Lower Address calculation. The First DW BE from the original request determines the byte offset within the starting DW. Lower Address is the low 7 bits of (DW-aligned start address + offset from First DW BE). The requester uses it to calculate how many bytes reach the next RCB boundary.

📋 How Completions Are Routed

Completions use ID routing — not address routing. Every switch that receives a completion extracts the Requester ID bus number (DW2 bytes 8–9) and compares it against the Secondary and Subordinate bus numbers of its downstream ports to determine which port to forward the completion to.

Completion Routing — Switch Uses Requester ID Bus Number RC Completer BDF 00:00.0 Sends CplD ReqID=03:00.0 CplD · ReqID=03:00.0 Tag=42 · 64 bytes Switch Checks Req ID bus = 03 Port 2: Sec=3 Sub=5 3 is within 3–5 → MATCH forward to Port 2 CplD forwarded NVMe SSD BDF 03:00.0 Checks Tag = 42 Matches pending MRd Delivers 64 bytes to DMA engine Tag 42 freed Switch routes by Requester ID bus number Requester matches by Tag
Figure 6 — Completion routing. The switch extracts the Requester ID bus number (03) from DW2 bytes 8–9 and compares it against each downstream port’s Secondary/Subordinate bus range. Port 2 has Secondary=3, Subordinate=5 — bus 03 is in this range, so the completion is forwarded there. At the NVMe SSD, Tag=42 matches the pending MRd and the data is delivered.

📋 Split Completions and the Read Completion Boundary

A completer is allowed to satisfy a single read request with multiple completion TLPs. This is called a split completion. The spec defines when this is legal and how the completions must be structured.

Read Completion Boundary (RCB)

The RCB is a naturally-aligned boundary in the memory address space, at either 64 or 128 bytes. The Root Complex’s RCB size is readable from a configuration register. Bridges and endpoints may also implement an RCB configuration bit.

The rules from the spec:

  1. A completion that is entirely within a single RCB-aligned block must be returned in one CplD — it cannot be split at a point within an RCB.
  2. A completion spanning an RCB boundary may be split at the boundary. The first CplD covers up to (but not including) the next RCB boundary; subsequent CplDs cover one or more full RCB blocks; the final CplD covers the remaining bytes.
  3. Multiple completions for one request must be returned in strictly increasing address order.
  4. A single completion can only satisfy one request — it cannot partially satisfy two requests.
  5. For IO and Config reads, the completer always returns exactly 1 DW (4 bytes) in a single completion — never split.
RCB Split — Request Crossing Two 64-byte RCB Boundaries MRd: start=0x38, length=48 bytes (12 DW) · RCB=64 bytes · Split required at 0x40 and 0x80 0x00 0x40 0x80 RCB boundary RCB boundary 0x00–0x3F (before request) 0x38– 0x3F 8 B 0x40–0x7F full RCB block 64 B 0x80– 0x97 24 B after request CplD #1 8 B · Byte Count=96 CplD #2 64 B · Byte Count=88 CplD #3 24 B · Byte Count=24
Figure 7 — Split completion at RCB=64 bytes. The original MRd covers bytes 0x38–0x97 (96 bytes across three segments). CplD #1 covers up to the first RCB boundary (8 bytes, Byte Count=96). CplD #2 covers the full middle RCB block (64 bytes, Byte Count=88). CplD #3 covers the tail (24 bytes, Byte Count=24 — equals its own Length×4, signalling last completion).

📋 Completion Timeout

A requester that sends a non-posted request and never receives a completion is in trouble — its Tag is stuck, its receive buffer may be held, and dependent operations cannot proceed. The PCIe spec defines a Completion Timeout mechanism to recover from this situation.

PropertyValue / Description
Where enabled Device Control 2 register — Completion Timeout Value field and Completion Timeout Disable bit
Default timeout range 50 µs to 50 ms (default), configurable to 16 µs–55 ms, 65 ms–210 ms, 260 ms–900 ms, or 1 s–3.5 s
What happens on timeout Device reports a Completion Timeout error to software (AER: advisory non-fatal error). The pending Tag is released. The outstanding request is considered failed.
Disable option Software can set the Completion Timeout Disable bit to prevent timeouts. Only appropriate for specific test scenarios — disabled timeouts risk hanging the device indefinitely on a lost completion.
CRS exception CRS (Config Request Retry Status) completions do not start the Completion Timeout for the RC — the RC is expected to retry. But software must eventually stop retrying if CRS continues beyond 1 second after reset.
Gen 6 consideration At Gen 6 speeds, a small timeout value may trigger spuriously if the completer is processing a large AtomicOp or slow memory operation. Firmware should configure the timeout range appropriately for the workload.
Completion Timeout is mandatory in PCIe 2.0 and later. Devices designed to PCIe 1.x had no standard timeout mechanism — firmware had to poll the device and manually detect hangs. From PCIe 2.0 onward, the timeout is a required hardware feature, making system error recovery far more reliable.

Completions in Gen 6

The completion TLP header format — every byte, every field, every bit position — is identical in Gen 6 compared to Gen 1 through Gen 5. The same 3DW, 12-byte header. The same Completer ID in DW1. The same Requester ID and Tag in DW2. The same four Status codes. The same Byte Count and Lower Address fields.

Gen 6 flit packing of completions

The practical rule for Gen 6 completion handling. If you are writing RTL that builds or parses completion TLPs, no changes are needed for Gen 6. The header fields, Byte Count logic, Lower Address calculation, RCB rules, and Tag matching all work identically. The Physical Layer handles flit packing transparently and the Transaction Layer sees the same clean TLP stream it has always seen.

📋 Quick Reference

ItemValue / Rule
Cpl — Fmt, TypeFmt=000 · Type=0_1010 · no data · 12 bytes
CplD — Fmt, TypeFmt=010 · Type=0_1010 · data follows · 12 bytes header
CplLk / CplDLkType=0_1011 · locked transaction legacy variants
Header sizeAlways 3DW (12 bytes) · no 4DW variant for completions
DW0Fmt + Type + TC (must match request) + Attr (must match request) + Length (payload DWs)
DW1 — Completer IDBDF of the device sending this completion · bytes 4–5 · for debug only, not used for routing
DW1 — Status [7:5]000=SC (success) · 001=UR (unsupported) · 010=CRS (retry, config only) · 100=CA (abort)
DW1 — BCM [4]Byte Count Modified · PCI-X legacy only · always 0 in native PCIe
DW1 — Byte Count [11:0]Bytes remaining to satisfy original request, including this completion’s payload · counts down across splits
DW2 — Requester IDCopied from original request · bytes 8–9 · used by switches to route completion downstream
DW2 — TagCopied from original request · byte 10 · used by requester to match completion to pending request
DW2 — Lower Address [6:0]Byte 11 bits [6:0] · low 7 bits of first valid data byte address · derived from DW start addr + First DW BE offset
Routing methodID routing — switches compare Requester ID bus number to downstream port Secondary/Subordinate ranges
Non-SC statusTerminates transaction immediately · no data delivered · Tag freed · error reported to software
Last completion detectionWhen Byte Count in a CplD equals its Length × 4, this is the final completion for that request
RCBRead Completion Boundary — 64 or 128 bytes · completions entirely within one RCB must not be split
Completion TimeoutMandatory from PCIe 2.0 · Device Control 2 register · 16 µs to 3.5 s range · AER advisory non-fatal on expiry
Gen 6 impactZero — header unchanged · Byte Count/Lower Address/Tag/Status identical · flit packing transparent
Coming next — PCIe-08: Configuration TLPs — CfgRd0, CfgRd1, CfgWr0, CfgWr1 — Type 0 vs Type 1, how Bus/Device/Function addressing works in config space, and how Type 1 is converted to Type 0 at the target bus.
Scroll to Top