PCIe Series — PCIe-10: TLP Ordering Rules — VLSI Trainers
PCIe Series · PCIe-10

TLP Ordering Rules

Why order matters in a packet-switched fabric, the three TLP categories, the complete ordering table explained in plain English, the Producer/Consumer model that motivates it all, deadlock prevention, Relaxed Ordering, ID-Based Ordering, and Gen 6.

📋 Why Ordering Rules Exist

PCIe is a packet-switched fabric. Every switch port has independent buffers that can stall or drain at different rates. Without ordering rules a small, lightweight packet can slide past a large stalled packet and arrive at the destination first — even though it was sent second. Software that depends on arrival order gets wrong results with no error flag raised anywhere.

The ordering rules solve three problems at once:

  1. Data correctness — written data must arrive before the flag that says it is ready
  2. Deadlock prevention — completions must always be able to make forward progress even when write queues are backed up
  3. PCI compatibility — the rules exactly match the PCI/PCI-X ordering model so legacy software works unchanged on PCIe
Ordering applies within one Traffic Class. Two TLPs with different TC values have no ordering relationship — they may freely overtake each other. Everything in this post applies to TLPs sharing the same TC moving through the same Virtual Channel.

📋 Three TLP Categories

The ordering table groups TLPs into three buckets. Every TLP in the system belongs to exactly one of them:

Three TLP Categories — Each Has Its Own Sub-buffer and Credit Pool Posted (P) MWr · Msg · MsgD Fire-and-forget · no completion Buffered in P sub-buffer Non-Posted (NP) MRd · IORd · IOWr · CfgRd · CfgWr Completion must return Buffered in NP sub-buffer Completion (CPL) Cpl · CplD · CplLk · CplDLk Response to NP · infinite credits Buffered in CPL sub-buffer
Figure 1 — Three TLP categories with separate VC sub-buffers and independent flow control credit pools. Separating them is what makes the ordering rules implementable — a full Posted buffer cannot block the Completion buffer by design.

The Producer/Consumer Model

Most ordering rules exist to protect one specific programming pattern: a Producer writes data to memory and sets a flag, a Consumer polls the flag and reads the data when it is 1. This pattern is everywhere — NIC DMA, GPU command queues, NVMe submission rings.

Producer/Consumer — Two Writes, One Critical Ordering Requirement Producer NIC / DMA engine Step 1: send data Step 2: set flag Memory Data Buffer Flag Consumer CPU / software Poll flag When flag=1: read data ① MWr — data payload (Posted) ② MWr — flag = 1 (Posted) ③ MRd — poll flag (Non-Posted) ④ MRd — read data (flag was 1) The ordering requirement: ① must reach memory before ②. Both are Posted MWr — only the ordering rule keeps them in sequence.
Figure 2 — Producer/Consumer pattern. Steps ① and ② are both posted MWr TLPs. If ② arrives before ①, the Consumer reads Flag=1 and fetches the data buffer — but finds whatever stale bytes were there before ① landed. No error is raised. The rule “Posted must not pass Posted” is the only thing preventing this.

📋 What Breaks Without Ordering

Flag Write Overtakes Data Write — Silent Corruption Producer Sends ① then ② in this order Switch Posted buffer FULL ① stuck — no credits NP + CPL buffers OK ② flows through here Memory Data: STALE ① not here yet Flag = 1 ✓ Consumer Reads Flag → 1 Reads Data Buffer → Gets stale bytes! No error. No warning. Silent.
Figure 3 — Flag write (②) is small, flows through via the Non-Posted/Completion path. Data write (①) is stuck in the full Posted buffer. Flag=1 appears in memory first. Consumer reads Flag=1 and fetches data — but data hasn’t arrived. The ordering rule “Posted must not pass Posted” prevents ② from ever overtaking ①.

📋 How to Read the Ordering Table

The table is structured as Row passes Column. Columns represent TLPs already waiting in an egress queue. Rows represent a newly arrived TLP that wants to go out the same port. Each cell answers one question: may the row TLP jump ahead of the column TLP?

Three Possible Cell Values No Must not pass. Violating this = data corruption or deadlock. Yes Must be allowed to pass. Blocking this = deadlock. Mandatory. Yes / No May pass but not required. Implementation choice. Both are compliant.
Figure 4 — Cell meanings. “No” entries protect data correctness. “Yes” entries prevent deadlock. “Yes/No” entries give implementers freedom to optimise or to keep logic simple — neither choice is wrong.

📋 The Ordering Table

Read each row as: “This newly arrived TLP — may it pass a TLP that is already queued?”

Newly arrived TLP ↓
Already queued TLP →
Posted
(MWr · Msg)
Non-Posted Read
(MRd · IORd)
Non-Posted Write
(IOWr · CfgWr)
Completion
(Cpl · CplD)
Posted Write / Message
(MWr · Msg · MsgD)
No
core ordering rule
Yes
deadlock prevention
Yes
deadlock prevention
Yes/No
impl. choice
Non-Posted Read
(MRd · IORd · AtomicOp)
No
write-before-read
Yes/No Yes/No Yes/No
Non-Posted Write
(IOWr · CfgWr0/1)
No
write-before-read
Yes/No Yes/No Yes/No
Completion
(Cpl · CplD)
Yes/No
Yes if RO set
Yes
deadlock prevention
Yes
deadlock prevention
Yes/No
different IDs: Y/N
same ID: No

There are four hard rules in this table — three No cells and two mandatory Yes cells. Everything else is implementation choice. The sections below explain each group in plain English.

📋 Posted vs Posted — The Core Rule

A Posted Write must not pass a Posted Write that arrived earlier.

This single rule protects the Producer/Consumer pattern. The data write (①) and the flag write (②) are both MWr TLPs. Without this rule a lightweight flag write can bypass a heavy data write inside a switch buffer and arrive first. The Consumer then reads Flag=1 before the data has landed — silent data corruption with no error flag raised anywhere in the system.

Posted Must Not Pass Posted — Queue Behaviour Egress queue (front→) ① Data MWr ② Flag MWr ① delivered first then ② → data guaranteed in memory before flag signals ready If ② overtook ①: Consumer reads Flag=1, reads stale data → corruption. No exception to this rule. RO on flag writes is not safe.
Figure 5 — Posted writes leave the egress port in the same order they entered it. ① always exits before ②. The switch must not allow a newer Posted to bypass an older Posted regardless of size difference.

There is one exception for advanced use: when ID-Based Ordering (IDO) is enabled and two Posted packets come from devices with different Requester IDs, they may be allowed to reorder — because packets from different devices are almost certainly unrelated. IDO is explained later in this post.

📋 Posted vs Non-Posted — Mandatory Pass

A Posted Write must be allowed to pass a queued Non-Posted request. This is not optional — it is mandatory to prevent deadlock.

The scenario that demands this rule: a read request (MRd) is stuck at an egress port because the NP buffer at the next hop is full. If an MWr is not allowed to bypass that stuck MRd, the MWr is also stuck. If the MWr carries data that the read’s target needs to return in its completion, and the completion cannot return until the MWr lands — nothing moves. The switch, the requester, and the target all wait on each other forever.

Allowing the MWr to pass the stuck MRd breaks this circle. The data lands. The target reads it. The completion flows back. The MRd resolves. Everything drains.

This applies equally to Non-Posted writes (IOWr, CfgWr). A Posted Write that arrives behind a stalled IOWr or CfgWr must also be allowed to bypass it. The deadlock scenario is identical. Blocking MWr behind any stuck Non-Posted request is prohibited.

📋 Posted vs Completion — Implementation Choice

A newly arrived Posted Write may optionally pass a queued Completion going in the same direction. This is a Yes/No cell — both choices are compliant. Neither data corruption nor deadlock results from either decision.

When Relaxed Ordering (RO) is set on the Completion, passing the queued MWr is specifically permitted. This is the most useful case: a GPU read completion bypassing queued DMA writes to deliver frame data back to the CPU faster.

There is one bridge-specific exception: in a PCIe-to-PCI/PCI-X bridge translating traffic from PCIe into PCI, a Posted Write must be able to pass a Completion or a deadlock can form due to PCI’s legacy delayed-transaction model. For all native PCIe-to-PCIe paths this does not apply.

📋 Non-Posted Rules

Non-Posted must not pass Posted

A read request or non-posted write must never bypass an earlier Posted Write. This enforces write-before-read ordering. If a read bypassed an earlier write and returned data, the data it returned could be the pre-write value — old data, delivered to software as if it were current. No error flag.

This is the read-side mirror of the core Posted rule. Together they ensure that all writes a device has issued are visible at the target before any subsequent read from that same device can return.

Non-Posted may pass Non-Posted — Yes/No

Two non-posted requests from different contexts may optionally reorder relative to each other. If an MRd is stalled because the NP buffer at the next hop is full, a subsequent MRd from an unrelated context may be allowed to bypass it. No correctness risk — the two reads target different addresses and have no dependency between them. This is called weak ordering and exists to prevent head-of-line blocking from spreading across unrelated traffic.

Non-Posted may pass Completion — Yes/No

A non-posted request may optionally bypass a queued completion. Again, purely implementation choice. The read request and the completion are almost certainly unrelated.

📋 Completion Rules

Completion may pass Posted — Yes/No (Yes if RO set)

A Completion going toward the original requester may optionally bypass queued Posted Writes heading in the same direction. Without Relaxed Ordering this is an implementation choice. With RO=1 on the completion, switches are specifically permitted to let it pass — improving read latency by not making completions wait behind write queues.

Completion must pass Non-Posted — mandatory Yes

A Completion must always be allowed to pass a queued Non-Posted request. This is the second mandatory rule and it exists for the same reason as the Posted-passes-NP rule: deadlock.

The scenario: a requester holds Non-Posted flow-control credits while waiting for a completion. The completion is stuck behind a queued MRd at an intermediate switch. The MRd is stuck because the NP buffer downstream is full. The NP buffer is full because the requester’s own NP buffer is backed up waiting for… the completion. If the completion cannot bypass the stuck MRd, nothing moves. Allowing it to pass breaks the deadlock.

Completions with different IDs may pass each other — Yes/No

Two completions returning to different requesters (different Requester ID + Tag combinations) may optionally reorder relative to each other. They are delivering data to completely different waiting contexts. Neither one cares what order the other arrives in.

Completions for the same request must not reorder — hard No

When a single large read is satisfied by multiple CplD TLPs (a split completion), those partial completions must arrive at the requester in ascending address order. CplD #2 must not arrive before CplD #1. If it did, the requester would assemble the pieces in the wrong order — corrupted data, no error flag.

📋 Deadlock — Why Some Cells Are Mandatory

Four cells in the table are mandatory — not optional performance hints but hard requirements without which the fabric can permanently stall. The two “Posted must pass Non-Posted” entries and the two “Completion must pass Non-Posted” entries all exist to prevent circular waits.

Deadlock Circle — Completion Blocked Behind Non-Posted Endpoint Sent MRd · waits for CplD Holds NP FC credits until completion arrives Switch Posted buffer FULL CplD stuck behind MWr MWr can’t drain Root Complex CplD ready to send blocked at Switch Endpoint waits for CplD · CplD waits for MWr · MWr waits for Endpoint RX space · DEADLOCK Fix: CplD must be allowed to pass the queued MWr → deadlock broken
Figure 6 — Deadlock circle. The Endpoint holds NP credits waiting for its CplD. The CplD is stuck at the Switch behind a full Posted queue. The Posted queue cannot drain because the Endpoint’s receive buffer is full. The Endpoint’s receive buffer is full because it is waiting for the CplD. The mandatory rule “Completion must pass Posted/Non-Posted” cuts this circle at the Switch.

📋 Relaxed Ordering (RO)

Relaxed Ordering is a single bit in the TLP header (DW0 bit 13, Attr[1]). When set to 1 by the software driver, it is a declaration: “I guarantee this packet has no ordering dependency on earlier posted writes. You may let it pass them.”

A switch that sees RO=1 is permitted — but not required — to reorder that packet ahead of earlier posted writes. This makes it an advisory hint rather than a command.

Where RO helps

Where RO is unsafe

Never set RO on a flag write that follows a data write. The flag write depends on the data write having arrived. Marking the flag write as RO allows it to bypass the data write — reintroducing the exact Producer/Consumer corruption the ordering rules exist to prevent. RO is only safe when software can genuinely guarantee the marked packet has zero dependency on anything that came before it.

TLP with RO=1Can pass…
Posted WriteEarlier Posted Writes and Messages
MessageEarlier Posted Writes and Messages
Read Completion (CplD)Earlier Posted Writes and Messages

📋 ID-Based Ordering (IDO)

ID-Based Ordering (IDO, added in PCIe 2.1) is a performance enhancement based on a simple observation: packets from different Requester IDs almost certainly have no ordering relationship with each other. A write from Device A and a write from Device B are almost always independent — they come from different software contexts, targeting different memory regions.

IDO allows a switch to reorder two TLPs that would normally be kept in order, as long as they have different Requester IDs. It effectively says: “treat each device’s traffic as its own independent ordered stream — don’t let one device’s blockage stall another device’s unrelated traffic.”

IDO — Blockage From One Device Does Not Spread to Other Devices Without IDO — blockage spreads MWr Dev-A (STUCK) MWr Dev-B (blocked) CplD for Dev-C (blocked) Dev-B and Dev-C have no dependency on Dev-A yet they wait. With IDO — blockage is contained MWr Dev-A (STUCK) MWr Dev-B → passes ✓ CplD Dev-C → passes ✓ Different Requester IDs = independent streams. Dev-A only blocks Dev-A.
Figure 7 — IDO effect. Without IDO, a stuck MWr from Device A blocks all subsequent packets regardless of source. With IDO enabled, packets from Device B and Device C are recognised as independent streams and bypass the stuck Dev-A packet freely.

Enabling and using IDO safely

📋 Ordering and Virtual Channels

Every Virtual Channel buffer is split into three independent sub-buffers: Posted (P), Non-Posted (NP), and Completion (CPL). Each sub-buffer has its own flow-control credit pool. This physical separation is what makes the mandatory “Yes” rules implementable — the CPL queue can always drain past a full P queue because they are different hardware structures with independent credits.

Ordering rules apply strictly within a single VC. Two TLPs travelling in different VCs have no ordering relationship whatsoever — a TC 7 packet in VC 1 can freely overtake a TC 0 packet in VC 0 without any restriction.

When multiple Traffic Classes are mapped to the same VC (the common case — most systems use only VC 0 for all TC values), the implementation may choose to apply ordering rules across all traffic within that VC for simplicity. The rules only require enforcement within a single TC; applying them across a full VC is a valid superset.

Ordering in Gen 6

The transaction ordering rules are completely unchanged in Gen 6. They live in the Transaction Layer. Gen 6 changes only the Physical Layer — flit packing and FEC are entirely transparent to ordering logic.

Zero impact on ordering logic for Gen 6. Every rule in this post applies identically to a Gen 6 link. If you are writing switch RTL or a PCIe controller, your ordering enforcement code is the same for Gen 1 through Gen 6.

📋 Quick Reference

RuleCellPlain-English Meaning
Posted must not pass Posted No — hard Data writes must arrive before flag writes. The foundation of Producer/Consumer correctness. No exceptions (except IDO with different device IDs).
Posted must pass Non-Posted Read Yes — mandatory A write must be able to bypass a stuck read request. Required to break circular deadlocks. Cannot be blocked.
Posted must pass Non-Posted Write Yes — mandatory Same deadlock reason as above, for IOWr/CfgWr stuck in the NP queue.
Posted may pass Completion Yes/No Implementation choice. No correctness or deadlock risk either way.
Non-Posted must not pass Posted No — hard A read must not bypass a write that preceded it. Would break write-before-read ordering.
Non-Posted may pass Non-Posted Yes/No Weak ordering — independent reads from unrelated contexts may reorder. No correctness risk.
Non-Posted may pass Completion Yes/No Implementation choice.
Completion may pass Posted Yes/No Yes when Relaxed Ordering bit is set. Enables fast read completion bypass of write queues.
Completion must pass Non-Posted Read Yes — mandatory A completion must bypass a stuck read request. Mandatory deadlock prevention.
Completion must pass Non-Posted Write Yes — mandatory Same as above, for IOWr/CfgWr.
Completion may pass Completion (diff IDs) Yes/No Two completions returning to different requesters may reorder. Each goes to a different context.
Completion must not pass Completion (same ID) No — hard Split completions for the same read must arrive in ascending address order. Out-of-order assembly = corruption.
Relaxed Ordering (RO) DW0 bit 13 Driver declares no dependency on prior writes. Switch may (not must) let the TLP pass earlier posted writes. Never safe for flag writes.
ID-Based Ordering (IDO) DW0 bit 14 Packets from different Requester IDs may reorder. Safe only when devices share no state through common memory. Enabled in Device Control 2.
Ordering scope Within TC/VC Different TCs have no ordering relationship. Different VCs have no ordering relationship.
Gen 6 impact None All rules, RO, IDO, P/NP/CPL buffer separation — identical in Gen 6. Flit packing is transparent.

Scroll to Top