Why order matters in a packet-switched fabric, the three TLP categories, the complete ordering table explained in plain English, the Producer/Consumer model that motivates it all, deadlock prevention, Relaxed Ordering, ID-Based Ordering, and Gen 6.
PCIe is a packet-switched fabric. Every switch port has independent buffers that can stall or drain at different rates. Without ordering rules a small, lightweight packet can slide past a large stalled packet and arrive at the destination first — even though it was sent second. Software that depends on arrival order gets wrong results with no error flag raised anywhere.
The ordering rules solve three problems at once:
The ordering table groups TLPs into three buckets. Every TLP in the system belongs to exactly one of them:
Most ordering rules exist to protect one specific programming pattern: a Producer writes data to memory and sets a flag, a Consumer polls the flag and reads the data when it is 1. This pattern is everywhere — NIC DMA, GPU command queues, NVMe submission rings.
The table is structured as Row passes Column. Columns represent TLPs already waiting in an egress queue. Rows represent a newly arrived TLP that wants to go out the same port. Each cell answers one question: may the row TLP jump ahead of the column TLP?
Read each row as: “This newly arrived TLP — may it pass a TLP that is already queued?”
| Newly arrived TLP ↓ Already queued TLP → |
Posted (MWr · Msg) |
Non-Posted Read (MRd · IORd) |
Non-Posted Write (IOWr · CfgWr) |
Completion (Cpl · CplD) |
|---|---|---|---|---|
| Posted Write / Message (MWr · Msg · MsgD) |
No core ordering rule |
Yes deadlock prevention |
Yes deadlock prevention |
Yes/No impl. choice |
| Non-Posted Read (MRd · IORd · AtomicOp) |
No write-before-read |
Yes/No | Yes/No | Yes/No |
| Non-Posted Write (IOWr · CfgWr0/1) |
No write-before-read |
Yes/No | Yes/No | Yes/No |
| Completion (Cpl · CplD) |
Yes/No Yes if RO set |
Yes deadlock prevention |
Yes deadlock prevention |
Yes/No different IDs: Y/N same ID: No |
There are four hard rules in this table — three No cells and two mandatory Yes cells. Everything else is implementation choice. The sections below explain each group in plain English.
A Posted Write must not pass a Posted Write that arrived earlier.
This single rule protects the Producer/Consumer pattern. The data write (①) and the flag write (②) are both MWr TLPs. Without this rule a lightweight flag write can bypass a heavy data write inside a switch buffer and arrive first. The Consumer then reads Flag=1 before the data has landed — silent data corruption with no error flag raised anywhere in the system.
There is one exception for advanced use: when ID-Based Ordering (IDO) is enabled and two Posted packets come from devices with different Requester IDs, they may be allowed to reorder — because packets from different devices are almost certainly unrelated. IDO is explained later in this post.
A Posted Write must be allowed to pass a queued Non-Posted request. This is not optional — it is mandatory to prevent deadlock.
The scenario that demands this rule: a read request (MRd) is stuck at an egress port because the NP buffer at the next hop is full. If an MWr is not allowed to bypass that stuck MRd, the MWr is also stuck. If the MWr carries data that the read’s target needs to return in its completion, and the completion cannot return until the MWr lands — nothing moves. The switch, the requester, and the target all wait on each other forever.
Allowing the MWr to pass the stuck MRd breaks this circle. The data lands. The target reads it. The completion flows back. The MRd resolves. Everything drains.
A newly arrived Posted Write may optionally pass a queued Completion going in the same direction. This is a Yes/No cell — both choices are compliant. Neither data corruption nor deadlock results from either decision.
When Relaxed Ordering (RO) is set on the Completion, passing the queued MWr is specifically permitted. This is the most useful case: a GPU read completion bypassing queued DMA writes to deliver frame data back to the CPU faster.
There is one bridge-specific exception: in a PCIe-to-PCI/PCI-X bridge translating traffic from PCIe into PCI, a Posted Write must be able to pass a Completion or a deadlock can form due to PCI’s legacy delayed-transaction model. For all native PCIe-to-PCIe paths this does not apply.
A read request or non-posted write must never bypass an earlier Posted Write. This enforces write-before-read ordering. If a read bypassed an earlier write and returned data, the data it returned could be the pre-write value — old data, delivered to software as if it were current. No error flag.
This is the read-side mirror of the core Posted rule. Together they ensure that all writes a device has issued are visible at the target before any subsequent read from that same device can return.
Two non-posted requests from different contexts may optionally reorder relative to each other. If an MRd is stalled because the NP buffer at the next hop is full, a subsequent MRd from an unrelated context may be allowed to bypass it. No correctness risk — the two reads target different addresses and have no dependency between them. This is called weak ordering and exists to prevent head-of-line blocking from spreading across unrelated traffic.
A non-posted request may optionally bypass a queued completion. Again, purely implementation choice. The read request and the completion are almost certainly unrelated.
A Completion going toward the original requester may optionally bypass queued Posted Writes heading in the same direction. Without Relaxed Ordering this is an implementation choice. With RO=1 on the completion, switches are specifically permitted to let it pass — improving read latency by not making completions wait behind write queues.
A Completion must always be allowed to pass a queued Non-Posted request. This is the second mandatory rule and it exists for the same reason as the Posted-passes-NP rule: deadlock.
The scenario: a requester holds Non-Posted flow-control credits while waiting for a completion. The completion is stuck behind a queued MRd at an intermediate switch. The MRd is stuck because the NP buffer downstream is full. The NP buffer is full because the requester’s own NP buffer is backed up waiting for… the completion. If the completion cannot bypass the stuck MRd, nothing moves. Allowing it to pass breaks the deadlock.
Two completions returning to different requesters (different Requester ID + Tag combinations) may optionally reorder relative to each other. They are delivering data to completely different waiting contexts. Neither one cares what order the other arrives in.
When a single large read is satisfied by multiple CplD TLPs (a split completion), those partial completions must arrive at the requester in ascending address order. CplD #2 must not arrive before CplD #1. If it did, the requester would assemble the pieces in the wrong order — corrupted data, no error flag.
Four cells in the table are mandatory — not optional performance hints but hard requirements without which the fabric can permanently stall. The two “Posted must pass Non-Posted” entries and the two “Completion must pass Non-Posted” entries all exist to prevent circular waits.
Relaxed Ordering is a single bit in the TLP header (DW0 bit 13, Attr[1]). When set to 1 by the software driver, it is a declaration: “I guarantee this packet has no ordering dependency on earlier posted writes. You may let it pass them.”
A switch that sees RO=1 is permitted — but not required — to reorder that packet ahead of earlier posted writes. This makes it an advisory hint rather than a command.
Never set RO on a flag write that follows a data write. The flag write depends on the data write having arrived. Marking the flag write as RO allows it to bypass the data write — reintroducing the exact Producer/Consumer corruption the ordering rules exist to prevent. RO is only safe when software can genuinely guarantee the marked packet has zero dependency on anything that came before it.
| TLP with RO=1 | Can pass… |
|---|---|
| Posted Write | Earlier Posted Writes and Messages |
| Message | Earlier Posted Writes and Messages |
| Read Completion (CplD) | Earlier Posted Writes and Messages |
ID-Based Ordering (IDO, added in PCIe 2.1) is a performance enhancement based on a simple observation: packets from different Requester IDs almost certainly have no ordering relationship with each other. A write from Device A and a write from Device B are almost always independent — they come from different software contexts, targeting different memory regions.
IDO allows a switch to reorder two TLPs that would normally be kept in order, as long as they have different Requester IDs. It effectively says: “treat each device’s traffic as its own independent ordered stream — don’t let one device’s blockage stall another device’s unrelated traffic.”
Every Virtual Channel buffer is split into three independent sub-buffers: Posted (P), Non-Posted (NP), and Completion (CPL). Each sub-buffer has its own flow-control credit pool. This physical separation is what makes the mandatory “Yes” rules implementable — the CPL queue can always drain past a full P queue because they are different hardware structures with independent credits.
Ordering rules apply strictly within a single VC. Two TLPs travelling in different VCs have no ordering relationship whatsoever — a TC 7 packet in VC 1 can freely overtake a TC 0 packet in VC 0 without any restriction.
When multiple Traffic Classes are mapped to the same VC (the common case — most systems use only VC 0 for all TC values), the implementation may choose to apply ordering rules across all traffic within that VC for simplicity. The rules only require enforcement within a single TC; applying them across a full VC is a valid superset.
The transaction ordering rules are completely unchanged in Gen 6. They live in the Transaction Layer. Gen 6 changes only the Physical Layer — flit packing and FEC are entirely transparent to ordering logic.
| Rule | Cell | Plain-English Meaning |
|---|---|---|
| Posted must not pass Posted | No — hard | Data writes must arrive before flag writes. The foundation of Producer/Consumer correctness. No exceptions (except IDO with different device IDs). |
| Posted must pass Non-Posted Read | Yes — mandatory | A write must be able to bypass a stuck read request. Required to break circular deadlocks. Cannot be blocked. |
| Posted must pass Non-Posted Write | Yes — mandatory | Same deadlock reason as above, for IOWr/CfgWr stuck in the NP queue. |
| Posted may pass Completion | Yes/No | Implementation choice. No correctness or deadlock risk either way. |
| Non-Posted must not pass Posted | No — hard | A read must not bypass a write that preceded it. Would break write-before-read ordering. |
| Non-Posted may pass Non-Posted | Yes/No | Weak ordering — independent reads from unrelated contexts may reorder. No correctness risk. |
| Non-Posted may pass Completion | Yes/No | Implementation choice. |
| Completion may pass Posted | Yes/No | Yes when Relaxed Ordering bit is set. Enables fast read completion bypass of write queues. |
| Completion must pass Non-Posted Read | Yes — mandatory | A completion must bypass a stuck read request. Mandatory deadlock prevention. |
| Completion must pass Non-Posted Write | Yes — mandatory | Same as above, for IOWr/CfgWr. |
| Completion may pass Completion (diff IDs) | Yes/No | Two completions returning to different requesters may reorder. Each goes to a different context. |
| Completion must not pass Completion (same ID) | No — hard | Split completions for the same read must arrive in ascending address order. Out-of-order assembly = corruption. |
| Relaxed Ordering (RO) | DW0 bit 13 | Driver declares no dependency on prior writes. Switch may (not must) let the TLP pass earlier posted writes. Never safe for flag writes. |
| ID-Based Ordering (IDO) | DW0 bit 14 | Packets from different Requester IDs may reorder. Safe only when devices share no state through common memory. Enabled in Device Control 2. |
| Ordering scope | Within TC/VC | Different TCs have no ordering relationship. Different VCs have no ordering relationship. |
| Gen 6 impact | None | All rules, RO, IDO, P/NP/CPL buffer separation — identical in Gen 6. Flit packing is transparent. |