In Lecture 20, we explored how the Relaxed Ordering (RO) attribute allows software to flag specific packets to bypass blocked traffic. But what happens when a traffic jam involves packets originating from completely different, unrelated devices?
In Lecture 21, we are exploring a newer PCIe performance optimization feature: ID-Based Ordering (IDO). We will discover how IDO prevents unnecessary transaction stalls by looking at who sent the packet, allowing the fabric to safely reorder packets from completely different Requesters!
Part 1: The Problem with Unrelated Traffic
In our previous discussions on strict ordering, we learned that if a transaction is blocked (for instance, a posted write delayed due to a lack of Flow Control credits at a Root port), every subsequent transaction must wait in line behind it.
However, this strict rule ignores the nature of traffic streams. To use the PCIe specification’s terminology, packets coming from the exact same Requester are called a TLP stream. It is highly unlikely that packets from completely different requesters (different TLP streams) have any logical dependencies on one another.
Without IDO, if “Device A” sends a transaction that gets blocked, and “Device B” sends a completely unrelated transaction a microsecond later, Device B’s packet is unnecessarily delayed simply because it must stay in order with Device A’s packet.
Part 2: The ID-Based Ordering Solution
The solution to this performance bottleneck is incredibly straightforward: allow packets to be safely reordered if they do not share the same Requester ID (or Completer ID, in the case of Completion packets).
When ID-Based Ordering is enabled, a switch port can look at the incoming packets, recognize that they belong to different TLP streams based on their IDs, and allow them to safely bypass the stalled traffic. By treating traffic from different sources independently, the fabric maintains high throughput and prevents one device’s bottleneck from paralyzing the entire system.
Part 3: How IDO is Controlled
Just like Relaxed Ordering, ID-Based Ordering requires coordination between software and hardware:
- Software Enablement: Software enables the use of IDO for Requests or Completions coming from a given port by setting specific bits in the device’s Device Control 2 Register.
- The IDO Attribute Bit: Once enabled, software still decides whether each individual packet should use the feature. A specific attribute bit in the TLP header indicates to the system whether a packet is officially using IDO.
- Independent Completions: Interestingly, Completers can enable and use IDO independently. This means that a returning Completion packet might use IDO to bypass traffic, even if the original Request that initiated it did not have IDO enabled.
Part 4: When is it Safe to use IDO?
The PCIe specification highly recommends using IDO (and Relaxed Ordering) whenever it is safely possible. However, it is not a blanket solution for every scenario:
- When it is Safe: It is generally perfectly safe for an Endpoint to use IDO for all its TLPs if it is communicating directly with only one other entity, such as the Root Complex.
- When it is Unsafe: IDO becomes unsafe if an Endpoint is communicating with multiple agents simultaneously. For example, if a device performs a DMA write to system memory and then immediately performs a peer-to-peer write to trigger a flag in a second device, IDO could cause problems. If that second device receives the flag and triggers its own DMA write, IDO might allow the second device’s write to arrive at memory before the first device’s write, completely breaking the intended sequence.
