How PCIe prevents a fast transmitter from overwhelming a slow receiver — the six credit types (PH, PD, NPH, NPD, CPLH, CPLD), credit unit sizes, FC DLLP format, the two-phase FC_INIT handshake, infinite credits and why completions need them, UpdateFC credit return, and how Gen 6 handles flow control inside flits.
PCIe is a packet-switched, full-duplex link. At Gen 6 speeds a x16 link can deliver ~122 GB/s in each direction simultaneously. A device at one end of the link has finite receive buffers — if the transmitter sends TLPs faster than the receiver can process and consume them, the buffers overflow and TLPs are dropped.
PCI solved this with a very different mechanism: bus-wide retry signals (STOP#, RETRY#). If a target was busy it asserted RETRY# and the initiator tried again. On a shared bus this is cheap — the bus is stalled anyway. On a PCIe point-to-point link there is no retry signal. Dropping a TLP and retransmitting it costs a full round-trip latency and wastes link bandwidth.
PCIe’s answer is credit-based flow control: the receiver tells the transmitter exactly how much buffer space it has available. The transmitter never sends more than what the receiver has space to hold. No overflow, no drops, no retries — just a clean credit accounting system operating at the Data Link Layer.
A credit is a unit of receive buffer space. The receiver counts its available buffer space in credits and advertises that count to the transmitter. The transmitter maintains a running credit balance. Before sending any TLP, it checks whether it has enough credits for that TLP. If yes, it deducts credits and sends the TLP. If no, it waits.
Credits are not consumed permanently. When the receiver’s Transaction Layer processes a TLP and frees the buffer space that held it, the receiver returns those credits to the transmitter via an UpdateFC DLLP. The transmitter adds them back to its balance. This circulation continues indefinitely during normal operation.
Flow control credits are tracked independently for six buffer types. The three TLP categories (Posted, Non-Posted, Completion) each have separate header and data credit pools, giving six credit types in total. This separation is what allows the mandatory ordering rules (completions must pass non-posted) to be implementable — they use physically separate credit pools.
| Credit type | Short name | What it covers |
|---|---|---|
| Posted Header | PH | One credit per MWr, Msg, or MsgD header regardless of payload size. Covers the header only. |
| Posted Data | PD | Covers the data payload of MWr, MsgD. Separate from PH — a large MWr uses one PH credit but many PD credits. |
| Non-Posted Header | NPH | One credit per MRd, IORd, IOWr, CfgRd, CfgWr header. |
| Non-Posted Data | NPD | Covers the payload of IOWr and CfgWr requests (reads have no payload so they only consume NPH). |
| Completion Header | CPLH | One credit per Cpl or CplD header. |
| Completion Data | CPLD | Covers the data payload of CplD. A large split read may return many CplD TLPs each consuming CPLD credits. |
A credit is not a byte — it is a fixed-size unit chosen to balance accounting precision against counter width. Different credit types have different unit sizes:
A Memory Write (MWr) with a 256-byte payload sent to a receiver that has advertised PH and PD credits:
Before the transmitter DLL passes any TLP to the Physical Layer, it performs a credit check. The check is a simple comparison of two counters per credit type:
A TLP may be sent if and only if: CREDITS_LIMIT − CREDITS_CONSUMED ≥ credits_needed_for_this_TLP
This check is performed independently for each credit type. A TLP can only be sent when all relevant credit pools (header + data, for the appropriate TLP category) have sufficient space. If either pool is exhausted, the transmitter must wait — the TLP sits in the Transaction Layer’s egress queue until UpdateFC returns enough credits.
All Flow Control DLLPs — InitFC1, InitFC2, and UpdateFC — use the same 8-byte format. Only the type byte changes.
No TLPs may be sent on a link until flow control initialisation is complete. This process happens automatically in hardware immediately after Physical Layer link training succeeds. It involves a two-phase handshake (FC_INIT1 and FC_INIT2) carried out simultaneously in both directions.
The two-phase design handles the case where the two devices finish phase 1 at different times. If Device A finishes phase 1 before Device B, A transitions to phase 2 and starts sending InitFC2 DLLPs. Device B ignores InitFC2 while it is still in phase 1 waiting to receive enough InitFC1 repetitions to be confident the values were received reliably. Since A is still sending InitFC1 DLLPs during phase 2 (they carry the same credit values), B eventually receives them, completes phase 1, and transitions to phase 2. Both sides then converge to DL_Active.
A credit value of 0x000 during FC initialisation has a special meaning: infinite credits. A receiver that advertises infinite credits is guaranteeing that its receive buffer for that credit type will never overflow — the transmitter may send without checking that credit pool.
| Device type | Infinite completion credits? | Why |
|---|---|---|
| Endpoint (NVMe, GPU, NIC) | Yes — CPLH = 0x000, CPLD = 0x000 | Originates requests. Must be able to receive all resulting completions without stalling the completer. No risk of overflow because the endpoint controls how many requests it sends. |
| Root Complex (no peer-to-peer) | Yes — CPLH = 0x000, CPLD = 0x000 | Acts as completer for device-initiated reads. Must always be able to receive completions for its own requests. Infinite completion credits prevent deadlock. |
| Switch downstream port | No — finite completion credits | Forwards completions between devices. Has finite buffers. Must advertise actual buffer space so upstream devices do not flood it. |
| Root Complex (with peer-to-peer) | No — finite completion credits | May need to buffer completions for peer-to-peer transactions between devices. Finite credits prevent its own buffers from overflowing. |
After FC initialisation, the receiver continuously sends UpdateFC DLLPs as its Transaction Layer processes TLPs and frees buffer space. UpdateFC DLLPs use the same format as InitFC DLLPs (same byte layout) but with type bytes in the 0x80/0x90/0xA0 range instead of the 0x40/0x50/0x60 range.
UpdateFC carries a cumulative count — the total number of credits that have ever been available to the transmitter, not just the credits being returned in this DLLP. This means a single delayed UpdateFC DLLP does not create a problem — the next one that arrives will contain the accumulated total, including all the credits the previous DLLP would have returned.
The transmitter maintains two counters per credit type, per Virtual Channel. Together they implement the credit check:
| Counter | Width | Meaning | Updated when |
|---|---|---|---|
| CREDITS_LIMIT | 12 bits (header) / 12 bits (data) | Total credits available from the receiver. Starts with the InitFC1 advertisement value. Increments each time an UpdateFC DLLP is received. | InitFC1 received (set initial value). UpdateFC received (increment). |
| CREDITS_CONSUMED | 12 bits (header) / 12 bits (data) | Credits used by TLPs that have been sent but not yet returned by the receiver. This is a wrapping counter. | Incremented by the credit cost of each TLP sent. |
Available credits at any moment = CREDITS_LIMIT − CREDITS_CONSUMED (modulo 2¹²). The check is performed as a signed comparison accounting for the wrap-around nature of the counter — the two values must never be more than half the counter range (2048) apart, or the counter semantics become ambiguous.
| Credit type | Minimum advertisement | Maximum advertisement |
|---|---|---|
| PH | 1 credit (covers one 4DW request header + ECRC = 5DW = 20 bytes) | 128 credits (128 × 20 bytes = 2,560 bytes) |
| PD | Credits for the maximum Max_Payload_Size supported by the device (e.g. 1024B MPS = 64 credits) | 2048 credits (2048 × 16 bytes = 32,768 bytes = 8 × 4096 bytes, max from 8 functions) |
| NPH | 1 credit | 128 credits |
| NPD | 1 credit (2 credits for AtomicOp capable devices) | 128 credits (same as NPH since NP data always travels with NP headers) |
| CPLH | 1 credit (switches and P2P RC) · Infinite (endpoints, standard RC) | 128 credits (switches, P2P RC) · Infinite (endpoints, standard RC) |
| CPLD | Credits for max MPS or max read request size (whichever smaller) · Infinite (endpoints) | 2048 credits (switches, P2P RC) · Infinite (endpoints, standard RC) |
All six credit types are tracked independently per Virtual Channel. A 6-credit-type × 8-VC system has 48 independent credit pools. Each VC initialises its own flow control when that VC is enabled by software — VC0 initialises automatically (it is always enabled), VC1–VC7 initialise when software enables them.
This per-VC independence is what enables Quality-of-Service guarantees. A high-priority video stream in VC7 can be guaranteed a minimum bandwidth allocation because its credit pool is completely separate from the VC0 best-effort DMA traffic pool. Even if VC0 PD credits are exhausted by a flood of best-effort writes, VC7 PD credits are unaffected and video TLPs continue to flow.
The flow control credit model — six types, credit unit sizes, FC_INIT handshake, infinite completion credits, UpdateFC return — is completely unchanged in Gen 6. Flow control is a Transaction Layer and Data Link Layer mechanism. Gen 6 changes only the Physical Layer.
In Gen 1–5, FC DLLPs travel between TLPs in inter-packet gaps or dedicated framing windows. In Gen 6, there are no inter-packet gaps — TLPs are packed continuously into 256-byte flits. FC DLLPs are embedded inside flits alongside TLPs. The flit header contains a field that identifies which bytes within the flit carry DLLPs. The receiver’s Data Link Layer unpacks DLLPs from the flit before passing TLPs to the Transaction Layer.
The DLLP content — type byte, HdrFC, DataFC, CRC — is identical. Only the physical packaging changes. An UpdateFC-P DLLP carrying HdrFC=42, DataFC=100 has exactly the same bit pattern in Gen 6 as in Gen 1.
In Gen 6, multiple TLPs are packed into one flit. The credit check still happens per-TLP, not per-flit. Before each TLP is added to a flit, the transmitter checks whether sufficient credits exist for that TLP. The flit packing is a Physical Layer concern — by the time the TL hands TLPs to the DLL for flit packing, the credit check has already been performed and cleared.
At ~122 GB/s on a Gen 6 x16 link, credit pools drain far faster than at Gen 1 speeds. A receiver’s PD credit pool of 128 credits (2048 bytes) at Gen 1 would take roughly 800 µs to drain. At Gen 6, the same pool drains in nanoseconds. This means UpdateFC DLLPs must be sent far more frequently in Gen 6 systems to avoid credit starvation. Hardware implementations of Gen 6 endpoints must service their UpdateFC generation at much tighter intervals than Gen 1 designs. The protocol is unchanged — the timing expectations are far more aggressive.
| Item | Value / Rule |
|---|---|
| Purpose of flow control | Prevent receiver buffer overflow. Transmitter never sends more than available credits allow. No drops, no retries. |
| Six credit types | PH (Posted Header) · PD (Posted Data) · NPH (Non-Posted Header) · NPD (Non-Posted Data) · CPLH (Completion Header) · CPLD (Completion Data) |
| Header credit unit size | 1 credit = 1 header slot. Request headers = 5 DW (20 bytes). Completion headers = 4 DW (16 bytes). |
| Data credit unit size | 1 credit = 4 DW = 16 bytes. All data credit types (PD, NPD, CPLD) use the same 16-byte unit. |
| Credits per TLP | 1 header credit + ⌈payload_bytes / 16⌉ data credits. For TLPs with no payload (MRd), only 1 header credit. |
| FC DLLP format | 8 bytes: Type byte (with VC ID) · HdrFC (12-bit) · DataFC (12-bit) · 16-bit CRC. Shared by InitFC1, InitFC2, UpdateFC. |
| FC_INIT1 | Both devices continuously send three InitFC1 DLLPs (P, NP, Cpl in that order) advertising receive buffer sizes. Repeats until neighbour values confirmed. |
| FC_INIT2 | After registering neighbour credits, send three InitFC2 DLLPs to confirm. When both sides complete InitFC2, link enters DL_Active and TLPs may flow. |
| TLPs blocked until… | Both FC_INIT1 and FC_INIT2 phases complete on the link. No TLP may be sent before DL_Active. |
| Infinite credits encoding | InitFC1/2 with HdrFC=0x000 and/or DataFC=0x000 means infinite credits for that type. No UpdateFC is sent for infinite-credit pools. |
| Who advertises infinite CPLH/CPLD | Endpoints and Root Complexes without peer-to-peer support. Required — otherwise deadlock is possible when all completion credits are in use. |
| CREDITS_LIMIT counter | Transmitter counter tracking total available credits from receiver. Set by InitFC1, updated by UpdateFC. |
| CREDITS_CONSUMED counter | Transmitter counter tracking credits deducted for sent-but-not-returned TLPs. Incremented per TLP sent. |
| Available credits formula | Available = CREDITS_LIMIT − CREDITS_CONSUMED (mod 2¹²). TLP sent only if Available ≥ credits_needed. |
| UpdateFC | Receiver sends periodically as it processes TLPs and frees buffer space. Carries cumulative total credit count, not delta. Transmitter updates CREDITS_LIMIT. |
| Minimum PH/NPH | 1 credit minimum advertisement. Ensures at least one request can always be accepted. |
| Minimum PD | Credits for the largest Max_Payload_Size the device supports. Ensures the first MWr can always be received. |
| Maximum PH/NPH/CPLH | 128 credits (for devices with finite completion credits). |
| Maximum PD/CPLD | 2048 credits (for devices with finite credit pools). |
| Per-VC independence | All six credit types tracked independently per Virtual Channel. VC0 always initialised. VC1–7 init when enabled by software. |
| Gen 6 impact | Credit types, unit sizes, InitFC handshake, infinite credit encoding, UpdateFC format — all identical. FC DLLPs embedded in flits. Credit check still per-TLP. UpdateFC frequency must increase proportionally to link bandwidth. |