PCIe Flow Control Explained | Credits, Tags & Credit Return

📋 Why Flow Control Exists

PCIe is a packet-switched, full-duplex link. At Gen 6 speeds a x16 link can deliver ~122 GB/s in each direction simultaneously. A device at one end of the link has finite receive buffers — if the transmitter sends TLPs faster than the receiver can process and consume them, the buffers overflow and TLPs are dropped.

PCI solved this with a very different mechanism: bus-wide retry signals (STOP#, RETRY#). If a target was busy it asserted RETRY# and the initiator tried again. On a shared bus this is cheap — the bus is stalled anyway. On a PCIe point-to-point link there is no retry signal. Dropping a TLP and retransmitting it costs a full round-trip latency and wastes link bandwidth.

PCIe’s answer is credit-based flow control: the receiver tells the transmitter exactly how much buffer space it has available. The transmitter never sends more than what the receiver has space to hold. No overflow, no drops, no retries — just a clean credit accounting system operating at the Data Link Layer.

Figure 1 — Credit-based flow control. The receiver periodically sends FC DLLPs advertising available buffer space in credits. The transmitter deducts credits for each TLP sent and never transmits when the credit count for that TLP type would go negative. As the receiver processes TLPs and frees buffer space, it returns credits via UpdateFC DLLPs.

📋 The Credit Concept

A credit is a unit of receive buffer space. The receiver counts its available buffer space in credits and advertises that count to the transmitter. The transmitter maintains a running credit balance. Before sending any TLP, it checks whether it has enough credits for that TLP. If yes, it deducts credits and sends the TLP. If no, it waits.

Credits are not consumed permanently. When the receiver’s Transaction Layer processes a TLP and frees the buffer space that held it, the receiver returns those credits to the transmitter via an UpdateFC DLLP. The transmitter adds them back to its balance. This circulation continues indefinitely during normal operation.

Figure 2 — Credit lifecycle. ① Advertise: receiver tells transmitter the total buffer space at link startup. ② Consume: transmitter deducts credits per TLP sent. ③ Return: as the receiver processes TLPs and frees buffer space, UpdateFC DLLPs return credits to the transmitter. The cycle repeats continuously.

📋 Six Credit Types

Flow control credits are tracked independently for six buffer types. The three TLP categories (Posted, Non-Posted, Completion) each have separate header and data credit pools, giving six credit types in total. This separation is what allows the mandatory ordering rules (completions must pass non-posted) to be implementable — they use physically separate credit pools.

Figure 3 — Six credit types organised by TLP category (rows) and header vs data (columns). Each cell is an independently managed credit pool with its own DLLP counter. The separation is essential for deadlock prevention — the ordering rules that mandate completions pass non-posted traffic require independent pools.

Credit type	Short name	What it covers
Posted Header	PH	One credit per MWr, Msg, or MsgD header regardless of payload size. Covers the header only.
Posted Data	PD	Covers the data payload of MWr, MsgD. Separate from PH — a large MWr uses one PH credit but many PD credits.
Non-Posted Header	NPH	One credit per MRd, IORd, IOWr, CfgRd, CfgWr header.
Non-Posted Data	NPD	Covers the payload of IOWr and CfgWr requests (reads have no payload so they only consume NPH).
Completion Header	CPLH	One credit per Cpl or CplD header.
Completion Data	CPLD	Covers the data payload of CplD. A large split read may return many CplD TLPs each consuming CPLD credits.

📋 Credit Unit Sizes

A credit is not a byte — it is a fixed-size unit chosen to balance accounting precision against counter width. Different credit types have different unit sizes:

Figure 4 — Credit unit sizes. Header credits cover one header slot (5 DW = 20 bytes for requests, 4 DW = 16 bytes for completions). Data credits are always 4 DW = 16 bytes per credit. A single TLP always consumes one header credit plus as many data credits as its payload requires.

Worked example — credits consumed by one TLP

A Memory Write (MWr) with a 256-byte payload sent to a receiver that has advertised PH and PD credits:

PH credits consumed: 1 (one header, regardless of payload size)
PD credits consumed: 256 bytes ÷ 16 bytes per credit = 16
NPH, NPD, CPLH, CPLD: unchanged — MWr does not use these pools

📋 Transmitter Credit Check

Before the transmitter DLL passes any TLP to the Physical Layer, it performs a credit check. The check is a simple comparison of two counters per credit type:

CREDITS_LIMIT — the total credits available from the receiver (updated by UpdateFC DLLPs)
CREDITS_CONSUMED — how many credits have been used by TLPs sent but not yet returned

A TLP may be sent if and only if: CREDITS_LIMIT − CREDITS_CONSUMED ≥ credits_needed_for_this_TLP

This check is performed independently for each credit type. A TLP can only be sent when all relevant credit pools (header + data, for the appropriate TLP category) have sufficient space. If either pool is exhausted, the transmitter must wait — the TLP sits in the Transaction Layer’s egress queue until UpdateFC returns enough credits.

Credit check failure stalls only that TLP category. If the PD (Posted Data) pool is exhausted but NPH and CPLD are fine, the Non-Posted and Completion queues continue to drain. This is why the ordering rules mandate that completions must be able to pass non-posted traffic — their credit pools are independent, so a stalled Posted queue does not prevent completion delivery.

📋 FC DLLP Format

All Flow Control DLLPs — InitFC1, InitFC2, and UpdateFC — use the same 8-byte format. Only the type byte changes.

Figure 5 — FC DLLP format. Byte 0 encodes the type and VC number (lower 3 bits). HdrFC is a 12-bit header credit value. DataFC is a 12-bit data credit value. InitFC1, InitFC2, and UpdateFC all use this same layout — they share the same field positions and only differ in the type byte’s upper nibble.

📋 FC Initialisation — Two Phases

No TLPs may be sent on a link until flow control initialisation is complete. This process happens automatically in hardware immediately after Physical Layer link training succeeds. It involves a two-phase handshake (FC_INIT1 and FC_INIT2) carried out simultaneously in both directions.

Figure 6 — FC initialisation sequence. Phase 1 (InitFC1): both devices continuously send three InitFC1 DLLPs — one for Posted, one for Non-Posted, one for Completions — advertising their receive buffer sizes. Phase 2 (InitFC2): once Phase 1 values are registered, both devices send InitFC2 DLLPs to confirm. After both phases complete, the link enters DL_Active and TLPs may flow.

Why two phases?

The two-phase design handles the case where the two devices finish phase 1 at different times. If Device A finishes phase 1 before Device B, A transitions to phase 2 and starts sending InitFC2 DLLPs. Device B ignores InitFC2 while it is still in phase 1 waiting to receive enough InitFC1 repetitions to be confident the values were received reliably. Since A is still sending InitFC1 DLLPs during phase 2 (they carry the same credit values), B eventually receives them, completes phase 1, and transitions to phase 2. Both sides then converge to DL_Active.

📋 Infinite Credits and Why They Exist

A credit value of 0x000 during FC initialisation has a special meaning: infinite credits. A receiver that advertises infinite credits is guaranteeing that its receive buffer for that credit type will never overflow — the transmitter may send without checking that credit pool.

Figure 7 — Why infinite completion credits are mandatory for endpoints. An endpoint that sends read requests must guarantee it can receive all the resulting CplD TLPs. If it advertised finite completion credits and those ran out, the RC could not return completions and a deadlock would form. Infinite credits prevent this permanently.

Which devices advertise infinite credits

Device type	Infinite completion credits?	Why
Endpoint (NVMe, GPU, NIC)	Yes — CPLH = 0x000, CPLD = 0x000	Originates requests. Must be able to receive all resulting completions without stalling the completer. No risk of overflow because the endpoint controls how many requests it sends.
Root Complex (no peer-to-peer)	Yes — CPLH = 0x000, CPLD = 0x000	Acts as completer for device-initiated reads. Must always be able to receive completions for its own requests. Infinite completion credits prevent deadlock.
Switch downstream port	No — finite completion credits	Forwards completions between devices. Has finite buffers. Must advertise actual buffer space so upstream devices do not flood it.
Root Complex (with peer-to-peer)	No — finite completion credits	May need to buffer completions for peer-to-peer transactions between devices. Finite credits prevent its own buffers from overflowing.

Infinite credits do not mean unbounded buffer space. An endpoint that advertises infinite completion credits must implement logic to never have more outstanding requests than its completion receive buffer can hold. It controls its own flow by limiting how many read requests it sends simultaneously (through Tag exhaustion — once all Tags are in use, it cannot send more requests regardless of completion credits).

📋 UpdateFC — Returning Credits

After FC initialisation, the receiver continuously sends UpdateFC DLLPs as its Transaction Layer processes TLPs and frees buffer space. UpdateFC DLLPs use the same format as InitFC DLLPs (same byte layout) but with type bytes in the 0x80/0x90/0xA0 range instead of the 0x40/0x50/0x60 range.

Figure 8 — UpdateFC credit return. After 10 PD credits are consumed, the transmitter stalls. The receiver processes TLPs, frees buffer space, and sends UpdateFC-P with the new total cumulative credit count (12). Transmitter updates CREDITS_LIMIT to 12. Now available = 12 − 10 = 2 credits, and the transmitter can resume.

UpdateFC carries a cumulative count — the total number of credits that have ever been available to the transmitter, not just the credits being returned in this DLLP. This means a single delayed UpdateFC DLLP does not create a problem — the next one that arrives will contain the accumulated total, including all the credits the previous DLLP would have returned.

📋 Counters Inside the Transmitter

The transmitter maintains two counters per credit type, per Virtual Channel. Together they implement the credit check:

Counter	Width	Meaning	Updated when
CREDITS_LIMIT	12 bits (header) / 12 bits (data)	Total credits available from the receiver. Starts with the InitFC1 advertisement value. Increments each time an UpdateFC DLLP is received.	InitFC1 received (set initial value). UpdateFC received (increment).
CREDITS_CONSUMED	12 bits (header) / 12 bits (data)	Credits used by TLPs that have been sent but not yet returned by the receiver. This is a wrapping counter.	Incremented by the credit cost of each TLP sent.

Available credits at any moment = CREDITS_LIMIT − CREDITS_CONSUMED (modulo 2¹²). The check is performed as a signed comparison accounting for the wrap-around nature of the counter — the two values must never be more than half the counter range (2048) apart, or the counter semantics become ambiguous.

📋 Minimum and Maximum Advertisements

Credit type	Minimum advertisement	Maximum advertisement
PH	1 credit (covers one 4DW request header + ECRC = 5DW = 20 bytes)	128 credits (128 × 20 bytes = 2,560 bytes)
PD	Credits for the maximum Max_Payload_Size supported by the device (e.g. 1024B MPS = 64 credits)	2048 credits (2048 × 16 bytes = 32,768 bytes = 8 × 4096 bytes, max from 8 functions)
NPH	1 credit	128 credits
NPD	1 credit (2 credits for AtomicOp capable devices)	128 credits (same as NPH since NP data always travels with NP headers)
CPLH	1 credit (switches and P2P RC) · Infinite (endpoints, standard RC)	128 credits (switches, P2P RC) · Infinite (endpoints, standard RC)
CPLD	Credits for max MPS or max read request size (whichever smaller) · Infinite (endpoints)	2048 credits (switches, P2P RC) · Infinite (endpoints, standard RC)

The minimum PD advertisement matters. The PD minimum must equal the credits needed to hold the largest payload the device might receive. If a device advertises PD=4 credits (64 bytes) but the sender sends a 256-byte MWr (16 PD credits), the sender would stall waiting for credits that will never arrive from a tiny buffer. The minimum ensures the first MWr can always be accepted without waiting.

📋 Per-Virtual-Channel Independence

All six credit types are tracked independently per Virtual Channel. A 6-credit-type × 8-VC system has 48 independent credit pools. Each VC initialises its own flow control when that VC is enabled by software — VC0 initialises automatically (it is always enabled), VC1–VC7 initialise when software enables them.

This per-VC independence is what enables Quality-of-Service guarantees. A high-priority video stream in VC7 can be guaranteed a minimum bandwidth allocation because its credit pool is completely separate from the VC0 best-effort DMA traffic pool. Even if VC0 PD credits are exhausted by a flood of best-effort writes, VC7 PD credits are unaffected and video TLPs continue to flow.

⚡ Flow Control in Gen 6

The flow control credit model — six types, credit unit sizes, FC_INIT handshake, infinite completion credits, UpdateFC return — is completely unchanged in Gen 6. Flow control is a Transaction Layer and Data Link Layer mechanism. Gen 6 changes only the Physical Layer.

FC DLLPs inside Gen 6 flits

In Gen 1–5, FC DLLPs travel between TLPs in inter-packet gaps or dedicated framing windows. In Gen 6, there are no inter-packet gaps — TLPs are packed continuously into 256-byte flits. FC DLLPs are embedded inside flits alongside TLPs. The flit header contains a field that identifies which bytes within the flit carry DLLPs. The receiver’s Data Link Layer unpacks DLLPs from the flit before passing TLPs to the Transaction Layer.

The DLLP content — type byte, HdrFC, DataFC, CRC — is identical. Only the physical packaging changes. An UpdateFC-P DLLP carrying HdrFC=42, DataFC=100 has exactly the same bit pattern in Gen 6 as in Gen 1.

Credit check is unchanged per TLP

In Gen 6, multiple TLPs are packed into one flit. The credit check still happens per-TLP, not per-flit. Before each TLP is added to a flit, the transmitter checks whether sufficient credits exist for that TLP. The flit packing is a Physical Layer concern — by the time the TL hands TLPs to the DLL for flit packing, the credit check has already been performed and cleared.

Gen 6 and high-bandwidth credit scaling

At ~122 GB/s on a Gen 6 x16 link, credit pools drain far faster than at Gen 1 speeds. A receiver’s PD credit pool of 128 credits (2048 bytes) at Gen 1 would take roughly 800 µs to drain. At Gen 6, the same pool drains in nanoseconds. This means UpdateFC DLLPs must be sent far more frequently in Gen 6 systems to avoid credit starvation. Hardware implementations of Gen 6 endpoints must service their UpdateFC generation at much tighter intervals than Gen 1 designs. The protocol is unchanged — the timing expectations are far more aggressive.

Gen 6 practical implication for RTL designers. The FC credit types, unit sizes, InitFC1/2 handshake logic, infinite credit encoding (0x000), and UpdateFC format are all identical to Gen 1–5. What changes is the UpdateFC generation frequency and the flit unpacking logic that extracts FC DLLPs from incoming flits. Credit pool management logic needs no changes for Gen 6 compatibility.

📋 Quick Reference

Item	Value / Rule
Purpose of flow control	Prevent receiver buffer overflow. Transmitter never sends more than available credits allow. No drops, no retries.
Six credit types	PH (Posted Header) · PD (Posted Data) · NPH (Non-Posted Header) · NPD (Non-Posted Data) · CPLH (Completion Header) · CPLD (Completion Data)
Header credit unit size	1 credit = 1 header slot. Request headers = 5 DW (20 bytes). Completion headers = 4 DW (16 bytes).
Data credit unit size	1 credit = 4 DW = 16 bytes. All data credit types (PD, NPD, CPLD) use the same 16-byte unit.
Credits per TLP	1 header credit + ⌈payload_bytes / 16⌉ data credits. For TLPs with no payload (MRd), only 1 header credit.
FC DLLP format	8 bytes: Type byte (with VC ID) · HdrFC (12-bit) · DataFC (12-bit) · 16-bit CRC. Shared by InitFC1, InitFC2, UpdateFC.
FC_INIT1	Both devices continuously send three InitFC1 DLLPs (P, NP, Cpl in that order) advertising receive buffer sizes. Repeats until neighbour values confirmed.
FC_INIT2	After registering neighbour credits, send three InitFC2 DLLPs to confirm. When both sides complete InitFC2, link enters DL_Active and TLPs may flow.
TLPs blocked until…	Both FC_INIT1 and FC_INIT2 phases complete on the link. No TLP may be sent before DL_Active.
Infinite credits encoding	InitFC1/2 with HdrFC=0x000 and/or DataFC=0x000 means infinite credits for that type. No UpdateFC is sent for infinite-credit pools.
Who advertises infinite CPLH/CPLD	Endpoints and Root Complexes without peer-to-peer support. Required — otherwise deadlock is possible when all completion credits are in use.
CREDITS_LIMIT counter	Transmitter counter tracking total available credits from receiver. Set by InitFC1, updated by UpdateFC.
CREDITS_CONSUMED counter	Transmitter counter tracking credits deducted for sent-but-not-returned TLPs. Incremented per TLP sent.
Available credits formula	Available = CREDITS_LIMIT − CREDITS_CONSUMED (mod 2¹²). TLP sent only if Available ≥ credits_needed.
UpdateFC	Receiver sends periodically as it processes TLPs and frees buffer space. Carries cumulative total credit count, not delta. Transmitter updates CREDITS_LIMIT.
Minimum PH/NPH	1 credit minimum advertisement. Ensures at least one request can always be accepted.
Minimum PD	Credits for the largest Max_Payload_Size the device supports. Ensures the first MWr can always be received.
Maximum PH/NPH/CPLH	128 credits (for devices with finite completion credits).
Maximum PD/CPLD	2048 credits (for devices with finite credit pools).
Per-VC independence	All six credit types tracked independently per Virtual Channel. VC0 always initialised. VC1–7 init when enabled by software.
Gen 6 impact	Credit types, unit sizes, InitFC handshake, infinite credit encoding, UpdateFC format — all identical. FC DLLPs embedded in flits. Credit check still per-TLP. UpdateFC frequency must increase proportionally to link bandwidth.