PCIe Series — PCIe-13: Flow Control — VLSI Trainers
PCIe Series · PCIe-13

Flow Control

How PCIe prevents a fast transmitter from overwhelming a slow receiver — the six credit types (PH, PD, NPH, NPD, CPLH, CPLD), credit unit sizes, FC DLLP format, the two-phase FC_INIT handshake, infinite credits and why completions need them, UpdateFC credit return, and how Gen 6 handles flow control inside flits.

📋 Why Flow Control Exists

PCIe is a packet-switched, full-duplex link. At Gen 6 speeds a x16 link can deliver ~122 GB/s in each direction simultaneously. A device at one end of the link has finite receive buffers — if the transmitter sends TLPs faster than the receiver can process and consume them, the buffers overflow and TLPs are dropped.

PCI solved this with a very different mechanism: bus-wide retry signals (STOP#, RETRY#). If a target was busy it asserted RETRY# and the initiator tried again. On a shared bus this is cheap — the bus is stalled anyway. On a PCIe point-to-point link there is no retry signal. Dropping a TLP and retransmitting it costs a full round-trip latency and wastes link bandwidth.

PCIe’s answer is credit-based flow control: the receiver tells the transmitter exactly how much buffer space it has available. The transmitter never sends more than what the receiver has space to hold. No overflow, no drops, no retries — just a clean credit accounting system operating at the Data Link Layer.

Flow Control — Receiver Tells Transmitter How Much Space It Has Transmitter Tracks credits Never sends more than available credits allow FC DLLP — “I have N credits free” TLP — uses credits from the pool Receiver Has finite buffers Advertises free space Returns credits as TLPs consumed
Figure 1 — Credit-based flow control. The receiver periodically sends FC DLLPs advertising available buffer space in credits. The transmitter deducts credits for each TLP sent and never transmits when the credit count for that TLP type would go negative. As the receiver processes TLPs and frees buffer space, it returns credits via UpdateFC DLLPs.

📋 The Credit Concept

A credit is a unit of receive buffer space. The receiver counts its available buffer space in credits and advertises that count to the transmitter. The transmitter maintains a running credit balance. Before sending any TLP, it checks whether it has enough credits for that TLP. If yes, it deducts credits and sends the TLP. If no, it waits.

Credits are not consumed permanently. When the receiver’s Transaction Layer processes a TLP and frees the buffer space that held it, the receiver returns those credits to the transmitter via an UpdateFC DLLP. The transmitter adds them back to its balance. This circulation continues indefinitely during normal operation.

Credit Lifecycle — Advertised → Consumed → Returned ① Advertise During FC_INIT, receiver sends InitFC1 + InitFC2 DLLPs telling transmitter how many credits are available per buffer type. Happens once at link startup. Transmitter stores in CREDITS_LIMIT. ② Consume Transmitter checks credit balance before every TLP. If enough: deduct from CREDITS_CONSUMED counter and send the TLP. Happens per TLP. Blocks if credit balance is zero. ③ Return Receiver’s TL processes the TLP and frees the buffer. Receiver sends UpdateFC DLLP with the new credit count back to sender. Transmitter increments its CREDITS_LIMIT counter.
Figure 2 — Credit lifecycle. ① Advertise: receiver tells transmitter the total buffer space at link startup. ② Consume: transmitter deducts credits per TLP sent. ③ Return: as the receiver processes TLPs and frees buffer space, UpdateFC DLLPs return credits to the transmitter. The cycle repeats continuously.

📋 Six Credit Types

Flow control credits are tracked independently for six buffer types. The three TLP categories (Posted, Non-Posted, Completion) each have separate header and data credit pools, giving six credit types in total. This separation is what allows the mandatory ordering rules (completions must pass non-posted) to be implementable — they use physically separate credit pools.

Six Credit Types — Tracked Independently per Virtual Channel Header Credits Data Credits Posted MWr · Msg · MsgD PH Posted Header Credits PD Posted Data Credits Non-Posted MRd · IORd · IOWr · Cfg NPH Non-Posted Header Credits NPD Non-Posted Data Credits (IOWr, CfgWr) Completion Cpl · CplD CPLH Completion Header Credits CPLD Completion Data Credits All six types are tracked independently · a full PD buffer does not block NPD or CPLD traffic · separate credit pools per Virtual Channel
Figure 3 — Six credit types organised by TLP category (rows) and header vs data (columns). Each cell is an independently managed credit pool with its own DLLP counter. The separation is essential for deadlock prevention — the ordering rules that mandate completions pass non-posted traffic require independent pools.
Credit typeShort nameWhat it covers
Posted HeaderPHOne credit per MWr, Msg, or MsgD header regardless of payload size. Covers the header only.
Posted DataPDCovers the data payload of MWr, MsgD. Separate from PH — a large MWr uses one PH credit but many PD credits.
Non-Posted HeaderNPHOne credit per MRd, IORd, IOWr, CfgRd, CfgWr header.
Non-Posted DataNPDCovers the payload of IOWr and CfgWr requests (reads have no payload so they only consume NPH).
Completion HeaderCPLHOne credit per Cpl or CplD header.
Completion DataCPLDCovers the data payload of CplD. A large split read may return many CplD TLPs each consuming CPLD credits.

📋 Credit Unit Sizes

A credit is not a byte — it is a fixed-size unit chosen to balance accounting precision against counter width. Different credit types have different unit sizes:

Credit Unit Sizes — What One Credit Represents Header Credits — 1 credit = 1 TLP header slot Request headers (MWr, MRd, Msg, IOWr, CfgWr): 1 credit = space for 1 header (4DW + optional ECRC) = 5 DWs = 20 bytes Completion headers (Cpl, CplD): 1 credit = space for 1 header (3DW + optional ECRC) = 4 DWs = 16 bytes One header credit is consumed per TLP, regardless of payload size Data Credits — 1 credit = 4 DW = 16 bytes Every data credit type (PD, NPD, CPLD) uses the same unit: 1 data credit = 4 Doublewords = 16 bytes of payload space Example: a 64-byte MWr payload = 16 DW = 4 data credits (PD) Example: a 4096-byte CplD payload = 1024 DW = 256 data credits (CPLD) If payload is not a multiple of 16 bytes, round up to next 16-byte boundary
Figure 4 — Credit unit sizes. Header credits cover one header slot (5 DW = 20 bytes for requests, 4 DW = 16 bytes for completions). Data credits are always 4 DW = 16 bytes per credit. A single TLP always consumes one header credit plus as many data credits as its payload requires.

Worked example — credits consumed by one TLP

A Memory Write (MWr) with a 256-byte payload sent to a receiver that has advertised PH and PD credits:

📋 Transmitter Credit Check

Before the transmitter DLL passes any TLP to the Physical Layer, it performs a credit check. The check is a simple comparison of two counters per credit type:

A TLP may be sent if and only if: CREDITS_LIMIT − CREDITS_CONSUMED ≥ credits_needed_for_this_TLP

This check is performed independently for each credit type. A TLP can only be sent when all relevant credit pools (header + data, for the appropriate TLP category) have sufficient space. If either pool is exhausted, the transmitter must wait — the TLP sits in the Transaction Layer’s egress queue until UpdateFC returns enough credits.

Credit check failure stalls only that TLP category. If the PD (Posted Data) pool is exhausted but NPH and CPLD are fine, the Non-Posted and Completion queues continue to drain. This is why the ordering rules mandate that completions must be able to pass non-posted traffic — their credit pools are independent, so a stalled Posted queue does not prevent completion delivery.

📋 FC DLLP Format

All Flow Control DLLPs — InitFC1, InitFC2, and UpdateFC — use the same 8-byte format. Only the type byte changes.

Flow Control DLLP Format — All FC Types Share This Layout Type [7:0] Byte 0 xxxx 0 · VC ID [2:0] HdrFC [11:0] 12-bit header credit value Bytes 1[5:0] + Byte 2[7:6] Rsvd DataFC [11:0] 12-bit data credit value Bytes 2[3:0] + Byte 3[7:0] CRC [15:0] Bytes 4–5 16-bit CRC Type byte encodings — upper nibble selects phase, lower nibble selects TLP category + VC InitFC1-P: 0100_0xxx InitFC1-NP: 0101_0xxx InitFC1-Cpl: 0110_0xxx UpdateFC-P: 1000_0xxx UpdateFC-NP: 1001_0xxx UpdateFC-Cpl: 1010_0xxx xxx = VC number 0–7
Figure 5 — FC DLLP format. Byte 0 encodes the type and VC number (lower 3 bits). HdrFC is a 12-bit header credit value. DataFC is a 12-bit data credit value. InitFC1, InitFC2, and UpdateFC all use this same layout — they share the same field positions and only differ in the type byte’s upper nibble.

📋 FC Initialisation — Two Phases

No TLPs may be sent on a link until flow control initialisation is complete. This process happens automatically in hardware immediately after Physical Layer link training succeeds. It involves a two-phase handshake (FC_INIT1 and FC_INIT2) carried out simultaneously in both directions.

FC Initialisation — FC_INIT1 then FC_INIT2 in Both Directions Device A Physical link up Send InitFC1 x3 (P, NP, Cpl) Rx B’s InitFC1 Store B’s credits Set FL1 flag Send InitFC2 x3 (P, NP, Cpl) → DL_Active Device B Physical link up Send InitFC1 x3 (P, NP, Cpl) Rx A’s InitFC1 Store A’s credits Set FL1 flag Send InitFC2 x3 (P, NP, Cpl) → DL_Active InitFC1-P · InitFC1-NP · InitFC1-Cpl InitFC1-P · InitFC1-NP · InitFC1-Cpl Phase 1 complete — both know the other’s credits InitFC2-P · InitFC2-NP · InitFC2-Cpl InitFC2-P · InitFC2-NP · InitFC2-Cpl Phase 2 complete — DL_Active · TLPs may now flow Required sending order: Posted first · then Non-Posted · then Completions · both devices send simultaneously and independently
Figure 6 — FC initialisation sequence. Phase 1 (InitFC1): both devices continuously send three InitFC1 DLLPs — one for Posted, one for Non-Posted, one for Completions — advertising their receive buffer sizes. Phase 2 (InitFC2): once Phase 1 values are registered, both devices send InitFC2 DLLPs to confirm. After both phases complete, the link enters DL_Active and TLPs may flow.

Why two phases?

The two-phase design handles the case where the two devices finish phase 1 at different times. If Device A finishes phase 1 before Device B, A transitions to phase 2 and starts sending InitFC2 DLLPs. Device B ignores InitFC2 while it is still in phase 1 waiting to receive enough InitFC1 repetitions to be confident the values were received reliably. Since A is still sending InitFC1 DLLPs during phase 2 (they carry the same credit values), B eventually receives them, completes phase 1, and transitions to phase 2. Both sides then converge to DL_Active.

📋 Infinite Credits and Why They Exist

A credit value of 0x000 during FC initialisation has a special meaning: infinite credits. A receiver that advertises infinite credits is guaranteeing that its receive buffer for that credit type will never overflow — the transmitter may send without checking that credit pool.

Infinite Credits — Why Completions Must Be Infinite at Endpoints Without Infinite CPLH/CPLD — Deadlock Risk Device A sends MRd · Device B’s RC processes it and wants to send back CplD. But Device A only advertised N=4 CPLD credits. All 4 are consumed by previous completions. RC cannot send CplD — no credits left. Device A is waiting for CplD to free its read tag. Neither side can progress. Deadlock. With Infinite CPLH/CPLD — Deadlock Impossible Endpoints that originate requests advertise infinite completion credits (InitFC1 HdrFC=0x000, DataFC=0x000). The completer (RC) is guaranteed: no matter how many read requests are outstanding, the endpoint can always receive the corresponding CplD TLPs. No deadlock is possible.
Figure 7 — Why infinite completion credits are mandatory for endpoints. An endpoint that sends read requests must guarantee it can receive all the resulting CplD TLPs. If it advertised finite completion credits and those ran out, the RC could not return completions and a deadlock would form. Infinite credits prevent this permanently.

Which devices advertise infinite credits

Device typeInfinite completion credits?Why
Endpoint (NVMe, GPU, NIC) Yes — CPLH = 0x000, CPLD = 0x000 Originates requests. Must be able to receive all resulting completions without stalling the completer. No risk of overflow because the endpoint controls how many requests it sends.
Root Complex (no peer-to-peer) Yes — CPLH = 0x000, CPLD = 0x000 Acts as completer for device-initiated reads. Must always be able to receive completions for its own requests. Infinite completion credits prevent deadlock.
Switch downstream port No — finite completion credits Forwards completions between devices. Has finite buffers. Must advertise actual buffer space so upstream devices do not flood it.
Root Complex (with peer-to-peer) No — finite completion credits May need to buffer completions for peer-to-peer transactions between devices. Finite credits prevent its own buffers from overflowing.
Infinite credits do not mean unbounded buffer space. An endpoint that advertises infinite completion credits must implement logic to never have more outstanding requests than its completion receive buffer can hold. It controls its own flow by limiting how many read requests it sends simultaneously (through Tag exhaustion — once all Tags are in use, it cannot send more requests regardless of completion credits).

📋 UpdateFC — Returning Credits

After FC initialisation, the receiver continuously sends UpdateFC DLLPs as its Transaction Layer processes TLPs and frees buffer space. UpdateFC DLLPs use the same format as InitFC DLLPs (same byte layout) but with type bytes in the 0x80/0x90/0xA0 range instead of the 0x40/0x50/0x60 range.

UpdateFC — How Credits Return to the Transmitter Transmitter CREDITS_LIMIT = 10 CONSUMED = 10 Available = 0 → stalled After UpdateFC(12): CREDITS_LIMIT = 12 TLPs consumed 10 PD credits Receiver TL processes TLPs Frees buffer space New total available = 12 Send UpdateFC-P(12) UpdateFC-P DLLP · HdrFC=12
Figure 8 — UpdateFC credit return. After 10 PD credits are consumed, the transmitter stalls. The receiver processes TLPs, frees buffer space, and sends UpdateFC-P with the new total cumulative credit count (12). Transmitter updates CREDITS_LIMIT to 12. Now available = 12 − 10 = 2 credits, and the transmitter can resume.

UpdateFC carries a cumulative count — the total number of credits that have ever been available to the transmitter, not just the credits being returned in this DLLP. This means a single delayed UpdateFC DLLP does not create a problem — the next one that arrives will contain the accumulated total, including all the credits the previous DLLP would have returned.

📋 Counters Inside the Transmitter

The transmitter maintains two counters per credit type, per Virtual Channel. Together they implement the credit check:

CounterWidthMeaningUpdated when
CREDITS_LIMIT 12 bits (header) / 12 bits (data) Total credits available from the receiver. Starts with the InitFC1 advertisement value. Increments each time an UpdateFC DLLP is received. InitFC1 received (set initial value). UpdateFC received (increment).
CREDITS_CONSUMED 12 bits (header) / 12 bits (data) Credits used by TLPs that have been sent but not yet returned by the receiver. This is a wrapping counter. Incremented by the credit cost of each TLP sent.

Available credits at any moment = CREDITS_LIMIT − CREDITS_CONSUMED (modulo 2¹²). The check is performed as a signed comparison accounting for the wrap-around nature of the counter — the two values must never be more than half the counter range (2048) apart, or the counter semantics become ambiguous.

📋 Minimum and Maximum Advertisements

Credit typeMinimum advertisementMaximum advertisement
PH 1 credit (covers one 4DW request header + ECRC = 5DW = 20 bytes) 128 credits (128 × 20 bytes = 2,560 bytes)
PD Credits for the maximum Max_Payload_Size supported by the device (e.g. 1024B MPS = 64 credits) 2048 credits (2048 × 16 bytes = 32,768 bytes = 8 × 4096 bytes, max from 8 functions)
NPH 1 credit 128 credits
NPD 1 credit (2 credits for AtomicOp capable devices) 128 credits (same as NPH since NP data always travels with NP headers)
CPLH 1 credit (switches and P2P RC) · Infinite (endpoints, standard RC) 128 credits (switches, P2P RC) · Infinite (endpoints, standard RC)
CPLD Credits for max MPS or max read request size (whichever smaller) · Infinite (endpoints) 2048 credits (switches, P2P RC) · Infinite (endpoints, standard RC)
The minimum PD advertisement matters. The PD minimum must equal the credits needed to hold the largest payload the device might receive. If a device advertises PD=4 credits (64 bytes) but the sender sends a 256-byte MWr (16 PD credits), the sender would stall waiting for credits that will never arrive from a tiny buffer. The minimum ensures the first MWr can always be accepted without waiting.

📋 Per-Virtual-Channel Independence

All six credit types are tracked independently per Virtual Channel. A 6-credit-type × 8-VC system has 48 independent credit pools. Each VC initialises its own flow control when that VC is enabled by software — VC0 initialises automatically (it is always enabled), VC1–VC7 initialise when software enables them.

This per-VC independence is what enables Quality-of-Service guarantees. A high-priority video stream in VC7 can be guaranteed a minimum bandwidth allocation because its credit pool is completely separate from the VC0 best-effort DMA traffic pool. Even if VC0 PD credits are exhausted by a flood of best-effort writes, VC7 PD credits are unaffected and video TLPs continue to flow.

Flow Control in Gen 6

The flow control credit model — six types, credit unit sizes, FC_INIT handshake, infinite completion credits, UpdateFC return — is completely unchanged in Gen 6. Flow control is a Transaction Layer and Data Link Layer mechanism. Gen 6 changes only the Physical Layer.

FC DLLPs inside Gen 6 flits

In Gen 1–5, FC DLLPs travel between TLPs in inter-packet gaps or dedicated framing windows. In Gen 6, there are no inter-packet gaps — TLPs are packed continuously into 256-byte flits. FC DLLPs are embedded inside flits alongside TLPs. The flit header contains a field that identifies which bytes within the flit carry DLLPs. The receiver’s Data Link Layer unpacks DLLPs from the flit before passing TLPs to the Transaction Layer.

The DLLP content — type byte, HdrFC, DataFC, CRC — is identical. Only the physical packaging changes. An UpdateFC-P DLLP carrying HdrFC=42, DataFC=100 has exactly the same bit pattern in Gen 6 as in Gen 1.

Credit check is unchanged per TLP

In Gen 6, multiple TLPs are packed into one flit. The credit check still happens per-TLP, not per-flit. Before each TLP is added to a flit, the transmitter checks whether sufficient credits exist for that TLP. The flit packing is a Physical Layer concern — by the time the TL hands TLPs to the DLL for flit packing, the credit check has already been performed and cleared.

Gen 6 and high-bandwidth credit scaling

At ~122 GB/s on a Gen 6 x16 link, credit pools drain far faster than at Gen 1 speeds. A receiver’s PD credit pool of 128 credits (2048 bytes) at Gen 1 would take roughly 800 µs to drain. At Gen 6, the same pool drains in nanoseconds. This means UpdateFC DLLPs must be sent far more frequently in Gen 6 systems to avoid credit starvation. Hardware implementations of Gen 6 endpoints must service their UpdateFC generation at much tighter intervals than Gen 1 designs. The protocol is unchanged — the timing expectations are far more aggressive.

Gen 6 practical implication for RTL designers. The FC credit types, unit sizes, InitFC1/2 handshake logic, infinite credit encoding (0x000), and UpdateFC format are all identical to Gen 1–5. What changes is the UpdateFC generation frequency and the flit unpacking logic that extracts FC DLLPs from incoming flits. Credit pool management logic needs no changes for Gen 6 compatibility.

📋 Quick Reference

ItemValue / Rule
Purpose of flow controlPrevent receiver buffer overflow. Transmitter never sends more than available credits allow. No drops, no retries.
Six credit typesPH (Posted Header) · PD (Posted Data) · NPH (Non-Posted Header) · NPD (Non-Posted Data) · CPLH (Completion Header) · CPLD (Completion Data)
Header credit unit size1 credit = 1 header slot. Request headers = 5 DW (20 bytes). Completion headers = 4 DW (16 bytes).
Data credit unit size1 credit = 4 DW = 16 bytes. All data credit types (PD, NPD, CPLD) use the same 16-byte unit.
Credits per TLP1 header credit + ⌈payload_bytes / 16⌉ data credits. For TLPs with no payload (MRd), only 1 header credit.
FC DLLP format8 bytes: Type byte (with VC ID) · HdrFC (12-bit) · DataFC (12-bit) · 16-bit CRC. Shared by InitFC1, InitFC2, UpdateFC.
FC_INIT1Both devices continuously send three InitFC1 DLLPs (P, NP, Cpl in that order) advertising receive buffer sizes. Repeats until neighbour values confirmed.
FC_INIT2After registering neighbour credits, send three InitFC2 DLLPs to confirm. When both sides complete InitFC2, link enters DL_Active and TLPs may flow.
TLPs blocked until…Both FC_INIT1 and FC_INIT2 phases complete on the link. No TLP may be sent before DL_Active.
Infinite credits encodingInitFC1/2 with HdrFC=0x000 and/or DataFC=0x000 means infinite credits for that type. No UpdateFC is sent for infinite-credit pools.
Who advertises infinite CPLH/CPLDEndpoints and Root Complexes without peer-to-peer support. Required — otherwise deadlock is possible when all completion credits are in use.
CREDITS_LIMIT counterTransmitter counter tracking total available credits from receiver. Set by InitFC1, updated by UpdateFC.
CREDITS_CONSUMED counterTransmitter counter tracking credits deducted for sent-but-not-returned TLPs. Incremented per TLP sent.
Available credits formulaAvailable = CREDITS_LIMIT − CREDITS_CONSUMED (mod 2¹²). TLP sent only if Available ≥ credits_needed.
UpdateFCReceiver sends periodically as it processes TLPs and frees buffer space. Carries cumulative total credit count, not delta. Transmitter updates CREDITS_LIMIT.
Minimum PH/NPH1 credit minimum advertisement. Ensures at least one request can always be accepted.
Minimum PDCredits for the largest Max_Payload_Size the device supports. Ensures the first MWr can always be received.
Maximum PH/NPH/CPLH128 credits (for devices with finite completion credits).
Maximum PD/CPLD2048 credits (for devices with finite credit pools).
Per-VC independenceAll six credit types tracked independently per Virtual Channel. VC0 always initialised. VC1–7 init when enabled by software.
Gen 6 impactCredit types, unit sizes, InitFC handshake, infinite credit encoding, UpdateFC format — all identical. FC DLLPs embedded in flits. Credit check still per-TLP. UpdateFC frequency must increase proportionally to link bandwidth.
Scroll to Top