PCIe Series — PCIe-26: Device Power States — D0 to D3 — VLSI Trainers
PCIe Series · PCIe-26

Device Power States — D0 to D3

The four PCIe device power states in full detail — D0 Uninitialized vs D0 Active, the optional D1 and D2 intermediate states, D3hot with its No Soft Reset rule, D3cold with auxiliary power, PME context preservation, wake signalling, state transition delays, and how D-states interact with link L-states in Gen 6 systems.

📋 D-States and L-States — Two Separate Systems

PCIe power management operates at two independent levels that are tightly coupled but not identical:

The coupling between them: when software places a device in D1, D2, or D3hot, the device autonomously triggers an L1 transition on its link — hardware handles this without further software involvement. When software returns a device to D0, the link exits L1 automatically as the first configuration write causes Recovery. This coupling is summarised throughout this post.

D-states originate from the ACPI specification (D0–D3) and the PCI Bus Power Management Interface Specification. PCIe inherits them fully and makes the PM Capability structure (ID 01h) mandatory for all functions — unlike PCI where it was optional.

📋 Device Power State Map

PCIe Device Power States — Power vs Capability Trade-off D0 Fully operational All transactions ASPM on link Link: L0 / L0s / L1 PMCSR [1:0] = 00b Mandatory D1 Light sleep Config + Msgs only Context may be lost Link: L1 (forced) PMCSR [1:0] = 01b Optional D2 Deep sleep Config + Msgs only Context may be lost Link: L1 (forced) PMCSR [1:0] = 10b Optional D3hot Full off, power on Config + PME only Context likely lost Link: L1 (forced) PMCSR [1:0] = 11b Mandatory D3cold Power removed No communication Context all lost Link: L2 (Vaux) or L3 PMCSR inaccessible Mandatory Software-controlled — write PMCSR Power State field [1:0] D0 and D3 mandatory. D1 and D2 optional — device advertises support in PMC register bits 9 and 8. D3cold — hardware event (Vcc removed) Entry by power removal. Exit by power restore + fundamental reset.
Figure 1 — Device power state overview. D0 and D3 are mandatory for all PCIe functions. D1 and D2 are optional — a device declares support in the PMC register bits [9:8]. D3cold is entered by hardware (Vcc removal), not software. The PMCSR Power State field controls D0–D3hot transitions. Each state below D0 forces the link to L1, which stays in L1 until the next configuration access.

📋 D0 — Full On

D0 is the only fully operational state. The function can originate any PCIe transaction type, respond to any request, generate interrupts, and perform DMA. All PCIe functions must implement D0 — it is the state in which drivers load, initialise, and operate the device.

D0 is the only state where ASPM may operate on the link. In all other D-states, the link is forced to L1 (or lower) because the device cannot respond quickly enough to the exit latency requirements of L0s. When a device is in D0, the LTSSM is free to enter L0s and L1 ASPM based on traffic patterns.

Technically, D0 has two sub-states: D0 Uninitialized and D0 Active. These are not directly controlled by the PMCSR register — D0 Uninitialized is entered automatically after reset, and D0 Active is entered when the driver finishes configuration.

📋 D0 Uninitialized vs D0 Active

D0 Sub-States — Uninitialized and Active D0 Uninitialized Entered after: Fundamental Reset, or D3hot→D0 (if No Soft Reset = 0) Only configuration transactions accepted Command register enable bits at default (all 0) — no DMA, no MMIO Registers at reset default values — BAR contents may be lost Driver must re-program all configuration before device is usable D0 Active Entered after: driver completes configuration and sets Command register All transaction types permitted: MRd, MWr, IO, Config, Messages Bus Master Enable set — DMA can be initiated by device Memory/IO Space Enable set — device responds to BAR accesses ASPM on link: L0s and L1 may operate per ASPM Control setting
Figure 2 — D0 sub-states. D0 Uninitialized is the state after any reset or after a D3hot→D0 transition where context was not retained. The device only accepts configuration reads and writes. D0 Active is the normal operating state — the driver has programmed BARs, enabled the Command register, and the device is fully functional. The transition from Uninitialized to Active is entirely handled by software (driver init sequence).

📋 Dynamic Power Allocation — D0 Substates

The PCIe 2.1 revision added Dynamic Power Allocation (DPA) — an optional extended capability (found in extended config space at offsets starting at 000h) that defines up to 32 numbered substates within D0. The goal is to allow software to negotiate fine-grained power reduction with a device that remains technically in D0 (and therefore does not go offline) but operates at reduced internal performance.

Unlike D1/D2/D3 which take the device partially or fully offline, DPA substates keep the device fully in D0 Active — software can still issue any transaction type. The device internally reduces power (fewer active processing units, reduced clocks on internal logic, power-gated subsystems) while still accepting all requests. Substate 0 always represents the highest power/performance level. Higher substate numbers represent progressively lower power.

DPA is particularly useful for GPUs and AI accelerators where internal compute engines can be clock-gated between workloads without needing a full D3→D0 cycle (which would require driver re-initialisation).

📋 D1 — Light Sleep

D1 is an optional, lightly defined power state. The spec intentionally leaves most of its behaviour device-class-specific. All that is guaranteed is: D1 consumes less power than D0 and more than D2. In practice, D1 is rarely used in modern PCIe deployments — most devices implement only D0 and D3hot. But its characteristics are precisely specified:

Software must drain outstanding completions before entering D1. Before writing D1 to PMCSR, software must poll the Transactions Pending bit (Device Status register bit 5 in the PCIe Capability structure) until it reads 0. Only then is it safe to request the state change. If software skips this check, a pending completion returning from upstream will arrive when the device is in D1 and may be treated as an Unexpected Completion — corrupting the outstanding transaction.

📋 D2 — Deep Sleep

D2 is the second optional intermediate state, deeper than D1 but less aggressive than D3hot. Like D1, most of its characteristics are device-class-specific. In practice D2 is even less commonly implemented than D1 — most PCIe device designers skip directly from D0 to D3hot:

The same pre-transition requirement applies: drain all non-posted requests (poll Transactions Pending = 0) before entering D2.

PropertyD1D2
MandatoryNoNo
Link state forcedL1L1
Accepted requestsConfig + MessagesConfig + Messages
Can send PMEYes (if supported)Yes (if supported)
Context retentionMay be lostMay be lost
D→D0 delay0 µs200 µs minimum
Practical usageRare — mostly legacy devicesVery rare — almost never implemented

📋 D3hot — Full Off (Power On)

D3hot is the mandatory deepest software-accessible power state. Software writes PMCSR Power State bits [1:0] = 11b to enter it. The device is maximally powered down while main power (Vcc) remains applied — the device retains just enough logic to respond to configuration accesses and maintain PME capability. Unlike D1/D2 which are lightly defined, D3hot has precise rules:

D3hot — Capabilities and Constraints What Device CAN Do in D3hot Accept configuration reads and writes (mandatory) Accept PME_Turn_Off broadcast message Send PME message to request wake-up (if PME supported) Send PME_TO_ACK message (response to PME_Turn_Off) Send PM_Enter_L23 DLLP (for system power-down sequence) Return to D0 when software writes PMCSR Power State to D0 What Device CANNOT Do in D3hot Accept memory read or write requests (→ UR returned) Accept I/O transactions Initiate DMA (Bus Master is effectively disabled) Generate interrupts (except PME message if supported) Guarantee context retention (may be lost) Use I/O or memory BARs — BAR decoding is suspended
Figure 3 — D3hot capabilities and constraints. The device is largely inactive but configuration space remains accessible. The minimum required capability is responding to configuration reads/writes and the PME_Turn_Off Message. All other request types return UR. The link enters L1 on D3hot entry. A 10 ms delay is required after the PMCSR write before any access (including configuration reads).

📋 No Soft Reset — Context Retention

In early PCIe versions, a D3hot → D0 transition always implied a soft reset — the device re-initialised all registers to their default values, and the driver had to re-program everything from scratch. The 1.2 revision of the PCI PM spec added the No Soft Reset bit in PMCSR to change this:

No Soft Reset Bit (PMCSR bit 2) — D3hot→D0 Behaviour No Soft Reset = 0 (default / legacy) D3hot → D0 transition includes a soft reset All PCI configuration registers reset to power-on defaults All BARs cleared — must be re-programmed by driver Command register reset — Memory/IO Space Enable cleared Driver must perform full re-initialisation after returning to D0 No Soft Reset = 1 (context retained) D3hot → D0 preserves PCI configuration register context BARs retain their programmed values Command register retain its enable bits Device-specific registers may still need re-initialisation Driver may skip full BAR re-programming and resume faster
Figure 4 — No Soft Reset bit (PMCSR bit 2). When 0 (legacy behaviour), D3hot→D0 is equivalent to a hardware reset — all PCI configuration registers clear, BARs are zeroed, and the driver must do a full re-initialisation. When 1, the device promises to retain its PCI configuration space context across D3hot→D0. Note: device-specific registers (internal hardware state) may still be lost even when No Soft Reset = 1.
Check No Soft Reset before assuming fast D3→D0. Software should always read PMCSR bit 2 after the device enumerates. If No Soft Reset = 0, every D3hot→D0 cycle requires re-scanning BARs and re-initialising the Command register — even if the driver remembers the previous values. The hardware may have cleared them. If No Soft Reset = 1, the driver can safely skip BAR re-programming and directly re-enable the device. Modern power-managed drivers (Linux power management framework) check this bit during D3→D0 resume.

📋 D3cold — Full Off (Power Removed)

D3cold is entered when main power (Vcc) is physically removed from the device. This is a hardware event, not a software write — it happens after the L2/L3 Ready handshake completes and the OS/BIOS triggers actual power removal on the platform. All PCIe functions are required to implement D3cold (the specification is that every function must tolerate Vcc removal).

D3cold has distinct characteristics from D3hot:

D3cold ≠ D3hot. These are often confused. D3hot: device still has Vcc, software writes PMCSR to enter and exit, link is in L1, 10 ms recovery delay after PMCSR write to D0. D3cold: Vcc removed, entered by hardware power removal (not PMCSR write), link is in L2 or L3, recovery requires fundamental reset and full re-enumeration. A driver that handles D3hot resume (10 ms wait then re-enable) cannot use the same code path for D3cold resume (full re-init including BAR re-sizing).

📋 Auxiliary Power (Vaux)

Vaux is a secondary 3.3V standby power supply that remains active even when the main power rail (Vcc/+12V/+3.3V main) is removed. Its presence determines whether the device enters L2 (Vaux present) or L3 (no Vaux) when the system powers down to D3cold.

ConditionLink stateDevice capability in this state
Vcc present, device in D3hotL1Can respond to config accesses. Can send PME if enabled and powered.
Vcc removed, Vaux presentL2Can monitor for external events. Can assert Beacon or WAKE# to request power restore.
Vcc removed, no VauxL3No capability. Device is completely powerless.

Vaux-powered devices are commonly found in:

Whether a device supports Vaux operation is declared in the PMC register’s PME Support bits [15:11]. Specifically, bit 15 = PME from D3cold support. If this bit is set, the device can send a PME (via Beacon or WAKE# signal) from D3cold — implying it must have Vaux capability.

📋 PME Context — What Must Be Retained

PME context is the minimal set of state that a device must preserve in a low-power state if it supports PME (Power Management Events). Without PME context, the device cannot detect the event that requires waking, cannot generate the PME message, and cannot correctly re-initialise after the wake-up.

PME context includes:

The requirement by state:

D-statePME context requirement
D0Full context always maintained (device is fully powered)
D1Must retain PME context if PME is supported in D1 (PMC bit 11 = 1)
D2Must retain PME context if PME is supported in D2 (PMC bit 12 = 1)
D3hotMust retain PME context if PME is supported in D3hot (PMC bit 14 = 1)
D3coldMust retain PME context on Vaux if PME is supported in D3cold (PMC bit 15 = 1). Requires Vaux-powered logic.

📋 PME Message — Wake Signalling

When a device in a low-power state detects an event that requires the system to restore its power (a wake event), it signals this by sending a PME message TLP to the Root Complex. The PME message is a standard PCIe Message TLP routed to the Root Complex. It carries the Requester ID (Bus:Device:Function) of the device that generated the event, allowing PM software to identify exactly which device needs service.

PME Message TLP — Wake Signalling Through the Fabric Device In D1/D2/D3hot Event detected Exit L1 first ① Exit L1 (TS1s) ② Send PME TLP PME TLP → Message Code 18h · Routing: Route to Root Complex Root Complex Sets PME Status Generates interrupt PM software wakes CPU / OS PM ISR runs Write PMCSR D0 Device restored
Figure 5 — PME wake flow. The device detects a wake event while in a low-power state. If the link is in L1, the device first exits L1 (initiating Recovery to L0). Once in L0, it sends the PME Message TLP upstream. The message routes to the Root Complex which records PME Status and generates an interrupt. The PM interrupt service routine reads the Requester ID, identifies the source device, and writes D0 to its PMCSR to restore it to full operation.

PME message constraints

📋 State Transitions and Delays

D-State Transition Diagram D0 Active D1 (opt) D2 (opt) D3hot D3cold D0 Uninit (after reset/D3→D0) Power-on / Reset SW: PMCSR=01b SW: PMCSR=10b SW: PMCSR=11b PMCSR=11b Vcc removed PMCSR=00b (10ms wait) Driver init D0 Active Power restore + reset
Figure 6 — Full D-state transition diagram. Arrows show allowed transitions. D0 can go to D1, D2, or D3hot directly (software PMCSR writes). D1 can go to D2 or D3hot. D2 can go to D3hot. D3hot becomes D3cold when Vcc is removed (hardware event). Recovery from D3hot/D3cold always lands in D0 Uninitialized, not D0 Active — driver must re-initialise. Note: D1 → D0 is not directly shown but is allowed (PMCSR=00b, zero delay).
TransitionMethodMandatory delay before first access
D0 → D1Software: PMCSR[1:0] = 01b0 (immediate)
D0 or D1 → D2Software: PMCSR[1:0] = 10b200 µs
D0, D1, or D2 → D3hotSoftware: PMCSR[1:0] = 11b10 ms
D1 → D0Software: PMCSR[1:0] = 00b0 (immediate)
D2 → D0Software: PMCSR[1:0] = 00b200 µs
D3hot → D0Software: PMCSR[1:0] = 00b10 ms
D3hot → D3coldHardware: Vcc removal (after L2/L3 Ready handshake)N/A — platform timing
D3cold → D0Hardware: Vcc restore → Fundamental Reset → enumerationPlatform-specific minimum delay

📋 Pre-Transition Software Requirements

Before writing any power state below D0 to the PMCSR, software must ensure no outstanding non-posted requests are pending. Entering D1/D2/D3hot while completions are in flight leaves orphan transactions that will never receive their completions, potentially hanging the device driver.

  1. Quiesce the driver — stop issuing new DMA requests and new MMIO reads.
  2. Read Device Status register (PCIe Capability DW2 bits [31:16]) bit 5 — Transactions Pending.
  3. If Transactions Pending = 1, wait and re-poll. Allow sufficient time for all outstanding completions to return (this time depends on the device’s completion timeout setting — up to 50 ms by default).
  4. Only when Transactions Pending = 0: write the desired power state to PMCSR[1:0].
  5. Wait the mandatory delay (0 µs, 200 µs, or 10 ms depending on target state) before attempting any access.
Skipping the Transactions Pending check is a common driver bug. Linux PM drivers have had multiple incidents where the D3 path omitted this check. The result is that the device enters D3hot while a DMA completion is in flight — the completion returns after D3hot entry, is treated as an Unexpected Completion (or ignored), and the DMA buffer is never properly released. The kernel’s pci_set_power_state() function handles this correctly, but drivers that bypass it may not.

📋 PMCSR — The Runtime Control Register

The PM Control/Status Register (PMCSR) is the primary runtime register for device power management. It sits at DW1 of the PM Capability structure (Cap ID 01h), typically at offset 44h or so in the capability chain.

Bit(s)FieldAccessPurpose
[1:0]Power StateRWCurrent/requested D-state. 00b=D0 · 01b=D1 · 10b=D2 · 11b=D3hot. Write here to change power state. Hardware transitions begin immediately on write — respect the mandatory delays before accessing the device again.
[2]No Soft ResetROWhen 1: device retains PCI configuration context across D3hot→D0. When 0: registers reset to defaults on D3hot→D0. Hardware-set, not writable by software.
[7:3]ReservedMust return 0 when read. Must not write non-zero values.
[8]PME EnableRWWhen 1: device is enabled to generate PME messages from the current power state (if PMC says PME is supported in this state). Set this before entering a low-power state if wake is desired.
[12:9]Data SelectRWSelects which metric the Data register (PMCSR bits [31:24]) reports — power consumption or heat dissipation data, indexed per power state. Legacy PCI field rarely used in PCIe.
[14:13]Data ScaleROScale factor (0.1W per unit or similar) for the Data register reading. Legacy field.
[15]PME StatusRW1CSet when the device has sent (or wants to send) a PME message. Sticky — persists until software writes 1 to clear it. PM software clears this bit after servicing the wake event.

D-States in Gen 6

The D-state model — D0, D0 Uninitialized, D0 Active, D1, D2, D3hot, D3cold — and all their characteristics (context retention, transition delays, PME support, No Soft Reset behaviour, Transactions Pending requirement) are completely unchanged in Gen 6. The PMCSR register layout, PMC register fields, and all PMCSR field definitions are identical across all PCIe generations.

What changes in Gen 6 device power management practice:

AspectGen 6 impact
D-state register formatUnchanged — same PMC and PMCSR layout, same bit definitions, same delays
D3hot→D0 after No Soft ResetUnchanged — same 10 ms delay, same context retention rules
PME message formatUnchanged — same TLP format, same Message Code, same routing
Transactions Pending checkMore critical at Gen 6 — higher throughput means more in-flight requests; the Transactions Pending window may be longer for AI accelerators with many outstanding DMA completions
D3hot and L1 couplingGen 6 adds L0p within L0 (PCIe-25), but L0p is not used during D3hot — the link remains in L1 when any function on the device is in D3hot
D0 Active and L0pL0p (the Gen 6 in-band bandwidth reduction) only operates in D0 Active — it requires the device to be fully operational to negotiate bandwidth reduction with the link partner
DPA substates in Gen 6More relevant for Gen 6 AI accelerators — allows fine-grained power control of compute engines (SM clusters, HBM memory controllers) without entering D3hot and losing driver state
PME from D3cold via WAKE#Unchanged — WAKE# sideband signalling works the same at all generations. CXL.mem devices may have additional protocol-level wake mechanisms but these are in addition to, not replacing, the standard PME path.
The D-state model is the stable foundation for all PCIe power management. From PCIe Gen 1 to Gen 6, the same PMCSR register, the same four states, the same Transactions Pending check, and the same 10 ms D3hot→D0 delay have been consistent. Drivers written for Gen 3 power management work correctly on Gen 6 hardware without modification to the D-state code paths.

📋 Quick Reference

ItemValue / Rule
Mandatory D-statesD0 and D3 (D3hot + D3cold). D1 and D2 are optional.
D1/D2 support declarationPMC register bit 8 = D2 Supported · bit 9 = D1 Supported. Read-only, set by designer.
PMCSR Power State fieldBits [1:0]: 00b=D0 · 01b=D1 · 10b=D2 · 11b=D3hot. Write to change state.
Pre-transition requirementPoll Device Status bit 5 (Transactions Pending) = 0 before any PMCSR state write below D0.
D0→D1 delay0 µs (immediate)
D0/D1→D2 delay200 µs minimum before first access after PMCSR write
D0/D1/D2→D3hot delay10 ms minimum before first access after PMCSR write
D1→D0 delay0 µs (immediate)
D2→D0 delay200 µs minimum
D3hot→D0 delay10 ms minimum
No Soft Reset (PMCSR bit 2)0=D3hot→D0 resets all PCI config registers (driver must re-init). 1=PCI config registers retained across D3hot→D0.
D3hot entrySoftware writes PMCSR[1:0]=11b. Device sends PM_Enter_L1 DLLP → link enters L1. PMCSR remains accessible. Only config + PME accepted.
D3cold entryHardware event: Vcc removed (after L2/L3 Ready handshake). PMCSR inaccessible. Link enters L2 (Vaux) or L3 (no Vaux).
D3cold exitVcc restore → Fundamental Reset → D0 Uninitialized. Full re-enumeration and driver init required.
D0 UninitializedAfter any reset or D3hot→D0 with No Soft Reset=0. Config only. Command register enables cleared. BARs zeroed.
D0 ActiveAfter driver configures BARs and sets Command register. All transaction types enabled. ASPM may operate.
Link state vs D-stateD0→L0 (ASPM free). D1/D2/D3hot→L1 (mandatory, via PM_Enter_L1 DLLP). D3cold→L2 or L3.
PME Enable (PMCSR bit 8)Must be set by software before entering low-power state if wake is desired. Controls whether device may send PME TLP.
PME Status (PMCSR bit 15)RW1C. Set when PME sent. PM software clears by writing 1 after handling wake event.
PME from D3coldPMC bit 15. Uses Beacon or WAKE# sideband (in-band TLP unavailable when link is in L2/L3). Requires Vaux.
PME contextMinimum state device must retain to detect and signal a wake event. Required in any state where PME is supported.
Vaux present → L2Device can monitor events, signal WAKE# or Beacon. No PCIe communication possible.
No Vaux → L3Device has no power. Cannot detect or signal anything. Wake only by physical power restore.
Gen 6 changesD-state formats, delays, and protocols unchanged. L0p only available in D0 Active. DPA substates more relevant for AI accelerators. Transactions Pending window may be longer at high throughput.
Scroll to Top