PCIe Series — PCIe-24: Interrupts — INTx, MSI, and MSI-X — VLSI Trainers
PCIe Series · PCIe-24
Interrupts — INTx, MSI, and MSI-X
Three interrupt mechanisms in one post — how legacy PCI INTx pins became virtual in-band TLPs, how MSI replaces pins with a single memory write, how MSI-X gives each queue its own independent vector with per-CPU targeting, and exactly which to use and when.
📋 Why Interrupts Exist in PCIe
A PCIe device needs to notify the CPU when an event requires attention — a DMA transfer completed, a new packet arrived, an error occurred, a queue is empty. Without interrupts, the CPU would have to poll device status registers continuously, wasting cycles even when nothing is happening.
PCIe supports three interrupt mechanisms, each a generation of the same idea evolved to address limitations of the previous. All three remain valid in Gen 6 — which mechanism a device uses depends on its capability declaration in configuration space and how the driver configures it.
📋 Interrupt Evolution: Pins → Messages
Figure 1 — Three PCIe interrupt mechanisms. INTx is the legacy PCI model emulated in-band. MSI replaced physical pins with a single memory write. MSI-X extended MSI to 2048 independent vectors with per-CPU targeting — addressing all MSI limitations for modern multi-queue devices.
📋 Legacy PCI INTx# Pins
The original PCI interrupt model uses physical open-drain wires: INTA#, INTB#, INTC#, and INTD#. These are active-low signals — a device asserts its interrupt by pulling the line low. Because they are open-drain, multiple devices can share the same physical wire by pulling it low simultaneously without electrical damage. The interrupt controller sees the shared line as asserted if any device is pulling it.
Each PCI function declares which pin it uses in the Interrupt Pin register at offset 3Ch [15:8]:
00h — no interrupt pin (device uses polling or MSI only)
01h — INTA#
02h — INTB#
03h — INTC#
04h — INTD#
The Interrupt Line register at offset 3Ch [7:0] stores the IRQ number (0–254) assigned by system firmware. This register has no hardware effect — it is purely a software convention for drivers to know which interrupt controller input to attach their handler to. Value FFh means “not connected to any interrupt controller input.”
When a function generates an interrupt, it both asserts its INTx# pin and sets the Interrupt Status bit in the Status register (offset 04h bit 19). The driver’s interrupt service routine reads the Status register to confirm the device is the source of the current interrupt. On shared IRQs, every driver sharing the line runs in sequence until one of them claims the interrupt.
📋 PCIe Virtual INTx — Assert/Deassert Messages
PCIe has no physical interrupt pins on the connector or link. There are no INTA#–INTD# wires. Instead, PCIe emulates the legacy interrupt model using in-band Message TLPs:
Assert_INTA / Assert_INTB / Assert_INTC / Assert_INTD — sent upstream when the device’s virtual INTx wire transitions from inactive to active (equivalent to pulling the physical pin low).
Deassert_INTA / Deassert_INTB / Deassert_INTC / Deassert_INTD — sent upstream when the virtual wire transitions from active to inactive (releasing the physical pin).
This pair of TLPs emulates the level-sensitive nature of the physical signal. The Root Complex tracks the assertion state of all four virtual wires for each downstream port and drives the corresponding interrupt controller input based on the cumulative Assert/Deassert state.
Figure 2 — Virtual INTx timing. When the device needs service it sends Assert_INTA upstream. The Root Complex marks that port’s virtual INTA wire as active and triggers the interrupt controller. When the driver reads and clears the device’s interrupt status, the device sends Deassert_INTA. The Root Complex deactivates the virtual wire. Only one Assert is needed even if the interrupt fires repeatedly before the driver clears it.
📋 INTx TLP Format
INTx messages are 3-DW (no data) Message TLPs. The Type field encodes both the direction (Assert vs Deassert) and the specific pin (INTA–INTD). The Routing field uses “Local — Terminate at Receiver” rather than “Route to Root” for two reasons: each intermediate bridge may remap the virtual wire to a different pin (see Swizzling below), and the TLP should be absorbed at each bridge rather than passed transparently.
Figure 3 — INTx Message TLP header. Two DWs (3rd DW is all zeros / unused in this format). Routing field = 100b (Local) means each bridge in the path processes and re-sends the message upstream with potential pin remapping — the TLP is never forwarded transparently. The Message Code selects which virtual pin (INTA–INTD) is being asserted or deasserted.
Message Code
TLP type
Meaning
20h
Assert_INTA
INTA virtual wire: inactive → active
21h
Assert_INTB
INTB virtual wire: inactive → active
22h
Assert_INTC
INTC virtual wire: inactive → active
23h
Assert_INTD
INTD virtual wire: inactive → active
24h
Deassert_INTA
INTA virtual wire: active → inactive
25h
Deassert_INTB
INTB virtual wire: active → inactive
26h
Deassert_INTC
INTC virtual wire: active → inactive
27h
Deassert_INTD
INTD virtual wire: active → inactive
📋 INTx Mapping, Swizzling, and Collapsing
Mapping (Swizzling)
When a Switch or Root Port forwards an INTx message upstream, it may remap the virtual interrupt wire to a different pin. This is called swizzling and is defined per port based on the device’s slot number. The remapping formula is:
This rotation ensures that devices with different slot numbers use different IRQ inputs at the interrupt controller, preventing all slots from sharing a single IRQ. A device in slot 0 with INTA maps to INTA; slot 1 with INTA maps to INTB; slot 2 with INTA maps to INTC; and so on.
Collapsing
Because the virtual wires behave like wire-ORed signals, a Switch must never send two consecutive Assert messages for the same virtual wire without an intervening Deassert. If two devices at different downstream ports both assert INTA, the switch sends one Assert_INTA upstream when the first one fires. The second Assert from the other device is collapsed — absorbed silently because the wire is already asserted. When one device deasserts but the other is still asserting, no Deassert message is sent upstream — the shared virtual wire stays asserted. Only when the last device deasserts does the Deassert message flow upstream.
INTx shared interrupts cost performance. When multiple devices share an IRQ through collapsing, the CPU cannot determine which device fired without polling all of them. The interrupt service routine chains through handlers from all sharing devices until one claims the interrupt. On a busy system with 3–4 devices sharing INTA, this can triple or quadruple the interrupt handling latency. This is the primary reason MSI was introduced.
📋 INTx Configuration Registers
Register
Offset
Access
Function
Interrupt Pin
3Ch [15:8]
Read-only
Hardcoded by designer: 0=none, 1=INTA, 2=INTB, 3=INTC, 4=INTD
Interrupt Line
3Ch [7:0]
Read/Write
OS/BIOS writes the assigned IRQ number here. No hardware effect — annotation only for drivers.
Interrupt Disable
Command bit 10
Read/Write
When 1: device must not send Assert_INTx messages. Any active virtual wires must be deasserted first. Set to 1 before enabling MSI/MSI-X.
Interrupt Status
Status bit 19
Read-only
Set by hardware when a virtual INTx assertion is pending. Cleared when the device’s interrupt cause is cleared. Not affected by Interrupt Disable bit.
📋 MSI — Message Signaled Interrupt Concept
MSI eliminates the virtual pin entirely. Instead of sending an Assert_INTx message TLP, the device signals an interrupt by performing a Memory Write transaction — a standard MWr TLP — targeting a specific MMIO address. The address is the Local APIC register of a CPU core, and the data value is the interrupt vector number.
The interrupt controller sees this write and immediately delivers the interrupt to the targeted CPU without any acknowledgment protocol. Because the address and data together uniquely identify the device and event, the CPU does not need to poll devices to find the interrupt source. The vector number itself tells the CPU which ISR to call.
Figure 4 — MSI mechanism. The device generates a standard Memory Write TLP targeting the APIC’s memory-mapped register at FEEx_xxxxh (on x86). The data payload is the 32-bit interrupt vector (upper 16 bits always zero). The APIC delivers interrupt vector N to the designated CPU core. The entire transaction is a normal PCIe TLP — no special handling required at any switch or bridge.
📋 MSI TLP — What the Device Sends
An MSI interrupt is delivered as a standard Memory Write TLP. The only things that make it an interrupt are the target address (APIC register) and the data value (vector number). The PCIe fabric treats it identically to any other DMA write — it flows through switches, is subject to flow control, and is protected by LCRC.
Figure 5 — MSI TLP header. A standard 4DW Memory Write header (native PCIe endpoints must support 64-bit addressing). Fmt=011b = 4DW + data. Length field = 1 DW (single DWORD payload). First BE = 1111b (all bytes valid). Last BE = 0000b (only one DW). The data DW (not shown) carries the 32-bit interrupt vector value — upper 16 bits always zero, lower 16 bits from the MSI Message Data register.
The MSI write must use the Relaxed Ordering = 0 and No Snoop = 0 attribute settings. This ensures the interrupt TLP is strictly ordered with respect to all prior DMA writes from the same device. This ordering guarantee is critical for interrupt-driven DMA: when the CPU receives the interrupt vector, all DMA writes that preceded the MSI write in the device’s output are already visible in memory.
📋 MSI Capability Structure and Registers
The MSI Capability (ID 05h) lives in the PCI-compatible capability space (offsets 40h–FFh). It has four variants based on address width and per-vector masking support. Native PCIe endpoints must implement the 64-bit variant.
Figure 6 — Four MSI Capability variants. Native PCIe endpoints must implement 64-bit addressing. The per-vector masking variants add 32-bit Mask Bits and Pending Bits registers. Each bit in the Mask register controls one vector (bit 0 = base vector). The Pending register tracks which masked vectors have pending interrupts.
MSI Message Control register key bits
Bit(s)
Field
Access
Meaning
0
MSI Enable
RW
1 = MSI active. Device uses MWr for interrupts. INTx and MSI-X automatically disabled.
[3:1]
Multiple Message Capable
RO
How many vectors the device wants. 000=1 · 001=2 · 010=4 · 011=8 · 100=16 · 101=32. Must be power of two.
[6:4]
Multiple Message Enable
RW
How many vectors software actually allocated. Same encoding. Device varies lower N bits of Message Data for N vectors.
7
64-bit Address Capable
RO
1 = Message Address Upper register present. All native PCIe devices must set this.
8
Per-Vector Masking Capable
RO
1 = Mask Bits and Pending Bits registers present.
📋 MSI Multiple Vectors
When more than one vector is allocated (Multiple Message Enable ≥ 1), the device uses a single base Message Data value from the register. For each event type, it sends a slightly different data value by modifying the lower N bits. With 4 vectors allocated (Enable = 010b), the device can send Data+0, Data+1, Data+2, or Data+3 for its four distinct events.
Figure 7 — MSI multiple vectors using one base Message Data value. With 4 messages allocated, the device modifies bits [1:0] of the data value. With 8 messages, it would modify bits [2:0]. The interrupt vectors allocated by the OS must be contiguous — e.g. 49A0h, 49A1h, 49A2h, 49A3h — because MSI has no way to assign non-contiguous vectors.
All MSI vectors share one APIC address. This means all MSI vectors for a device target the same CPU core — only the data value (vector number) differs. If software wants different events to be handled by different CPUs, it cannot use MSI. It must use MSI-X, where each table entry has an independent Message Address that can target any CPU’s APIC.
▶ MSI Configuration Sequence
Walk the PCI capability list from offset 34h. Find Cap ID = 05h (MSI).
Read Message Control (bits [8:0]):
Bits [3:1] = Multiple Message Capable → how many vectors the device wants
Bit 7 = 64-bit Address Capable → which register layout is present
Bit 8 = Per-Vector Masking Capable → whether Mask Bits register is present
Allocate interrupt vectors from the OS interrupt controller. Allocate a power-of-two count ≤ what the device requested. Get the base vector number (e.g. N) and the APIC address (e.g. FEEFxx0Ch for CPU core X).
Write Multiple Message Enable bits [6:4] with the count allocated (same encoding as Capable field).
Write Message Address Lower [31:0] = APIC address. If 64-bit capable, write Message Address Upper [63:32] (often 0 on x86).
Write Message Data = base interrupt vector N (lower 16 bits of the 32-bit data field).
Set Interrupt Disable in the Command register (bit 10 = 1) to disable INTx.
Set MSI Enable (Message Control bit 0 = 1). Device is now using MSI.
Register the ISR for vector N (and N+1, N+2, … for multiple-vector allocation).
📋 MSI-X — Extended MSI Concept
MSI-X was designed to remove all three remaining limitations of MSI:
32-vector limit → MSI-X supports up to 2048 vectors per function. A 100 Gbps NIC with 64 queues can have one vector per queue on each of 32 CPU cores.
Contiguous vectors only → MSI-X allows completely non-contiguous vector assignments. High-priority and low-priority events from the same device can have vectors with different relative priorities in the interrupt controller.
Single APIC address → MSI-X gives each vector its own Message Address. Different vectors can target different CPU cores (or even NUMA nodes), enabling proper multi-core interrupt load balancing without kernel IRQ affinity hacks.
The key architectural difference is where the vector information is stored. MSI stores everything in configuration space (a few registers). MSI-X stores the per-vector information in a table in MMIO space — one 128-bit entry per vector, located in a BAR-mapped region. Only the table pointer and enable bits are in configuration space.
📋 MSI-X Table and PBA
Figure 8 — MSI-X Table. Entry 0 targets CPU core X (APIC FEEFxx0Ch) with vector 0x4A. Entry 1 targets a different CPU core Y (APIC FEEFyy0Ch) with a completely different non-contiguous vector 0x6B. Neither the APIC addresses nor the vector numbers need to be related. The Mask bit in Vector Control [0] individually enables/disables each vector.
Message Data [31:0] — interrupt vector (upper 16 bits zero on x86)
DW3 (+0Ch)
Vector Control [31:0] — bit 0 = Mask (1=masked, 0=enabled). All other bits reserved.
▶ MSI-X Configuration Sequence
Find Cap ID = 11h in the capability list.
Read Message Control: Table Size [10:0] + 1 = total vectors supported. Read Table BIR and Table Offset. Read PBA BIR and PBA Offset.
Set Function Mask bit (Message Control bit 14 = 1) — globally mask all vectors while programming. This prevents spurious interrupts during table setup.
Set MSI-X Enable (bit 15 = 1). Memory Space Enable must already be set in Command register so the BAR is accessible.
Map the BAR identified by Table BIR into kernel virtual address space.
For each vector N to configure: write the MMIO table entry at offset (Table_Offset + N×16):
DW0: Message Address Lower = APIC address of target CPU core
Set Interrupt Disable in Command register (bit 10 = 1) to disable INTx.
Clear Function Mask bit (bit 14 = 0) — all configured vectors are now active.
Register ISR handlers for each allocated vector.
Function Mask is atomic. Setting Function Mask before programming the table prevents the device from generating interrupts with partially-written table entries — an entry with old Address but new Data would deliver a spurious interrupt to the wrong CPU. Always use Function Mask as a critical section wrapper around MSI-X table updates.
📋 INTx vs MSI vs MSI-X — Decision Guide
Figure 9 — Interrupt mechanism selection guide. INTx is a fallback for legacy scenarios. MSI is the baseline for PCIe-native devices with simple interrupt needs. MSI-X is the correct choice for any device with multiple queues, multiple CPUs, or more than 32 event types. Modern drivers (Linux, Windows) detect and prefer MSI-X automatically when it is present.
Property
INTx (Virtual)
MSI
MSI-X
Max vectors per function
4 (INTA–INTD)
32
2048
Vector contiguity required
N/A
Yes — contiguous
No — any vectors
Per-vector CPU targeting
No — single IRQ
No — one APIC addr
Yes — per-entry address
Trigger model
Level-sensitive
Edge-triggered
Edge-triggered
Sharing between devices
Yes (wire-ORed)
No
No
Configuration location
Config space registers
Config space capability
Config space + MMIO BAR table
Interrupt storm risk
High (shared IRQ)
Low
Very low (per-vector masking)
Per-vector masking
No
Optional (Mask Bits reg)
Always (per table entry)
Boot-time usability
Yes (BIOS)
Limited (OS required)
No (requires OS + driver)
Mandatory for PCIe
Yes (always emulated)
Yes (must implement)
No (optional)
⚡ Interrupts in Gen 6
All three interrupt mechanisms — INTx virtual wire TLPs, MSI memory writes, and MSI-X table-based writes — work identically in Gen 6. The interrupt TLP formats, capability structure layouts, and configuration sequences are unchanged. Gen 6’s changes are entirely in the Physical Layer; the Transaction Layer is the same.
What changes in Gen 6 interrupt practice:
MSI-X vector counts saturate. AI accelerators running thousands of parallel compute kernels may approach or reach the 2048 MSI-X vector limit. Architectures using SR-IOV + MSI-X across hundreds of VFs can hit this limit on a single physical function. Future spec extensions (beyond PCIe 6.0) may address this, but for PCIe 6.0 the limit remains 2048.
Interrupt latency still matters. At Gen 6 speeds (64 GT/s PAM4), the throughput is enormous but interrupt latency is still a few microseconds due to PCIe round-trip propagation. High-frequency trading, low-latency storage, and real-time control systems still need careful IRQ affinity and CPU isolation to achieve sub-microsecond interrupt response.
CXL devices use MSI-X. CXL.io devices present as standard PCIe endpoints to interrupt management. They use MSI-X for all interrupt delivery. CXL.cache and CXL.mem protocols have their own notification mechanisms but these are not PCIe interrupts — they operate through CXL-specific MMIO channels.
IDE-protected interrupts. When PCIe IDE (Integrity and Data Encryption) is enabled, MSI writes are encrypted like any other TLP. The interrupt value is decrypted by the Root Complex before delivery to the APIC, ensuring that compromised MSI writes cannot inject false interrupt vectors even with physical access to the PCB.