PCIe Advanced Error Reporting (AER) Explained – Your VLSI Journey Starts Here

📋 Why AER Exists

The basic PCI Status register has only a handful of error bits — detected parity error, signalled system error, received master/target abort. These tell you something went wrong but almost nothing about what. PCI error reporting was designed for a parallel shared bus where all devices could observe each other’s signals; a single SERR# or PERR# pin carried the entire error notification.

PCIe replaces physical error pins with in-band error messages and adds a rich standardised logging structure through the Advanced Error Reporting (AER) capability. AER enables software to determine: which specific error type fired (from a taxonomy of 20+ error categories), which transaction caused the error (via the 128-bit Header Log capturing the guilty TLP’s full header), which error fired first when multiple errors accumulate (First Error Pointer), and whether the error is correctable by hardware or requires software intervention.

AER is implemented as an Extended Capability (Cap ID 0001h) in the extended configuration space (offsets 100h+), accessible only via ECAM. All native PCIe endpoints and Root Ports are strongly recommended to implement AER.

📋 Baseline vs AER Error Reporting

PCIe defines two tiers of error reporting:

Figure 1 — Baseline vs AER. Baseline error reporting (mandatory for all PCIe devices) provides coarse error categorisation through the PCIe Capability Device Control register and the PCI-compatible Command/Status register. AER (Extended Capability 0001h) adds fine-grained per-error-type status, masking, severity control, first-error tracking, and the 128-bit Header Log. Without AER, error investigation is essentially impossible in complex systems.

Baseline Device Control error enable bits (PCIe Capability DW2 bits [3:0])

Bit	Field	Effect when 1
0	Correctable Error Reporting Enable	Device sends ERR_COR message when a correctable error is detected
1	Non-Fatal Error Reporting Enable	Device sends ERR_NONFATAL message for non-fatal uncorrectable errors
2	Fatal Error Reporting Enable	Device sends ERR_FATAL message for fatal uncorrectable errors
3	Unsupported Request Reporting Enable	UR errors are reported as non-fatal errors. When 0: UR errors are silently ignored (no message sent). Must remain 0 during enumeration.

📋 Error Taxonomy — Three Severities

Figure 2 — Three error severities. Correctable errors are handled silently by hardware (LCRC retry, DLLP retransmit) — software may receive ERR_COR messages to track frequency trends but no action is required. Non-fatal uncorrectable errors require software attention but the link remains usable. Fatal uncorrectable errors indicate the link may be non-functional and require a reset. Software controls severity promotion: a non-fatal error can be escalated to fatal via the AER Severity register.

📋 Error Messages — ERR_COR, ERR_NONFATAL, ERR_FATAL

PCIe error messages are 3-DW Message TLPs routed to the Root Complex using Route-to-Root-Complex routing. The detecting device sends the appropriate error message when an error occurs and error reporting is enabled. The message carries the Requester ID of the detecting device so the Root Complex knows the source.

Message	Code	Severity	Link still functional?	Software response
ERR_COR	30h	Correctable	Yes	Optional — monitor frequency. Too many correctable errors may indicate hardware degradation.
ERR_NONFATAL	31h	Non-fatal uncorrectable	Yes	Driver must handle. Read AER Uncorrectable Error Status + Header Log. Retry or abort the failed transaction.
ERR_FATAL	33h	Fatal uncorrectable	No	Mandatory reset. Read AER registers before reset. Reset affected device or link segment. Re-enumerate.

Error messages must be enabled at every bridge in the path. The SERR# Enable bit (Command register bit 8) on each bridge between the error source and the Root Complex must be set to 1 for the error message to propagate upstream. If any bridge in the path has SERR# Enable = 0, the error message is silently dropped at that bridge. The AER-specific enabling (Device Control bits [2:0]) on the source device must also be set.

📋 AER Capability Structure Layout

Figure 3 — AER Capability structure register map. Six core DW groups: Extended Cap Header, Uncorrectable Status/Mask/Severity (3 registers), Correctable Status/Mask (2 registers), AECR (1 register), Header Log (4 DWs). For Root Complex ports only, three additional registers follow at 12Ch–134h: Root Error Command, Root Error Status, Error Source ID. All Status registers are RW1C (write 1 to clear). All Mask and Severity registers are RWS (read/write sticky).

📋 Uncorrectable Error Status Register (offset 104h)

Each bit corresponds to one specific uncorrectable error type. Hardware sets the bit when the error is detected, regardless of masking or severity settings. Software clears bits by writing 1 to them (RW1C). Multiple bits can be set simultaneously.

Bit	Error Name	Default Severity	Description
4	Data Link Protocol Error	Fatal	DLL received ACK/NAK with a sequence number that doesn’t match any unacknowledged TLP or the ACKD_SEQ number. Indicates protocol violation at Data Link Layer.
5	Surprise Down Error	Fatal	Physical Layer reports LinkUp = 0 unexpectedly — link communication failed. Only valid for downstream ports. Fatal because the link is no longer communicating.
12	Poisoned TLP Received	Non-Fatal	Received a TLP with the EP (Error Poisoned) bit set. Data in the TLP is known to be corrupt. Default Non-Fatal because some devices can handle poisoned data (e.g. audio stream).
13	Flow Control Protocol Error	Fatal	Flow control credits exceeded or invalid FC DLLP received. Fatal because FC violations indicate the device is not maintaining correct buffer accounting.
14	Completion Timeout	Non-Fatal	A non-posted request was sent but no completion arrived within the configured timeout period (default 50 µs–50 ms). Non-fatal because a retry is often possible.
15	Completer Abort	Non-Fatal	Received a completion with Completer Abort (CA) status — the completer had a programming violation or internal error and could not complete the request.
16	Unexpected Completion	Non-Fatal	Received a completion that does not match any outstanding request tag. May be a mis-routed completion. Advisory non-fatal (handled as ERR_COR) in some scenarios.
17	Receiver Overflow	Fatal	More TLPs arrived than the receive buffer could hold — buffer overflow. Fatal because data was lost.
18	Malformed TLP	Fatal	TLP header violated formatting rules — bad length, mismatched byte enables, payload exceeds Max Payload Size, illegal type field, etc. Fatal because it indicates a serious protocol violation.
19	ECRC Error	Non-Fatal	ECRC check failed on a received TLP — data was corrupted end-to-end. Only set if ECRC checking is enabled. Non-fatal as a retry may succeed.
20	Unsupported Request	Non-Fatal	Completer could not handle the request type. Request was correctly formed but unsupported — e.g. wrong request type for this device.
21	ACS Violation	Non-Fatal	TLP violated Access Control Services policy at a switch port — e.g. peer-to-peer DMA when ACS source validation rejected the requester ID.
22	Uncorrectable Internal Error	Non-Fatal	Internal device error that could not be corrected by hardware. Device-specific — what constitutes an internal error is implementation-defined.
23	MC Blocked TLP	Non-Fatal	A multicast TLP was blocked by an egress port configured to deny forwarding multicast to untranslated addresses.
24	AtomicOp Egress Blocked	Non-Fatal	An AtomicOp TLP was blocked at an egress port that does not allow AtomicOps to flow to the downstream device.
25	TLP Prefix Blocked Error	Non-Fatal	A TLP containing an End-to-End TLP Prefix was blocked at an egress port configured to not forward such TLPs.

📋 Uncorrectable Error Severity Register (offset 10Ch)

The Severity register has the same bit positions as the Uncorrectable Error Status register. Each bit controls whether the corresponding error type is treated as Fatal (1) or Non-Fatal (0). The default values reflect the PCIe specification’s judgment of how serious each error is, but software can change them based on application requirements.

Figure 4 — Default severity values. Fatal errors are those that fundamentally break the link protocol or overflow buffers. Non-fatal errors are those where the link remains functional and the error is recoverable by software retry or device driver intervention. Software can override any severity bit — for example, escalating Completion Timeout to Fatal in a mission-critical storage controller that must not silently lose I/O requests.

📋 Uncorrectable Error Mask Register (offset 108h)

The Mask register prevents an error message from being sent for the corresponding error type, even though the error still sets its Status bit and is still logged by the First Error Pointer. Masking is useful when a particular error type is expected in a specific context and sending an error message would be misleading.

Mask = 0 (default): error message is sent if reporting is enabled (Device Control bit set) and severity selects the message type.
Mask = 1: no error message is sent for this error type. The Status bit is still set by hardware. The error may still be logged in the First Error Pointer and Header Log.

Common masking use cases: masking Completion Timeout during hot-plug removal (completions are expected to not return), masking Unsupported Request during enumeration probing (UR responses are expected when a BAR size is read as 0xFFFFFFFF), masking advisory errors that generate noise in specific platform configurations.

📋 Correctable Error Status Register (offset 110h)

Correctable errors are automatically fixed by hardware — the bit is set purely for logging purposes. All correctable errors are reported with ERR_COR messages (if enabled) regardless of severity — there is no correctable error severity register because by definition all correctable errors use ERR_COR.

Bit	Error Name	Description
0	Receiver Error	Physical Layer detected an error in an incoming packet — 8b/10b code violation, disparity error, or 128b/130b sync header error. Packet discarded. Link Layer informed. Buffer space released.
6	Bad TLP	Data Link Layer received a TLP with a bad LCRC, an out-of-sequence Sequence Number, or an incorrectly nullified packet. Packet discarded. NAK DLLP sent to trigger retransmission from the Replay Buffer.
7	Bad DLLP	Data Link Layer received a DLLP with a CRC failure. DLLP dropped. A subsequent DLLP of the same type is expected to carry the same information.
8	REPLAY_NUM Rollover	The retry counter has rolled over — a set of TLPs has been transmitted four consecutive times without receiving an ACK, and the counter has returned to zero. Hardware automatically retrains the link.
12	Replay Timer Timeout	Transmitted TLPs did not receive an ACK or NAK within the allowed timeout. Hardware replays all unacknowledged TLPs from the Replay Buffer.
13	Advisory Non-Fatal Error	An uncorrectable error was downgraded to a correctable advisory notification. The corresponding uncorrectable error bit is also set. An ERR_COR is sent here instead of ERR_NONFATAL to avoid confusing the error source identification.
14	Corrected Internal Error	An internal device error was detected and corrected by hardware without any data loss or improper behaviour. Device-specific — e.g. ECC correction on internal SRAM.
15	Header Log Overflow	The Header Log register capacity has been reached — a subsequent error’s header could not be captured. Only relevant when Multiple Header Recording is enabled.

📋 Correctable Error Mask Register (offset 114h)

Same per-bit structure as the Correctable Error Status register. When a mask bit is set, the corresponding correctable error does not generate an ERR_COR message but still sets its Status bit. The default is all bits clear (all correctable errors generate ERR_COR if enabled).

A common practical use: masking Replay Timer Timeout and Bad TLP in systems with slightly marginal signal integrity — these correctable errors may occur at low frequency during normal operation, and sending ERR_COR messages for every LCRC retry would generate unnecessary interrupt overhead without requiring any corrective action.

📋 Advanced Error Capability and Control Register (AECR — offset 118h)

The AECR contains the First Error Pointer and ECRC control fields. It is the most operationally important register in the AER structure for error diagnosis.

Bit(s)	Field	Access	Description
[4:0]	First Error Pointer	RO	The bit position in the Uncorrectable Error Status register of the first uncorrectable error that fired since the pointer was last updated. When software clears the corresponding Status bit by writing 1, the pointer advances to the next-oldest error in the Status register. Value of 0–31 maps directly to a bit position.
5	ECRC Generation Capable	RO	Hardware supports generating ECRC on outgoing TLPs (sets TD bit in header and appends ECRC DW). Set by designer.
6	ECRC Generation Enable	RW	When 1: device generates ECRC on all outgoing TLPs. The TD bit in the TLP header is set and a 32-bit CRC DW is appended to the TLP after the data payload.
7	ECRC Check Capable	RO	Hardware supports verifying ECRC on incoming TLPs. Set by designer.
8	ECRC Check Enable	RW	When 1: device checks ECRC on incoming TLPs with TD bit set. ECRC failures set the ECRC Error Status bit and (if not masked and if error reporting enabled) generate ERR_NONFATAL.
9	Multiple Header Recording Enable	RW	When 1: device records headers for multiple uncorrectable errors (up to a device-specific count). When 0: only the first error’s header is logged, and subsequent errors set the Header Log Overflow bit.

Using the First Error Pointer. When multiple uncorrectable errors occur simultaneously, all their Status bits are set. The First Error Pointer identifies which bit position fired first — that error’s header was logged in the Header Log. Error handling software should: (1) read First Error Pointer to know which error to investigate first, (2) read Header Log to identify the guilty TLP, (3) service that error, (4) write 1 to the corresponding Status bit to clear it — this causes the First Error Pointer to advance to the next-oldest error.

📋 Header Log — 128-bit TLP Capture

The Header Log register (offsets 11Ch–128h) captures the full 128-bit (4-DW) header of the TLP that caused the first uncorrectable error. Not all error types record a header — only those where the TLP itself was the cause and where the header is meaningful for diagnosis.

Figure 5 — Header Log captures the complete 128-bit TLP header of the first uncorrectable error. DW0 always holds TLP Fmt, Type, TC, TD, EP, Attr, AT, Length. DW1 holds the Requester ID (Bus/Device/Function of the originator), Tag, and Byte Enables. DW2 and DW3 hold the remaining header fields which differ per TLP type — for memory TLPs these are the 64-bit or 32-bit address; for completions these are Completer ID, Completion Status, Byte Count, and Lower Address.

Errors that log headers in the Header Log

Error type	Is header logged?	What the header tells you
Poisoned TLP Received	Yes	Address of the poisoned write, Requester ID of the sender, whether it’s a DMA or completion
Malformed TLP	Yes	Which TLP type violated the formatting rules and what the offending fields were
ECRC Error	Yes	Identifies the TLP whose ECRC failed — address, requester, type
Unsupported Request	Yes	What transaction type was not supported and what address/BDF it targeted
Unexpected Completion	Yes	The Completion’s Completer ID, Requester ID, and Tag that didn’t match any outstanding request
Completion Timeout	No (no TLP to capture)	No header — the error is the absence of a completion, not a received TLP
Data Link Protocol Error	No	DLL error — no TLP header involved
Flow Control Protocol Error	No	FC error — no TLP header involved

📋 Root Complex AER Registers

Root Ports (and Root Complex Event Collectors) have three additional AER registers that endpoints and switch ports do not have. These are the final error collection point for the entire fabric:

Register	Offset from cap start	Key fields
Root Error Command	12Ch	3 enable bits: Correctable Error Reporting Enable [0], Non-Fatal Error Reporting Enable [1], Fatal Error Reporting Enable [2]. When set, the Root Complex generates an MSI/MSI-X interrupt when the corresponding error type is received from downstream.
Root Error Status	130h	ERR_COR Received [0], Multiple ERR_COR Received [1], ERR_FATAL/NONFATAL Received [2], Multiple ERR_FATAL/NONFATAL Received [3], First Uncorrectable Fatal [4], Non-Fatal Error Messages Received [5], Fatal Error Messages Received [6]. Advanced Error Interrupt Message Number [31:27] — MSI/MSI-X vector used for error interrupts.
Error Source Identification	134h	ERR_COR Source ID [15:0] — BDF of the first device that sent ERR_COR. ERR_FATAL/NONFATAL Source ID [31:16] — BDF of the first device that sent ERR_FATAL or ERR_NONFATAL. Read-only sticky (ROS) — persists until cleared by a write.

Error Source ID is the key to error triage. When an error interrupt fires at the Root Complex, the Error Source ID register immediately tells software the BDF of the originating device — no need to poll all devices. Software reads the BDF from Error Source ID, walks to that device’s AER registers, reads the Uncorrectable Error Status to find the error type, reads the Header Log to identify the guilty TLP, and then decides on the response (retry, reset, or declare the device failed).

📋 ECRC — End-to-End CRC

LCRC (Link CRC, added by the Data Link Layer) protects each TLP on a single link segment — it is stripped and re-computed at every switch hop. If data is corrupted inside a switch (internal memory error), the new LCRC covers the corrupted data and the receiver will accept the corrupted TLP without knowing it is wrong.

ECRC (End-to-End CRC) is computed by the originating Requester over the TLP header and data payload, and survives all the way to the final destination. Intermediate devices (switches) forward the ECRC unchanged. The final Completer checks the ECRC and reports an error if it fails. ECRC catches corruption that occurs inside switches — something LCRC cannot do.

Figure 6 — ECRC vs LCRC coverage. LCRC-A protects Link 1 only — stripped and recomputed by the switch. LCRC-B protects Link 2 only. If the switch’s internal memory corrupts the TLP between Link 1 and Link 2, LCRC-B covers the corrupt data and the Root Complex accepts the corrupted TLP. ECRC covers the entire path from originating Endpoint through the switch to the Root Complex — internal switch corruption is detected.

ECRC format details

The ECRC is a 32-bit CRC computed by the originating Transaction Layer over all header bytes and data payload bytes.
Two header bits are variant bits and must be treated as 1 for ECRC computation: bit 0 of the Type field (changes in Config TLPs from Type 1 to Type 0) and the EP bit (can be set by an intermediate device to report a data error). These bits can legally change in transit, so including them in ECRC would cause false failures.
The ECRC DW is appended after the data payload (or after the header for TLPs with no data). The TD (TLP Digest) bit in DW0 of the header is set to indicate ECRC is present.
Both endpoints must have ECRC capability enabled for ECRC to be useful. Switches forward the ECRC DW unchanged — they do not check it unless they have AER ECRC checking capability themselves.

📋 Error Forwarding — Data Poisoning

Data poisoning (also called error forwarding) is a mechanism for a device to indicate that a TLP’s data payload is known to be corrupted, without discarding the TLP. The device sets the EP (Error Poisoned) bit in the TLP header’s DW0. The TLP is then forwarded normally — any device that receives a TLP with EP=1 knows the data is bad.

Figure 7 — Data poisoning use cases. Sending a completion with EP=1 is preferable to no completion at all because the Requester can immediately identify that the round-trip path worked (so the problem is at the Completer or inside a switch) rather than seeing a vague Completion Timeout. For streaming data like audio, the receiver may choose to accept poisoned data with a glitch rather than stalling for error recovery.

Rules for data poisoning

EP can only be set on TLPs that have a data payload — Memory Write requests, I/O Write requests, Configuration Write requests, and Completions with data. A memory read request (no data payload) with EP=1 is undefined — receivers may discard it.
EP cannot be set on control register writes (MMIO control registers, Configuration writes). Poisoned writes to control locations must not modify the register value — the write must be silently discarded.
Switches that detect internal data corruption can set EP on the forwarded TLP if they support error forwarding (optional capability). If they do not support error forwarding, they must report the error as a non-fatal uncorrectable error instead.
Receiving a TLP with EP=1 sets the Poisoned TLP Received bit in the Uncorrectable Error Status register. Whether this triggers an error message depends on the mask and severity settings.

📋 Advisory Non-Fatal Errors

Some uncorrectable errors should be reported as ERR_COR (correctable) rather than ERR_NONFATAL to avoid confusion about the source. The rationale: when multiple devices detect the same underlying error event, only the most appropriate device should send the “real” ERR_NONFATAL. Other detectors send ERR_COR as an advisory notification.

PCIe 1.1 introduced Role-Based Error Reporting — devices compliant with 1.1 or later set the Role-Based Error Reporting bit in Device Capabilities register. These devices follow the advisory non-fatal rules. Older 1.0 devices do not.

The five advisory non-fatal cases where ERR_COR is sent instead of ERR_NONFATAL:

Completer sent UR or CA completion. The completer sends ERR_COR (not ERR_NONFATAL) because the Requester is better positioned to report the uncorrectable error when it receives the UR/CA completion.
Intermediate device detected poisoned TLP. A switch forwarding a poisoned TLP sends ERR_COR. The final destination is the right device to send ERR_NONFATAL if it cannot handle the data.
Destination received poisoned TLP but can handle it. An audio device receiving a poisoned audio packet accepts the data (glitch better than stall) and sends only ERR_COR.
Requester experienced Completion Timeout but can retry. If the requester retries and expects to succeed, it sends ERR_COR for the timeout.
Unexpected Completion received. Always advisory — the real Requester will eventually timeout and send the appropriate error message.

Advisory errors still set the Uncorrectable Status bit. An advisory non-fatal error sets both: the corresponding Uncorrectable Error Status bit (in the AER Uncorrectable register) and the Advisory Non-Fatal Error Status bit (in the AER Correctable register). The uncorrectable bit is set for tracking but no ERR_NONFATAL message is sent — only ERR_COR. Software can distinguish advisory errors by noting that a correctable error message arrived despite an uncorrectable error bit being set.

⚡ AER in Gen 6

The AER capability structure — Cap ID 0001h, all register offsets, all error bit positions, ECRC mechanism, Header Log format, advisory non-fatal rules, error message codes — is completely unchanged in Gen 6. AER is defined at the Transaction Layer, and PCIe 6.0 is a Physical Layer change. The same AER driver code that works on Gen 3 works identically on Gen 6 hardware.

What changes in Gen 6 AER practice:

Aspect	Gen 6 change or new consideration
AER register layout	Unchanged — same offsets, same bit definitions, same error codes
New physical layer error types	Gen 6 adds FEC (forward error correction) at the flit level. FEC corrects bit errors silently — correctly operating FEC does not generate AER events. FEC decode failures that result in corrupted flits will appear as Malformed TLP or ECRC errors in the AER Uncorrectable Status register.
ECRC and flit mode	ECRC is computed at the Transaction Layer before flit encapsulation and checked after flit de-encapsulation. Flit-mode framing is transparent to ECRC — it covers the same TLP header and data payload fields as before Gen 6.
Receiver Overflow	At 64 GT/s, the rate of TLP arrival is much higher — receive buffer overflow errors are more likely if the device has insufficient buffer depth. Ensuring adequate buffer sizing is critical for Gen 6 designs.
REPLAY_NUM Rollover	More likely at high data rates with long links (e.g. retimers add latency) — ACK latency may be proportionally longer relative to the retransmission window. Increasing the completion timeout and replay timer settings may be needed for Gen 6 links with multiple retimers.
IDE (Integrity and Data Encryption)	PCIe 6.0 adds the IDE extended capability (Cap ID 0034h). When IDE is active, TLPs are encrypted. AER errors on encrypted TLPs may not have useful Header Log content (the header fields will be ciphertext). Systems using IDE must factor this into their error investigation procedures.
Error investigation tooling	Standard AER tools (Linux `aer-inject`, Windows AER testing, PCIe Gen 6 compliance test suites) apply without modification. The AER error API is identical.

📋 Quick Reference

Item	Value / Rule
AER Cap ID	0001h — Extended Capability in 100h+ space, accessible only via ECAM
Correctable errors	Fixed by hardware. Status bit set. ERR_COR sent if enabled. No software action needed. Examples: Bad TLP, DLLP CRC, Replay Timer Timeout.
Non-fatal uncorrectable	Not hardware-correctable. Software attention required. Link still functional. ERR_NONFATAL sent. Examples: Completion Timeout, Poisoned TLP, UR, ECRC Error.
Fatal uncorrectable	Link integrity compromised. Reset required. ERR_FATAL sent. Examples: DL Protocol Error, Surprise Down, Receiver Overflow, Malformed TLP.
Status registers	All RW1C (write 1 to clear). Hardware sets on error detection. Software clears after handling.
Mask registers	1=suppress error message for this error type. Status bit still set. First Error Pointer still updated.
Severity register	1=Fatal, 0=Non-Fatal per error type. Default: see table above. Software can escalate severity.
First Error Pointer [4:0]	In AECR (offset 118h). Bit position of first uncorrectable error. Advances when that Status bit is cleared.
Header Log	128 bits (4 DWs at 11Ch–128h). Captures TLP header of first uncorrectable error. Not all error types produce a logged header.
ECRC Generation Enable	AECR bit 6. When 1: device appends 32-bit ECRC DW and sets TD bit in all outgoing TLPs.
ECRC Check Enable	AECR bit 8. When 1: device verifies ECRC on incoming TLPs with TD=1. Failure → ECRC Error Status bit + ERR_NONFATAL.
ECRC variant bits	Type bit 0 and EP bit treated as 1 during ECRC generation/checking — both can legally change in transit.
Data poisoning (EP=1)	Indicates TLP data is known corrupt. Only legal on TLPs with data payload. Poisoned control writes must be discarded by receiver.
Advisory Non-Fatal	Uncorrectable error sent as ERR_COR (not ERR_NONFATAL). Both correctable and uncorrectable Status bits set. Role-Based Error Reporting (Device Cap bit) must be set.
Root Error Command	Offset 12Ch. Three enable bits: ERR_COR [0], ERR_NONFATAL [1], ERR_FATAL [2] interrupt generation. Root Complex only.
Error Source ID	Offset 134h. BDF of first ERR_COR source [15:0] and first ERR_FATAL/NONFATAL source [31:16]. ROS — read-only sticky. Root Complex only.
Error investigation sequence	Read Root Error Status → read Error Source ID → go to source BDF → read Uncorrectable Error Status → read First Error Pointer → read Header Log → decode guilty TLP → service error → clear status bits.
Gen 6 changes	AER format unchanged. FEC failures appear as Malformed TLP or ECRC errors. IDE-encrypted links may have encrypted Header Log content. Higher bandwidth increases risk of Receiver Overflow if buffers insufficient.