PCIe Series — PCIe-20: Base Address Registers (BARs) — VLSI Trainers
PCIe Series · PCIe-20
Base Address Registers (BARs)
How PCIe devices request address space from software — the full BAR bit encoding, Memory vs I/O BARs, 32-bit vs 64-bit addressing, prefetchable vs non-prefetchable, the three-step sizing algorithm, the Expansion ROM BAR, 64-bit BAR pairing, unimplemented BARs, and Gen 6 implications.
📋 Why BARs Exist
A PCIe device has internal registers and storage — control registers, status registers, device memory, DMA ring buffers. Software needs to be able to read and write these locations over the PCIe fabric. To do that, each location must have an address in one of the three PCIe address spaces: Memory, I/O, or Configuration.
Configuration space is used for standardised hardware identification and capability registers. That leaves Memory and I/O space for device-specific registers. But the device itself cannot choose which addresses to use — that would create conflicts when multiple devices compete for overlapping address ranges. Address assignment is exclusively the job of system software (BIOS during POST, then the OS during enumeration).
The Base Address Registers solve this negotiation problem. The device designer hardcodes the lower bits of each BAR to communicate what kind and how much address space the device needs. Software reads these hardcoded bits, allocates a suitable range from available address space, and programs the upper bits with the base address of that allocation. From then on, any TLP whose address falls within a programmed BAR range is claimed and processed by the device.
📋 Where BARs Live
Figure 1 — BAR locations. Type 0 (endpoint) has six 32-bit BAR slots. Type 1 (bridge) has only two. A 64-bit BAR consumes two consecutive slots — so a device using three 64-bit BARs uses all six Type 0 slots and has no room for additional 32-bit BARs.
Each BAR slot is always 32 bits wide in the hardware register. A 64-bit address requirement uses two adjacent slots: the lower BAR holds bits [31:0] of the address and the upper BAR holds bits [63:32]. Software reads both, writes both, and treats them as a single logical 64-bit register.
📋 Bit 0 — Memory vs I/O
The single most important bit in any BAR is bit 0 — it determines whether the BAR requests memory address space or I/O address space. This bit is hardcoded by the device designer and cannot be changed by software.
Figure 2 — BAR bit 0 determines the address space type. When bit 0 = 0, the BAR requests MMIO space and bits [2:1] and bit 3 carry further encoding. When bit 0 = 1, the BAR requests I/O port space and the remaining lower bits have no meaning for the memory type.
📋 Memory BAR Bit Fields
A Memory BAR (bit 0 = 0) uses bits [3:0] to fully encode its type. The upper bits [31:4] or higher are the writable base address field.
Figure 3 — Memory BAR full layout. Bit 0 = 0 identifies a memory BAR. Bits [2:1] are the Type field selecting 32-bit or 64-bit decoding. Bit 3 is the Prefetchable flag. All remaining upper bits are the writable Base Address field — their count is determined by how many lower bits the device designer hardwires to 0 to specify the required size alignment.
📋 Type Field — 32-bit vs 64-bit
The Type field at bits [2:1] of a Memory BAR tells software whether the device can be placed anywhere in the 32-bit address space or whether it requires (or supports) placement anywhere in the full 64-bit address space.
Figure 4 — 32-bit vs 64-bit BAR placement. A 32-bit BAR must be placed below 4 GB — software may not assign a base address with bits [63:32] set. A 64-bit BAR can be placed anywhere — software may assign any 64-bit address. The 64-bit BAR uses two consecutive BAR slots, so a device with one 64-bit BAR consumes two of the six available Type 0 slots.
Why 64-bit BARs matter for modern devices. Systems with large DRAM (256 GB+ servers) have memory maps that place many MMIO regions above the 4 GB boundary. An AI accelerator with 80 GB of HBM must declare a 64-bit BAR — a 32-bit BAR cannot hold an address above 4 GB. At Gen 6 bandwidth, devices also tend to have large MMIO windows (tens of gigabytes) that require 64-bit placement regardless of system RAM size.
📋 Prefetchable Bit — What It Really Means
Bit 3 of a Memory BAR declares whether the region is prefetchable. This is a promise from the device designer to software about how the device behaves when its memory region is read:
Figure 5 — Prefetchable vs non-prefetchable. The bit 3 declaration controls how CPU caches and PCIe bridges may handle the memory region. Non-prefetchable (0) means every read is an actual device access — no caching, no speculation. Prefetchable (1) means the CPU and bridges may cache, prefetch, and merge writes freely without unexpected side effects.
Practical impact on CPU MTRR / PAT settings
On x86 systems, the CPU’s MTRR (Memory Type Range Registers) or PAT (Page Attribute Table) must be configured to match the BAR’s prefetchable declaration:
Non-prefetchable BAR → CPU maps the region as UC (Uncacheable) or UCMINUS. Every read and write goes directly to the device with no CPU buffering.
Prefetchable BAR → CPU may map as WC (Write Combining) or WB (Write-Back if the device supports it). Write Combining coalesces multiple small writes into single larger burst writes, dramatically improving GPU framebuffer write throughput.
Setting a non-prefetchable BAR region as WC is a software bug — speculative reads to a status register would change device state silently, causing unpredictable behaviour. The BAR’s bit 3 is the hardware contract that drives this CPU configuration decision.
📋 I/O BAR Bit Fields
When BAR bit 0 = 1, the BAR requests I/O port address space. The format is simpler than a Memory BAR — there is no type field and no prefetchable bit.
Figure 6 — I/O BAR layout. Bit 0 = 1 identifies an I/O BAR. Bit 1 is reserved (must be 0). Bits [31:2] are the writable I/O base address. The minimum I/O allocation is 4 bytes (bit 2 is the lowest possible writable address bit). In practice, I/O allocations are always 4 KB aligned due to I/O window granularity in bridges.
📋 The Sizing Algorithm
Software cannot read the size of a BAR directly. The size is encoded by how many lower address bits the device designer has hardcoded to 0. To discover the size, software uses a three-step write-and-read procedure:
Figure 7 — BAR sizing algorithm. Step ①: save current value, then write 0xFFFFFFFF. Step ②: read back, mask type bits, find lowest 1-bit → that position is log₂(size). Step ③: allocate a region of that size, write the base address, enable the BAR. Between steps ① and ③ software must also clear Memory Space Enable in the Command register to prevent accidental device claims during sizing.
How software computes size from the read-back value
After writing all-1s and reading back the BAR, software:
Saves the raw value.
Clears the type bits: for a Memory BAR, mask out bits [3:0] (set them to 0). For an I/O BAR, mask out bits [1:0].
Bitwise-negate the result: flip all bits. The result is (size − 1) in the lower bits.
Add 1 to get the size.
Equivalently: size = ~(readback & ~0xF) + 1 for Memory BARs. This formula works for any BAR size from 16 bytes upward.
▶ Example: 32-bit Non-Prefetchable Memory BAR
A device needs 4 KB of non-prefetchable MMIO for its control registers. The device designer hardcodes BAR0 as follows:
BAR1 bits [2:1] = 10b → 64-bit type (next BAR = upper 32 bits)
BAR1 bit 0 = 0 → memory
BAR2 bits [31:0] = all writable (upper 32 bits of 64-bit address)
Figure 8 — 64-bit BAR pair. BAR1 holds the lower 32 bits and the type encoding. BAR2 holds the upper 32 bits with no special encoding. After sizing, software programs both: BAR1 gets the lower 32 bits of the base address (preserving type bits), BAR2 gets the upper 32 bits. This places the 64 MB region at address 0x0000_0002_4000_0000h — above the 4 GB boundary.
▶ Example: I/O BAR
A legacy NIC requests 256 bytes of I/O port space on BAR3. The device designer hardcodes:
Lowest 1-bit in address field is at position 8. Size = 2⁸ = 256 bytes of I/O space. Bit 0 = 1 confirms I/O BAR.
After programming base 4000h
0000_4001h
I/O base at 4000h. Device claims I/O transactions to ports 4000h–40FFh. Software must set I/O Space Enable (Command register bit 0).
📋 64-bit BAR Pairing Rules
When a device declares a 64-bit BAR (Type field bits [2:1] = 10b), software must follow specific rules to read it correctly:
The next sequential BAR slot is automatically the upper half. If BAR0 is a 64-bit BAR, then BAR1 is the upper 32 bits. Software must not evaluate BAR1’s type bits — they have no type meaning when BAR1 is the upper half of a 64-bit BAR.
Software skips the upper BAR in its iteration. After reading and evaluating BAR0 as a 64-bit BAR, software jumps its index to BAR2 for the next independent BAR check.
The upper BAR is written with the high 32 bits of the base address. If the allocation is below 4 GB, BAR+1 is written as 0x00000000.
A 64-bit BAR must always start on an even BAR index. BAR0, BAR2, BAR4 may be 64-bit lower halves. BAR1, BAR3, BAR5 may be the upper halves but never the start of a new 64-bit pair.
If BAR5 is a 64-bit lower half, there is no BAR6 in a Type 0 header. This is an illegal configuration — the device designer must not declare BAR5 as 64-bit in a Type 0 header because the required upper half BAR6 does not exist.
Figure 9 — Valid 64-bit BAR pair positions. Three possible pairs exist: BAR0+BAR1, BAR2+BAR3, BAR4+BAR5. A 64-bit BAR must always start on an even slot. The upper half (odd slot) must not be declared as a 64-bit lower half — it has no type meaning of its own.
📋 Unimplemented BARs
If a device does not need all six BAR slots, the unused ones are hardwired to all zeros. When software writes 0xFFFFFFFF to an unimplemented BAR and reads it back, the result is 0x00000000 — all bits return 0 regardless of what was written. Software uses this to detect unimplemented BARs and skip them during allocation.
Scenario
Write 0xFFFF_FFFF, read back result
Software action
Memory BAR requesting 4 KB
FFFF_F000h
Allocate 4 KB, write base address, skip to next BAR
64-bit BAR lower (upper half follows)
FFC0_000Ch
Allocate per size, write lower base, then write upper base to BAR+1, skip BAR+1 in loop
Unimplemented BAR (hardwired 0)
0000_0000h
No allocation. Move to next BAR index.
I/O BAR requesting 256 bytes
FFFF_FF01h
Allocate 256 bytes of I/O space, write base address, skip to next BAR
Always save and restore the BAR during sizing. A real-world complication: the sizing procedure writes 0xFFFFFFFF to the BAR temporarily. If an interrupt fires or another access occurs between the write and the read-back, the device might claim unrelated addresses. The correct procedure is to save the original BAR value before writing 0xFFFFFFFF, complete the read-back, then restore the saved value (or program the new base address) before re-enabling address decoding.
📋 Expansion ROM BAR
The Expansion ROM BAR at offset 30h in Type 0 headers and offset 38h in Type 1 headers is a special BAR used exclusively for the device’s option ROM — firmware executed at POST time to initialise the hardware before the OS loads. Its format is similar to a 32-bit memory BAR but with a different lower-bit encoding.
Figure 10 — Expansion ROM BAR. Bit 0 is the Enable flag — software must set this to 1 after programming the base address for the ROM to be decoded. Bits [10:1] are hardwired to 0, making the minimum ROM alignment 2 KB. The sizing procedure is identical to a regular BAR: write 0xFFFFFFFF, read back, find lowest 1-bit in bits [31:11] for size.
ROM BAR sizing and programming
The sizing procedure is the same as for regular BARs, but the mask applied is 0xFFFFF800 (instead of 0xFFFFFFF0) to isolate the writable address bits. After sizing:
Software allocates a ROM-sized region at an appropriate 32-bit address.
Programs the base address into bits [31:11] with bit 0 = 0 (disabled).
Sets bit 0 = 1 to enable decoding — ROM is now accessible for reading.
BIOS reads and executes the ROM at POST. After POST, bit 0 may be cleared to free the address space if the ROM is no longer needed.
ROM BAR is always 32-bit. There is no 64-bit variant of the Expansion ROM BAR. The ROM must be placed below the 4 GB boundary. This is acceptable because option ROMs execute at POST time in a 16-bit x86 real-mode environment that cannot access addresses above 4 GB.
📋 Enabling and Disabling BARs
BARs do not activate just by having a base address programmed into them. The device only decodes (claims) TLPs targeting a BAR’s address range when the appropriate enable bit in the Command register is set. This separation of programming from activation is intentional — it allows software to safely size and program all BARs before enabling any of them.
BAR type
Enable bit in Command register
Typical software sequence
Memory BAR (bit 0 = 0)
Bit 1 — Memory Space Enable
Clear bit 1. Size all memory BARs. Allocate ranges. Program base addresses. Set bit 1.
I/O BAR (bit 0 = 1)
Bit 0 — I/O Space Enable
Clear bit 0. Size all I/O BARs. Allocate I/O ranges. Program base addresses. Set bit 0.
Expansion ROM BAR
Bit 1 — Memory Space Enable AND ROM BAR bit 0
Both Memory Space Enable and ROM BAR bit 0 must be set for the ROM to respond.
During BAR sizing, software must temporarily clear Memory Space Enable (and I/O Space Enable for I/O BARs) in the Command register. This prevents the device from accidentally claiming TLPs that happen to hit the all-1s value written to the BAR during sizing. After programming all base addresses, software re-enables the relevant bits.
📋 Real-Device BAR Layouts
Different device classes have characteristic BAR layouts that reflect their MMIO needs:
Device class
Typical BAR layout
Reason
NVMe SSD
BAR0+BAR1: one 64-bit prefetchable BAR, 16 KB–256 KB
Single MMIO window for NVMe controller registers, submission/completion queue doorbells
Massive prefetchable BAR for HBM device memory. Requires 64-bit placement and PCIe 4.0/5.0 Resizable BAR.
Resizable BAR (ReBAR). Modern GPUs and AI accelerators need BARs in the tens of gigabytes, but the default BAR size (before ReBAR negotiation) may be only 256 MB for compatibility. The PCIe Resizable BAR Extended Capability (Extended Cap ID 0015h) allows a device to advertise multiple possible BAR sizes and lets software select the largest that fits in available address space. This is how a GPU with 80 GB of device memory exposes all of it through a single BAR in modern systems.
⚡ BARs in Gen 6
The BAR format — bit encoding, sizing algorithm, 64-bit pairing rules, ROM BAR structure, Command register enable bits — is completely unchanged in Gen 6. BARs are a Configuration Space feature; Gen 6 changes only the Physical Layer.
What changes with Gen 6 in practice:
BAR sizes grow dramatically. Gen 6 AI accelerators (512 GB+ HBM systems, coherent memory fabrics) have MMIO windows that dwarf Gen 3 GPUs. A 512 GB device memory BAR requires a 64-bit prefetchable BAR and a 64-bit-capable operating environment. The BAR format supports this — 64-bit BARs can address the full 2⁶⁴-byte space.
Resizable BAR becomes essential, not optional. Systems with Gen 6 devices almost universally need ReBAR to expose full device memory. BIOS must support Resizable BAR and allocate above-4 GB address space for 64-bit prefetchable BARs.
CXL devices use BAR space differently. CXL.mem devices expose large coherent memory regions through BARs in addition to the standard MMIO control register BAR. CXL 3.0 on PCIe 6.0 PHY introduces new Extended Capability structures but the underlying BAR mechanism is unchanged.
PASID + AT field usage requires no BAR changes. Address Translation Services (ATS, PCIe-11) and PASID (per-process address space identifiers) operate using the AT field in TLP headers and Extended Capabilities, not in the BAR format itself.
📋 Quick Reference
Item
Value / Rule
BAR count per device
Type 0 (endpoint): up to 6 × 32-bit slots. Type 1 (bridge): up to 2 × 32-bit slots. A 64-bit BAR uses 2 slots.
Bit 0 = 0
Memory BAR — bits [2:1] and bit 3 carry further meaning
Bit 0 = 1
I/O BAR — bit 1 reserved, no type or prefetchable fields
Bits [2:1] = 00b
32-bit memory decode — base address must be below 4 GB. One BAR slot.
Bits [2:1] = 10b
64-bit memory decode — base address may be anywhere in 64-bit space. Two consecutive BAR slots (lower + upper).
Bit 3 = 0
Non-prefetchable — reads may have side effects. CPU must map as UC/UCMINUS. No speculative reads, no write merging.
Bit 3 = 1
Prefetchable — reads have no side effects. CPU may map as WC. Speculative prefetch and write merging permitted.
Sizing step ①
Clear Memory/IO Space Enable. Save current BAR value. Write 0xFFFFFFFF to BAR.
Sizing step ②
Read BAR back. Mask attribute bits [3:0] (or [1:0] for I/O). Find lowest 1-bit position N. Size = 2^N bytes.
Sizing step ③
Allocate 2^N bytes at an aligned address. Write base address to BAR. Re-enable Space Enable bits.
Unimplemented BAR
Reads all-zeros after write 0xFFFFFFFF. Software skips allocation.
64-bit pair rule
Lower BAR always at even index (BAR0, BAR2, BAR4). Upper BAR is the next slot. Software must not evaluate upper BAR’s type bits.
ROM BAR — offset
Type 0: offset 30h. Type 1: offset 38h. Always 32-bit. Minimum 2 KB alignment.
ROM BAR bit 0
Enable: must be 1 for ROM to be decoded. Memory Space Enable must also be set. Bit 0 = 0 disables ROM even when base is programmed.
Memory Space Enable
Command register bit 1. Must be set before device responds to Memory BAR address range TLPs.
I/O Space Enable
Command register bit 0. Must be set before device responds to I/O BAR address range TLPs.
Prefetchable and MTRR
NP BAR → UC memory type. P BAR → WC or WB memory type (CPU-configurable). Mismatch causes bugs.
Resizable BAR (ReBAR)
Extended Cap 0015h. Device advertises multiple possible BAR sizes. Software selects largest that fits. Essential for Gen 6 AI accelerators.
Gen 6 impact
BAR format, sizing algorithm, pairing rules — all unchanged. BAR sizes grow. ReBAR essential. 64-bit prefetchable BARs required for large device memories.