How one physical PCIe device appears as hundreds of separate devices to virtual machines — Physical Functions vs Virtual Functions, the SR-IOV Extended Capability register set, VF BARs, the VF Offset and Stride addressing formula, ARI and the 8-bit function number, IOMMU isolation, and how SR-IOV scales in Gen 6 AI infrastructure.
In a virtualised server running 100 virtual machines, each VM needs network access, storage access, and potentially GPU or accelerator access. The naive approach — one physical NIC per VM — is completely impractical: 100 NICs per server would require 100 PCIe slots, 100 drivers, 100 separate configurations, and hundreds of watts of power.
Software virtualisation (the hypervisor emulates a virtual NIC for each VM) works but has a serious overhead: every network packet sent by a VM must be intercepted by the hypervisor, translated, and forwarded to the physical NIC — adding latency and consuming CPU cycles that should be running VM workloads. At 100 GbE speeds, software emulation becomes a bottleneck.
SR-IOV solves this by moving the virtualisation into the hardware of the PCIe device itself. A single SR-IOV-capable NIC (or storage controller, or GPU) can present up to 256 independent virtual devices to the system — each called a Virtual Function (VF). Each VM is assigned one VF directly, bypassing the hypervisor for data-plane operations. The hypervisor retains control over resource allocation and policy through the Physical Function (PF). The result: near-native I/O performance in virtualised environments with a single physical PCIe card.
From the PCIe fabric’s perspective, SR-IOV VFs are ordinary PCIe functions. They have BDF addresses, they send and receive TLPs, and they respond to memory accesses. The fabric routes TLPs to them exactly as to any other function. The only specialness is inside the physical device, which decodes each VF’s MMIO region and delivers data to the appropriate internal hardware queue.
Standard PCIe addressing allows at most 8 functions per device (3-bit Function field in BDF). For a single SR-IOV device to present many VFs, 8 functions is not enough. ARI (Alternative Routing-ID Interpretation) solves this by repurposing the 5-bit Device Number field in the BDF to extend the Function Number to 8 bits — giving 256 functions per bus.
The SR-IOV Extended Capability (Cap ID 0010h) lives in the PF’s extended configuration space (offset 100h+). VFs do not have this structure — only the PF does. The structure contains the complete control and status registers for creating and managing VFs.
| Register / Field | Offset | Access | Description |
|---|---|---|---|
| VF Migration Capable | 104h bit 0 | RO | When 1: device supports live migration of VFs between physical hosts. Requires VF Migration State Array in PF MMIO space. |
| VF Migration Interrupt Msg# | 104h [31:21] | RO | MSI/MSI-X vector number for VF migration state change notifications. Allows hypervisor to receive interrupt when a VF’s migration state changes. |
| TotalVFs | 10Ch [31:16] | RO | Maximum number of VFs the device can expose simultaneously. Hardware limit. Software may not write NumVFs greater than TotalVFs. Typical values: 16, 32, 64, 128, 256 depending on device. |
| InitialVFs | 10Ch [15:0] | RO | Number of VFs the device presents at power-on even before VF Enable is set. These VFs are available immediately and do not require software enablement. Most devices set this to 0 (no initial VFs). |
| NumVFs | 110h [31:16] | RW | The number of VFs currently active. Software writes this before setting VF Enable. After VF Enable, VFs with numbers 1 through NumVFs are present and visible to the OS. |
| First VF Offset | 114h [15:0] | RO | The routing ID offset from the PF to VF 1. Used in the VF BDF calculation formula. |
| VF Stride | 114h [31:16] | RO | The routing ID increment between successive VFs. VF n+1 has BDF = VF n BDF + VF Stride. |
| VF Device ID | 118h [31:16] | RO | The Device ID that all VFs present. The Vendor ID is the same as the PF’s Vendor ID. Drivers match VFs using this Device ID. |
| Supported Page Sizes | 11Ch | RO | Bitmask of memory page sizes the device can use for VF BAR base address alignment. Bit 0 = 4 KB, bit 1 = 8 KB, bit 2 = 16 KB, etc. |
| System Page Size | 120h | RW | Software writes the page size it will use for VF BAR alignment. Must be one of the supported page sizes. Determines alignment granularity of VF BAR segments. |
| SR-IOV Control Bit | Offset | Access | Function |
|---|---|---|---|
| VF Enable | 108h bit 0 | RW | The master VF enable bit. When set to 1, the VFs numbered 1 through NumVFs become visible as PCIe functions. Setting to 0 disables all VFs. Must write NumVFs before setting this bit. After setting, software must wait before attempting to access VF configuration space — device needs time to initialise VF hardware. |
| VF Migration Enable | 108h bit 1 | RW | Enables live VF migration. Only settable if VF Migration Capable = 1. |
| VF Migration Interrupt Enable | 108h bit 2 | RW | Enables generation of MSI/MSI-X interrupt when VF migration state changes. |
| VF MSE (Memory Space Enable) | 108h bit 3 | RW | When 1: VFs respond to memory TLPs targeting their BAR address ranges. Equivalent to the Memory Space Enable bit in the Command register, but applies to all VFs collectively. Must be set after VF Enable for VFs to respond to MMIO accesses. |
| ARI Capable Hierarchy | 108h bit 4 | RW | Indicates that ARI is being used for VF function numbering. Must be set before VF Enable if ARI is required (which is almost always). Both the switch port above and this bit must be configured for ARI to work. |
Each VF has a unique BDF (Routing ID) computed from the PF’s BDF using the First VF Offset and VF Stride fields. This formula allows the hardware designer to place VFs at any function number positions and with any spacing, as long as the pattern follows:
Routing ID arithmetic is done as 16-bit integers. If the result of the VF n BDF calculation causes the Function number field to overflow past Fn 7 (without ARI) or Fn 255 (with ARI), the BDF rolls over to the next bus number. Software must verify that all VF BDFs fall within valid ranges before enabling VFs.
VFs expose MMIO space to their assigned VMs but they do not have BARs in the Type 0 configuration space header. Instead, the PF’s SR-IOV capability has six VF BAR registers (VF BAR0 through VF BAR5) that define the total MMIO aperture for all VFs collectively. The MMIO aperture is divided equally among all active VFs based on the System Page Size.
The VF BAR sizing procedure in the PF’s SR-IOV capability is identical to regular BAR sizing: write 0xFFFFFFFF, read back, find lowest 1-bit to determine the per-VF-size. The total aperture that must be allocated is per_VF_size × NumVFs, aligned to per_VF_size × NumVFs. Software allocates this total region and writes the base address to the VF BAR register in the SR-IOV Extended Capability. The IOMMU then maps each VM’s VF to the correct slice of this total aperture.
The most critical security requirement for SR-IOV deployments is preventing VMs from accessing each other’s memory. When a VM performs DMA via its assigned VF, the DMA request carries the VF’s Requester ID. The IOMMU uses this Requester ID to look up which address space translation to apply — only addresses in the VM’s allocated IOMMU pages are accessible.
Without ACS (Access Control Services), a PCIe switch between the SR-IOV device and the Root Complex could allow peer-to-peer DMA between two VFs on the same switch — bypassing the IOMMU entirely. A compromised VM could DMA directly to another VM’s memory without the IOMMU seeing the request.
The required ACS features for SR-IOV are: ACS Source Validation (rejects spoofed Requester IDs), ACS Translation Blocking (prevents AT ≠ 00b TLPs from bypassing IOMMU), and ACS P2P Request Redirect (forces all peer-to-peer DMA upstream). All three must be enabled on every switch port in the path from the VF to the Root Complex.
VF Migration allows a running virtual machine (with its VF assigned) to be migrated from one physical server to another without stopping the VM. This is much harder than software-only VM migration because the VF’s hardware state (DMA queues, receive buffers, hardware ring buffer pointers) must be saved and restored on the destination server.
SR-IOV defines a VF Migration State Array in the PF’s MMIO space (pointed to by a PF BAR). Each VF has a migration state register that the hypervisor monitors. During migration, the hypervisor quiesces the VF (stops new DMA), checkpoints the hardware state, transfers it to the destination host, and restores it there. The VF Migration Enable bit in SR-IOV Control enables the interrupt that signals when VF migration state changes.
In practice, VF migration is rarely implemented in hardware — the complexity of saving all device-specific hardware state makes it device-class specific. Most SR-IOV deployments use “cold migration” (VM suspended, VF released, VM re-started on destination server with a new VF) rather than live migration.
| Property | PF Driver (hypervisor) | VF Driver (VM) |
|---|---|---|
| Runs in | Privileged domain (hypervisor, Dom0, host kernel) | Guest VM (unprivileged) |
| Controls | VF creation/destruction (NumVFs, VF Enable), hardware resource allocation, queue configuration | Only its own VF’s data plane — Tx/Rx queues, interrupts |
| Sees | Full PF configuration space including SR-IOV Extended Capability, all VF status | Only the VF’s own configuration space — Vendor ID, Device ID, VF BARs, minimal capabilities |
| DMA address space | Unrestricted (Root Complex identity) with IOMMU | Only the IOMMU pages assigned to this VF’s VM — hardware-enforced isolation |
| MSI-X vectors | PF’s own MSI-X table for management events | VF’s own separate MSI-X table — independent vectors targeting VM’s vCPU |
| Example (NIC) | Intel PF driver: manages switchdev, traffic shaping, VF MAC assignment, VLAN configuration | Intel VF driver (iavf): sends/receives packets on its allocated hardware queue |
| Example (GPU) | NVIDIA MIG: partitions GPU compute resources per VF | CUDA driver in VM: uses its partition without seeing other partitions |
| Device | SR-IOV usage | VF count | Per-VM benefit |
|---|---|---|---|
| 100 GbE NIC (Mellanox ConnectX, Intel E810) | Each VM gets a dedicated hardware Tx/Rx queue pair — direct DMA to VM memory without hypervisor | 64–128 VFs | Near wire-speed networking, sub-microsecond latency in HFT/HPC |
| NVMe SSD with SR-IOV (Samsung PM9A3) | Each VM gets its own NVMe queue pair — isolated namespace access, hardware-enforced I/O isolation | 8–64 VFs | Dedicated I/O bandwidth, no shared queue contention |
| GPU (NVIDIA A100 MIG, AMD MI300X) | GPU is partitioned into slices (Multi-Instance GPU) — each slice presented as a VF | 2–7 MIG instances | GPU compute isolation — each VM gets a fraction of the silicon |
| SmartNIC/DPU (NVIDIA BlueField) | Host NIC ports virtualised as VFs for tenants; DPU runs control plane | 64+ VFs | Zero-touch network offload — encryption, flow steering in hardware |
| SR-IOV in Kubernetes | Each pod gets a VF allocated by the SR-IOV Network Device Plugin | 1 VF per pod | Container networking at NIC line rate — replaces software overlay |
The SR-IOV Extended Capability structure — Cap ID 0010h, all register offsets, the VF Offset + Stride formula, the VF BAR sizing procedure, ARI, the enable sequence — is completely unchanged in Gen 6. SR-IOV is an application-layer feature defined above the Transaction Layer; Gen 6 changes only the Physical Layer.
What changes in Gen 6 SR-IOV deployments:
| Aspect | Gen 6 impact |
|---|---|
| SR-IOV register layout | Unchanged — same Extended Capability, same VF BDF formula, same enable sequence |
| VF count limits | Unchanged — TotalVFs max is 65535, ARI supports 256 functions per bus. The limiting factor is now the device hardware (queue count) not the PCIe spec. |
| Bandwidth per VF | Gen 6 at 64 GT/s × 16 lanes = 512 GB/s raw. With 64 VFs each sharing the link, each VF has access to 8 GB/s average peak — comparable to a dedicated Gen 4 x4 endpoint for each VF. |
| AI accelerator SR-IOV | Gen 6 AI accelerators (next-generation H200 successors, AMD MI400 class) use SR-IOV or MIG-style partitioning to share GPU silicon across tenants. At Gen 6 speeds, VF DMA can sustain the compute density required for LLM inference without PCIe bottleneck. |
| PASID + SR-IOV | Gen 6 workloads increasingly combine SR-IOV with PASID (PCIe-11, PCIe-22) — each compute kernel within a VF can have its own IOMMU address space via PASID, enabling per-process isolation within a VM that itself has a VF assigned. |
| ACS and IDE | Gen 6 adds IDE (TLP encryption, Cap ID 0034h) which can be applied per-VF to protect VF DMA traffic from physical eavesdropping on the PCIe traces or cables. Critical for multi-tenant cloud confidential computing. |
| VF MSI-X vectors | 2048 MSI-X vectors per VF (PCIe spec max) may be reached by AI accelerator VFs with many compute queues. Gen 6 AI workloads may push this limit. |
| Item | Value / Rule |
|---|---|
| SR-IOV Extended Capability ID | 0010h — in PF extended config space (100h+), accessible via ECAM |
| Physical Function (PF) | Standard PCIe endpoint. Contains SR-IOV capability. Managed by hypervisor. One per SR-IOV capable function. |
| Virtual Function (VF) | Lightweight endpoint. Own BDF address. Own VF BARs. No SR-IOV capability of its own. Assigned to VM. |
| TotalVFs | Hardware maximum VF count. Read-only. Software may not set NumVFs > TotalVFs. |
| NumVFs | Write before VF Enable. Sets how many VFs are created. Can only be changed when VF Enable = 0. |
| VF Enable (bit 0) | Write 1 to create VFs. Wait ≥100ms before VF config space access. Write 0 to destroy all VFs. |
| VF MSE (bit 3) | Must be set for VFs to respond to memory TLPs. Independent of Command register MSE in PF. |
| ARI Capable Hierarchy (bit 4) | Set before VF Enable when using ARI. Also requires ARI Forwarding Enable in upstream switch port Device Control 2. |
| First VF Offset | Routing ID offset from PF to VF1. Read-only, set by hardware. |
| VF Stride | Routing ID increment between consecutive VFs. Read-only, set by hardware. |
| VF BDF formula | VF_n = PF_Routing_ID + First_VF_Offset + (n−1) × VF_Stride. n counts from 1. |
| ARI | Extends Function number to 8 bits (256 functions per bus) by setting Device = 0. Required for >8 VFs. |
| VF BAR sizing | Same write-0xFFFFFFFF / read-back procedure as regular BARs. VF BAR total size = per-VF-size × NumVFs. |
| VF MMIO address formula | VF_n base = VF_BAR_base + (n−1) × per_VF_size |
| VF Device ID | All VFs have the same VF Device ID (from SR-IOV Cap register). Vendor ID = PF’s Vendor ID. |
| ACS requirement | ACS Source Validation + P2P Request Redirect must be enabled on all switch ports between SR-IOV device and Root Complex. Linux VFIO verifies this. |
| IOMMU isolation | Each VF’s DMA is isolated by IOMMU per-VF address space. PASID extends this to per-process within the VM. |
| VF config space | Minimal — Vendor ID, VF Device ID, Subsystem IDs, a few mandatory capabilities. No BARs in Type 0 header (BARs are in PF’s SR-IOV Extended Capability). |
| Gen 6 changes | SR-IOV mechanism unchanged. Per-VF bandwidth dramatically higher at 64 GT/s. IDE adds per-VF TLP encryption. PASID + SR-IOV for per-process IOMMU isolation in AI workloads. |