CA-04: System Bus & Bus Design – Your VLSI Journey Starts Here

🚌What Is a Bus?

A bus is a shared communication pathway connecting two or more devices. The word “bus” comes from the Latin omnibus — “for all” — reflecting that the pathway is available to all connected devices, not just a point-to-point link between two.

Three key properties define a bus:

Shared medium: Multiple devices connect to the same physical wires. Any signal placed on the bus by one device is simultaneously visible to all other devices.
Broadcast: Transmission is inherently broadcast — a signal from device A reaches all devices B, C, D at the same time.
One transmitter at a time: Because the medium is shared, only one device may transmit at any given moment. If two devices drive the bus simultaneously their signals overlap and corrupt each other — this is called bus contention.

Bus organisation: A bus is usually a group of parallel wires bundled together, each carrying one bit. A 32-bit data bus is 32 separate single-bit lines. In physical hardware, buses appear as: parallel copper traces on a PCB, ribbon cables, or edge-connector slots (like PCI/PCIe slots on a motherboard).

Figure 1 — Bus as shared medium: one transmits, all receive

The CPU is transmitting — its signal is broadcast to all connected devices simultaneously. Memory, I/O Module 1, I/O Module 2, and the DMA controller all see the signal at the same time. The address on the address bus identifies which device should respond. Only the addressed device acts on the data.

🔌Bus Structure — Three Line Types

A system bus typically consists of 50 to 100 separate lines grouped into three functional sets. Each line is a single-bit channel; together they form the full bus:

Figure 2 — System bus: address, data, and control line groups

The three system bus line groups. Address lines (blue) run unidirectionally from CPU to all devices — they specify which memory location or I/O port is being accessed. Data lines (orange) are bidirectional — the CPU writes data to memory on a write, or reads data from memory back to the CPU. Control lines (purple) carry command and timing signals in both directions.

Bus group	Direction	Width	What it carries	Impact on system
Address Bus	CPU → all (unidirectional)	32 or 64 lines	Memory address or I/O port number being accessed	Width determines max addressable memory: 32-bit → 4 GB, 64-bit → 16 EB
Data Bus	Bidirectional	8, 16, 32, or 64 lines	The actual data value being read or written	Width determines transfer bandwidth: 64-bit bus moves 8 bytes per cycle vs 1 byte for 8-bit
Control Bus	Bidirectional	~10–20 lines	Read/write commands, IRQ, clock, ACK, bus grant/request	Determines bus protocol richness and arbitration capability

🔍 Worked Example — Data bus width and instruction fetch efficiency

Scenario: A CPU has an 8-bit data bus. Each instruction is 16 bits (2 bytes). How many memory accesses are needed per instruction fetch?

Answer: 2 memory accesses — the CPU must fetch the high byte first, then the low byte. This halves effective instruction throughput.

Solution: Widen the data bus to 16 bits → 1 access per instruction fetch. This doubles throughput with no change to clock frequency. Most modern CPUs use 64-bit data buses and fetch multiple instructions per cycle.

Address bus example: The Intel 8080 (1974) had a 16-bit address bus → 2¹⁶ = 64 KB maximum memory. The Intel 8086 (1978) had a 20-bit address bus → 1 MB. The 80386 (1985) had a 32-bit address bus → 4 GB. Modern 64-bit processors have 48-bit physical address buses → 256 TB.

🎛️Control Lines Reference

The control bus carries all the signalling needed to orchestrate bus transactions. Each line has a specific function:

Control signal	Direction	Meaning
Memory Write	CPU → Memory	Data on the data bus should be written to the address on the address bus
Memory Read	CPU → Memory	Data at the addressed memory location should be placed on the data bus
I/O Write	CPU → I/O	Data on the data bus should be output to the addressed I/O port
I/O Read	CPU → I/O	Data from the addressed I/O port should be placed on the data bus
Transfer ACK	Slave → Master	Data has been accepted from or placed on the bus (handshake acknowledgement)
Bus Request	Device → Arbiter	A module (e.g. DMA) needs to gain control of the bus
Bus Grant	Arbiter → Device	The requesting module has been granted control of the bus — it may now drive address and data
Interrupt Request	Device → CPU	An interrupt is pending — the device needs CPU attention
Interrupt ACK	CPU → Device	The pending interrupt has been recognised — the CPU is about to handle it
Clock	Clock source → all	Synchronises all bus operations to a common clock edge (synchronous buses)
Reset	CPU/System → all	Initialises all modules to a known power-on state

🔗Single vs Multiple Bus

A single system bus connecting all components is the simplest design, but it creates a bottleneck as the system grows. Adding more devices increases contention — only one device can transmit at a time, so traffic from many devices queues up.

Single bus problems: (1) Propagation delay increases as more devices load the bus — the bus takes longer to settle. (2) If aggregate data transfer demand approaches bus bandwidth, the bus becomes a bottleneck. A single fast CPU paired with a single slow bus will idle most of the time waiting for memory. Solution: multiple buses at different speed tiers.

Figure 3 — Single bus vs multi-bus hierarchy

Single bus (left): all devices share one path — fast CPU is delayed by slow disk or USB traffic. Multi-bus hierarchy (right): high-speed bus handles CPU-cache-RAM traffic; a bridge connects to a slower expansion bus for I/O devices. The bridge isolates the two domains so slow I/O does not stall fast CPU-memory operations.

🧩Elements of Bus Design — Overview

Five elements must be specified when designing a bus. Together they determine the bus’s performance, cost, and complexity:

① BUS TYPE — Dedicated vs Multiplexed

Are address and data lines separate wires, or do they share the same lines at different times?

② ARBITRATION — Centralised vs Distributed

When multiple devices want the bus simultaneously, which device decides who gets it?

③ TIMING — Synchronous vs Asynchronous

Are bus events locked to a clock, or do they use handshake signals to self-time?

④ BUS WIDTH — Address & Data widths

How many address bits (determines max memory)? How many data bits (determines bandwidth)?

⑤ DATA TRANSFER TYPE — Read, Write, Block…

What kinds of transfers does the bus protocol support — single word, burst, read-modify-write?

①Element 1 — Bus Type: Dedicated vs Multiplexed

Type	How it works	Advantage	Disadvantage
Dedicated	Separate physical lines permanently assigned to address, separate lines for data. Both sets active simultaneously.	Highest throughput — address and data transfer can overlap. Simpler control logic. Lower latency.	More physical wires needed — higher pin count, larger connectors, more PCB traces.
Multiplexed	Address and data share the same lines. An “address valid” control signal indicates when the lines carry an address; “data valid” indicates when they carry data.	Fewer lines → lower pin count → cheaper packaging, smaller connectors, simpler PCB layout. Critical for ICs with limited pins.	More complex control logic. Address and data cannot transfer simultaneously — adds latency. Lower peak throughput.

Real example — Intel 8086: The original 8086 used a multiplexed address/data bus (AD0–AD15). The same 16 pins carried the address in the first clock cycle and data in subsequent cycles. The ALE (Address Latch Enable) signal told external logic when to latch the address. This saved 16 pins but required an external 8282/8283 latch chip — exactly the trade-off between cost and complexity described above.

②Element 2 — Arbitration: Centralised vs Distributed

When more than one device wants bus access simultaneously (CPU, DMA controller, and I/O module all need the bus), arbitration determines which device gets control. Only one device may be the bus master at any time — others must wait.

Figure 4 — Bus arbitration: centralised (left) vs distributed (right)

Centralised arbitration (left): one Bus Arbiter receives Bus Requests from all devices and sends Bus Grants. Simple but a single point of failure. Distributed arbitration (right): each device contains its own priority logic and negotiates access via shared arbitration wires — more resilient but more complex per device.

③Element 3 — Timing: Synchronous vs Asynchronous

Timing defines how bus events are coordinated between devices. The two approaches are fundamentally different in their clock relationship:

Figure 5 — Synchronous vs asynchronous bus timing for a read operation

Synchronous (top): all events happen at predetermined clock edges — simple, fast, but all devices must run at the same speed. Asynchronous (bottom): MSYN/SSYN handshake signals allow any-speed slave — the master waits until the slave signals data ready, regardless of how many cycles that takes.

Aspect	Synchronous	Asynchronous
Clock	Shared clock line — all events on clock edges	No clock — handshake signals only
Speed	Faster — no handshake overhead	Slower per transfer — handshake adds latency
Device speed	All devices must run at the same bus speed	Works with any-speed device — slave dictates its own readiness
Complexity	Simple — count clock edges	More complex — handshake state machine needed
Max cable length	Limited by clock skew	Longer cables possible — no skew problem
Examples	PCI, DDR SDRAM, SPI	Original Unibus, some I²C implementations

④Element 4 — Bus Width

Bus width affects two separate system properties — address bus width affects memory capacity; data bus width affects bandwidth:

Bus	Width	Impact	Example
Address bus width	More bits = larger address space	32-bit → 2³² = 4 GB max memory. 64-bit → 2⁶⁴ = 16 EB max memory	Intel 8080: 16-bit → 64 KB. x86-64: 48-bit physical → 256 TB
Data bus width	More bits = more data per cycle	8-bit bus: 1 byte/cycle. 64-bit bus: 8 bytes/cycle — 8× bandwidth for same clock rate	8088: 8-bit external. 8086: 16-bit. Pentium: 64-bit. Modern CPUs: 64-bit (DDR5: 64-bit + 8 ECC)

Key insight — address vs data width are independent: The Intel 8088 had a 16-bit internal data bus but only 8-bit external data bus — to save pins on the chip package. It accessed memory twice per 16-bit word fetch. The 8086 had a 16-bit external data bus and was faster despite the same clock rate. Data bus width directly determines instruction fetch and data transfer throughput.

⑤Element 5 — Data Transfer Types

A bus protocol defines which types of data transfers it supports. All buses support basic read and write; richer protocols add compound and burst operations:

Transfer type	Description	Use case
Read	Slave → Master: master requests data at an address; slave places it on the data bus	CPU reads from memory or I/O port
Write	Master → Slave: master sends address + data; slave stores it	CPU writes to memory or I/O port
Read-modify-write	Atomic read then write to the same address without releasing the bus between operations	Semaphore operations, test-and-set, compare-and-swap — critical for multiprocessor synchronisation
Read-after-write	Write followed immediately by a read from the same address to verify the write succeeded	Verifying writes to I/O-mapped hardware registers
Block transfer	Multiple consecutive words transferred in a burst — one address phase, multiple data phases	Cache line fill (typically 64 bytes = 8 × 64-bit words), DMA transfers, disk sector reads

Why block transfer matters: A cache line fill requires 64 bytes. With a 64-bit data bus at 100 MHz, fetching 64 bytes one word at a time with full address-data-ACK cycles = 8 × 3 cycles = 24 cycles minimum. With a burst transfer — one address phase + 8 back-to-back data phases = 9 cycles. Block transfers are the key reason modern memory systems can sustain high bandwidth despite long initial latency.

🔬VLSI Connections

🔬 System bus → AMBA (AXI / AHB / APB) in modern SoCs

Every concept in this article maps directly to the AMBA bus family used in ARM-based SoCs. The three bus line types (address, data, control) become the AXI4 channel signals: AWADDR/ARADDR (address), WDATA/RDATA (data), and AWVALID/AWREADY/WVALID/WREADY/BVALID/BREADY (control/handshake). AXI4’s separate read and write channels are the “dedicated bus” approach applied at the channel level. Synchronous timing is used — all signals are sampled on the rising clock edge. Bus arbitration is implemented in the AXI interconnect fabric (crossbar or bus matrix) — the functional equivalent of the centralised arbiter. The AMBA specification is essentially a formal, standardised version of the bus design elements you just learned.

🔬 AXI4 handshake = asynchronous-style protocol over a synchronous bus

AXI4’s VALID/READY handshake is philosophically identical to the MSYN/SSYN asynchronous handshake in Figure 5 — but implemented on a synchronous (clocked) bus. The master asserts VALID when it has a valid address/data; the slave asserts READY when it can accept. A transfer occurs only when both VALID and READY are HIGH on the same clock edge. This gives the same “any-speed slave” flexibility as asynchronous buses while retaining the simplicity of synchronous clocking. Understanding the asynchronous handshake concept from this article is the key to understanding why AXI4 works the way it does — and why VALID/READY can be de-asserted independently, creating the back-pressure mechanism used for flow control across clock domains.

🔬 Bus width → memory interface design (LPDDR5, HBM)

The bus width trade-off is central to memory interface design. LPDDR5 uses a 16-bit data bus per channel to minimise power and pin count on mobile SoCs — width is traded for lower power. HBM (High Bandwidth Memory) used in GPUs and AI accelerators achieves massive bandwidth via a 1024-bit-wide bus across a short interposer — width is maximised at the cost of physical complexity. DDR5 uses a 64-bit data bus plus 8 ECC bits. Every memory PHY you implement or verify expresses bus width as the number of DQ (data) pins — understanding bus width from first principles helps you reason about why HBM delivers 900 GB/s while LPDDR5 delivers 68 GB/s despite similar clock rates.

Summary — CA-04 key points: A bus is a shared broadcast medium — only one device transmits at a time; contention causes corruption. Three bus line types: Address (specifies location, unidirectional), Data (carries value, bidirectional), Control (command + timing + interrupt signals, bidirectional). Five bus design elements: (1) Type — dedicated (separate lines, higher performance) vs multiplexed (shared lines, fewer pins); (2) Arbitration — centralised (single arbiter, simpler) vs distributed (per-device logic, more resilient); (3) Timing — synchronous (clock-based, faster, fixed speed) vs asynchronous (handshake-based, any-speed slave); (4) Width — address width determines max memory; data width determines transfer bandwidth; (5) Data transfer types — read, write, read-modify-write, read-after-write, block transfer. Multiple bus hierarchies solve the single-bus bottleneck by separating fast and slow traffic.

← CA-03: Von Neumann Architecture ↑ Series Index CA-05: Interrupts & Instruction Cycle →