Single Port RAM — 128×8
Complete single-port synchronous RAM designs — synchronous write / synchronous read, synchronous write / asynchronous read, read-first and write-first modes, and byte-enable parameterised RAM — with timing diagrams, function tables, and an exhaustive self-checking testbench covering all address locations.
💿 Introduction & Theory
A Single Port RAM has exactly one address bus, one data input bus, and one data output bus. Read and write operations share the same address port — only one can occur per clock cycle. A write-enable signal selects between reading (we=0) and writing (we=1). The 128×8 configuration means 128 memory locations, each 8 bits wide, for a total of 1024 bits (128 bytes).
data_in is written to the location addressed by addr. When we=0, the location is read.Memory Organisation
📋 Port Description & Function Table
| Port | Dir | Width | Description |
|---|---|---|---|
| clk | in | 1 | Clock — active rising edge for write (and sync read) |
| we | in | 1 | Write enable: 1=write, 0=read |
| addr | in | 7 | Address [6:0] — selects one of 128 locations (0x00–0x7F) |
| data_in | in | 8 | Write data [7:0] — captured when we=1 at clock edge |
| data_out | out | 8 | Read data [7:0] — registered (sync) or combinational (async) |
Function Table
| clk edge | we | addr | data_in | Operation | data_out |
|---|---|---|---|---|---|
| ↑ | 1 | A | D | Write: mem[A] <= D | varies by mode |
| ↑ | 0 | A | x | Read (sync): data_out <= mem[A] | mem[A] (next cycle) |
| — | 0 | A | x | Read (async): data_out = mem[A] | mem[A] (immediate) |
| — | x | x | x | No clock edge | Holds previous value (sync) |
🔌 Block Diagram
⚫ Implementation 1 — Synchronous Write / Synchronous Read (Write-First)
Both write and read are registered on the rising clock edge. When a write occurs (we=1), if the read address equals the write address, data_out immediately reflects the newly written value on the same cycle — this is the write-first (or read-new-data) behaviour. This is the most common FPGA BRAM configuration.
// ============================================================ // Module : sp_ram_sync // Config : 128 locations x 8 bits (128 bytes) // Write : Synchronous (rising edge, we=1) // Read : Synchronous (rising edge, we=0) // Behaviour: Write-first -- on simultaneous write+read to same // address, data_out = data_in (new data wins) // FPGA : Maps to Block RAM (BRAM) on Xilinx/Intel/Lattice // ============================================================ `timescale 1ns/1ps `default_nettype none module sp_ram_sync #( parameter DEPTH = 128, // number of locations parameter WIDTH = 8, // bits per location parameter ADDR_W = $clog2(DEPTH) // 7 for DEPTH=128 ) ( input clk, input we, // write enable input [ADDR_W-1:0] addr, // 7-bit address input [WIDTH-1:0] data_in, output reg [WIDTH-1:0] data_out ); // Memory array: DEPTH locations, each WIDTH bits wide reg [WIDTH-1:0] mem [0:DEPTH-1]; always @(posedge clk) begin if (we) begin mem[addr] <= data_in; // synchronous write data_out <= data_in; // write-first: output reflects new data end else begin data_out <= mem[addr]; // synchronous read end end endmodule `default_nettype wire
data_out is assigned data_in directly (bypassing the memory array). The memory array is also updated. This means a read immediately after a write to the same address does not need to wait for the memory array to settle — it reads the bypass path. On Xilinx 7-series and UltraScale FPGAs, this corresponds to the READ_FIRST = false mode of RAMB18/RAMB36.
🔵 Implementation 2 — Read-First Mode
In read-first (also called read-before-write), a simultaneous read and write to the same address returns the old data (the value stored before the write), and then updates the memory. This is useful for implementing FIFOs where you need to read the outgoing value while simultaneously writing the incoming one.
// ============================================================ // Module : sp_ram_read_first // Behaviour: READ-FIRST -- on simultaneous write+read to same // address, data_out = old data (value before write). // Memory is then updated on the same clock edge. // FPGA : Xilinx READ_FIRST mode, Intel "old_data" mode // ============================================================ `timescale 1ns/1ps `default_nettype none module sp_ram_read_first #( parameter DEPTH = 128, parameter WIDTH = 8, parameter ADDR_W = $clog2(DEPTH) ) ( input clk, we, input [ADDR_W-1:0] addr, input [WIDTH-1:0] data_in, output reg [WIDTH-1:0] data_out ); reg [WIDTH-1:0] mem [0:DEPTH-1]; always @(posedge clk) begin // Read ALWAYS happens (old data captured before write) data_out <= mem[addr]; // read first (old value) if (we) mem[addr] <= data_in; // write after (updates memory) end endmodule `default_nettype wire
Scenario: mem[5] currently holds 0xAA. Now write 0xBB to addr=5 while reading addr=5.
Write-First (Impl 1): Read-First (Impl 2):
Cycle N: we=1, addr=5 Cycle N: we=1, addr=5
data_in=0xBB data_in=0xBB
After edge: After edge:
mem[5] = 0xBB mem[5] = 0xBB
data_out = 0xBB (new!) data_out = 0xAA (old!)
(bypasses memory) (read before write happened)
🟠 Implementation 3 — Synchronous Write / Asynchronous Read
Write is registered (requires clock edge), but read is combinational: data_out changes immediately whenever addr changes, without waiting for a clock edge. This maps to distributed RAM (LUT-based) on FPGAs rather than BRAM, and to asynchronous SRAM on ASICs.
// ============================================================ // Module : sp_ram_async_rd // Write : Synchronous (rising edge, we=1) // Read : Asynchronous (combinational) -- no clock needed // data_out changes immediately with addr // FPGA : Maps to Distributed RAM (LUT RAM) // - Xilinx: RAMB16 with async read port // - Faster read access, more flexible, uses LUTs // ASIC : Models asynchronous SRAM interface // ============================================================ `timescale 1ns/1ps `default_nettype none module sp_ram_async_rd #( parameter DEPTH = 128, parameter WIDTH = 8, parameter ADDR_W = $clog2(DEPTH) ) ( input clk, input we, input [ADDR_W-1:0] addr, input [WIDTH-1:0] data_in, output [WIDTH-1:0] data_out // wire, not reg ); reg [WIDTH-1:0] mem [0:DEPTH-1]; // Synchronous write always @(posedge clk) if (we) mem[addr] <= data_in; // Asynchronous read (combinational -- no clock, immediate) assign data_out = mem[addr]; endmodule `default_nettype wire
data_out can glitch briefly when addr transitions between two values (the address bus is not guaranteed to be glitch-free during a transition). If data_out is used combinationally downstream, these glitches may propagate. For robust designs, either: (1) use synchronous read (register the output), or (2) add an output register to latch the async read data on the clock edge. The async read style is preferred only when read latency of zero cycles is mandatory.
🟣 Implementation 4 — Byte-Enable Single Port RAM
Extends the synchronous RAM with a byte-enable signal (be) that controls which bytes within a wider word are written. For a 16-bit wide RAM, be[0] enables the low byte and be[1] enables the high byte. This is the standard pattern used in processor data buses where sub-word writes are common.
// ============================================================ // Module : sp_ram_byteen // Config : 64 locations x 16 bits (2 bytes per word) // be[0] : enables write to bits [7:0] (low byte) // be[1] : enables write to bits [15:8] (high byte) // Use case : 16-bit processor data bus, byte/halfword stores // ============================================================ `timescale 1ns/1ps `default_nettype none module sp_ram_byteen #( parameter DEPTH = 64, parameter WIDTH = 16, // 2 bytes per word parameter NBYTES = WIDTH/8, // 2 byte-enables parameter ADDR_W = $clog2(DEPTH) ) ( input clk, input [NBYTES-1:0] be, // byte enables input [ADDR_W-1:0] addr, input [WIDTH-1:0] data_in, output reg [WIDTH-1:0] data_out ); reg [WIDTH-1:0] mem [0:DEPTH-1]; integer b; always @(posedge clk) begin // Write only the byte(s) whose enable is asserted for (b = 0; b < NBYTES; b = b + 1) begin if (be[b]) mem[addr][b*8 +: 8] <= data_in[b*8 +: 8]; end // Synchronous read (read entire word regardless of be) data_out <= mem[addr]; end endmodule `default_nettype wire
Before write: mem[10] = 0xABCD (16-bit word)
Operation: addr=10, data_in=0x1234, be=2'b01 (only low byte enabled)
After write:
mem[10][15:8] = 0xAB (unchanged -- be[1]=0)
mem[10][ 7:0] = 0x34 (updated -- be[0]=1)
mem[10] = 0xAB34
If be=2'b10: high byte updated, low byte unchanged: 0x12CD
If be=2'b11: full word write: 0x1234
If be=2'b00: no write (read only)
🧪 Comprehensive Testbench
The testbench verifies all three 128×8 RAM variants simultaneously. It covers: write-then-read across all 128 addresses, simultaneous read-write collision (verifying write-first vs read-first behaviour), back-to-back writes, and address boundary conditions.
// ============================================================ // Testbench : sp_ram_tb // DUTs : sp_ram_sync (write-first) // sp_ram_read_first // sp_ram_async_rd // Tests : // 1. Write all 128 locations with unique data (addr^0xA5) // 2. Read all 128 locations and verify // 3. Write-first collision: write+read same addr same cycle // 4. Read-first collision: verify old data returned // 5. Async read: verify immediate response to addr change // 6. Address boundary: addr=0x00 and addr=0x7F // ============================================================ `timescale 1ns/1ps `default_nettype none module sp_ram_tb; reg clk=0, we=0; reg [6:0] addr=0; reg [7:0] data_in=0; wire [7:0] dout_sync, dout_rf, dout_async; sp_ram_sync u_sync (.clk(clk),.we(we),.addr(addr),.data_in(data_in),.data_out(dout_sync)); sp_ram_read_first u_rf (.clk(clk),.we(we),.addr(addr),.data_in(data_in),.data_out(dout_rf)); sp_ram_async_rd u_async (.clk(clk),.we(we),.addr(addr),.data_in(data_in),.data_out(dout_async)); always #5 clk = ~clk; initial begin $dumpfile("sp_ram.vcd"); $dumpvars(0,sp_ram_tb); end integer pass_cnt=0, fail_cnt=0, test_num=0; integer i; reg [7:0] exp; task tick; @(posedge clk); #1; endtask task do_write; input [6:0] a; input [7:0] d; begin addr=a; data_in=d; we=1; tick; we=0; end endtask task do_read_sync; input [6:0] a; input [7:0] expected; input [255:0] msg; begin addr=a; we=0; tick; test_num++; if(dout_sync===expected && dout_rf===expected) begin $display(" PASS [%3d] %s addr=%02h data=%02h",test_num,msg,a,dout_sync); pass_cnt++; end else begin $display(" FAIL [%3d] %s addr=%02h sync=%02h rf=%02h exp=%02h", test_num,msg,a,dout_sync,dout_rf,expected); fail_cnt++; end end endtask initial begin $display("\n======================================================"); $display(" Single Port RAM 128x8 Testbench"); $display("======================================================"); // Phase 1: Write all 128 locations $display("\n --- Phase 1: Write all 128 locations ---"); for(i=0; i<128; i=i+1) do_write(i[6:0], i[7:0] ^ 8'hA5); // data = addr XOR 0xA5 // Phase 2: Read and verify all 128 locations $display("\n --- Phase 2: Read all 128 locations ---"); for(i=0; i<128; i=i+1) do_read_sync(i[6:0], i[7:0] ^ 8'hA5, "READ"); // Phase 3: Write-first collision test $display("\n --- Phase 3: Write-First Collision (addr=0x10) ---"); do_write(7'h10, 8'hAA); // pre-load 0xAA at addr 0x10 addr=7'h10; data_in=8'hBB; we=1; tick; we=0; // Write-first: dout_sync should be 0xBB (new data) test_num++; if(dout_sync===8'hBB) begin $display(" PASS [%3d] Write-First: dout=0xBB (new data)",test_num); pass_cnt++; end else begin $display(" FAIL [%3d] Write-First: dout=%02h exp=BB",test_num,dout_sync); fail_cnt++; end // Read-first: dout_rf should be 0xAA (old data) test_num++; if(dout_rf===8'hAA) begin $display(" PASS [%3d] Read-First: dout=0xAA (old data)",test_num); pass_cnt++; end else begin $display(" FAIL [%3d] Read-First: dout=%02h exp=AA",test_num,dout_rf); fail_cnt++; end // Phase 4: Async read -- immediate response $display("\n --- Phase 4: Async Read Verification ---"); do_write(7'h20, 8'hCC); do_write(7'h21, 8'hDD); addr=7'h20; #1; // no clock edge -- async read test_num++; if(dout_async===8'hCC) begin $display(" PASS [%3d] Async read addr=0x20 -> 0xCC",test_num); pass_cnt++; end else begin $display(" FAIL [%3d] Async read: %02h exp CC",test_num,dout_async); fail_cnt++; end addr=7'h21; #1; // just change addr, no clock test_num++; if(dout_async===8'hDD) begin $display(" PASS [%3d] Async read addr=0x21 -> 0xDD (no clk!)",test_num); pass_cnt++; end else begin $display(" FAIL [%3d] Async read: %02h exp DD",test_num,dout_async); fail_cnt++; end // Phase 5: Boundary addresses $display("\n --- Phase 5: Boundary Addresses ---"); do_write(7'h00, 8'h11); do_write(7'h7F, 8'hFF); do_read_sync(7'h00, 8'h11, "Boundary addr=0x00"); do_read_sync(7'h7F, 8'hFF, "Boundary addr=0x7F"); $display("\n======================================================"); $display(" RESULTS: %0d / %0d PASS | %0d FAIL",pass_cnt,test_num,fail_cnt); $display("======================================================"); if(fail_cnt==0) $display(" ALL TESTS PASSED\n"); else $fatal(1," %0d FAILURE(S)\n",fail_cnt); #20; $finish; end endmodule `default_nettype wire
📈 Simulation Waveform
At the collision cycle (t=5): write-first dout_sync = 0xCC (new data written); read-first dout_rf = 0xBB (old data before this write). The divergence at t=5 is the defining difference between the two modes.
💻 Simulation Console Output
How to Run
# Icarus Verilog iverilog -o ram_sim \ sp_ram_sync.v \ sp_ram_read_first.v \ sp_ram_async_rd.v \ sp_ram_byteen.v \ sp_ram_tb.v vvp ram_sim gtkwave sp_ram.vcd # ModelSim vlog sp_ram_sync.v sp_ram_read_first.v sp_ram_async_rd.v \ sp_ram_byteen.v sp_ram_tb.v vsim -c sp_ram_tb -do "run -all; quit -f"
🔬 Design Analysis & FPGA Mapping
Implementation Comparison
| Module | Write | Read | Collision (RW same addr) | FPGA target | Read latency |
|---|---|---|---|---|---|
| sp_ram_sync | Sync | Sync | Write-first (new data) | Block RAM (BRAM) | 1 cycle |
| sp_ram_read_first | Sync | Sync | Read-first (old data) | Block RAM (BRAM) | 1 cycle |
| sp_ram_async_rd | Sync | Async (comb) | N/A (immediate) | Distributed RAM (LUT) | 0 cycles |
| sp_ram_byteen | Sync (byte-enable) | Sync | Write-first by byte | Block RAM w/ BE | 1 cycle |
FPGA BRAM Inference Rules
Block RAM (BRAM) is inferred when:
- Both read and write are synchronous (clocked)
- Array size is large enough (typically > 16 bits)
- Single or dual port access
- Enables and write-enables are cleanly separated
- No reset of memory contents (RAM, not ROM)
Distributed RAM (LUT RAM) is inferred when:
- Read is asynchronous (combinational output)
- Very small arrays (16 bits or fewer)
- Multiple independent read ports needed
- Synthesis tool cannot find a matching BRAM primitive
- Explicitly constrained with synthesis directives
mem array initialises to x (unknown) by default. For a ROM or pre-loaded RAM, use $readmemh or $readmemb to load from a hex/binary file: initial $readmemh("init.hex", mem);. In synthesis, the same construct initialises the BRAM’s initial content bitstream — on Xilinx and Intel FPGAs, the BRAM configuration supports initial values. For ASICs, initial content must be provided to the foundry as a custom cell or programmed externally.
DEPTH and WIDTH parameters. Common variants: #(.DEPTH(256),.WIDTH(8)) = 256×8 (256 bytes), #(.DEPTH(512),.WIDTH(16)) = 512×16 (1 KB), #(.DEPTH(1024),.WIDTH(32)) = 4 KB. The address width ADDR_W = $clog2(DEPTH) calculates automatically. Xilinx RAMB18 fits 16K bits = 2K bytes; RAMB36 fits 4K bytes. Multiple BRAM primitives are automatically packed by synthesis for larger memories.
