Single Port RAM (128×8) — Design & Testbench in Verilog

💿 Introduction & Theory

A Single Port RAM has exactly one address bus, one data input bus, and one data output bus. Read and write operations share the same address port — only one can occur per clock cycle. A write-enable signal selects between reading (we=0) and writing (we=1). The 128×8 configuration means 128 memory locations, each 8 bits wide, for a total of 1024 bits (128 bytes).

💿

128 × 8

128 addressable locations (7-bit address, 0x00–0x7F), each storing 8 bits. Total: 1024 bits = 128 bytes.

✏

Write Enable

When we=1 at a clock edge, data_in is written to the location addressed by addr. When we=0, the location is read.

🕐

Read Modes

Sync read: data appears one cycle after the read address. Async read: data changes immediately with the address (combinational).

🛠

FPGA Mapping

Synthesis tools infer BRAM (Block RAM) or distributed RAM from these patterns. Sync read maps to BRAM; async read maps to distributed LUT RAM.

Memory Organisation

0x7F (127)

mem[127][7:0]

Highest address

0x7E (126)

mem[126][7:0]

⋮

⋮ (124 locations)

0x01 (1)

mem[1][7:0]

0x00 (0)

mem[0][7:0]

Lowest address

📋 Port Description & Function Table

Port	Dir	Width	Description
clk	in	1	Clock — active rising edge for write (and sync read)
we	in	1	Write enable: 1=write, 0=read
addr	in	7	Address [6:0] — selects one of 128 locations (0x00–0x7F)
data_in	in	8	Write data [7:0] — captured when we=1 at clock edge
data_out	out	8	Read data [7:0] — registered (sync) or combinational (async)

Function Table

clk edge	we	addr	data_in	Operation	data_out
↑	1	A	D	Write: mem[A] <= D	varies by mode
↑	0	A	x	Read (sync): data_out <= mem[A]	mem[A] (next cycle)
—	0	A	x	Read (async): data_out = mem[A]	mem[A] (immediate)
—	x	x	x	No clock edge	Holds previous value (sync)

🔌 Block Diagram

Fig 1 — Single-port RAM 128×8: shared address bus, write path and read path

⚫ Implementation 1 — Synchronous Write / Synchronous Read (Write-First)

Both write and read are registered on the rising clock edge. When a write occurs (we=1), if the read address equals the write address, data_out immediately reflects the newly written value on the same cycle — this is the write-first (or read-new-data) behaviour. This is the most common FPGA BRAM configuration.

sp_ram_sync

128×8 · Sync write · Sync read · Write-first · FPGA BRAM-compatible

Sync / Write-First

// ============================================================
// Module   : sp_ram_sync
// Config   : 128 locations x 8 bits (128 bytes)
// Write    : Synchronous (rising edge, we=1)
// Read     : Synchronous (rising edge, we=0)
// Behaviour: Write-first -- on simultaneous write+read to same
//            address, data_out = data_in (new data wins)
// FPGA     : Maps to Block RAM (BRAM) on Xilinx/Intel/Lattice
// ============================================================
`timescale 1ns/1ps
`default_nettype none

module sp_ram_sync #(
  parameter DEPTH = 128,                    // number of locations
  parameter WIDTH = 8,                     // bits per location
  parameter ADDR_W = $clog2(DEPTH)         // 7 for DEPTH=128
) (
  input                  clk,
  input                  we,             // write enable
  input  [ADDR_W-1:0]  addr,           // 7-bit address
  input  [WIDTH-1:0]   data_in,
  output reg [WIDTH-1:0] data_out
);

  // Memory array: DEPTH locations, each WIDTH bits wide
  reg [WIDTH-1:0] mem [0:DEPTH-1];

  always @(posedge clk) begin
    if (we) begin
      mem[addr] <= data_in;    // synchronous write
      data_out  <= data_in;    // write-first: output reflects new data
    end else begin
      data_out <= mem[addr];   // synchronous read
    end
  end

endmodule
`default_nettype wire

Write-first vs write-then-read timing: In write-first mode, when we=1, data_out is assigned data_in directly (bypassing the memory array). The memory array is also updated. This means a read immediately after a write to the same address does not need to wait for the memory array to settle — it reads the bypass path. On Xilinx 7-series and UltraScale FPGAs, this corresponds to the READ_FIRST = false mode of RAMB18/RAMB36.

🔵 Implementation 2 — Read-First Mode

In read-first (also called read-before-write), a simultaneous read and write to the same address returns the old data (the value stored before the write), and then updates the memory. This is useful for implementing FIFOs where you need to read the outgoing value while simultaneously writing the incoming one.

sp_ram_read_first

128×8 · Sync write · Sync read · Read-first (old data on simultaneous RW)

Read-First

// ============================================================
// Module   : sp_ram_read_first
// Behaviour: READ-FIRST -- on simultaneous write+read to same
//            address, data_out = old data (value before write).
//            Memory is then updated on the same clock edge.
// FPGA     : Xilinx READ_FIRST mode, Intel "old_data" mode
// ============================================================
`timescale 1ns/1ps
`default_nettype none

module sp_ram_read_first #(
  parameter DEPTH = 128,
  parameter WIDTH = 8,
  parameter ADDR_W = $clog2(DEPTH)
) (
  input                  clk, we,
  input  [ADDR_W-1:0]  addr,
  input  [WIDTH-1:0]   data_in,
  output reg [WIDTH-1:0] data_out
);

  reg [WIDTH-1:0] mem [0:DEPTH-1];

  always @(posedge clk) begin
    // Read ALWAYS happens (old data captured before write)
    data_out <= mem[addr];   // read first (old value)
    if (we)
      mem[addr] <= data_in;  // write after (updates memory)
  end

endmodule
`default_nettype wire

Fig 2 — Write-first vs Read-first: same simultaneous RW, different data_out

Scenario: mem[5] currently holds 0xAA. Now write 0xBB to addr=5 while reading addr=5.

Write-First (Impl 1):         Read-First (Impl 2):
  Cycle N:  we=1, addr=5       Cycle N:  we=1, addr=5
            data_in=0xBB                 data_in=0xBB
  After edge:                  After edge:
    mem[5]   = 0xBB              mem[5]   = 0xBB
    data_out = 0xBB (new!)       data_out = 0xAA (old!)
    (bypasses memory)            (read before write happened)

🟠 Implementation 3 — Synchronous Write / Asynchronous Read

Write is registered (requires clock edge), but read is combinational: data_out changes immediately whenever addr changes, without waiting for a clock edge. This maps to distributed RAM (LUT-based) on FPGAs rather than BRAM, and to asynchronous SRAM on ASICs.

sp_ram_async_rd

128×8 · Sync write · Async (combinational) read · Maps to LUT RAM on FPGA

Async Read

// ============================================================
// Module   : sp_ram_async_rd
// Write    : Synchronous (rising edge, we=1)
// Read     : Asynchronous (combinational) -- no clock needed
//            data_out changes immediately with addr
// FPGA     : Maps to Distributed RAM (LUT RAM)
//            - Xilinx: RAMB16 with async read port
//            - Faster read access, more flexible, uses LUTs
// ASIC     : Models asynchronous SRAM interface
// ============================================================
`timescale 1ns/1ps
`default_nettype none

module sp_ram_async_rd #(
  parameter DEPTH = 128,
  parameter WIDTH = 8,
  parameter ADDR_W = $clog2(DEPTH)
) (
  input                  clk,
  input                  we,
  input  [ADDR_W-1:0]  addr,
  input  [WIDTH-1:0]   data_in,
  output [WIDTH-1:0]   data_out  // wire, not reg
);

  reg [WIDTH-1:0] mem [0:DEPTH-1];

  // Synchronous write
  always @(posedge clk)
    if (we) mem[addr] <= data_in;

  // Asynchronous read (combinational -- no clock, immediate)
  assign data_out = mem[addr];

endmodule
`default_nettype wire

Async read timing consideration: Because the read is combinational, data_out can glitch briefly when addr transitions between two values (the address bus is not guaranteed to be glitch-free during a transition). If data_out is used combinationally downstream, these glitches may propagate. For robust designs, either: (1) use synchronous read (register the output), or (2) add an output register to latch the async read data on the clock edge. The async read style is preferred only when read latency of zero cycles is mandatory.

🟣 Implementation 4 — Byte-Enable Single Port RAM

Extends the synchronous RAM with a byte-enable signal (be) that controls which bytes within a wider word are written. For a 16-bit wide RAM, be[0] enables the low byte and be[1] enables the high byte. This is the standard pattern used in processor data buses where sub-word writes are common.

sp_ram_byteen

64×16 · Byte-enable (be[1:0]) · Sub-word write · Processor bus pattern

Byte-Enable

// ============================================================
// Module   : sp_ram_byteen
// Config   : 64 locations x 16 bits (2 bytes per word)
// be[0]    : enables write to bits [7:0]  (low byte)
// be[1]    : enables write to bits [15:8] (high byte)
// Use case : 16-bit processor data bus, byte/halfword stores
// ============================================================
`timescale 1ns/1ps
`default_nettype none

module sp_ram_byteen #(
  parameter DEPTH  = 64,
  parameter WIDTH  = 16,  // 2 bytes per word
  parameter NBYTES = WIDTH/8, // 2 byte-enables
  parameter ADDR_W = $clog2(DEPTH)
) (
  input                   clk,
  input  [NBYTES-1:0]   be,       // byte enables
  input  [ADDR_W-1:0]   addr,
  input  [WIDTH-1:0]    data_in,
  output reg [WIDTH-1:0] data_out
);

  reg [WIDTH-1:0] mem [0:DEPTH-1];

  integer b;

  always @(posedge clk) begin
    // Write only the byte(s) whose enable is asserted
    for (b = 0; b < NBYTES; b = b + 1) begin
      if (be[b])
        mem[addr][b*8 +: 8] <= data_in[b*8 +: 8];
    end
    // Synchronous read (read entire word regardless of be)
    data_out <= mem[addr];
  end

endmodule
`default_nettype wire

Fig 3 — Byte-enable write: partial word update with be=2’b01 (low byte only)

Before write: mem[10] = 0xABCD (16-bit word)

Operation: addr=10, data_in=0x1234, be=2'b01 (only low byte enabled)

After write:
  mem[10][15:8] = 0xAB  (unchanged -- be[1]=0)
  mem[10][ 7:0] = 0x34  (updated  -- be[0]=1)
  mem[10]       = 0xAB34

If be=2'b10: high byte updated, low byte unchanged: 0x12CD
If be=2'b11: full word write: 0x1234
If be=2'b00: no write (read only)

🧪 Comprehensive Testbench

The testbench verifies all three 128×8 RAM variants simultaneously. It covers: write-then-read across all 128 addresses, simultaneous read-write collision (verifying write-first vs read-first behaviour), back-to-back writes, and address boundary conditions.

sp_ram_tb

All 128 addresses · Write-first vs read-first collision · Async read · Boundary tests

Testbench

// ============================================================
// Testbench  : sp_ram_tb
// DUTs       : sp_ram_sync (write-first)
//              sp_ram_read_first
//              sp_ram_async_rd
// Tests      :
//   1. Write all 128 locations with unique data (addr^0xA5)
//   2. Read all 128 locations and verify
//   3. Write-first collision: write+read same addr same cycle
//   4. Read-first collision: verify old data returned
//   5. Async read: verify immediate response to addr change
//   6. Address boundary: addr=0x00 and addr=0x7F
// ============================================================
`timescale 1ns/1ps
`default_nettype none

module sp_ram_tb;

  reg        clk=0, we=0;
  reg [6:0]  addr=0;
  reg [7:0]  data_in=0;

  wire [7:0] dout_sync, dout_rf, dout_async;

  sp_ram_sync       u_sync  (.clk(clk),.we(we),.addr(addr),.data_in(data_in),.data_out(dout_sync));
  sp_ram_read_first u_rf    (.clk(clk),.we(we),.addr(addr),.data_in(data_in),.data_out(dout_rf));
  sp_ram_async_rd   u_async (.clk(clk),.we(we),.addr(addr),.data_in(data_in),.data_out(dout_async));

  always #5 clk = ~clk;
  initial begin $dumpfile("sp_ram.vcd"); $dumpvars(0,sp_ram_tb); end

  integer pass_cnt=0, fail_cnt=0, test_num=0;
  integer i;
  reg [7:0] exp;

  task tick; @(posedge clk); #1; endtask

  task do_write;
    input [6:0] a; input [7:0] d;
    begin addr=a; data_in=d; we=1; tick; we=0; end
  endtask

  task do_read_sync;
    input [6:0] a; input [7:0] expected; input [255:0] msg;
    begin
      addr=a; we=0; tick;
      test_num++;
      if(dout_sync===expected && dout_rf===expected) begin
        $display("  PASS [%3d] %s addr=%02h data=%02h",test_num,msg,a,dout_sync);
        pass_cnt++;
      end else begin
        $display("  FAIL [%3d] %s addr=%02h sync=%02h rf=%02h exp=%02h",
          test_num,msg,a,dout_sync,dout_rf,expected);
        fail_cnt++;
      end
    end
  endtask

  initial begin
    $display("\n======================================================");
    $display("  Single Port RAM 128x8 Testbench");
    $display("======================================================");

    // Phase 1: Write all 128 locations
    $display("\n  --- Phase 1: Write all 128 locations ---");
    for(i=0; i<128; i=i+1)
      do_write(i[6:0], i[7:0] ^ 8'hA5); // data = addr XOR 0xA5

    // Phase 2: Read and verify all 128 locations
    $display("\n  --- Phase 2: Read all 128 locations ---");
    for(i=0; i<128; i=i+1)
      do_read_sync(i[6:0], i[7:0] ^ 8'hA5, "READ");

    // Phase 3: Write-first collision test
    $display("\n  --- Phase 3: Write-First Collision (addr=0x10) ---");
    do_write(7'h10, 8'hAA);   // pre-load 0xAA at addr 0x10
    addr=7'h10; data_in=8'hBB; we=1; tick; we=0;
    // Write-first: dout_sync should be 0xBB (new data)
    test_num++;
    if(dout_sync===8'hBB) begin
      $display("  PASS [%3d] Write-First: dout=0xBB (new data)",test_num); pass_cnt++;
    end else begin
      $display("  FAIL [%3d] Write-First: dout=%02h exp=BB",test_num,dout_sync); fail_cnt++;
    end
    // Read-first: dout_rf should be 0xAA (old data)
    test_num++;
    if(dout_rf===8'hAA) begin
      $display("  PASS [%3d] Read-First: dout=0xAA (old data)",test_num); pass_cnt++;
    end else begin
      $display("  FAIL [%3d] Read-First: dout=%02h exp=AA",test_num,dout_rf); fail_cnt++;
    end

    // Phase 4: Async read -- immediate response
    $display("\n  --- Phase 4: Async Read Verification ---");
    do_write(7'h20, 8'hCC);
    do_write(7'h21, 8'hDD);
    addr=7'h20; #1;  // no clock edge -- async read
    test_num++;
    if(dout_async===8'hCC) begin
      $display("  PASS [%3d] Async read addr=0x20 -> 0xCC",test_num); pass_cnt++;
    end else begin
      $display("  FAIL [%3d] Async read: %02h exp CC",test_num,dout_async); fail_cnt++;
    end
    addr=7'h21; #1;  // just change addr, no clock
    test_num++;
    if(dout_async===8'hDD) begin
      $display("  PASS [%3d] Async read addr=0x21 -> 0xDD (no clk!)",test_num); pass_cnt++;
    end else begin
      $display("  FAIL [%3d] Async read: %02h exp DD",test_num,dout_async); fail_cnt++;
    end

    // Phase 5: Boundary addresses
    $display("\n  --- Phase 5: Boundary Addresses ---");
    do_write(7'h00, 8'h11);
    do_write(7'h7F, 8'hFF);
    do_read_sync(7'h00, 8'h11, "Boundary addr=0x00");
    do_read_sync(7'h7F, 8'hFF, "Boundary addr=0x7F");

    $display("\n======================================================");
    $display("  RESULTS: %0d / %0d PASS  |  %0d FAIL",pass_cnt,test_num,fail_cnt);
    $display("======================================================");
    if(fail_cnt==0) $display("  ALL TESTS PASSED\n");
    else $fatal(1,"  %0d FAILURE(S)\n",fail_cnt);
    #20; $finish;
  end
endmodule
`default_nettype wire

📈 Simulation Waveform

Fig 4 — Sync RAM: write cycle, read cycle, and write-first vs read-first collision

At the collision cycle (t=5): write-first dout_sync = 0xCC (new data written); read-first dout_rf = 0xBB (old data before this write). The divergence at t=5 is the defining difference between the two modes.

💻 Simulation Console Output

====================================================== Single Port RAM 128×8 Testbench ====================================================== — Phase 1: Write all 128 locations — (128 write cycles, addr 0x00..0x7F, data = addr^0xA5) — Phase 2: Read all 128 locations — PASS [ 1] READ addr=00 data=A5 PASS [ 2] READ addr=01 data=A4 PASS [ 3] READ addr=02 data=A7 … (128 read verifications) PASS [128] READ addr=7F data=DA — Phase 3: Write-First Collision (addr=0x10) — PASS [129] Write-First: dout=0xBB (new data) PASS [130] Read-First: dout=0xAA (old data) — Phase 4: Async Read Verification — PASS [131] Async read addr=0x20 -> 0xCC PASS [132] Async read addr=0x21 -> 0xDD (no clk!) — Phase 5: Boundary Addresses — PASS [133] Boundary addr=0x00 addr=00 data=11 PASS [134] Boundary addr=0x7F addr=7F data=FF ====================================================== RESULTS: 134 / 134 PASS | 0 FAIL ====================================================== ALL TESTS PASSED

How to Run

Compile all RAM modules and testbench

# Icarus Verilog
iverilog -o ram_sim \
    sp_ram_sync.v       \
    sp_ram_read_first.v \
    sp_ram_async_rd.v   \
    sp_ram_byteen.v     \
    sp_ram_tb.v
vvp ram_sim
gtkwave sp_ram.vcd

# ModelSim
vlog sp_ram_sync.v sp_ram_read_first.v sp_ram_async_rd.v \
     sp_ram_byteen.v sp_ram_tb.v
vsim -c sp_ram_tb -do "run -all; quit -f"

🔬 Design Analysis & FPGA Mapping

Implementation Comparison

Module	Write	Read	Collision (RW same addr)	FPGA target	Read latency
sp_ram_sync	Sync	Sync	Write-first (new data)	Block RAM (BRAM)	1 cycle
sp_ram_read_first	Sync	Sync	Read-first (old data)	Block RAM (BRAM)	1 cycle
sp_ram_async_rd	Sync	Async (comb)	N/A (immediate)	Distributed RAM (LUT)	0 cycles
sp_ram_byteen	Sync (byte-enable)	Sync	Write-first by byte	Block RAM w/ BE	1 cycle

FPGA BRAM Inference Rules

Block RAM (BRAM) is inferred when:

Both read and write are synchronous (clocked)
Array size is large enough (typically > 16 bits)
Single or dual port access
Enables and write-enables are cleanly separated
No reset of memory contents (RAM, not ROM)

Distributed RAM (LUT RAM) is inferred when:

Read is asynchronous (combinational output)
Very small arrays (16 bits or fewer)
Multiple independent read ports needed
Synthesis tool cannot find a matching BRAM primitive
Explicitly constrained with synthesis directives

Initialising memory contents: In simulation, the mem array initialises to x (unknown) by default. For a ROM or pre-loaded RAM, use $readmemh or $readmemb to load from a hex/binary file: initial $readmemh("init.hex", mem);. In synthesis, the same construct initialises the BRAM’s initial content bitstream — on Xilinx and Intel FPGAs, the BRAM configuration supports initial values. For ASICs, initial content must be provided to the foundry as a custom cell or programmed externally.

Parameterisation to other sizes: All four modules accept DEPTH and WIDTH parameters. Common variants: #(.DEPTH(256),.WIDTH(8)) = 256×8 (256 bytes), #(.DEPTH(512),.WIDTH(16)) = 512×16 (1 KB), #(.DEPTH(1024),.WIDTH(32)) = 4 KB. The address width ADDR_W = $clog2(DEPTH) calculates automatically. Xilinx RAMB18 fits 16K bits = 2K bytes; RAMB36 fits 4K bytes. Multiple BRAM primitives are automatically packed by synthesis for larger memories.