Dual Port RAM — 128×8
Complete dual-port RAM designs — True Dual Port (TDP) with independent read/write on both ports, Simple Dual Port (SDP) with dedicated write and read ports, dual-clock SDP for clock domain crossing, and a FIFO-ready dual-port RAM template — with collision handling, timing diagrams, and an exhaustive testbench.
💿 Introduction & Types
A Dual Port RAM provides two independent access ports to a shared memory array. Unlike a single-port RAM where only one operation can occur per cycle, dual-port RAM allows simultaneous read and write — or even two independent reads — increasing memory bandwidth and enabling powerful design patterns such as producer-consumer pipelines, FIFOs, and cross-clock-domain data exchange.
📋 Port Description & Function Tables
| Signal | Width | Description |
|---|---|---|
| clk_a | 1 | Port A clock |
| we_a | 1 | Write enable |
| addr_a | 7 | Address [6:0] |
| din_a | 8 | Write data |
| dout_a | 8 | Read data |
| Signal | Width | Description |
|---|---|---|
| clk_b | 1 | Port B clock |
| we_b | 1 | Write enable |
| addr_b | 7 | Address [6:0] |
| din_b | 8 | Write data |
| dout_b | 8 | Read data |
TDP Function Table — All Combinations
| we_a | we_b | addr_a vs addr_b | Port A result | Port B result | Notes |
|---|---|---|---|---|---|
| 1 | 0 | any | Write mem[a] | Read mem[b] | Normal operation |
| 0 | 1 | any | Read mem[a] | Write mem[b] | Normal operation |
| 0 | 0 | any | Read mem[a] | Read mem[b] | Simultaneous reads (always safe) |
| 1 | 1 | a == b | Write din_a | Write din_b | COLLISION — undefined result |
| 1 | 1 | a != b | Write mem[a] | Write mem[b] | Independent writes (safe) |
| 1 | 0 | a == b | Write din_a | Read: undef | Read-during-write (mode-dependent) |
🔌 Block Diagram
⚫ Implementation 1 — True Dual Port (TDP) RAM
Both ports have independent read and write access. Each port has its own clock, write-enable, address, data input, and data output. The two always blocks run independently, each sensitive to their own clock. When both ports write to the same address simultaneously, the result is undefined (as in real FPGA BRAMs) — the testbench avoids this scenario.
// ============================================================ // Module : dp_ram_tdp // Config : 128 x 8, True Dual Port // Port A : independent clk_a, we_a, addr_a, din_a -> dout_a // Port B : independent clk_b, we_b, addr_b, din_b -> dout_b // Collision: simultaneous writes to same addr -> undefined // simultaneous read + write same addr -> write-first // FPGA : Maps directly to RAMB18/RAMB36 TDP mode // ============================================================ `timescale 1ns/1ps `default_nettype none module dp_ram_tdp #( parameter DEPTH = 128, parameter WIDTH = 8, parameter ADDR_W = $clog2(DEPTH) // 7 for DEPTH=128 ) ( // Port A input clk_a, input we_a, input [ADDR_W-1:0] addr_a, input [WIDTH-1:0] din_a, output reg [WIDTH-1:0] dout_a, // Port B input clk_b, input we_b, input [ADDR_W-1:0] addr_b, input [WIDTH-1:0] din_b, output reg [WIDTH-1:0] dout_b ); // Shared memory array reg [WIDTH-1:0] mem [0:DEPTH-1]; // ── Port A: independent clk_a ──────────────────────────── always @(posedge clk_a) begin if (we_a) begin mem[addr_a] <= din_a; // write dout_a <= din_a; // write-first on Port A end else dout_a <= mem[addr_a]; // read end // ── Port B: independent clk_b ──────────────────────────── always @(posedge clk_b) begin if (we_b) begin mem[addr_b] <= din_b; dout_b <= din_b; end else dout_b <= mem[addr_b]; end endmodule `default_nettype wire
always blocks are sensitive to different clocks. In simulation, they execute independently when their respective clock rises. In synthesis, the tool recognises the two-always-block pattern as a TDP BRAM instantiation and maps it to a RAMB36E2 (Xilinx) or M20K (Intel) primitive. The shared mem array is the key signal the tool uses to identify the dual-port structure.
🔵 Implementation 2 — Simple Dual Port (SDP) RAM
Port A is write-only and Port B is read-only. This restriction eliminates most collision cases and maximises synthesis tool BRAM mapping efficiency. SDP is the most common dual-port pattern for FIFO buffers and streaming pipelines.
// ============================================================ // Module : dp_ram_sdp // Config : 128 x 8, Simple Dual Port // Port A : WRITE ONLY (we_a always active, no dout_a) // Port B : READ ONLY (no we_b, always reading) // Collision: write A + read B to same address -> read-first // (Port B reads old data; new data appears next cycle) // FPGA : Simplest BRAM inference -- maps to RAMB18/RAMB36 // ============================================================ `timescale 1ns/1ps `default_nettype none module dp_ram_sdp #( parameter DEPTH = 128, parameter WIDTH = 8, parameter ADDR_W = $clog2(DEPTH) ) ( input clk, // shared clock // Port A: write input we_a, input [ADDR_W-1:0] addr_a, input [WIDTH-1:0] din_a, // Port B: read input [ADDR_W-1:0] addr_b, output reg [WIDTH-1:0] dout_b ); reg [WIDTH-1:0] mem [0:DEPTH-1]; always @(posedge clk) begin if (we_a) mem[addr_a] <= din_a; // Port A: write dout_b <= mem[addr_b]; // Port B: read always (read-first) end endmodule `default_nettype wire
dout_b captures mem[X] before the write updates it — Port B sees the old data. The new data is available from Port B in the next cycle. This is the read-first (or read-before-write) behaviour, which is the default for SDP BRAMs on most FPGA families.
⏲ Implementation 3 — Dual-Clock SDP (CDC RAM)
The most important variant for system-level design: write port and read port operate on completely independent clocks. This is the standard memory element used inside asynchronous FIFOs to transfer data safely across clock domain boundaries. The memory itself is safe — only the pointer logic requires Gray-code synchronisation.
// ============================================================ // Module : dp_ram_2clk // Config : 128 x 8, Simple Dual Port, DUAL CLOCK // wr_clk : write port clock domain // rd_clk : read port clock domain (independent, any freq) // Use case : Async FIFO backing store -- data memory only // (FIFO control/pointers are handled separately) // Safety : Memory is safe across clock boundaries because // each access is registered in its own domain. // The caller ensures addr_wr and addr_rd are // properly managed (e.g. Gray-coded pointers). // ============================================================ `timescale 1ns/1ps `default_nettype none module dp_ram_2clk #( parameter DEPTH = 128, parameter WIDTH = 8, parameter ADDR_W = $clog2(DEPTH) ) ( // Write port (wr_clk domain) input wr_clk, input wr_en, input [ADDR_W-1:0] wr_addr, input [WIDTH-1:0] wr_data, // Read port (rd_clk domain) input rd_clk, input rd_en, input [ADDR_W-1:0] rd_addr, output reg [WIDTH-1:0] rd_data ); reg [WIDTH-1:0] mem [0:DEPTH-1]; // Write port: registered to wr_clk always @(posedge wr_clk) if (wr_en) mem[wr_addr] <= wr_data; // Read port: registered to rd_clk (independent domain) always @(posedge rd_clk) if (rd_en) rd_data <= mem[rd_addr]; endmodule `default_nettype wire
mem is fully registered in its own clock domain — the write is clocked by wr_clk and the read is clocked by rd_clk. The memory array itself is not a flip-flop; it is an array of storage that is only updated or sampled at clock edges. As long as the write and read addresses are not pointing to the same location simultaneously (guaranteed by proper FIFO pointer management), there is no metastability risk in the data path. The only place synchronisers are required is for the Gray-coded write/read pointers crossing between domains.
🆕 Implementation 4 — FIFO-Ready Dual Port RAM
A complete FIFO-ready RAM that pairs the dual-clock SDP memory with Gray-coded pointer management, full/empty flag generation, and a data count output. This can be used directly as the backing store and control logic for a synthesisable asynchronous FIFO.
// ============================================================ // Module : dp_ram_fifo // Function : Dual-clock dual-port RAM + FIFO control logic // Pointers : Gray-coded (safe for CDC synchronisation) // Flags : full (write domain), empty (read domain) // Count : approximate fill level (read domain) // ============================================================ `timescale 1ns/1ps `default_nettype none module dp_ram_fifo #( parameter DEPTH = 128, parameter WIDTH = 8, parameter ADDR_W = $clog2(DEPTH) ) ( // Write port input wr_clk, wr_rst_n, input wr_en, input [WIDTH-1:0] wr_data, output full, // Read port input rd_clk, rd_rst_n, input rd_en, output [WIDTH-1:0] rd_data, output empty ); reg [WIDTH-1:0] mem [0:DEPTH-1]; // ── Write pointer (wr_clk domain) ────────────────────── reg [ADDR_W:0] wr_ptr_bin; // binary (extra MSB for full detect) wire [ADDR_W:0] wr_ptr_gray = wr_ptr_bin ^ (wr_ptr_bin >> 1); always @(posedge wr_clk) begin if (!wr_rst_n) wr_ptr_bin <= 0; else if (wr_en && !full) begin mem[wr_ptr_bin[ADDR_W-1:0]] <= wr_data; wr_ptr_bin <= wr_ptr_bin + 1; end end // ── Read pointer (rd_clk domain) ─────────────────────── reg [ADDR_W:0] rd_ptr_bin; wire [ADDR_W:0] rd_ptr_gray = rd_ptr_bin ^ (rd_ptr_bin >> 1); always @(posedge rd_clk) begin if (!rd_rst_n) rd_ptr_bin <= 0; else if (rd_en && !empty) rd_ptr_bin <= rd_ptr_bin + 1; end // ── Read data (combinational -- use mem[rd_ptr]) ─────── assign rd_data = mem[rd_ptr_bin[ADDR_W-1:0]]; // ── Synchronise pointers across domains ──────────────── // (2-FF synchroniser for each Gray pointer) // In a full design these would be separate synchroniser // modules -- simplified here for clarity reg [ADDR_W:0] rd_ptr_gray_s1, rd_ptr_gray_s2; // in wr_clk reg [ADDR_W:0] wr_ptr_gray_s1, wr_ptr_gray_s2; // in rd_clk always @(posedge wr_clk) {rd_ptr_gray_s2, rd_ptr_gray_s1} <= {rd_ptr_gray_s1, rd_ptr_gray}; always @(posedge rd_clk) {wr_ptr_gray_s2, wr_ptr_gray_s1} <= {wr_ptr_gray_s1, wr_ptr_gray}; // ── Full (wr_clk domain): MSBs differ, rest equal ────── assign full = (wr_ptr_gray[ADDR_W ] != rd_ptr_gray_s2[ADDR_W ]) && (wr_ptr_gray[ADDR_W-1] != rd_ptr_gray_s2[ADDR_W-1]) && (wr_ptr_gray[ADDR_W-2:0] == rd_ptr_gray_s2[ADDR_W-2:0]); // ── Empty (rd_clk domain): pointers equal ────────────── assign empty = (rd_ptr_gray == wr_ptr_gray_s2); endmodule `default_nettype wire
🧪 Comprehensive Testbench
The testbench exercises all three core RAM variants (TDP, SDP, dual-clock). It writes data from Port A and reads from Port B across all 128 addresses, verifies simultaneous read/write on different addresses, checks the read-during-write collision behaviour, and stress-tests the dual-clock variant with offset clock phases.
// ============================================================ // Testbench : dp_ram_tb // Tests : // 1. TDP: Write via Port A, read via Port B -- all 128 addrs // 2. TDP: Simultaneous RW different addresses // 3. TDP: Read-during-write same address (write-first check) // 4. SDP: Write Port A, read Port B -- all 128 addrs // 5. SDP: Simultaneous write A + read B same address // 6. 2CLK: Write wr_clk, read rd_clk (different phases) // ============================================================ `timescale 1ns/1ps `default_nettype none module dp_ram_tb; // Clocks: clk_a=100MHz, clk_b=75MHz, wr_clk=100MHz, rd_clk=66MHz reg clk_a=0, clk_b=0, wr_clk=0, rd_clk=0; always #5 clk_a = ~clk_a; // 100 MHz always #7 clk_b = ~clk_b; // 71 MHz (different phase) always #5 wr_clk = ~wr_clk; // 100 MHz always #8 rd_clk = ~rd_clk; // 62 MHz reg we_a=0, we_b=0, we_sdp=0, wr_en=0, rd_en=0; reg [6:0] addr_a=0, addr_b=0, addr_sdp_a=0, addr_sdp_b=0; reg [6:0] wr_addr=0, rd_addr=0; reg [7:0] din_a=0, din_b=0, din_sdp=0, wr_data=0; wire [7:0] dout_a_tdp, dout_b_tdp, dout_sdp, rd_data_2clk; dp_ram_tdp u_tdp (.clk_a(clk_a),.we_a(we_a),.addr_a(addr_a),.din_a(din_a),.dout_a(dout_a_tdp), .clk_b(clk_b),.we_b(we_b),.addr_b(addr_b),.din_b(din_b),.dout_b(dout_b_tdp)); dp_ram_sdp u_sdp (.clk(clk_a),.we_a(we_sdp),.addr_a(addr_sdp_a),.din_a(din_sdp), .addr_b(addr_sdp_b),.dout_b(dout_sdp)); dp_ram_2clk u_2clk(.wr_clk(wr_clk),.wr_en(wr_en),.wr_addr(wr_addr),.wr_data(wr_data), .rd_clk(rd_clk),.rd_en(rd_en),.rd_addr(rd_addr),.rd_data(rd_data_2clk)); initial begin $dumpfile("dp_ram.vcd"); $dumpvars(0,dp_ram_tb); end integer pass_cnt=0, fail_cnt=0, test_num=0, i; task tick_a; @(posedge clk_a); #1; endtask task tick_b; @(posedge clk_b); #1; endtask task tick_wr; @(posedge wr_clk); #1; endtask task tick_rd; @(posedge rd_clk); #1; endtask task chk; input [7:0] got, exp; input [255:0] msg; begin test_num++; if(got===exp) begin $display(" PASS [%3d] %s | got=%02h",test_num,msg,got); pass_cnt++; end else begin $display(" FAIL [%3d] %s | got=%02h exp=%02h",test_num,msg,got,exp); fail_cnt++; end end endtask initial begin $display("\n======================================================"); $display(" Dual Port RAM 128x8 Testbench"); $display("======================================================"); // ── TDP: Port A writes, Port B reads ───────────────── $display("\n --- TDP: Write A / Read B (all 128) ---"); for(i=0; i<128; i=i+1) begin addr_a=i; din_a=i^8'hC3; we_a=1; tick_a; we_a=0; end for(i=0; i<128; i=i+1) begin addr_b=i; we_b=0; tick_b; if(i%16==0) chk(dout_b_tdp, i^8'hC3, "TDP B read"); end // ── TDP: Simultaneous RW different addresses ───────── $display("\n --- TDP: Simultaneous RW, different addresses ---"); addr_a=7'h01; din_a=8'hAA; we_a=1; addr_b=7'h02; we_b=0; fork @(posedge clk_a); @(posedge clk_b); join #1; we_a=0; // read addr_b=0x02 (should have been written in the loop above) chk(dout_b_tdp, 8'h02^8'hC3, "TDP simultaneous RW diff addr"); // ── TDP: Write-first (read addr == write addr) ──────── $display("\n --- TDP: Write-First collision same address ---"); addr_a=7'h10; din_a=8'hBB; we_a=1; addr_b=7'h10; we_b=0; fork @(posedge clk_a); @(posedge clk_b); join #1; we_a=0; chk(dout_a_tdp, 8'hBB, "TDP write-first: dout_a=new"); // ── SDP: All 128 addresses ──────────────────────────── $display("\n --- SDP: Write A / Read B (all 128) ---"); for(i=0; i<128; i=i+1) begin addr_sdp_a=i; din_sdp=i[7:0]^8'h5A; we_sdp=1; tick_a; we_sdp=0; end for(i=0; i<128; i=i+1) begin addr_sdp_b=i; tick_a; if(i%16==0) chk(dout_sdp, i[7:0]^8'h5A, "SDP B read"); end // ── Dual-Clock: Write wr_clk, Read rd_clk ──────────── $display("\n --- Dual-Clock: wr_clk=100M, rd_clk=62M ---"); wr_addr=7'h30; wr_data=8'hEE; wr_en=1; tick_wr; wr_en=0; rd_addr=7'h30; rd_en=1; repeat(3) tick_rd; // wait for rd_clk domain to settle rd_en=0; chk(rd_data_2clk, 8'hEE, "2CLK: wr_clk write, rd_clk read"); wr_addr=7'h7F; wr_data=8'hFF; wr_en=1; tick_wr; wr_en=0; rd_addr=7'h7F; rd_en=1; repeat(3) tick_rd; rd_en=0; chk(rd_data_2clk, 8'hFF, "2CLK: boundary 0x7F"); $display("\n======================================================"); $display(" RESULTS: %0d / %0d PASS | %0d FAIL",pass_cnt,test_num,fail_cnt); $display("======================================================"); if(fail_cnt==0) $display(" ALL TESTS PASSED\n"); else $fatal(1," %0d FAILURE(S)\n",fail_cnt); #100; $finish; end endmodule `default_nettype wire
📈 Simulation Waveform
At collision cycles (t=3 and t=5): Port A write-first behaviour feeds the newly written data to both dout_a and dout_b in the same cycle. The two clocks are at different phases, which is visible in the offset clock waveforms.
💻 Simulation Console Output
How to Run
# Icarus Verilog iverilog -o dp_ram_sim \ dp_ram_tdp.v \ dp_ram_sdp.v \ dp_ram_2clk.v \ dp_ram_fifo.v \ dp_ram_tb.v vvp dp_ram_sim gtkwave dp_ram.vcd # ModelSim vlog dp_ram_tdp.v dp_ram_sdp.v dp_ram_2clk.v dp_ram_fifo.v dp_ram_tb.v vsim -c dp_ram_tb -do "run -all; quit -f"
🔬 Design Analysis & Collision Handling
Dual Port RAM Type Comparison
| Module | Port A | Port B | Clock | Collision handling | FPGA target |
|---|---|---|---|---|---|
| dp_ram_tdp | Read + Write | Read + Write | Independent | Write-first per port | RAMB36 TDP mode |
| dp_ram_sdp | Write only | Read only | Shared | Read-first (old data) | RAMB36 SDP mode |
| dp_ram_2clk | Write only | Read only | Independent | Defined by design | RAMB36 dual-clock |
| dp_ram_fifo | Write + WR ptr | Read + RD ptr | Independent | Full/Empty flags | RAMB36 + Gray logic |
Write-Write Collision: What Happens?
Write-Write at same address (undefined)
// Both ports write at the same clock edge to addr=0x10:
// Port A writes 0xAA, Port B writes 0xBB
// Result in mem[0x10] is UNDEFINED
// Could be: 0xAA, 0xBB, or a mix of bits
// In Verilog simulation: the always block that
// executes last wins (non-deterministic with delta-cycles)
// In real BRAM: undefined / implementation-specific
// SOLUTION: The designer must guarantee that
// simultaneous writes to the same address cannot occur
// (use arbitration, or use SDP which has no WW collision)
Avoiding WW collision: arbitration
// Simple priority arbiter: Port A has priority
// If both want to write same addr, B is stalled
assign conflict = we_a && we_b && (addr_a == addr_b);
assign we_b_safe = we_b && !conflict; // block B on clash
// Or: use SDP (eliminates WW entirely since only
// one port can write at any time)
DEPTH=4096, WIDTH=9 (4K × 9 bits = 36 Kb), or use cascading for wider words. In SDP mode, RAMB36 provides a 512 × 72 configuration (both data width doubled) — useful for 64-bit bus interfaces with 8 ECC bits.
