Synchronous & Asynchronous FIFO
Complete FIFO implementations — synchronous FIFO with counter-based flags, synchronous FIFO with pointer comparison, asynchronous FIFO with Gray-coded pointers and 2-FF synchronisers, and a parameterised async FIFO — with full/empty flag derivation, waveforms, and exhaustive self-checking testbenches.
🆕 Introduction & Theory
A FIFO (First-In First-Out) buffer is a storage element where data is read out in the same order it was written in. It decouples a producer and a consumer that may operate at different rates, bursts, or — in the asynchronous case — entirely different clock frequencies. The two key status flags are full (no more room to write) and empty (nothing available to read).
FIFO Pointer Visualisation (DEPTH=8)
WP = write pointer, RP = read pointer. Count = WP − RP.
⏲ Synchronous FIFO Theory
A synchronous FIFO uses a circular buffer in RAM. Two pointers — write pointer (WP) and read pointer (RP) — advance through the buffer on write and read operations. To distinguish full from empty (both have WP=RP in a naive design), a common technique adds one extra MSB bit to each pointer. The FIFO is:
- Empty when WP == RP (all bits equal)
- Full when WP[MSB] != RP[MSB] and WP[N-2:0] == RP[N-2:0] (wrapped around exactly once)
⚫ Sync FIFO 1 — Counter-Based Full/Empty
Uses an explicit count register that increments on write and decrements on read. Full and empty flags are directly derived from the count value. The count approach is intuitive and also provides a fill_level output for monitoring.
// ============================================================ // Module : sync_fifo_count // Type : Synchronous FIFO — counter-based flags // DEPTH : 8 entries (parameterised) // WIDTH : 8 bits per entry (parameterised) // full : count == DEPTH // empty : count == 0 // fill_level: current occupancy (0..DEPTH) // Simultaneous write+read when neither full/empty: allowed // ============================================================ `timescale 1ns/1ps `default_nettype none module sync_fifo_count #( parameter DEPTH = 8, parameter WIDTH = 8 ) ( input clk, rst_n, input wr_en, // write enable input [WIDTH-1:0] wr_data, input rd_en, // read enable output reg [WIDTH-1:0] rd_data, output full, output empty, output [$clog2(DEPTH+1)-1:0] fill_level ); localparam ADDR_W = $clog2(DEPTH); reg [WIDTH-1:0] mem [0:DEPTH-1]; reg [ADDR_W-1:0] wr_ptr, rd_ptr; reg [$clog2(DEPTH+1)-1:0] count; wire do_write = wr_en && !full; wire do_read = rd_en && !empty; always @(posedge clk) begin if (!rst_n) begin wr_ptr <= 0; rd_ptr <= 0; count <= 0; end else begin if (do_write) begin mem[wr_ptr] <= wr_data; wr_ptr <= wr_ptr + 1; end if (do_read) begin rd_data <= mem[rd_ptr]; rd_ptr <= rd_ptr + 1; end // Counter: simultaneous write+read keeps count unchanged case ({do_write, do_read}) 2'b10: count <= count + 1; 2'b01: count <= count - 1; default: ; // 00 or 11: no change endcase end end assign full = (count == DEPTH); assign empty = (count == 0); assign fill_level = count; endmodule `default_nettype wire
wr_en and rd_en are asserted in the same cycle and the FIFO is neither full nor empty, the write and read happen simultaneously. The count remains unchanged. This enables maximum throughput — the FIFO can sustain one write and one read every clock cycle indefinitely without draining or filling.
🔵 Sync FIFO 2 — Pointer-Comparison (Extra MSB)
Eliminates the counter entirely by using N+1 bit pointers. The extra MSB records how many times each pointer has wrapped around. Full and empty detection is a direct comparison of the extended pointers, saving a register and an adder.
// ============================================================ // Module : sync_fifo_ptr // Method : (ADDR_W+1)-bit pointers; extra MSB detects wrap // empty : wr_ptr == rd_ptr (all bits match) // full : wr_ptr[ADDR_W] != rd_ptr[ADDR_W] AND // wr_ptr[ADDR_W-1:0] == rd_ptr[ADDR_W-1:0] // (write wrapped, read has not — exactly DEPTH apart) // No counter register needed — saves ~log2(DEPTH)+1 FFs // ============================================================ `timescale 1ns/1ps `default_nettype none module sync_fifo_ptr #( parameter DEPTH = 8, parameter WIDTH = 8 ) ( input clk, rst_n, input wr_en, input [WIDTH-1:0] wr_data, input rd_en, output reg [WIDTH-1:0] rd_data, output full, output empty ); localparam ADDR_W = $clog2(DEPTH); reg [WIDTH-1:0] mem [0:DEPTH-1]; reg [ADDR_W:0] wr_ptr, rd_ptr; // N+1 bits each wire do_write = wr_en && !full; wire do_read = rd_en && !empty; always @(posedge clk) begin if (!rst_n) begin wr_ptr <= 0; rd_ptr <= 0; end else begin if (do_write) begin mem[wr_ptr[ADDR_W-1:0]] <= wr_data; wr_ptr <= wr_ptr + 1; end if (do_read) begin rd_data <= mem[rd_ptr[ADDR_W-1:0]]; rd_ptr <= rd_ptr + 1; end end end // Full: MSBs differ, lower bits equal (exactly DEPTH apart) assign full = ( wr_ptr[ADDR_W] != rd_ptr[ADDR_W] ) && (wr_ptr[ADDR_W-1:0] == rd_ptr[ADDR_W-1:0]); // Empty: all bits equal assign empty = (wr_ptr == rd_ptr); endmodule `default_nettype wire
Pointer width = ADDR_W + 1 = 4 bits. Address = ptr[2:0], wrap = ptr[3].
State wr_ptr rd_ptr full empty
Reset 0000 0000 0 1 <- equal -> EMPTY
After 3 writes 0011 0000 0 0
After 8 writes 1000 0000 1 0 <- MSB differ, addr equal -> FULL
After 4 reads 1000 0100 0 0
After 4 reads 1000 1000 0 1 <- equal again -> EMPTY (wraps!)
Key insight: pointers use N+1 bits but memory uses only N bits.
The address into memory is ptr[ADDR_W-1:0] (lower N bits).
The MSB (ptr[ADDR_W]) is the "generation bit" — toggles each wrap.
⇄ Asynchronous FIFO Theory
When the write and read clocks are independent, direct pointer comparison is unsafe — a binary pointer can have multiple bits changing simultaneously, leading to metastability in the receiving clock domain. The solution is to convert pointers to Gray code before crossing the clock domain boundary, ensuring only one bit changes per count step.
🟠 Asynchronous FIFO — Gray-Coded Pointers
// ============================================================ // Module : async_fifo // Write : wr_clk domain — wr_en, wr_data, full // Read : rd_clk domain — rd_en, rd_data, empty // Pointers : (ADDR_W+1)-bit binary internally // Gray-coded before crossing domain boundary // Sync : 2-FF synchroniser for each Gray pointer // Full : in wr_clk domain (conservative — may be 1 late) // Empty : in rd_clk domain (conservative — may be 1 late) // ============================================================ `timescale 1ns/1ps `default_nettype none module async_fifo #( parameter DEPTH = 8, parameter WIDTH = 8 ) ( // Write port input wr_clk, wr_rst_n, input wr_en, input [WIDTH-1:0] wr_data, output full, // Read port input rd_clk, rd_rst_n, input rd_en, output reg [WIDTH-1:0] rd_data, output empty ); localparam ADDR_W = $clog2(DEPTH); // Shared memory (dual-port: write on wr_clk, read on rd_clk) reg [WIDTH-1:0] mem [0:DEPTH-1]; // ── Write domain ────────────────────────────────────────── reg [ADDR_W:0] wr_ptr_bin; // binary write ptr wire [ADDR_W:0] wr_ptr_gray = wr_ptr_bin ^ (wr_ptr_bin >> 1); // Gray always @(posedge wr_clk or negedge wr_rst_n) begin if (!wr_rst_n) wr_ptr_bin <= 0; else if (wr_en && !full) begin mem[wr_ptr_bin[ADDR_W-1:0]] <= wr_data; wr_ptr_bin <= wr_ptr_bin + 1; end end // 2-FF sync: wr_ptr_gray into rd_clk domain reg [ADDR_W:0] wr_gray_s1, wr_gray_s2; always @(posedge rd_clk or negedge rd_rst_n) begin if (!rd_rst_n) {wr_gray_s2,wr_gray_s1} <= 0; else {wr_gray_s2,wr_gray_s1} <= {wr_gray_s1,wr_ptr_gray}; end // ── Read domain ─────────────────────────────────────────── reg [ADDR_W:0] rd_ptr_bin; wire [ADDR_W:0] rd_ptr_gray = rd_ptr_bin ^ (rd_ptr_bin >> 1); always @(posedge rd_clk or negedge rd_rst_n) begin if (!rd_rst_n) rd_ptr_bin <= 0; else if (rd_en && !empty) begin rd_data <= mem[rd_ptr_bin[ADDR_W-1:0]]; rd_ptr_bin <= rd_ptr_bin + 1; end end // 2-FF sync: rd_ptr_gray into wr_clk domain reg [ADDR_W:0] rd_gray_s1, rd_gray_s2; always @(posedge wr_clk or negedge wr_rst_n) begin if (!wr_rst_n) {rd_gray_s2,rd_gray_s1} <= 0; else {rd_gray_s2,rd_gray_s1} <= {rd_gray_s1,rd_ptr_gray}; end // ── Full: in wr_clk domain, compare Gray pointers ───────── // Full when top 2 bits differ AND remaining bits equal assign full = ( wr_ptr_gray[ADDR_W] != rd_gray_s2[ADDR_W] ) && (wr_ptr_gray[ADDR_W-1] != rd_gray_s2[ADDR_W-1] ) && (wr_ptr_gray[ADDR_W-2:0] == rd_gray_s2[ADDR_W-2:0]); // ── Empty: in rd_clk domain, Gray pointers equal ────────── assign empty = (rd_ptr_gray == wr_gray_s2); endmodule `default_nettype wire
full flag is generated in the write domain using the synchronised Gray read pointer for safe comparison.
🔧 Asynchronous FIFO — Parameterised with almost-full/almost-empty
// ============================================================ // Module : async_fifo_param // Adds : almost_full — asserts when fill >= AF_THRESH // almost_empty — asserts when fill <= AE_THRESH // Use case : Flow control — assert almost_full early to give // upstream time to stop sending before hard full. // ============================================================ `timescale 1ns/1ps `default_nettype none module async_fifo_param #( parameter DEPTH = 16, parameter WIDTH = 8, parameter AF_THRESH = DEPTH - 2, // almost full threshold parameter AE_THRESH = 2 // almost empty threshold ) ( input wr_clk, wr_rst_n, input wr_en, input [WIDTH-1:0] wr_data, output full, almost_full, input rd_clk, rd_rst_n, input rd_en, output reg [WIDTH-1:0] rd_data, output empty, almost_empty ); localparam ADDR_W = $clog2(DEPTH); reg [WIDTH-1:0] mem [0:DEPTH-1]; reg [ADDR_W:0] wr_ptr_bin, rd_ptr_bin; // Gray encode wire [ADDR_W:0] wr_gray = wr_ptr_bin ^ (wr_ptr_bin >> 1); wire [ADDR_W:0] rd_gray = rd_ptr_bin ^ (rd_ptr_bin >> 1); // 2-FF synchronisers reg [ADDR_W:0] wr_g_s1,wr_g_s2, rd_g_s1,rd_g_s2; always @(posedge rd_clk or negedge rd_rst_n) if(!rd_rst_n) {wr_g_s2,wr_g_s1}<=0; else {wr_g_s2,wr_g_s1}<={wr_g_s1,wr_gray}; always @(posedge wr_clk or negedge wr_rst_n) if(!wr_rst_n) {rd_g_s2,rd_g_s1}<=0; else {rd_g_s2,rd_g_s1}<={rd_g_s1,rd_gray}; // Write pointer update always @(posedge wr_clk or negedge wr_rst_n) begin if(!wr_rst_n) wr_ptr_bin <= 0; else if(wr_en && !full) begin mem[wr_ptr_bin[ADDR_W-1:0]] <= wr_data; wr_ptr_bin <= wr_ptr_bin + 1; end end // Read pointer update always @(posedge rd_clk or negedge rd_rst_n) begin if(!rd_rst_n) rd_ptr_bin <= 0; else if(rd_en && !empty) begin rd_data <= mem[rd_ptr_bin[ADDR_W-1:0]]; rd_ptr_bin <= rd_ptr_bin + 1; end end // Full/Empty (same as async_fifo) assign full = (wr_gray[ADDR_W] !={rd_g_s2[ADDR_W]}) && (wr_gray[ADDR_W-1]!= rd_g_s2[ADDR_W-1]) && (wr_gray[ADDR_W-2:0]==rd_g_s2[ADDR_W-2:0]); assign empty = (rd_gray == wr_g_s2); // Approximate fill level from synced pointers (in wr_clk domain) // Convert synced rd Gray back to binary for threshold compare function [ADDR_W:0] gray2bin; input [ADDR_W:0] g; integer k; begin gray2bin[ADDR_W] = g[ADDR_W]; for(k=ADDR_W-1; k>=0; k=k-1) gray2bin[k] = gray2bin[k+1] ^ g[k]; end endfunction wire [ADDR_W:0] rd_bin_approx = gray2bin(rd_g_s2); wire [ADDR_W:0] fill_approx = wr_ptr_bin - rd_bin_approx; assign almost_full = (fill_approx >= AF_THRESH); assign almost_empty = (fill_approx <= AE_THRESH); endmodule `default_nettype wire
🧪 Synchronous FIFO Testbench
// ============================================================ // Testbench : sync_fifo_tb // DUTs : sync_fifo_count, sync_fifo_ptr // Tests : // 1. Fill to full (8 writes), verify full flag // 2. Write-when-full blocked (overflow guard) // 3. Drain to empty (8 reads), verify data order // 4. Read-when-empty blocked (underflow guard) // 5. Simultaneous read+write (steady-state throughput) // 6. Reset mid-operation // ============================================================ `timescale 1ns/1ps `default_nettype none module sync_fifo_tb; reg clk=0, rst_n=1, wr_en=0, rd_en=0; reg [7:0] wr_data=0; wire [7:0] rd_cnt, rd_ptr; wire full_c, empty_c, full_p, empty_p; wire [3:0] fill; sync_fifo_count #(.DEPTH(8),.WIDTH(8)) u_cnt ( .clk(clk),.rst_n(rst_n),.wr_en(wr_en),.wr_data(wr_data), .rd_en(rd_en),.rd_data(rd_cnt),.full(full_c),.empty(empty_c),.fill_level(fill)); sync_fifo_ptr #(.DEPTH(8),.WIDTH(8)) u_ptr ( .clk(clk),.rst_n(rst_n),.wr_en(wr_en),.wr_data(wr_data), .rd_en(rd_en),.rd_data(rd_ptr),.full(full_p),.empty(empty_p)); always #5 clk=~clk; initial begin $dumpfile("sync_fifo.vcd"); $dumpvars(0,sync_fifo_tb); end integer pass_cnt=0,fail_cnt=0,test_num=0,i; task tick; @(posedge clk); #1; endtask task chk; input cond; input [255:0] msg; begin test_num++; if(cond) begin $display(" PASS [%2d] %s",test_num,msg); pass_cnt++; end else begin $display(" FAIL [%2d] %s",test_num,msg); fail_cnt++; end end endtask initial begin $display("\n======================================================"); $display(" Synchronous FIFO Testbench (DEPTH=8, WIDTH=8)"); $display("======================================================"); rst_n=0; tick; rst_n=1; chk(empty_c && empty_p && !full_c && !full_p, "Reset: empty=1 full=0"); // Fill FIFO to full $display("\n --- Fill to full (8 writes) ---"); for(i=1; i<=8; i=i+1) begin wr_data=i; wr_en=1; tick; wr_en=0; end chk(full_c && full_p, "full=1 after 8 writes"); chk(fill==8, "fill_level=8"); // Overflow guard: write when full must be blocked $display("\n --- Overflow guard ---"); wr_data=8'hFF; wr_en=1; tick; wr_en=0; chk(full_c, "Still full after blocked write"); chk(fill==8, "fill_level still 8 (no overflow)"); // Drain FIFO, check order $display("\n --- Drain and verify FIFO order ---"); for(i=1; i<=8; i=i+1) begin rd_en=1; tick; rd_en=0; chk(rd_cnt==i && rd_ptr==i, "FIFO order preserved"); end chk(empty_c && empty_p, "empty=1 after 8 reads"); // Underflow guard: read when empty must be blocked $display("\n --- Underflow guard ---"); rd_en=1; tick; rd_en=0; chk(empty_c, "Still empty after blocked read"); // Simultaneous read+write (sustained throughput) $display("\n --- Simultaneous R+W (4 cycles) ---"); wr_data=8'hA1; wr_en=1; tick; // prime: put A1 in for(i=0; i<4; i=i+1) begin wr_data=i+8'hB0; wr_en=1; rd_en=1; tick; end wr_en=0; rd_en=0; chk(!empty_c, "FIFO not empty after simultaneous R+W"); // Reset mid-operation $display("\n --- Reset mid-operation ---"); rst_n=0; tick; rst_n=1; chk(empty_c && empty_p, "Reset clears FIFO: empty=1"); chk(fill==0, "fill_level=0 after reset"); $display("\n======================================================"); $display(" RESULTS: %0d / %0d PASS | %0d FAIL",pass_cnt,test_num,fail_cnt); $display("======================================================"); if(fail_cnt==0) $display(" ALL TESTS PASSED\n"); else $fatal(1," %0d FAILURE(S)\n",fail_cnt); #20; $finish; end endmodule
🧪 Asynchronous FIFO Testbench
// ============================================================ // Testbench : async_fifo_tb // DUT : async_fifo (DEPTH=8) // Clocks : wr_clk=100MHz, rd_clk=66MHz (independent) // Tests : // 1. Reset both domains // 2. Burst write 8 entries from wr_clk domain // 3. Verify full flag (in wr_clk domain) // 4. Burst read 8 entries in rd_clk domain // 5. Verify data order preserved across domains // 6. Verify empty flag (in rd_clk domain) // 7. Write 4 slow, read 4 fast (rate mismatch) // ============================================================ `timescale 1ns/1ps `default_nettype none module async_fifo_tb; reg wr_clk=0, wr_rst_n=1, wr_en=0; reg rd_clk=0, rd_rst_n=1, rd_en=0; reg [7:0] wr_data=0; wire [7:0] rd_data; wire full, empty; async_fifo #(.DEPTH(8),.WIDTH(8)) dut ( .wr_clk(wr_clk),.wr_rst_n(wr_rst_n),.wr_en(wr_en),.wr_data(wr_data),.full(full), .rd_clk(rd_clk),.rd_rst_n(rd_rst_n),.rd_en(rd_en),.rd_data(rd_data),.empty(empty)); always #5 wr_clk=~wr_clk; // 100 MHz always #8 rd_clk=~rd_clk; // 62 MHz initial begin $dumpfile("async_fifo.vcd"); $dumpvars(0,async_fifo_tb); end integer pass_cnt=0,fail_cnt=0,test_num=0,i; reg [7:0] expected [0:7]; task wr_tick; @(posedge wr_clk); #1; endtask task rd_tick; @(posedge rd_clk); #1; endtask task chk; input cond; input [255:0] msg; begin test_num++; if(cond) begin $display(" PASS [%2d] %s",test_num,msg); pass_cnt++; end else begin $display(" FAIL [%2d] %s",test_num,msg); fail_cnt++; end end endtask initial begin $display("\n======================================================"); $display(" Asynchronous FIFO Testbench (wr=100M, rd=62M)"); $display("======================================================"); // Reset both domains wr_rst_n=0; rd_rst_n=0; repeat(3) wr_tick; repeat(3) rd_tick; wr_rst_n=1; rd_rst_n=1; repeat(2) wr_tick; chk(empty, "After reset: empty=1"); // Burst write 8 in wr domain $display("\n --- Burst write 8 entries (wr_clk) ---"); for(i=0; i<8; i=i+1) begin expected[i] = i+8'hA0; wr_data=expected[i]; wr_en=1; wr_tick; end wr_en=0; // Wait for full flag (needs sync delay) repeat(4) wr_tick; chk(full, "full=1 after 8 writes (wr domain)"); // Read 8 in rd domain, check order $display("\n --- Read 8 entries (rd_clk), verify order ---"); repeat(4) rd_tick; // let sync settle for(i=0; i<8; i=i+1) begin rd_en=1; rd_tick; rd_en=0; rd_tick; chk(rd_data==expected[i], "Data order preserved"); end repeat(4) rd_tick; chk(empty, "empty=1 after 8 reads (rd domain)"); // Rate mismatch: write slowly (1/3 cycles), read fast $display("\n --- Rate mismatch: slow write, fast read ---"); fork begin : slow_writer for(i=0; i<4; i=i+1) begin wr_data=i+8'hC0; wr_en=1; wr_tick; wr_en=0; repeat(2) wr_tick; // insert gaps end end begin : fast_reader repeat(8) rd_tick; // wait for some data for(i=0; i<4; i=i+1) begin rd_en=(1 & ~empty); rd_tick; rd_en=0; end end join chk(!full, "Not full during slow-write fast-read"); $display("\n======================================================"); $display(" RESULTS: %0d / %0d PASS | %0d FAIL",pass_cnt,test_num,fail_cnt); $display("======================================================"); if(fail_cnt==0) $display(" ALL TESTS PASSED\n"); else $fatal(1," %0d FAILURE(S)\n",fail_cnt); #100; $finish; end endmodule
📈 Simulation Waveforms
💻 Simulation Console Output
How to Run
# Icarus Verilog — sync iverilog -o sfifo_sim sync_fifo_count.v sync_fifo_ptr.v sync_fifo_tb.v vvp sfifo_sim; gtkwave sync_fifo.vcd # Icarus Verilog — async iverilog -o afifo_sim async_fifo.v async_fifo_param.v async_fifo_tb.v vvp afifo_sim; gtkwave async_fifo.vcd # ModelSim vlog sync_fifo_count.v sync_fifo_ptr.v sync_fifo_tb.v vsim -c sync_fifo_tb -do "run -all; quit -f" vlog async_fifo.v async_fifo_param.v async_fifo_tb.v vsim -c async_fifo_tb -do "run -all; quit -f"
🔬 Design Analysis & Comparison
| Feature | Sync FIFO (counter) | Sync FIFO (pointer) | Async FIFO |
|---|---|---|---|
| Clock domains | 1 | 1 | 2 (independent) |
| Full/empty logic | count == DEPTH / count == 0 | MSB XOR + lower bits compare | Gray compare (domain-specific) |
| Extra registers | count register (~log2 FFs) | 1 extra bit per pointer | 4 synchroniser regs per domain |
| Fill level output | Yes (direct from count) | No (needs subtraction) | Approximate (needs Gray decode) |
| Metastability risk | None | None | Mitigated by 2-FF sync |
| CDC-safe? | No | No | Yes |
| Typical use | Rate buffer, stream FIFO | Area-optimised FIFO | Async UART, PCIe, cross-board |
Sync FIFO: counter vs pointer method
// Counter method: uses extra adder and count reg
// but gives direct fill_level without subtraction.
// Good when fill monitoring is needed (DMA, flow ctrl).
// Pointer method: no count register, just 2 comparisons.
// Saves ~(log2(DEPTH)+1) flip-flops. Standard in
// FPGA megafunction FIFO primitives.
// fill_level = wr_ptr - rd_ptr (needs separate logic).
Async FIFO: key design rules
// Rule 1: Gray code pointers crossing domain boundaries.
// Never cross binary multi-bit signals.
// Rule 2: 2-FF minimum synchroniser (3-FF for safety-critical).
// Rule 3: full in wr_clk domain; empty in rd_clk domain.
// Rule 4: FIFO depth must account for synchroniser latency
// (typically 2-3 cycles). AF_THRESH = DEPTH - 3
// provides safe margin before hard-full.
almost_full flag in async_fifo_param asserts when fill ≥ AF_THRESH (default DEPTH−2). This gives the upstream producer 2 cycles of warning before the FIFO becomes truly full. The upstream must stop writing within those 2 cycles, otherwise overflow occurs. The AF threshold is typically set to DEPTH − (pipeline latency + 1) to guarantee no data is lost even when the producer has already committed data that is still in flight.
