Behavioral Modelling — Part 2
Assignments with delays, the wait construct, multiple always blocks, complete RTL designs, blocking vs non-blocking deep-dive, the case statement family, and the Verilog simulation flow.
⏱ Assignments with Delays
Inside procedural blocks, delays can be added to assignments to control when signals change during simulation. Verilog supports three distinct forms of procedural delay, each with different behaviour.
#) are ignored by synthesis tools. They exist purely for simulation timing control — testbench stimulus generation, modelling propagation delays in behavioural models, and waveform generation.
// Wait 10 units, then assign #10 a = 1'b1; // Wait 5 units, then compute and assign #5 b = x & y;
// Evaluate RHS now, assign after 10 a = #10 b & c; // b and c read NOW, a updated at t+10
// Wait for rising edge of clk @(posedge clk) a = b; // Wait for any change on a or b @(a or b) c = a ^ b;
📐 Three Delay Forms — Side by Side
Understanding the difference between these three forms is critical for writing accurate simulation models and testbenches.
// ── Regular Delay — waits BEFORE evaluating RHS ─────────────── initial begin a = 1'b0; #10 a = 1'b1; // at t=10: read RHS (1'b1), assign to a // If 'a' changes during those 10ns, it doesn't matter // — the RHS is a constant here so this is unambiguous end // ── Intra-Assignment Delay — evaluates RHS NOW, assigns LATER ── initial begin b = 4'hA; a = #10 b; // at t=0: capture b=4'hA b = 4'hF; // at t=0: b changes to 4'hF immediately // at t=10: a is assigned 4'hA (old value of b!) end // Key: intra-assignment captures the RHS snapshot at trigger time // ── Event Control — waits for a specific event ───────────────── always begin @(posedge clk); // suspend until next rising edge q = d; // execute immediately after edge end // ── Combined: wait for edge, then delay ─────────────────────── always begin @(posedge clk); // wait for clock #2 q = d; // then wait 2ns more (setup check window) end
Fig 2 — Intra-Assignment Delay: Waveform
Fig 3 — Testbench Stimulus Using Delays
initial begin // Apply stimulus — each # delay is relative to previous line rst_n = 0; a = 8'h00; b = 8'h00; // t=0: assert reset #20 rst_n = 1; // t=20: release reset #10 a = 8'hAB; // t=30: apply data A #5 b = 8'hCD; // t=35: apply data B #10 a = 8'hFF; b = 8'h00; // t=45: change both #20 a = 8'h00; // t=65: final value #50 $finish; // t=115: end simulation end
⏳ The wait Construct
The wait statement suspends execution of a procedural block until a specified level-sensitive condition becomes true. Unlike @(posedge clk) which triggers on a signal edge, wait checks a level — if the condition is already true when the simulator reaches the statement, execution continues immediately without any delay.
🔵 wait — Level Sensitive
// Waits until 'done' is HIGH (level) wait(done); // If done is already 1 → no wait // If done is 0 → suspends until done=1
🟣 @ — Edge Sensitive
// Waits for rising edge of 'done' @(posedge done); // Always waits for next 0→1 transition // Even if done is already 1
// ── Basic wait ──────────────────────────────────────────────── wait(ready); // pause until ready = 1 data = bus; // capture data once ready // ── Wait with expression ────────────────────────────────────── wait(count == 8'hFF); // wait until counter reaches max wait(!busy && enable); // wait for compound condition // ── Wait in a testbench handshake ───────────────────────────── initial begin start = 1'b1; wait(ack); // wait for DUT to acknowledge start = 1'b0; wait(!ack); // wait for ACK to deassert $display("Handshake complete at t=%0t", $time); end // ── Wait with timeout (safety measure) ─────────────────────── initial begin fork begin wait(done); $display("Done!"); end begin #1000; $display("TIMEOUT"); $finish; end join_any // SystemVerilog — use join for Verilog end
wait when you want to synchronise to a signal level — e.g., waiting for a handshake flag, a FIFO non-empty signal, or a bus-ready signal. Use @(posedge clk) when you want to synchronise to a specific transition — e.g., capturing data at a clock edge.
🔀 Multiple Always Blocks
A single Verilog module can contain any number of always blocks. They all start at simulation time zero and run concurrently and independently — each block responds to its own sensitivity list, modeling different hardware elements in the same module.
reg variables. Multiple blocks driving the same variable causes race conditions.module cpu_datapath ( input clk, rst_n, input [7:0] instr, output reg [7:0] acc, pc, flags ); wire [7:0] alu_result; reg [7:0] operand; // ── Always block 1: Accumulator register (sequential) ────────── always @(posedge clk or negedge rst_n) begin if (!rst_n) acc <= 8'h00; else acc <= alu_result; end // ── Always block 2: Program counter (sequential) ─────────────── always @(posedge clk or negedge rst_n) begin if (!rst_n) pc <= 8'h00; else pc <= pc + 8'h01; end // ── Always block 3: Flag computation (combinational) ─────────── always @(*) begin flags[0] = ~|acc; // zero flag flags[1] = acc[7]; // negative flag flags[7:2] = 6'b0; end // ── Continuous assignment: ALU ───────────────────────────────── assign alu_result = acc + operand; endmodule
reg variable without a clear priority mechanism, the result depends on simulation scheduling — a race condition. Always ensure each reg is driven by exactly one always block.
Fig 7 — Multiple always blocks timeline
🏗 Designs at Behavioral Level
Behavioral modelling is the primary method for writing synthesizable RTL. Here are complete, production-quality designs demonstrating best practices.
Fig 8 — Synchronous FIFO (First-In First-Out Buffer)
module fifo_sync #( parameter DEPTH = 8, parameter WIDTH = 8, parameter PTR_W = 3 // log2(DEPTH) ) ( input clk, rst_n, input wr_en, rd_en, input [WIDTH-1:0] wr_data, output reg [WIDTH-1:0] rd_data, output full, empty ); reg [WIDTH-1:0] mem [0:DEPTH-1]; reg [PTR_W:0] wr_ptr, rd_ptr; // extra bit for full/empty detect // Write port always @(posedge clk) begin if (!rst_n) wr_ptr <= 0; else if (wr_en && !full) begin mem[wr_ptr[PTR_W-1:0]] <= wr_data; wr_ptr <= wr_ptr + 1; end end // Read port always @(posedge clk) begin if (!rst_n) rd_ptr <= 0; else if (rd_en && !empty) begin rd_data <= mem[rd_ptr[PTR_W-1:0]]; rd_ptr <= rd_ptr + 1; end end // Status flags — combinational assign full = (wr_ptr[PTR_W] != rd_ptr[PTR_W]) && (wr_ptr[PTR_W-1:0] == rd_ptr[PTR_W-1:0]); assign empty = (wr_ptr == rd_ptr); endmodule
Fig 9 — PWM (Pulse Width Modulation) Generator
module pwm_gen #(parameter N = 8) ( input clk, rst_n, input [N-1:0] duty, // 0 = 0%, 255 = ~100% output reg pwm_out ); reg [N-1:0] counter; always @(posedge clk or negedge rst_n) begin if (!rst_n) begin counter <= 0; pwm_out <= 0; end else begin counter <= counter + 1; // free-running counter pwm_out <= (counter < duty); // high when counter < duty end end endmodule
⚖️ Blocking and Non-Blocking Assignments — Deep Dive
This is the single most important distinction in behavioral Verilog. Getting it wrong produces code that simulates correctly but synthesizes to wrong hardware — or simulates wrong but synthesizes correctly. Both are dangerous.
Fig 11 — Shift Register: the most important non-blocking example
❌ With blocking (=) — BROKEN
always @(posedge clk) begin q1 = d; // q1 ← d (new) q2 = q1; // q2 ← NEW q1 = d q3 = q2; // q3 ← NEW q2 = d end // All three become d immediately // No shifting — broken pipeline!
✅ With non-blocking (<=) — CORRECT
always @(posedge clk) begin q1 <= d; // q1 ← old d q2 <= q1; // q2 ← OLD q1 q3 <= q2; // q3 ← OLD q2 end // All use old values — correct // shift: d→q1→q2→q3 over 3 cycles
Summary Reference
| Aspect | Blocking ( = ) | Non-Blocking ( <= ) |
|---|---|---|
| Execution | Sequential — one at a time | Parallel — all evaluate then update |
| RHS evaluated | Immediately, one statement at a time | All simultaneously before any update |
| LHS updated | Immediately after RHS | All together at end of time step |
| Models | Combinational logic, C-like flow | Flip-flops, registers, pipelines |
| Use in | always @(*) — combinational blocks | always @(posedge clk) — sequential |
| Race conditions | Order-dependent — careful ordering needed | Order-independent — safe |
| Synthesis | Combinational gates | Flip-flops (when clock-gated) |
🔀 The case Statement
The case statement provides a clean, readable alternative to long if-else if chains. It compares an expression against a set of values and executes the matching branch. In hardware it synthesizes to a priority-free multiplexer — unlike if-else which creates a priority chain.
always @(*) begin case (expression) // expression to match value1: // single value statement1; value2, value3: // multiple values share one branch statement2; value4: begin // multi-statement branch needs begin/end a = x; b = y; end default: // ← always include! prevents latches statement_default; endcase end
Fig 13 — case vs if-else: hardware implications
🔵 case — Equal priority MUX
case (sel) 2'b00: y = in0; 2'b01: y = in1; 2'b10: y = in2; 2'b11: y = in3; endcase // Synthesizes: balanced 4-to-1 MUX // All branches equal cost
🟣 if-else — Priority chain
if (sel==2'b00) y=in0; else if (sel==2'b01) y=in1; else if (sel==2'b10) y=in2; else y=in3; // Synthesizes: priority tree // First condition checked first
Fig 14 — Practical case examples
// ── ALU operation selector ──────────────────────────────────── always @(*) begin case (alu_op) 4'b0000: result = a + b; 4'b0001: result = a - b; 4'b0010: result = a & b; 4'b0011: result = a | b; 4'b0100: result = a ^ b; 4'b0101: result = ~a; 4'b0110: result = a << 1; 4'b0111: result = a >> 1; default: result = 8'bx; endcase end // ── 7-segment display decoder (0–9) ─────────────────────────── always @(*) begin case (digit) // segments: gfedcba 4'd0: seg = 7'b0111111; 4'd1: seg = 7'b0000110; 4'd2: seg = 7'b1011011; 4'd3: seg = 7'b1001111; 4'd4: seg = 7'b1100110; 4'd5: seg = 7'b1101101; 4'd6: seg = 7'b1111101; 4'd7: seg = 7'b0000111; 4'd8: seg = 7'b1111111; 4'd9: seg = 7'b1101111; default: seg = 7'b0000000; endcase end
🃏 casex and casez
Two variants of case allow wildcard matching — essential for priority encoders and instruction decoders where some bits are “don’t care”.
z or ? in either operand matches any value at that bit position. Recommended for priority encoders and instruction decode with don’t-cares.
casez with ? instead.
// 8-to-3 Priority Encoder using casez // Outputs the index of the highest-priority (leftmost) set bit module priority_enc_8to3 ( input [7:0] req, output reg [2:0] grant, output reg valid ); always @(*) begin valid = 1'b1; casez (req) 8'b1???????: grant = 3'd7; // bit 7 has priority 8'b01??????: grant = 3'd6; 8'b001?????: grant = 3'd5; 8'b0001????: grant = 3'd4; 8'b00001???: grant = 3'd3; 8'b000001??: grant = 3'd2; 8'b0000001?: grant = 3'd1; 8'b00000001: grant = 3'd0; default: begin grant = 3'd0; valid = 1'b0; // no request active end endcase end endmodule
casex treats real x values in simulation as wildcards — masking genuine unknowns that should be caught as bugs. casez with ? gives you don’t-care matching while keeping x values visible during simulation.
🔄 Simulation Flow
Understanding how a Verilog simulator executes a design is essential for writing correct behavioral code — especially when mixing blocking and non-blocking assignments across multiple concurrent blocks.
🗂 Simulation Regions — Within a Time Step
Within a single simulation time step, Verilog processes events in a defined set of scheduling regions. This ordering is what makes non-blocking assignments work correctly and what separates correct behavioral code from code with race conditions.
$display#0) — deferred to end of active region. Rarely used.$monitor and $strobe print here — after all updates, guaranteed stable values// At posedge clk — what happens inside one time step: // ── Active Region ───────────────────────────────────────────── // Non-blocking RHS evaluated using CURRENT values: // q1_new = d (capture d now) // q2_new = q1_old (capture old q1, not new!) // q3_new = q2_old (capture old q2) always @(posedge clk) begin q1 <= d; // schedule q1 ← d q2 <= q1; // schedule q2 ← q1_old (q1 not yet updated!) q3 <= q2; // schedule q3 ← q2_old end // ── NBA Region (later in same time step) ────────────────────── // All three assignments execute simultaneously: // q1 = d ← d moves one stage // q2 = q1_old ← old q1 moves one stage // q3 = q2_old ← old q2 moves one stage // ✅ Perfect shift register — all captured simultaneously
$display vs $strobe vs $monitor
| Task | When it prints | Use for |
|---|---|---|
| $display | Immediately in the Active region when executed | Quick debug — may show intermediate values before NBA updates |
| $strobe | End of current time step — after all NBA updates | See final stable values at each time step |
| $monitor | End of time step, whenever listed signals change | Continuous automatic logging of signal changes |
always @(posedge clk) begin q <= d; // $display executes NOW in Active region — q not yet updated $display("$display: q=%b (may show OLD value)", q); end // $strobe fires after NBA region — q is updated always @(posedge clk) $strobe("$strobe: q=%b (shows NEW value)", q); // $monitor fires whenever q changes (end of time step) initial $monitor("$monitor: t=%0t q=%b", $time, q);
Fig 19 — Complete Simulation Flow Example
// DUT module dff(input clk,d, output reg q); always @(posedge clk) q <= d; // NBA: q updated after active region endmodule // Testbench module dff_tb; reg clk=0, d=0; wire q; dff dut(.clk(clk), .d(d), .q(q)); always #5 clk = ~clk; initial begin $monitor("t=%0t clk=%b d=%b q=%b", $time, clk, d, q); #3 d = 1'b1; // t=3: d goes high #10 d = 1'b0; // t=13: d goes low #20 $finish; // t=33: end simulation end // Simulation order at t=10 (posedge clk): // Active: DUT always @posedge fires, schedules q←d(=1) // NBA: q = 1 (update happens here) // Monitor: $monitor prints "t=10 clk=1 d=1 q=1" endmodule
