Behavioral Modelling Part 2 — VLSI Trainers

Verilog Series · Module 11

Behavioral Modelling — Part 2

Assignments with delays, the wait construct, multiple always blocks, complete RTL designs, blocking vs non-blocking deep-dive, the case statement family, and the Verilog simulation flow.

⏱ Assignments with Delays

Inside procedural blocks, delays can be added to assignments to control when signals change during simulation. Verilog supports three distinct forms of procedural delay, each with different behaviour.

Simulation only. All procedural delays (#) are ignored by synthesis tools. They exist purely for simulation timing control — testbench stimulus generation, modelling propagation delays in behavioural models, and waveform generation.

Regular Delay

Delays execution of the entire statement. The simulator waits the specified time, then evaluates the RHS and updates the LHS.

// Wait 10 units, then assign
#10 a = 1'b1;

// Wait 5 units, then compute and assign
#5  b = x & y;

Intra-Assignment Delay

RHS is evaluated immediately, but the assignment to LHS is delayed. The captured value is held and applied after the delay.

// Evaluate RHS now, assign after 10
a = #10 b & c;

// b and c read NOW, a updated at t+10

Event Control

Wait for a specific event before proceeding — a signal edge, a level change, or a named event.

// Wait for rising edge of clk
@(posedge clk) a = b;

// Wait for any change on a or b
@(a or b) c = a ^ b;

📐 Three Delay Forms — Side by Side

Understanding the difference between these three forms is critical for writing accurate simulation models and testbenches.

Fig 1 — Regular delay vs intra-assignment delay vs event control

// ── Regular Delay — waits BEFORE evaluating RHS ───────────────
initial begin
  a = 1'b0;
  #10 a = 1'b1;   // at t=10: read RHS (1'b1), assign to a
  // If 'a' changes during those 10ns, it doesn't matter
  // — the RHS is a constant here so this is unambiguous
end

// ── Intra-Assignment Delay — evaluates RHS NOW, assigns LATER ──
initial begin
  b = 4'hA;
  a = #10 b;      // at t=0: capture b=4'hA
  b = 4'hF;       // at t=0: b changes to 4'hF immediately
                   // at t=10: a is assigned 4'hA (old value of b!)
end
// Key: intra-assignment captures the RHS snapshot at trigger time

// ── Event Control — waits for a specific event ─────────────────
always begin
  @(posedge clk);  // suspend until next rising edge
  q = d;            // execute immediately after edge
end

// ── Combined: wait for edge, then delay ───────────────────────
always begin
  @(posedge clk);  // wait for clock
  #2 q = d;          // then wait 2ns more (setup check window)
end

Fig 2 — Intra-Assignment Delay: Waveform

a = #10 b — RHS (b) captured at t=0, a updates at t=10

Fig 3 — Testbench Stimulus Using Delays

Building a stimulus waveform with regular delays

initial begin
  // Apply stimulus — each # delay is relative to previous line
  rst_n = 0; a = 8'h00; b = 8'h00;   // t=0:  assert reset
  #20  rst_n = 1;                        // t=20: release reset
  #10  a     = 8'hAB;                    // t=30: apply data A
  #5   b     = 8'hCD;                    // t=35: apply data B
  #10  a     = 8'hFF; b = 8'h00;        // t=45: change both
  #20  a     = 8'h00;                    // t=65: final value
  #50  $finish;                          // t=115: end simulation
end

⏳ The wait Construct

The wait statement suspends execution of a procedural block until a specified level-sensitive condition becomes true. Unlike @(posedge clk) which triggers on a signal edge, wait checks a level — if the condition is already true when the simulator reaches the statement, execution continues immediately without any delay.

🔵 wait — Level Sensitive

// Waits until 'done' is HIGH (level)
wait(done);
// If done is already 1 → no wait
// If done is 0 → suspends until done=1

🟣 @ — Edge Sensitive

// Waits for rising edge of 'done'
@(posedge done);
// Always waits for next 0→1 transition
// Even if done is already 1

Fig 4 — wait construct: syntax and practical usage patterns

// ── Basic wait ────────────────────────────────────────────────
wait(ready);              // pause until ready = 1
data = bus;               // capture data once ready

// ── Wait with expression ──────────────────────────────────────
wait(count == 8'hFF);     // wait until counter reaches max
wait(!busy && enable);    // wait for compound condition

// ── Wait in a testbench handshake ─────────────────────────────
initial begin
  start = 1'b1;
  wait(ack);               // wait for DUT to acknowledge
  start = 1'b0;
  wait(!ack);              // wait for ACK to deassert
  $display("Handshake complete at t=%0t", $time);
end

// ── Wait with timeout (safety measure) ───────────────────────
initial begin
  fork
    begin wait(done); $display("Done!"); end
    begin #1000; $display("TIMEOUT"); $finish; end
  join_any   // SystemVerilog — use join for Verilog
end

Fig 5 — wait behaviour: if condition already true, no pause

wait vs @: Use wait when you want to synchronise to a signal level — e.g., waiting for a handshake flag, a FIFO non-empty signal, or a bus-ready signal. Use @(posedge clk) when you want to synchronise to a specific transition — e.g., capturing data at a clock edge.

🔀 Multiple Always Blocks

A single Verilog module can contain any number of always blocks. They all start at simulation time zero and run concurrently and independently — each block responds to its own sensitivity list, modeling different hardware elements in the same module.

⚡

Fully Concurrent

All always blocks in a module are active simultaneously. They react to their own events independently.

🔒

Own Variables

Each always block should drive its own set of reg variables. Multiple blocks driving the same variable causes race conditions.

📐

Separation of Concerns

Best practice: one block for state register, one for next-state logic, one for output logic. Clean, modular structure.

🔁

Each Loops Forever

Every always block loops continuously — they are not functions that return, they are hardware elements that always exist.

Fig 6 — Multiple always blocks modelling different hardware units

module cpu_datapath (
  input        clk, rst_n,
  input  [7:0] instr,
  output reg [7:0] acc, pc, flags
);

  wire [7:0] alu_result;
  reg  [7:0] operand;

  // ── Always block 1: Accumulator register (sequential) ──────────
  always @(posedge clk or negedge rst_n) begin
    if (!rst_n) acc <= 8'h00;
    else        acc <= alu_result;
  end

  // ── Always block 2: Program counter (sequential) ───────────────
  always @(posedge clk or negedge rst_n) begin
    if (!rst_n) pc <= 8'h00;
    else        pc <= pc + 8'h01;
  end

  // ── Always block 3: Flag computation (combinational) ───────────
  always @(*) begin
    flags[0] = ~|acc;      // zero flag
    flags[1] = acc[7];      // negative flag
    flags[7:2] = 6'b0;
  end

  // ── Continuous assignment: ALU ─────────────────────────────────
  assign alu_result = acc + operand;

endmodule

Race condition warning: If two always blocks both drive the same reg variable without a clear priority mechanism, the result depends on simulation scheduling — a race condition. Always ensure each reg is driven by exactly one always block.

Fig 7 — Multiple always blocks timeline

All always blocks start at t=0 and run independently

🏗 Designs at Behavioral Level

Behavioral modelling is the primary method for writing synthesizable RTL. Here are complete, production-quality designs demonstrating best practices.

Fig 8 — Synchronous FIFO (First-In First-Out Buffer)

8-deep, 8-bit wide FIFO with full/empty flags

module fifo_sync #(
  parameter DEPTH = 8,
  parameter WIDTH = 8,
  parameter PTR_W = 3   // log2(DEPTH)
) (
  input              clk, rst_n,
  input              wr_en, rd_en,
  input  [WIDTH-1:0] wr_data,
  output reg [WIDTH-1:0] rd_data,
  output             full, empty
);
  reg [WIDTH-1:0] mem  [0:DEPTH-1];
  reg [PTR_W:0]   wr_ptr, rd_ptr;   // extra bit for full/empty detect

  // Write port
  always @(posedge clk) begin
    if (!rst_n)              wr_ptr <= 0;
    else if (wr_en && !full) begin
      mem[wr_ptr[PTR_W-1:0]] <= wr_data;
      wr_ptr <= wr_ptr + 1;
    end
  end

  // Read port
  always @(posedge clk) begin
    if (!rst_n)               rd_ptr <= 0;
    else if (rd_en && !empty) begin
      rd_data <= mem[rd_ptr[PTR_W-1:0]];
      rd_ptr  <= rd_ptr + 1;
    end
  end

  // Status flags — combinational
  assign full  = (wr_ptr[PTR_W] != rd_ptr[PTR_W]) &&
                 (wr_ptr[PTR_W-1:0] == rd_ptr[PTR_W-1:0]);
  assign empty = (wr_ptr == rd_ptr);
endmodule

Fig 9 — PWM (Pulse Width Modulation) Generator

8-bit PWM: duty cycle set by ‘duty’ input (0=0%, 255=100%)

module pwm_gen #(parameter N = 8) (
  input          clk, rst_n,
  input  [N-1:0] duty,      // 0 = 0%, 255 = ~100%
  output reg     pwm_out
);
  reg [N-1:0] counter;

  always @(posedge clk or negedge rst_n) begin
    if (!rst_n) begin
      counter <= 0;
      pwm_out <= 0;
    end else begin
      counter <= counter + 1;            // free-running counter
      pwm_out <= (counter < duty);       // high when counter < duty
    end
  end
endmodule

⚖️ Blocking and Non-Blocking Assignments — Deep Dive

This is the single most important distinction in behavioral Verilog. Getting it wrong produces code that simulates correctly but synthesizes to wrong hardware — or simulates wrong but synthesizes correctly. Both are dangerous.

Fig 10 — Execution model comparison: time step view

Fig 11 — Shift Register: the most important non-blocking example

3-stage shift register — why non-blocking is mandatory here

❌ With blocking (=) — BROKEN

always @(posedge clk) begin
  q1 = d;   // q1 ← d (new)
  q2 = q1;  // q2 ← NEW q1 = d
  q3 = q2;  // q3 ← NEW q2 = d
end
// All three become d immediately
// No shifting — broken pipeline!

✅ With non-blocking (<=) — CORRECT

always @(posedge clk) begin
  q1 <= d;   // q1 ← old d
  q2 <= q1;  // q2 ← OLD q1
  q3 <= q2;  // q3 ← OLD q2
end
// All use old values — correct
// shift: d→q1→q2→q3 over 3 cycles

Summary Reference

Aspect	Blocking ( = )	Non-Blocking ( <= )
Execution	Sequential — one at a time	Parallel — all evaluate then update
RHS evaluated	Immediately, one statement at a time	All simultaneously before any update
LHS updated	Immediately after RHS	All together at end of time step
Models	Combinational logic, C-like flow	Flip-flops, registers, pipelines
Use in	always @(*) — combinational blocks	always @(posedge clk) — sequential
Race conditions	Order-dependent — careful ordering needed	Order-independent — safe
Synthesis	Combinational gates	Flip-flops (when clock-gated)

🔀 The case Statement

The case statement provides a clean, readable alternative to long if-else if chains. It compares an expression against a set of values and executes the matching branch. In hardware it synthesizes to a priority-free multiplexer — unlike if-else which creates a priority chain.

Fig 12 — case statement: full syntax

always @(*) begin
  case (expression)            // expression to match

    value1:                     // single value
      statement1;

    value2, value3:             // multiple values share one branch
      statement2;

    value4: begin               // multi-statement branch needs begin/end
      a = x;
      b = y;
    end

    default:                    // ← always include! prevents latches
      statement_default;

  endcase
end

Fig 13 — case vs if-else: hardware implications

🔵 case — Equal priority MUX

case (sel)
  2'b00: y = in0;
  2'b01: y = in1;
  2'b10: y = in2;
  2'b11: y = in3;
endcase
// Synthesizes: balanced 4-to-1 MUX
// All branches equal cost

🟣 if-else — Priority chain

if (sel==2'b00)      y=in0;
else if (sel==2'b01) y=in1;
else if (sel==2'b10) y=in2;
else                   y=in3;
// Synthesizes: priority tree
// First condition checked first

Fig 14 — Practical case examples

ALU operation select, state machine output decoder, 7-segment display

// ── ALU operation selector ────────────────────────────────────
always @(*) begin
  case (alu_op)
    4'b0000: result = a + b;
    4'b0001: result = a - b;
    4'b0010: result = a & b;
    4'b0011: result = a | b;
    4'b0100: result = a ^ b;
    4'b0101: result = ~a;
    4'b0110: result = a << 1;
    4'b0111: result = a >> 1;
    default: result = 8'bx;
  endcase
end

// ── 7-segment display decoder (0–9) ───────────────────────────
always @(*) begin
  case (digit)         // segments: gfedcba
    4'd0: seg = 7'b0111111;
    4'd1: seg = 7'b0000110;
    4'd2: seg = 7'b1011011;
    4'd3: seg = 7'b1001111;
    4'd4: seg = 7'b1100110;
    4'd5: seg = 7'b1101101;
    4'd6: seg = 7'b1111101;
    4'd7: seg = 7'b0000111;
    4'd8: seg = 7'b1111111;
    4'd9: seg = 7'b1101111;
    default: seg = 7'b0000000;
  endcase
end

🃏 casex and casez

Two variants of case allow wildcard matching — essential for priority encoders and instruction decoders where some bits are “don’t care”.

case Exact match — no wildcards

Compares every bit exactly. x and z in either the selector or case items cause the match to fail (x≠x, z≠z). Use for normal decoders where every bit matters.

casez z and ? are wildcards

z or ? in either operand matches any value at that bit position. Recommended for priority encoders and instruction decode with don’t-cares.

casex x, z and ? are wildcards

Both x and z act as wildcards. Avoid in RTL — x wildcards mask real unknown values in simulation, hiding bugs. Use casez with ? instead.

Fig 15 — casez priority encoder with don’t-care bits

// 8-to-3 Priority Encoder using casez
// Outputs the index of the highest-priority (leftmost) set bit
module priority_enc_8to3 (
  input  [7:0] req,
  output reg [2:0] grant,
  output reg       valid
);
  always @(*) begin
    valid = 1'b1;
    casez (req)
      8'b1???????: grant = 3'd7;  // bit 7 has priority
      8'b01??????: grant = 3'd6;
      8'b001?????: grant = 3'd5;
      8'b0001????: grant = 3'd4;
      8'b00001???: grant = 3'd3;
      8'b000001??: grant = 3'd2;
      8'b0000001?: grant = 3'd1;
      8'b00000001: grant = 3'd0;
      default:     begin
        grant = 3'd0;
        valid = 1'b0;            // no request active
      end
    endcase
  end
endmodule

Prefer casez with ? over casex. Using casex treats real x values in simulation as wildcards — masking genuine unknowns that should be caught as bugs. casez with ? gives you don’t-care matching while keeping x values visible during simulation.

🔄 Simulation Flow

Understanding how a Verilog simulator executes a design is essential for writing correct behavioral code — especially when mixing blocking and non-blocking assignments across multiple concurrent blocks.

Fig 16 — Verilog simulation lifecycle: from start to finish

Elaboration (pre-simulation)

Build module hierarchy, resolve parameters, create all net/reg instances, connect ports

↓

Initialisation (t=0)

All regs → x, nets → z (except supply0/1). All initial and always blocks become active.

↓

Simulate One Time Step

Process all events scheduled for the current time — evaluate active blocks, propagate continuous assignments

↓

Advance Time

Move to the next time step that has pending events. If none, simulation ends.

↓

↺

Repeat until $finish or no more events

Go back to step 3 for the next time step

🗂 Simulation Regions — Within a Time Step

Within a single simulation time step, Verilog processes events in a defined set of scheduling regions. This ordering is what makes non-blocking assignments work correctly and what separates correct behavioral code from code with race conditions.

Active Region

Blocking assignments (=), continuous assigns, evaluate non-blocking RHS, gate outputs, $display

↓

Inactive Region

Zero-delay assignments (#0) — deferred to end of active region. Rarely used.

↓

NBA Region (Non-Blocking Assignment Update)

All scheduled non-blocking LHS updates happen here simultaneously — this is why <= sees old values

↓

Observed / Monitor Region

$monitor and $strobe print here — after all updates, guaranteed stable values

↺ Repeat until no more events in this time step, then advance time

Fig 17 — Simulation regions: why non-blocking sees old values

// At posedge clk — what happens inside one time step:

// ── Active Region ─────────────────────────────────────────────
// Non-blocking RHS evaluated using CURRENT values:
//   q1_new = d       (capture d now)
//   q2_new = q1_old  (capture old q1, not new!)
//   q3_new = q2_old  (capture old q2)

always @(posedge clk) begin
  q1 <= d;   // schedule q1 ← d
  q2 <= q1;  // schedule q2 ← q1_old (q1 not yet updated!)
  q3 <= q2;  // schedule q3 ← q2_old
end

// ── NBA Region (later in same time step) ──────────────────────
// All three assignments execute simultaneously:
//   q1 = d       ← d moves one stage
//   q2 = q1_old  ← old q1 moves one stage
//   q3 = q2_old  ← old q2 moves one stage
// ✅ Perfect shift register — all captured simultaneously

$display vs $strobe vs $monitor

Task	When it prints	Use for
$display	Immediately in the Active region when executed	Quick debug — may show intermediate values before NBA updates
$strobe	End of current time step — after all NBA updates	See final stable values at each time step
$monitor	End of time step, whenever listed signals change	Continuous automatic logging of signal changes

Fig 18 — $display vs $strobe: showing the NBA region difference

always @(posedge clk) begin
  q <= d;

  // $display executes NOW in Active region — q not yet updated
  $display("$display: q=%b (may show OLD value)", q);
end

// $strobe fires after NBA region — q is updated
always @(posedge clk)
  $strobe("$strobe:  q=%b (shows NEW value)", q);

// $monitor fires whenever q changes (end of time step)
initial
  $monitor("$monitor: t=%0t q=%b", $time, q);

Use $strobe in testbenches instead of $display when monitoring non-blocking assignments. $strobe waits until the NBA region completes, so you always see the final updated value — never an intermediate state that would confuse debugging.

Fig 19 — Complete Simulation Flow Example

D flip-flop testbench showing simulation region order at one clock edge

// DUT
module dff(input clk,d, output reg q);
  always @(posedge clk) q <= d;      // NBA: q updated after active region
endmodule

// Testbench
module dff_tb;
  reg clk=0, d=0;
  wire q;

  dff dut(.clk(clk), .d(d), .q(q));
  always #5 clk = ~clk;

  initial begin
    $monitor("t=%0t clk=%b d=%b q=%b", $time, clk, d, q);
    #3  d = 1'b1;  // t=3: d goes high
    #10 d = 1'b0;  // t=13: d goes low
    #20 $finish;   // t=33: end simulation
  end

  // Simulation order at t=10 (posedge clk):
  //   Active:   DUT always @posedge fires, schedules q←d(=1)
  //   NBA:      q = 1 (update happens here)
  //   Monitor:  $monitor prints "t=10 clk=1 d=1 q=1"
endmodule

VERILOG SERIES · MODULE 11

Behavioral Modelling — Part 2

⏱ Assignments with Delays

📐 Three Delay Forms — Side by Side

Fig 2 — Intra-Assignment Delay: Waveform

Fig 3 — Testbench Stimulus Using Delays

⏳ The wait Construct

🔵 wait — Level Sensitive

🟣 @ — Edge Sensitive

🔀 Multiple Always Blocks

Fig 7 — Multiple always blocks timeline

🏗 Designs at Behavioral Level

Fig 8 — Synchronous FIFO (First-In First-Out Buffer)

Fig 9 — PWM (Pulse Width Modulation) Generator

⚖️ Blocking and Non-Blocking Assignments — Deep Dive

Fig 11 — Shift Register: the most important non-blocking example

❌ With blocking (=) — BROKEN

✅ With non-blocking (<=) — CORRECT

Summary Reference

🔀 The case Statement

Fig 13 — case vs if-else: hardware implications

🔵 case — Equal priority MUX

🟣 if-else — Priority chain

Fig 14 — Practical case examples

🃏 casex and casez

🔄 Simulation Flow

🗂 Simulation Regions — Within a Time Step

$display vs $strobe vs $monitor

Fig 19 — Complete Simulation Flow Example

Leave a Comment Cancel Reply