Let’s try to simulate a non-trivial circuit in Racket using the techniques
proposed in the previous post.
Virgule is a 32-bit RISC processor core that supports most of the
base instruction set of the RISC-V specification (RV32I).
It was initially designed and implemented in VHDL to serve as an illustration
for the digital electronics course that I teach.
The VHDL source code is not publicly available yet but you can get an overview
of what Virgule is by using its web-based simulator emulsiV
and reading its documentation.
I have written two implementations of Virgule: one uses a
finite state machine as a sequencer; the other is organized as a five-stage
Both implementations use the same set of components in their datapaths.
The architecture of Virgule favours simplicity over completeness, speed or size.
If you are looking for an optimized, production-ready RISC-V core, there are
plenty of other implementations to choose from.
The pipelined version could be the topic of another post.
In this post, I will focus on the state-based implementation because it is
easier to understand.
Here is an overview of its internal architecture with a state diagram of its
Since the architecture is fully synchronous, the clock signal is not represented
to avoid cluttering the block diagram.
Virgule uses a similar bus interface as the PicoRV32.
The same interface is used for fetching instructions and for load/store
Here is a description of Virgule’s I/O ports:
| ||Input||1||Forces the processor to restart in its initial state.|
| ||Output||1||When asserted, the processor initiates a data transfer.|
| ||Input||1||Indicates that the current responder is ready for a new data transfer.|
| ||Output||32||The address where to read or write.|
| ||Output||4||Write enables for each byte of |
| ||Output||32||The data to write.|
| ||Input||32||The data read from the current responder.|
| ||Input||1||Interrupt request.|
The processor asserts its output
valid to indicate that
wdata are all valid and stable.
If a memory or peripheral device is mapped to the given address, it responds by
rdata and asserting
A data transfer is considered complete when
ready are both high
on the same clock edge.
In the following sections, I will detail the internal operation of the processor
and show the Racket code that implements it.
But before that, let’s have a look at the Racket forms that we will need.
My previous blog post, Simulating digital circuits in Racket,
introduced several constructs to represent and manipulate hardware signals in Racket.
This section explains the choices that I have made, and the syntactic sugar
that I have added since that post was written.
A combinational component can be written as a function.
define-signal automatically lifts a Racket function that operates on
values into a function that operates on signals.
The instruction decoder and the arithmetic and logic unit are defined like this:
(define-signal (decoder data) ... instr) (define-signal (arith-logic-unit instr a b) ... r)
In the previous blog post, my implementation of
define-signal could only define
functions with a single output value.
It is an error to return
(values ...) as the result of such a function.
decoder, I work around this limitation by returning a struct value of type:
(struct instruction (...))
To address other situations, when creating a struct type is not relevant, I have
#:returns clause to
It is used in
load-store-unit like this:
(define-signal (load-store-unit #:instr instr #:address address #:store-enable store-enable #:store-data store-data) #:rdata rdata #:returns (wstrobe wdata load-data) ... (define wstrobe ...) (define wdata ...) (define load-data ...))
load-store-unit will bundle
into a signal of lists. When calling
load-store-unit, this signal will be
unbundled into three separate signals:
(define-values (wstrobe wdata load-data) (load-store-unit #:instr ... #:address ... #:store-enable ... #:store-data ... #:rdata ...))
If you like to use named associations in your VHDL or Verilog instantiation
statements, you will appreciate the addition of keyword arguments to
Sequential components such as
branch-unit are represented
by ordinary functions using
As a consequence, if a component has several outputs, the corresponding Racket
function can use
(values ...) to return the output signals.
virgule itself is defined like this:
(define (virgule #:reset reset #:rdata rdata #:ready ready #:irq irq) ... (values valid address wstrobe wdata))
In the function body, we can create sequential signals with
or any of its variants, and combinational signals with
(define (register-unit #:reset reset #:enable enable #:src-instr src-instr #:dest-instr dest-instr #:xd xd) (define x-reg (register/r ... reset (for/signal (enable dest-instr xd this-reg) ...))) (for/signal (src-instr x-reg) #:returns (xs1 xs2) (define xs1 ...) (define xs2 ...)))
To make the code easier to read, I have modified
for/signal so that
the following forms are equivalent:
(for/signal (src-instr x-reg) ...) (for/signal ([src-instr src-instr] [x-reg x-reg]) ...)
Referring to a signal that is defined later
In a circuit that contains circular dependencies, a signal can appear in an
expression before it is assigned.
signal-defer is a macro that delays the evaluation of a Racket
variable containing a signal.
It could be defined like this:
(define-simple-macro (signal-defer sig) (signal-delay (signal-force sig)))
If I write:
(define x (signal-defer y))
the value of
y will not be read until
x is forced.
x will also automatically force
However, since we know that
signal-delay creates a function, and
considering that we do not need an extra level of memoization, we
can replace it by a simple lambda:
(define-simple-macro (signal-defer sig) (λ () (signal-force sig)))
Logic values and logic vectors
Hardware description languages usually have special support for binary
data types of arbitrary width. Before writing Virgule, my first impulse was to
create a Racket module that would provide data types and operations in the
spirit of VHDL packages
Such a module would be useful if my intent was to use Racket as a hardware
But you might remember that my ultimate goal is to use Racket as
a platform for a hardware description DSL.
While I expect this DSL to come with data types for logic vectors,
I am not convinced that we need sophisticated abstractions for these types
For this reason, the implementation of Virgule in Racket uses built-in
types for logic values and logic vectors: booleans for flags and control signals,
integers for general-purpose data and numbers.
In fact, Racket integers already provide all the facilities that I need
to represent logic vectors at runtime:
- They can be arbitrarily large.
- They support all the bitwise operations
needed for logic vectors.
- And obviously, they support integer arithmetic operations.
The missing features are: the ability to restrict the width of an integer,
sign extension of an integer with a given width, and vector concatenation.
For this reason, I have written the following helpers:
(unsigned-slice v left right) (signed-slice v left right) (unsigned v width) (signed v width) (unsigned-concat [v left right] ...) (signed-concat [v left right] ...)
signed-slicetake a slice of a vector
leftis the index of the most significant bit and
rightis the index of
the least significant bit.
The signed version sign-extends the result starting from index
signedare shorthands to take a right-aligned slice of a given width.
signed-concatare macros that concatenate slices from
one or more logic vectors.
rightcan be omitted like in
You can find the complete source code of these functions and macros
in module logic.rkt.
There is no support for uninitialized or indeterminate binary values,
'X' in VHDL’s
std_logic type, or
x in Verilog.
There is a lot of Racket code in this section.
The biggest code snippets are collapsed so you are not obliged to scroll
through them if you are only interested in the explanations.
The sequencer is a Moore machine where each state activates a boolean command
store, the sequencer waits for a data transfer
to complete (
execute state, the sequencer uses information from the current
decoded instruction (
has-rd?) to decide where to go next.
In Racket, the state machine is composed of a
that stores the current state, a
match expression that computes the next
state, and one combinational signal for each action.
I chose to represent states as symbols.
In VHDL, I would declare an enumerated type.
Click on the snippet to expand it.
(define (virgule #:reset reset #:rdata rdata #:ready ready #:irq irq) (define state-reg (register/r 'state-fetch reset (for/signal (instr ready [state this-reg]) (match state ['state-fetch (if ready 'state-decode state)] ['state-decode 'state-execute] ['state-execute (cond [(instruction-load? instr) 'state-load] [(instruction-store? instr) 'state-store] [(instruction-has-rd? instr) 'state-writeback] [else 'state-fetch])] ['state-load (if ready 'state-writeback state)] ['state-store (if ready 'state-fetch state)] ['state-writeback 'state-fetch])))) (define (state-equal? sym) (for/signal (state-reg) (equal? state-reg sym))) (define fetch-en (state-equal? 'state-fetch)) (define decode-en (state-equal? 'state-decode)) (define execute-en (state-equal? 'state-execute)) (define load-en (state-equal? 'state-load)) (define store-en (state-equal? 'state-store)) (define writeback-en (state-equal? 'state-writeback)) ...)
fetch state, the processor copies the program counter register
pc-reg to the address bus and asserts the
At the end of the memory transfer (
ready), the input data bus
rdata is copied to register
This is the complete definition of signals
virgule, they are part of the code that manages all memory
(define rdata-reg (register/e 0 (signal-and valid ready) rdata)) (define valid (signal-or fetch-en store-en load-en)) (define address (signal-if fetch-en pc-reg alu-result-reg))
decoder function extracts information from the instruction word
rdata-reg, producing the
instr signal of type:
(struct instruction (rd funct3 rs1 rs2 imm alu-fn use-pc? use-imm? has-rd? load? store? jump? branch? mret?))
| ||5-bit unsigned||The index of the destination register.|
| ||3-bit unsigned||The |
| ||5-bit unsigned||The index of the first source register.|
| ||5-bit unsigned||The index of the second source register.|
| ||32-bit signed||The immediate value encoded in the instruction.|
| ||symbol||The arithmetic or logic operation to execute.|
| ||boolean||Does this instruction use the program counter as the first ALU operand?|
| ||boolean||Does this instruction use an immediate as the second ALU operand?|
| ||boolean||Does this instruction write a result to a destination register?|
| ||boolean||Is this instruction a memory load?|
| ||boolean||Is this instruction a memory store?|
| ||boolean||Is this instruction a jump ( |
| ||boolean||Is this instruction a conditional branch?|
| ||boolean||Is this instruction a return from an interrupt handler?|
Several other operations happen in the
register-unitreads the source registers needed by the instruction,
and outputs their values to
- Two multiplexers select the operands for the arithmetic and logic unit.
The first operand (
alu-a) can be either the current value of the program
The second operand (
alu-b) can be either an immediate value or
- At the end of the clock cycle,
are stored to registers.
(define instr (decoder (signal-defer rdata-reg))) (define instr-reg (register/e instr-nop decode-en instr)) (define-values (xs1 xs2) (register-unit ... #:src-instr instr ...)) (define xs1-reg (register/e 0 decode-en xs1)) (define xs2-reg (register/e 0 decode-en xs2)) (define alu-a-reg (register/e 0 decode-en (for/signal (instr xs1 [pc (signal-defer pc-reg)]) (if (instruction-use-pc? instr) pc xs1)))) (define alu-b-reg (register/e 0 decode-en (for/signal (instr xs2) (if (instruction-use-imm? instr) (instruction-imm instr) xs2))))
decoder is defined in datapath-components.rkt
It uses constants and functions defined in module opcodes.rkt.
instruction-fmtidentifies the format of an instruction.
word->fieldsextracts the fields from the instruction word.
decode-alu-fnidentifies the arithmetic or logic operation to execute.
(define (instruction-fmt opcode) (match opcode [(== opcode-op) 'fmt-r] [(== opcode-store) 'fmt-s] [(== opcode-branch) 'fmt-b] [(or (== opcode-lui) (== opcode-auipc)) 'fmt-u] [(== opcode-jal) 'fmt-j] [_ 'fmt-i])) (define (word->fields w) (define opcode (unsigned-slice w 6 0)) (define imm (match (instruction-fmt opcode) ['fmt-i (signed-concat [w 31 20])] ['fmt-s (signed-concat [w 31 25] [w 11 7])] ['fmt-b (signed-concat [w 31] [w 7] [w 30 25] [w 11 8] [0 0])] ['fmt-u (signed-concat [w 31 12] [0 11 0])] ['fmt-j (signed-concat [w 31] [w 19 12] [w 20] [w 30 21] [0 0])] [_ 0])) (values opcode (unsigned-slice w 11 7) (unsigned-slice w 14 12) (unsigned-slice w 19 15) (unsigned-slice w 24 20) (unsigned-slice w 31 25) imm)) (define (decode-alu-fn opcode funct3 funct7) (match (list opcode funct3 funct7) [(list (== opcode-lui) _ _ ) 'alu-nop] [(list (== opcode-op) (== funct3-add-sub) (== funct7-sub-sra)) 'alu-sub] [(list (or (== opcode-op-imm) (== opcode-op)) (== funct3-slt) _ ) 'alu-slt] [(list (or (== opcode-op-imm) (== opcode-op)) (== funct3-sltu) _ ) 'alu-sltu] [(list (or (== opcode-op-imm) (== opcode-op)) (== funct3-xor) _ ) 'alu-xor] [(list (or (== opcode-op-imm) (== opcode-op)) (== funct3-or) _ ) 'alu-or] [(list (or (== opcode-op-imm) (== opcode-op)) (== funct3-and) _ ) 'alu-and] [(list (or (== opcode-op-imm) (== opcode-op)) (== funct3-sll) _ ) 'alu-sll] [(list (or (== opcode-op-imm) (== opcode-op)) (== funct3-srl-sra) (== funct7-sub-sra)) 'alu-sra] [(list (or (== opcode-op-imm) (== opcode-op)) (== funct3-srl-sra) _ ) 'alu-srl] [_ 'alu-add])) (define-signal (decoder data) (define-values (opcode rd funct3 rs1 rs2 funct7 imm) (word->fields data)) (define use-pc? (in? opcode (opcode-auipc opcode-jal opcode-branch))) (define use-imm? (not (= opcode opcode-op))) (define load? (= opcode opcode-load)) (define store? (= opcode opcode-store)) (define mret? (and (= opcode opcode-system) (= funct3 funct3-mret) (= imm imm-mret))) (define jump? (in? opcode (opcode-jal opcode-jalr))) (define branch? (= opcode opcode-branch)) (define has-rd? (nor branch? store? (zero? rd))) (define alu-fn (decode-alu-fn opcode funct3 funct7)) (instruction rd funct3 rs1 rs2 imm alu-fn use-pc? use-imm? has-rd? load? store? jump? branch? mret?))
Arithmetic and logic operations, branches
arith-logic-unit performs an arithmetic or logic operation
branch-unit computes the address of the next instruction.
pc+4 receives the address of the next instruction in memory.
It is stored in a register (
pc+4-reg) for later use in the
In branch and jump instructions, the target address is the result of an
addition performed by the arithmetic and logic unit.
If the instruction is a conditional branch,
branch-unit will compare the
values of two source registers, available in
and decide whether the branch is taken or not.
(define alu-result (arith-logic-unit instr-reg alu-a-reg alu-b-reg)) (define alu-result-reg (register/e 0 execute-en alu-result)) (define pc-reg (register/re 0 reset execute-en (branch-unit #:reset reset #:enable execute-en #:irq irq #:instr instr-reg #:xs1 xs1-reg #:xs2 xs2-reg #:address alu-result #:pc+4 (signal-defer pc+4)))) (define pc+4 (for/signal (pc-reg) (word (+ 4 pc-reg)))) (define pc+4-reg (register/e 0 execute-en pc+4))
branch-unit also handles interrupts. When
irq is asserted,
and when the processor is not already serving an interrupt request,
- switch to a non-interruptible state,
- save the address of the next instruction to an internal register (
- branch to the interrupt service routine at address 4.
If the current instruction is
- switch back to the interruptible state,
- branch to the address saved in
branch-unit are defined in module
(define-signal (arith-logic-unit instr a b) (define sa (signed-word a)) (define sb (signed-word b)) (define sh (unsigned-slice b 5 0)) (word (match (instruction-alu-fn instr) ['alu-nop b] ['alu-add (+ a b)] ['alu-sub (- a b)] ['alu-slt (if (< sa sb) 1 0)] ['alu-sltu (if (< a b) 1 0)] ['alu-xor (bitwise-xor a b)] ['alu-or (bitwise-ior a b)] ['alu-and (bitwise-and a b)] ['alu-sll (arithmetic-shift a sh)] ['alu-srl (arithmetic-shift a (- sh))] ['alu-sra (arithmetic-shift sa (- sh))]))) (define (branch-taken? instr a b) (and (instruction-branch? instr) (let ([sa (signed-word a)] [sb (signed-word b)]) (match (instruction-funct3 instr) [(== funct3-beq) (= a b)] [(== funct3-bne) (not (= a b))] [(== funct3-blt) (< sa sb)] [(== funct3-bge) (>= sa sb)] [(== funct3-bltu) (< a b)] [(== funct3-bgeu) (>= a b)] [_ #f])))) (define (branch-unit #:reset reset #:enable enable #:irq irq #:instr instr #:xs1 xs1 #:xs2 xs2 #:address address #:pc+4 pc+4) (define pc-target (for/signal (instr xs1 xs2 address pc+4 [mepc (signal-defer mepc-reg)]) (define aligned-address (unsigned-concat [address 31 2] [0 1 0])) (cond [(instruction-mret? instr) mepc] [(instruction-jump? instr) aligned-address] [(branch-taken? instr xs1 xs2) aligned-address] [else pc+4]))) (define irq-state-reg (register/re #f reset enable (for/signal (instr irq [state this-reg]) (cond [(instruction-mret? instr) #f] [irq #t] [else state])))) (define accept-irq (signal-and-not irq irq-state-reg)) (define mepc-reg (register/re 0 reset (signal-and enable accept-irq) pc-target)) (signal-if accept-irq (signal irq-addr) pc-target))
In load and store operations, the address is always the result of an
addition performed by the arithmetic and logic unit.
valid output is asserted in the
At the end of the memory transfer (
ready), the input data bus
rdata is copied to register
store state, the role of
load-store-unit consists in copying
the data from
wdata, ensuring that it is properly aligned
with respect to the target address and data size.
load-store-unit also sets the
(define rdata-reg (register/e 0 (signal-and valid ready) rdata)) (define valid (signal-or fetch-en store-en load-en)) (define address (signal-if fetch-en pc-reg alu-result-reg)) (define-values (wstrobe wdata ...) (load-store-unit #:instr instr-reg #:address alu-result-reg #:store-enable store-en #:store-data xs2-reg ...))
Here is the part of
load-store-unit that handles store operations:
(define-signal (load-store-unit #:instr instr #:address address #:store-enable store-enable #:store-data store-data #:rdata rdata) #:returns (wstrobe wdata load-data) (define align (unsigned-slice address 1 0)) (define wdata (match (instruction-funct3 instr) [(== funct3-lb-sb) (unsigned-concat [store-data 7 0] [store-data 7 0] [store-data 7 0] [store-data 7 0])] [(== funct3-lh-sh) (unsigned-concat [store-data 15 0] [store-data 15 0])] [_ store-data])) (define wstrobe (if store-enable (match (instruction-funct3 instr) [(or (== funct3-lb-sb) (== funct3-lbu)) (arithmetic-shift #b0001 align)] [(or (== funct3-lh-sh) (== funct3-lhu)) (arithmetic-shift #b0011 align)] [(== funct3-lw-sw) #b1111] [_ #b0000]) #b0000)) ...))
Finally, in the
writeback state, the processor stores the result of the
current instruction into a destination register.
The register index is available in the
rd field of
- If the instruction is a load, the result is taken from
load-store-unit, ensuring that it is properly aligned and sign-extended.
- If the instruction is a jump, the destination register receives a return
- In other cases, the destination register receives the result from
(define-values (xs1 xs2) (register-unit #:reset reset #:enable writeback-en #:src-instr instr #:dest-instr instr-reg #:xd (signal-defer xd))) (define-values (wstrobe wdata load-data) (load-store-unit #:instr instr-reg #:address alu-result-reg #:store-enable store-en #:store-data xs2-reg #:rdata rdata-reg)) (define xd (for/signal (instr-reg load-data pc+4-reg alu-result-reg) (cond [(instruction-load? instr-reg) load-data] [(instruction-jump? instr-reg) pc+4-reg] [else alu-result-reg])))
Here is the code for
register-unit. It uses a persistent vector
pvector) to store register values. You will find an explanation of this
implementation choice in section Performance considerations.
(define (register-unit #:reset reset #:enable enable #:src-instr src-instr #:dest-instr dest-instr #:xd xd) (define x-reg (register/r (make-pvector reg-count 0) reset (for/signal (enable dest-instr xd this-reg) (if (and enable (instruction-has-rd? dest-instr)) (set-nth this-reg (instruction-rd dest-instr) xd) this-reg)))) (for/signal (src-instr x-reg) #:returns (xs1 xs2) (define xs1 (nth x-reg (instruction-rs1 src-instr))) (define xs2 (nth x-reg (instruction-rs2 src-instr)))))
And this is the part of
load-store-unit that handles load operations:
(define-signal (load-store-unit #:instr instr #:address address #:store-enable store-enable #:store-data store-data #:rdata rdata) #:returns (wstrobe wdata load-data) (define align (unsigned-slice address 1 0)) ... (define aligned-rdata (unsigned-slice rdata 31 (* 8 align))) (define load-data (word (match (instruction-funct3 instr) [(== funct3-lb-sb) (signed-slice aligned-rdata 7 0)] [(== funct3-lh-sh) (signed-slice aligned-rdata 15 0)] [(== funct3-lbu) (unsigned-slice aligned-rdata 7 0)] [(== funct3-lhu) (unsigned-slice aligned-rdata 15 0)] [_ aligned-rdata]))))
The repository contains several modules that can help simulate a system
with a processor core, memory and peripheral devices:
Example programs for a simple system with a fake text output device are available
in the examples
While writing the register unit and the memory components, I had to choose
between two structures for the memory cells: they could be defined as a
signal of vectors, or as a vector of signals.
In VHDL, such a choice does not exist: if I declare a signal with an
array type, I am allowed to manipulate it as a whole,
or I can reference each array element as if it were a separate signal.
My first impression was that a signal of vectors would be less efficient
because each write operation would create a new vector.
But using a vector of signals turned out to be far worse, because the cost of reading
at any arbitrary location outweighed the benefits.
folder contains five programs that compare the speed an memory usage of various
The following choices are evaluated:
- Using a signal of vectors vs a vector of signals.
- Using built-in Racket vectors vs persistent vectors.
The third item compares the use of these patterns:
(register/e q0 e (for/signal (...) expr)) (register q0 (for/signal (this-reg ...) (if e expr this-reg)))
In the first case, the register is updated only when
e is true, but
the expression in
for/signal is evaluated even when
e is false.
In the second case, the register is updated in every clock cycle,
expr is evaluated only when
e is true.
Both produce signals with the same values, but are not equivalent
in terms of operations and intermediate data.
The results below have been obtained for a memory containing 1000 integers,
a simulation duration of 10000 clock cycles with one write every two clock cycles.
All memory cells are written and read in turns.
They show that the most efficient combination is to use a signal of
persistent vectors with
|Storage||Register type||Time (ms)||Memory use (MB)|
| || ||21000||2000|
|Signal of || ||110||86|
|Signal of || ||58||45|
|Signal of || ||39||14|
|Signal of || ||28||10|