Skip to Content

RISC-V: An Open CPU Architecture

RISC-V is an open CPU architecture that is a neat development in the history of CPUs and computing. This page is a start of collecting information and links about it. This page also has direct content to help NMSU CS 370 students.

The official RISC-V Home is always a good place to start, but it’s documentation wiki is probably better for technical info.

External ISA and Assembly Language Documentation

This Medium post is a RISC-V assembly tutorial. A RISC-V Assembly Manual is available; it looks to be unofficial but very comprehensive.

A RISC-V Assembly and Programming Textbook has some good online content. A decent RISC-V Assembly Reference. An open-access computer organization textbook using RISC-V has an appendix with example programs.

Simulators, Assemblers, Etc.

For CS370 we are using the RARS simulator. This is based on the long-standing MARS MIPS simulator, and is a good and solid tool. You can download a pre-built jar file from the Releases page. Other tools have moved to the “Resources” at the bottom of this page.

The RISC-V ISA (Instruction Set Architecture)

To program and use a CPU, it must have a well-defined interface. This is known as its Instruction Set Architecture, because it is centered around the machine instructions that the CPU actually executes in its circuits. But there is lots of other detail around the instructions that needs to be specified to actually program and use the CPU.

What is the native operand size? RISC-V comes in both a 32-bit and a 64-bit version. In CS 370 we will use the 32-bit version.

Does a CPU expect multi-byte values to be ordered most significant byte first (big endian), or least significant byte first (little endian)? The RISC-V CPU is little endian. (Intel x86 is also little endian, but some others are big endian).

CPUs use registers to efficiently store values and have instructions operate on them. RISC-V has 32 registers, but many have particular names and purposes, as seen in the table in the next section below.

How does the stack work in the CPU? In RISC-V the stack grows downward (towards smaller memory addresses), and the stack pointer register stores the address of the top item on the stack, not the next available location.

Pseudo-Instructions

RISC stands for Reduced Instruction Set Computer, meaning that the actual machine instructions are a very minimal set of instructions. RISC-V is just one example of a RISC CPU architecture.

This means that it can be annoyingly difficult for humans to program assembly language, because it might take multiple instructions to do simple things. So, the RISC-V ISA also includes pseudo-instructions that are allowed to be used in assembly programming, but that get translated to more than one machine instruction (usually two). Some common ones are la (load address), li (load immediate, and mv (move/copy).

In the RARS simulator, the help screen lists both actual and pseudo instructions; we can use both in our compiler project.

Register Names and Calling Convention

Programming using machine instructions involves alot of instructions that simply move (copy, actually) values between memory and registers, and from one register to another. This is because different registers are used for different things. Code in functions also needs to know which registers it can freely use, and which registers have data in them that needs saved first if the function is to use the register.

The table below describes the registers available in RISC-V, how they are used, and who is responsible for saving them when needed.

Register ASM Name Description Saver
x0 zero Always zero (hard-wired) n/a
x1 ra Return Address caller
x2 sp Stack Pointer callee
x3 gp Global Pointer n/a
x4 tp Thread Pointer n/a
x5-7 t0-t2 Temporary values caller
x8 s0/fp Frame Pointer (saved reg) callee
x9 s1 Saved register callee
x10-17 a0-a7 Argument values caller
x18-27 s2-s11 Saved registers callee
x28-31 t3-t6 Temporary values caller
pc n/a Program Counter n/a

A register marked as “caller” saved is freely available to a function to use however it wants; but if the function calls another function and expects the register to be unchanged after the call, then it must save it somewhere (usually on the stack). A register marked “callee” saved is not freely available for use; if a function wants to change one of these registers, it must first save it somewhere; however, a function can assume that these registers will not change value when it calls some other function.

The stack pointer is callee-saved in the sense that when returning from a function, whatever the function did on the stack must be undone, and the stack pointer must be back to whatever it was when the function was called.

Function return values are passed in the argument registers a0 and a1.

If a function does not make any calls to any other function (i.e., it is a leaf function), then it does not need to save the return address register. However, any function that does make calls must save its own return address, and then restore it back into the ra register before executing its ret instruction.

A function that does make calls to other functions will probably need to save its own incoming arguments so that it can use the argument registers to set up its own calls.

Basic Assembly Program Format

Assembly language is basically the textual, human-readable form of a CPU’s native machine instructions. However, to create a complete program we need to also declare data, mark functions and branch targets, and create other organization of our code and data.

So, assembly language includes not only instructions, but also directives and labels. And of course comments.

In RISC-V, and most assembly languages, comments begin with a hash/number sign (#) and end at the end of the line. Instructions and directives are single-tab-indented, forming a column. Labels (human-created names for things) are not indented and end with a colon (:).

Labels can be on the same line as an instruction or directive, or on their own line, in which case they apply to the next directive or instruction below them. Labels are addresses! A label is a name for the address where something is in memory.

Directives begin with a dot (.), as in most assembly languages, and tell the assembler to do something in the program organization. They are not machine instructions, rather they are instructions to the assembler program!

A sample program is:

#
# Sample RISC-V assembly program
#

	.data
.SC0:	.string	"The value is: "
val:	.word 0

	.text
#
# main program instructions
#
program:
	li	t0, 42
	sw	t0, val
	la	a0, .SC0
	jal	printStr
	lw	a0, val
	jal	printStr
	li	a0, 0
	li	a7, 93
	ecall   # syscall 93: exit

# Function printStr
printStr:
	li	a7, 4
	ecall   # RARS syscall: print string
	ret

# Function printInt
printInt:
	li	a7, 1
	ecall   # RARS syscall: print int
	ret

The .data directive tells the assembler that what follows is for the data section of the program, and the .text directive tells the assembler that what follows is machine instructions. The .string directive puts the given string into data memory, while the .word directive puts the given integer into memory as a 32-bit signed value (2’s complement).

The labels .SC0 and val are the names we can use in the program to refer to the string and the integer variable, respectively. Think of val as the programmer’s choice for that variable name, and think of .SC0 as a compiler-generated name for the string constant.

The labels program, printStr, and printInt are all function names, although program is more the label for the main program, and doesn’t expect to be called, just started.

Everything else in the .text section are assembly instructions that the assembler turns into machine code.

Stack Operations

The RISC-V ISA does not have push and pop instructions; rather, you have to modify the sp (stack pointer) yourself, and then do normal memory load and store instructions.

The convention is that the sp register is pointing to (i.e., contains the address of) the item on the top of the stack, so a push operation involves first subtracting an offset from the sp register to make room on the stack and then storing the item onto the stack. This can look like:

    addi sp, sp, -4   # make room for a 4-byte integer (32 bits)
    sw   a1, (sp)     # store value in a1 into memory at sp address

Similarly, a pop operation would first copy the item on the top of the stack into a register, and then add an offset to the stack pointer to remove the space of the item from the stack. This can look like:

    lw   a1, (sp)     # load value in memory at sp address into a1
    addi sp, sp, 4    # remove room of a 4-byte integer (32 bits)

Defining a Function

The functions printStr and printInt in the above example are a bit simplistic, even though they work fine. For a generic function that you are compiling from source code of a programming language, you need to generate a code preamble that saves the return address on the stack, and then generate a code postamble that pops the return address off the stack and then executes a return instruction. For a function named myFunc this looks like:

# Function myFunc
myFunc:
	addi  sp, sp, -4
	sw    ra, 0(sp)
	# code for actual function here
	lw	ra, 0(sp)
	addi	sp, sp, 4
	ret

In actuality, when functions get even more complex, more space can be allocated on (“pushed onto”) the stack, for saving arguments and having local variables, but the stack pointer (sp register) must be pointing at the spot where the return address (ra register) was pushed onto it when the lw instruction at the end restores the return address to the ra register.

Function Calls

A function is essentially, simply, a jal instruction, short for “jump and link”. The instruction jumps to the code where the function is, and the link part is that it saves the address where it came from – the return address – in the ra register. That way the function can return back to the caller when it is done.

However, the harder part of making a function call is setting up the arguments. In RISC-V, argument values are placed in the registers a0 through a7, and then on the stack if there are more than eight arguments. It can get more complicated than that, but for our compiler project we will only handle integer-type arguments (integers and addresses), and never more than eight arguments.

We basically have just two kinds of arguments: a string (address), and a numeric expression (perhaps as simple as a constant or a variable). For a string, we just load the address into the correct argument register, e.g.,:

	la  a2, .SC1

The example above loads a string with the label .SC2 into the third argument register. Since all numeric expressions leave their result in the t0 register, for this we just need to move (copy) the value into the argument register, e.g.,:

	mv  a1, t0

The instruction above copies the value in t0 into the second argument register a1.

So, for example, the function call “myFunc(“hello world!”, 76+myVar)” might look like:

	la	a0, .SC5
	# code for integer expression here
	mv	a1, t0
	jal	myFunc

Assuming that the label .SC5 is the label assigned to that string constant.

Expressions

For our RISC-V compiler project, we are not going to try to do anything fancy or complicated to optimize the code for the expressions. In particular, we are not even going to try to use all of the temporary value registers t0 to t7 that the RISC-V architecture has.

We will only use the registers t0 and t1, since for binary operators we need two registers to hold the two values that the operator instruction will use.

Every expression has this one simple rule: leave its resulting value in register t0. This way it does not matter if the expression requires just one instruction (e.g., loading a numeric constant into t0) or many instructions (a whole parenthised subexpression), we always know where the resulting value is: register t0.

For a numeric constant, this is just one “load immediate” instruction, e.g.,:

	li	t0, 42

For a variable, see the next section.

For an expression that must compute a binary operation (e.g., addition), the following code template will be used (with add as the example operation):

	# code for left subexpression here
	addi  sp, sp, -4
	sw    t0, 0(sp)
	# code for right subexpression here
	lw    t1, 0(sp)
	addi  sp, sp, 4
	add	t0, t0, t1

This code will compute the left subexpression’s value, and then push it on the stack, then will compute the right subexpression’s value, then pops the left’s value into the t1 register (because t0 contains the right’s value). Finally the add instruction performs the addition operation on the two values, and leaves the result in t0, just like it is supposed to do.

This code will work no matter how complicated the two subexpressions are, since the stack can hold many subexpression values.

Simple Variable Assignments and Uses

Since the right hand side of an assignment statement is an expression, the section above says that the value to be assigned will be in register t0, then all we have to do to make the variable assignment is to save the value into the variable’s memory location, using the “store word” instruction, like:

	sw	t0, myvariable, t1

The sw instruction requires a second register to use as a temporary, and so we use t1 for this purpose. This one instruction, when it follows the right-hand side’s expression evaluation code, saves the value into the variable.

To use a variable’s value, we should just follow the rule for expressions, which is to leave the value in the t0 register. For a simple variable, this is just one “load word” instruction:

	lw	t0, myvariable

Conditionals and Loops

Coming…

Arrays

Coming…

Resources

Several online simulation environments exist, including: BRisc-V, Kite, and one at Cornell.

Many functional RISC-V instruction simulators take a compiled binary as input, not the textual assembly language, which means you need some entire RISC-V toolchain to compile or assemble code into a correct binary executable file. I just want a simple assembly simulator!

The main RISC-V site can list simulators (and other tools).

The project Masimulator maybe does provide near-assembly simulation, but it uses (and embeds?) an assembler to produce a binary, but this simple toolchain may not be a bad way to go.

PyRISC is a python-based RISC-V simulator, but I am pretty sure it only uses binary executables as input, and so requires a compiler/assembler toolchain.

TinyFive simulators RISC-V instructions in Python, but seems oriented to AI-type applications, focusing on numerical operations in neural nets.

MARSS-RISC-V may be an option. MARSS has long been a decent MIPS simulator, so this port to RISC-V should be good.

This RISC-V ISA Model seems to be someone’s beginning attempt at some Python tool. It contains the defined components of the ISA, but that seems to be it. It might be something to build on.