Machine Instructions and Assembly Code
Machine Instructions and Assembly Language
So, now we know about instructions, addresses, and registers. Let's get to specifics
The machine instructions look like "01101010" -- they are just a sequence of bits
- E.g., the 16-bit word 0001101100010010 ($1B12) means "Subtract register 18 from register 17" to an AVR CPU
But we don't want to program using 1s and 0s, do we?
- Actually, that is how computers were first programmed. The programmer entered the machine instruction using switches that were set to 1 or 0.
We use words and abbreviations for each instruction, and then use a program to assemble the words into the machine instructions.
The AVR calls the instruction 000110########## "SUB", short for "Subtract some register from another"; the bits marked "#" designate which registers will be used.
We now refer to the numerical machine instruction as the opcode for the name of the instruction, and the name is the mnemonic
So, this "language" of words and abbreviations for the machine instructions is called assembly language.
The assembleris the program which reads a program written in assembly language, and produces the machine code.
This is not the same as compiling, but the gcc compiler, which we are using, also includes an assembler, so we can use it to assemble our programs.
Each assembly instruction is exactly one machine instruction
- Unlike the C/Java assignment statement, which a compiler must translate into many instructions
The assembly instructions are specific to a type of microprocessor (CPU)
- An AVR assembly program cannot be assembled to run on a Pentium
Assembly language also lets us assign names, or symbols, to certain values.
An example AVR assembly language function w/ global data (not a complete program)
#
# Global data (val1 and val2) (new change)
#
.data
.comm val1,1
.global val1
.comm val2,1
.global val2
#
# Program code (compute function)
#
.text
.global compute
compute:
lds r18, val2
ldi r19, 23
add r18, r19
sts val1, r18
ret
An assembly language contains:
Mnemonic names for the machine instructions
- In the above program, these are: lds, ldi, add, sts, ret
Register names: r18, r19
Assembler directives for creating data and specifying addresses: .data, .comm, .global, .text
Symbols that are names for values, such as data values or addresses: val1, val2, compute
Syntax for specifying addressing modes (we'll get to this)
Numerical values: 23
Comments: lines beginning with '#'
NOTE: the assembly language does not contain anything that the microprocessor does not directly support. It is not a high-level programming language!
What an assembler does:
- It translates the instruction names into opcodes
- It translates the symbols into numerical values
- It generates a sequence of bits that is the machine program
An assembler listing for the above program
The listing below is produced by running
"avr-as -a=compute.lst compute.s", which assembles the file
"compute.s" and puts the listing in "compute.lst".
GAS LISTING compute.s page 1
1
2 #
3 # Global data (val1 and val2) (new change)
4 #
5 .data
6 .comm val1,1
7 .global val1
8 .comm val2,1
9 .global val2
10
11 #
12 # Program code (compute function)
13 #
14 .text
15 .global compute
16 compute:
17 0000 2091 0000 lds r18, val2
18 0004 37E1 ldi r19, 23
19 0006 230F add r18, r19
20 0008 2093 0000 sts val1, r18
21 000c 0895 ret
22
GAS LISTING compute.s page 2
DEFINED SYMBOLS
*COM*:00000001 val1
*COM*:00000001 val2
compute.s:16 .text:00000000 compute
NO UNDEFINED SYMBOLS
This listing shows our program and the resulting machine code and the memory that it takes up. The first column is the line number of the listing (and of the program). The second column is a relative memory address in hexadecimal. These relative addresses begin at 0, but that does not mean that when the program runs it will be at memory address 0! Following the relative address is the machine code that each instruction results in. Most are 16-bit machine instructions, but some are 32-bit. Finally, the program text line is displayed.
The second page of the listing shows a table of defined symbols, and then a table of undefined symbols if there are any (in this case, none).
Important: each of these symbols represent a memory address, but we don't know yet what that address is! The two instructions that refer to "val1" and "val2" have a second 16-bit value of 0, but this is temporary: that value will be changed to be equal to the memory address represented by that symbol, when the linker create a final executable program.
What's a linker? It is the final step that a compiler or assembler uses to create an executable program. It's job is to connect all the pieces together ("link"), and figure out the actual addresses used in the pieces, and then put those real addresses into the code whereever they are needed.