Machine Instructions and Assembly Code • Jonathan Cook

Machine Instructions and Assembly Language

So, now we know about instructions, addresses, and registers. Let's get to specifics

The machine instructions look like "01101010" -- they are just a sequence of bits

E.g., the 16-bit word 0001101100010010 ($1B12) means "Subtract register 18 from register 17" to an AVR CPU

But we don't want to program using 1s and 0s, do we?

Actually, that is how computers were first programmed. The programmer entered the machine instruction using switches that were set to 1 or 0.

We use words and abbreviations for each instruction, and then use a program to assemble the words into the machine instructions.

The AVR calls the instruction 000110########## "SUB", short for "Subtract some register from another"; the bits marked "#" designate which registers will be used.

We now refer to the numerical machine instruction as the opcode for the name of the instruction, and the name is the mnemonic

So, this "language" of words and abbreviations for the machine instructions is called assembly language.

The assembleris the program which reads a program written in assembly language, and produces the machine code.

This is not the same as compiling, but the gcc compiler, which we are using, also includes an assembler, so we can use it to assemble our programs.

Each assembly instruction is exactly one machine instruction

Unlike the C/Java assignment statement, which a compiler must translate into many instructions

The assembly instructions are specific to a type of microprocessor (CPU)

An AVR assembly program cannot be assembled to run on a Pentium

Assembly language also lets us assign names, or symbols, to certain values.

An example AVR assembly language function w/ global data (not a complete program)

#
# Global data (val1 and val2) (new change)
#
    .data
    .comm val1,1
    .global val1
    .comm val2,1
    .global val2

#
# Program code (compute function)
#
    .text
    .global compute
compute: 
    lds  r18, val2
    ldi  r19, 23
    add  r18, r19
    sts  val1, r18
    ret

An assembly language contains:

Mnemonic names for the machine instructions

In the above program, these are: lds, ldi, add, sts, ret

Assembler directives for creating data and specifying addresses: .data, .comm, .global, .text

Symbols that are names for values, such as data values or addresses: val1, val2, compute

Syntax for specifying addressing modes (we'll get to this)

Numerical values: 23

Comments: lines beginning with '#'

NOTE: the assembly language does not contain anything that the microprocessor does not directly support. It is not a high-level programming language!

What an assembler does:

It translates the instruction names into opcodes
It translates the symbols into numerical values
It generates a sequence of bits that is the machine program

An assembler listing for the above program

The listing below is produced by running "avr-as -a=compute.lst compute.s", which assembles the file "compute.s" and puts the listing in "compute.lst".

GAS LISTING compute.s          page 1


   1                
   2                #
   3                # Global data (val1 and val2) (new change)
   4                #
   5                    .data
   6                    .comm val1,1
   7                    .global val1
   8                    .comm val2,1
   9                    .global val2
  10                
  11                #
  12                # Program code (compute function)
  13                #
  14                    .text
  15                    .global compute
  16                compute: 
  17 0000 2091 0000         lds  r18, val2
  18 0004 37E1              ldi  r19, 23
  19 0006 230F              add  r18, r19
  20 0008 2093 0000         sts  val1, r18
  21 000c 0895              ret
  22                    

GAS LISTING compute.s           page 2


DEFINED SYMBOLS
                            *COM*:00000001 val1
                            *COM*:00000001 val2
           compute.s:16     .text:00000000 compute

NO UNDEFINED SYMBOLS

This listing shows our program and the resulting machine code and the memory that it takes up. The first column is the line number of the listing (and of the program). The second column is a relative memory address in hexadecimal. These relative addresses begin at 0, but that does not mean that when the program runs it will be at memory address 0! Following the relative address is the machine code that each instruction results in. Most are 16-bit machine instructions, but some are 32-bit. Finally, the program text line is displayed.

The second page of the listing shows a table of defined symbols, and then a table of undefined symbols if there are any (in this case, none).

Important: each of these symbols represent a memory address, but we don't know yet what that address is! The two instructions that refer to "val1" and "val2" have a second 16-bit value of 0, but this is temporary: that value will be changed to be equal to the memory address represented by that symbol, when the linker create a final executable program.

What's a linker? It is the final step that a compiler or assembler uses to create an executable program. It's job is to connect all the pieces together ("link"), and figure out the actual addresses used in the pieces, and then put those real addresses into the code whereever they are needed.