CS 273: Compiling, Linking and Loading Programs

This page discusses how we go about translating a program from a high level language (like C or Java) into machine code, and then getting that machine code into a runnable state.

The main points we'll make are:

  1. Compilation translates a program in a high level language into assembly code.
  2. Assembly translates a program in assembly code into machine (also known as object) code.
  3. The differences between a high level language and assembly language are
    1. As the name implies, high level languages involve concepts that are at a conceptually "higher" level — for instance, a high level language will have concepts like for and while loops, while assembly code will only have branches (though the low-level operations provided by the assembly code are primarily intended to support the high-level concepts available in the high level language).
    2. High level languages are intended to be portable across multiple instruction sets; an assembly language is just a human-readable representation of a single instruction set.
    3. A statement in a high level language generally compiles into more than one statement in assembly code.
    4. A statement in assembly code generally assembles into exactly one statement in machine code.
  4. Linking combines separately compiled (and assembled) code modules with system libraries to produce executable code. While that's important in many environments, it won't be used in ours.
  5. Loading is putting the final machine-code program in the memory of the computer so it can be executed.

Note: some compilers skip the assembly step and compile directly to object code.

Compilation

Compiling a program translates it from the high level language into the machine code. We'll show the general idea in terms of a simple program that just adds some numbers together. Here's the program, in C:

// tiny example program that adds up the numbers 0+1+2+3+4
#include 
char count;
char sum;
int main() {
    sum = 0;
    count = 0;
    while (count < 5) {
        sum = sum + count;
        count = count + 1;
    }
    printf("sum = %d\n", sum);
}

The compilation process converts this code into assembly code. We'll talk more about just how this works later (and CS370 will talk about it a lot! For now since we'll be hand-compiling, we can just think in these terms: for every statement in the C code, we need to find some corresponding statements in assembly code. Here's the assembly code we get, with the corresponding C code appearing as comments.

*// tiny example program that adds up the numbers 0+1+2+3+4
*#include 
* variables
        org 0
count   rmb     1       * char count;
sum     rmb     1       * char sum;

* code
        org     $f800
main
*                       * int main() {
        ldaa    #0      *     sum = 0;
        staa    sum

        ldaa    #0      *    count = 0;
        staa    count

loop
        ldaa    count   *     while (count < 5) {
        cmpa    #5
        bge     out

        ldaa    sum     *        sum = sum + count;
        adda    count
        staa    sum

        ldaa    count           count = count + 1;
        adda    #1
        staa    count

        bra     loop    *    }
*                       *    printf("sum = %d\n", sum);

out     bra     out
        
        end     main    * }

Assembly

The assembly process converts the assembly code we got in the last step into machine code (note: some compilers skip the assembly step, and convert directly to machine code). If we assemble this code, using the command

as11 sum.asm -l > sum.lst

we get two new files as a result: one is called sum.lst, and the other is sum.s19. Of the two, sum.lst is more important to us, and sum.s19 is more important to the computer. Let's take a look at sum.lst


  Assembling sum.asm
0001                              *// tiny example program that adds up the numbers 0+1+2+3+4
0002                              *#include 
0003                              * variables
0004 0000                                 org 0
0005 0000                         count   rmb     1       * char count;
0006 0001                         sum     rmb     1       * char sum;
0007                              
0008                              * code
0009 f800                                 org     $f800
0010                              main
0011                              *                       * int main() {
0012 f800 86 00                           ldaa    #0      *     sum = 0;
0013 f802 97 01                           staa    sum
0014                              
0015 f804 86 00                           ldaa    #0      *    count = 0;
0016 f806 97 00                           staa    count
0017                              
0018                              loop
0019 f808 96 00                           ldaa    count   *     while (count < 5) {
0020 f80a 81 05                           cmpa    #5
0021 f80c 2c 0e                           bge     out
0022                              
0023 f80e 96 01                           ldaa    sum     *        sum = sum + count;
0024 f810 9b 00                           adda    count
0025 f812 97 01                           staa    sum
0026                              
0027 f814 96 00                           ldaa    count           count = count + 1;
0028 f816 8b 01                           adda    #1
0029 f818 97 00                           staa    count
0030                              
0031 f81a 20 ec                           bra     loop    *    }
0032                              *                       *    printf("sum = %d\n", sum);
0033                              
0034 f81c 20 fe                   out     bra     out
0035                                      
0036 f81e                                 end     main    * }


Number of errors 0
Number of warnings 0

You can see that this is exactly our assembly code, and also the machine code it translates into.

Linking

Think about a program in C. Just about every C program ever written uses the printf() function. But where does it come from? It certainly isn't in your program. The answer is that it is in a system library. Adding that function (and all the other system functions) into your program so it can be used is called linking the program.

For large programs, you don't want to have to compile the whole thing every time you make a change. So, instead of the program being one huge file, it's a bunch of smaller files which are compiled separately. Linking also stitches all this code together.

In the case of the HC11, the total amount of space in the processor isn't large enough to make separate assembly worth while. So, there is no linking step in this processor.

Loading

Finally, the generated machine code has to be loaded into the computer's memory so it can be executed — that's what the S19 file is for. It isn't intended to be read by human beings; it's just something that's really easy for a computer program to read so it can be loaded into memory. Just for grins, we can take a look at the S19 file generated for our summation program:

S121F8008600970186009700960081052C0E96019B00970196008B01970020EC20FEA8
S903F80004

This is called an S19 file because every line starts with either S1 or S9. S1 lines contain code; the S9 line at the end tells where the program should start executing. No, you don't need to actually know the file format!

Both our simulator and the downloader that loads code onto our computers for this class understand the S19 format. The tksim11 frontend to the simulator also understands the .lst files, so it can display the current line being executed.


Last modified: Wed Jan 28 13:06:47 MST 2004
Last modified: Fri Sep 4 08:39:56 MDT 2009

Valid HTML 4.01 Transitional