PL: Semantics of PLs • Jonathan Cook

From old CS 471 notes…

Semantics of Programming Languages

Already have seen static semantics, which are constraints on the behavior of the program that can be evaluated by the compiler.

Now we address the “pure” dynamic semantics of a programming language, which determine what our program will actually compute as it runs.

Semantics are important!

Knowing what will happen when you use some construct in your programming language is of utmost and critical importance for actually creating a program that will do what you expect!

Semantics are rarely described formally!

Full semantics of a programming language are so hard to describe that it is most often done informally, assuming that the programmer will be able to “just get it”.

Typical Classes of Semantics

Operational, Axiomatic, and Denotational

The Wikipedia page on formal PL semantics is useful, and has good links.

Operational Semantics

Essentially, formally describe the operation of each language construct in another language. The other language should generally be a simpler language that encodes the operation of a simpler computational machine.

Sometimes the other language is English: in this case, just informally describe the operation of each language construct.

Eventually, a simple machine must be described mathematically, or just left as informally understood because it is so simple.

Worst informal case: “Here’s my compiler/interpreter, the language does whatever the compiler/interpreter produces for it. If you want a more precise definition, look at the source code for the compiler/interpreter.”

Axiomatic Semantics

Essentially, provide mathematical rules (axioms) for each language construct detailing what postconditions hold after the construct is executed, given some assumed preconditions.

Was developed to prove the correctness of individual programs, but in doing so it establishes the meaning of programming language constructs.

Reasoning over axioms is in reverse: given a postcondition, what is the weakest precondition that will satisfy the postcondition?

For program correctness proofs, start with an assumed final postcondition of the program, work all the way backwards to the beginning of the program. The necessary (weakest) precondition of the program should only be the correct initial values of any static global variables, and the value(s) of the input data.

Instead of writing formulas, I’ll just use the Wikipedia page on Hoare Logic, which is close to the syntax the book uses.

Stickiest issue: loop invariants!

int sum(int vals[], int length)
{
   int i=0; int s=0;
   while (i < length)
   {
      s = s + vals[i];
      i++;
   }
   return s; // post: s = SUM(vals[0]:vals[length-1])
}

Denotational Semantics

Essentially, provide a full semantic description in a functional language, detailing a functional mapping of mathematical objects for each language construct.

What is a functional language?

A pure functional language has no statements that have side effects.

In other words, there is no “memory”, no variables that hold values beyond the context of some expression.

In a functional language, every piece of the program is a function that returns some value, but does not do anything else.

A function can have parameters, but these parameters are read-only, and the only thing a function can do is compute something and return a value.

Some programming languages have been based on this: LISP, Scheme, ML, etc.

In some sense, denotational semantics is similar to operational semantics, except that:

the “simpler” language used is functional; and
the “abstract machine” is mathematics; and
it is completely formal and mathematically exact.

Denotational semantics is “syntax-directed semantics” (quote from book), meaning:

an abstract syntax is given;
all built-in datatypes are given mathematical meaning; and
semantic valuation functions are defined for each syntactic entity and rule.

The semantic meaning of a program is the composition of the valuation functions of all the syntactic rules and entities that occur in the parse tree of the program.

Denotational semantics have often been written in lambda calculus, but we will not discuss this at all.

Denotational semantics are very natural for languages like LISP.

But imperative languages, like C/C++/Java, are harder.

Imperative languages assume a memory store over which statements operate – i.e., variables that contain values.

Statements update the memory store – i.e., they assign new values to variables.

Updating a variable is a side effect.

Main Idea for describing imperative languages in denotational semantics: the entire memory store must be passed as a parameter through valuation functions; for statements that update one or more variables, the functions for these statements return a new store as their return value.

A memory store is simply a mapping from variable names to values.

Variables are not updated, a new memory store is created where the same variable name is mapped to a new value.

Below are abbreviated examples from textbook.

Given a grammar that allows “expr” to be expanded into a number, variable, or +/* binary expression, then valuation function is:

   <num> -> Mnum(<num>,S)
   <var> -> VARMAP(<var>,S)
   <bin_expr> -> if (<bin_expr>.op == '+')
             Mexpr(<bin_expr>.lhs,S) + Mexpr(<bin_expr>.rhs,S)
          else if (<bin_expr>.op == '*')
             Mexpr(<bin_expr>.lhs,S) * Mexpr(<bin_expr>.rhs,S)

given an assignment statement of “var = expr”, the valuation function is:

     newS where foreach v in S,
       if (v.id == <var>.id)
          newS.add(v.id, Mexpr(<expr>,S))
       else
          newS.add(v.id,v.val)

Loop valuation, with “while TEST do BODY” loop:

     if (Mbool(<test>) == false) then S
     else
        Mloop(<test>,<body>,Mstmtlist(<body>,S))

In the above, Mstmtlist is a valuation function that takes an (initial) store and returns a new store that is the result of evaluating all the statements in the statement list (in sequence).