PL: Type Checking, Binding, and Scopes • Jonathan Cook

From old CS 471 notes…

Names, Type Checking, Binding, and Scopes

Names in a Program (Identifiers)

Language designer must consider several design decisions:

Case sensitivity? (usually yes these days)
Allowed characters
Initial allowed character
Length of allowed identifiers
Internal vs. external names
Keywords excluded?

Popular naming syntax: first character is underscore or letter, rest are underscore, letter, or digit; all case sensitive.

What things can be named?

Variables
Procedures
User defined types
Classes
Constants
Macros
Namespaces

Q: What sorts of naming standards are there for each?

Q: Should a language enforce naming standards?

Variables

From textbook: “A variable can be characterized as a sextuple of attributes: (name, address, value, type, lifetime, scope)”

Name: all of the considerations above.

Address: some value which indicates how to find the location of the abstract memory cell that hold’s the variable’s value. It is usually a hardware memory address. It is also called an l-value.

Type: the data type that constrains what values the variable can hold, and the set of operations that are allowed on it.

Value: The contents of the variable’s abstract memory cell. This is also called the r-value.

Scope: Where in the program can the variable be used.

Some scripting languages force the use of a $ sign to “access” the contents of a variable: in essense, the name is the l-value, and $name is the r-value.

Binding

Binding is a generic concept: the establishing of an association between two entities.

For example, binding the type int to the variable x.

The binding time is the point at which the association is established.

From the textbook, for the assignment count = count + 5:

The type of count is bound at compile time;
The set of possible values for count is bound at compiler design time;
The meaning of the operator “+” is bound at compile time, by the determination of its operand types;
The internal representation of 5 is bound at compiler design time;
The [resulting] value of count is bound at execution time;

Bindings are static if they are decided before runtime and do not change during runtime.

Bindings are dynamic if they are decided at runtime or can be changed during runtime.

Type Bindings

Type bindings are probably the most important type of binding! :-)

Static type binding: most common in “regular” programming languages.

Explicit declaration of variable types is also most common in “regular” programming languages.

Fortran allowed some implicit declaration: an undeclared variable beginning with (or only?) I through N is automatically Integer, otherwise it is Real.

Q: does C have any implicit declaration? Does Java?

Perl has semi-implicit: variables beginning with $ are scalar, those beginning with @ are arrays, and those beginning with % are hash structures.

Dynamic type binding: most common in scripting languages and other language paradigms (functional, logic, etc.)

Variables are bound to the type of whatever value they currently hold.

One can think of dynamic variable typing as static value typing. Each value has a statically bound type, which the variable inherits when it is assigned that value.

Static variable typing allows many decisions to be made at compile time and allows efficient machine code to be generated.

Dynamic typing allows programmer flexibility but requires run-time type information to be carried around and checked, and is less reliable because a variable might contain the wrong type of value in some particular execution context.

Dynamically typed languages, if not careful, can experience type shimmering, which drastically hurts performance. E.g.:

i = 1
while (i < 10)
  i++
  str = "the value of i is " + i
  print str
}

In the above, the variable i can shimmer between an integer type and a string type, if the interpreter is not smart enough.

Important note: in programming language theory, classes in OO languages are simply user-defined types. So when thinking “what does this language do with types?” we must also, for an OO language, think “what does this language do with classes and objects?”.

Type Inference

Type inference is the act of inferring or deciding the type of something (variable, expression, return value of function) without the programmer explicitly declaring the type.

Type inference is closely associated with type coercion (or type conversion), where the language decides it can automatically convert a value of one type into a value of another type.

Textbook examples in ML:

fun circumference(r) = 3.14159 * r * r;   // ok, constant forces real

fun timesten(x) = 10 * x;                 // ok, constant forces int

fun square(x) = x * x;                    // not ok, unconstrained

fun square(x):real = x * x;               // ok, func returns real

fun square(x) = (x:real) * x;             // ok, parameter must be real

Type Checking

Type checking is the process of ensuring that operands of operators, parameters, values for assignment, etc., are of the proper (same or “compatible”) type.

Compatible types means that the compiler or runtime system knows how to convert a value of one type into the other type (and that the rules of the language allow it).

Static type checking by the compiler ensures that no runtime type violations can occur.

More dynamic languages need dynamic (runtime) type checking.

Many languages use both.

Strong Typing: the notion that a program in a language can never have a type error (explicitly or implicitly).

A language that is not strongly typed is weakly typed (although there are degrees of weakness) or untyped.

C, C++: weakly typed because unions are not checked, incompatible operands are not always checked. Also variable argument lists.

Java, C#: strongly typed except for user-enforced type casting.

C and C++ also have gobs and gobs of built-in type coercion rules, which is “weak typing” in the sense of allowing too much and possibly causing a program error.

Type Equivalence

Q: When are the types of two variables compatible – i.e., directly substitutable without conversion? This is type equivalence.

Sounds easy…

Name type equivalence says that two variables have equivalent types if they are defined in the same declaration or declared using the same type name.

Structural type equivalence says that two variables have equivalent types if their types have equivalent structures.

Q: Which does Java use?

Q: Which does C use?

Storage Bindings and Lifetime

Every variable has a specific lifetime and a specific mechanism that allocates and deallocates its storage.

Programs typically have three areas for variable allocation: a static memory area, the stack, and the heap.

Static variables are bound to storage before the program begins and retain that storage until the program ends.

Stack dynamic variables are bound to storage on the stack during some point of program execution, and unbound at a point where the program execution pops that portion of the stack off.

Explicit heap dynamic variables are bound to heap storage when the program explicitly requests allocation for the variable, and are unbound in one of two ways: explicit de-allocation by the program, or automatic garbage collection.

Implicit heap dynamic variables are bound to heap storage only when assigned values. Popular in scripting languages. Does this occur in Java? C++?

Scope

The scope of a variable is the area of code statements where the variable is visible and can be referenced.

For most code this is “easy” and obvious, but many languages have constructs such as nested procedures, nested classes, inheritance visibility modifiers, etc. that can make for some difficult cases.

Static scoping: scope of every variable can be determined at compile-time.

Nested declared procedures: the enclosing procedure is the static parent of the enclosed procedure. Transitively, static ancestors.

Many languages also allow arbitrary block scoping: i.e., you can declare a variable within any compound statement block, such as the body of a loop.

C++ and Java allows variable declarations anywhere. Q: What are the scope rules for a variable declared in the middle of a method?

Dynamic scoping means that at least some variable scoping and accessibility must be decided at runtime.

Example: rather than “x” referring to a variable in the static ancestor, it will refer to a variable in a dynamic ancestor – that is, an active procedure on the call stack.

Not very popular these days.

OO concepts employ some sense of dynamic scoping, although it is restricted and can be type-checked at compile time.

Scope and variable lifetime are usually “similar”, but not always.

When a procedure is called, the caller’s local variables go out of scope, but they are still alive (until the caller finishes).

Modifiers like static in C/C++/Java change the lifetime of a variable but not the scope.

Also object data members can be public (global scope) but lifetime is the object lifetime.

The referencing environment of a statement in a program is the set of all variables that are accessible by the statement.

Named constants and const modifiers

Book says “a named constant is a variable that is bound to a value only once.”

Ok, but that means it is not variable. Plus, compilers do not always need to treat it as such (i.e., allocate space, etc.)

Typical constants are declared with some fixed value in the program text.

Some languages allow dynamic binding of constants – in other words, the constant value is computed at run time, and can involve an expression that uses variables. C++ const and Java final allow this.

C++ also allows const to be used as an interface specification. It declares that a method will not modify the parameter that is qualified as const.

OO visibility modifiers

public: member scope is whole program

private: member scope is the class only

protected: member scope is the class and all subclasses