Text: Handouts...
The Java Modeling Language: at http://www.cs.iastate.edu/~leavens/JML//index.shtml
A Formal Methods collection page
http://findbugs.sourceforge.net/
We wrote a program. We think we know what it does. We might not be right. It might have bugs. What do we do?!?!??!
Test it. and test it. and test it. and test it.
But testing proves correctness for only a very few concrete examples.
Ok, well, there's all these machines sitting idle. Let's just create a framework that automatically generates the next unique input, and runs the program on it. That way we'll run all possible executions!
How many bits of input do most programs have?
Let's say 10 32-bit integers = 320 bits. This means 2^320 different executions.
There's maybe about 10^78 atoms in the universe. Another way of saying this is there's 2^253 atoms in the universe. And you think you're going to run a program 2^320 different times?!?
Besides, we still have the halting problem -- how do you know whether a given execution is stuck in an infinite loop or just taking a long time to compute?
Rather than trying to run the program over all inputs, look at the program and decide statically that it "works".
"works" generally translates into "the program has some property X".
Hopefully you can check property X in a reasonable amount of time.
These analyses are also called formal methods, formal verification, and others.
The "hello world" example of static analysis: type checking
Property: Every variable Vi of type Ti only holds values of type Ti.
Compiler knows about all types, and knows the allowed conversions between types.
If your program violates the property, the compiler issues you an error message and refuses to compile your program.
Ok, that may sound easy....
but what if your language didn't require you to declare variable types?
or what if you wanted to verify that a valid date string was always displayed to the user?
or a variable always contained the distance in meters?
Type checking is not always as easy as it is in C.
int x; unsigned int y; for (x=0; x<10; x++) do_something; y = x;
Does C care about the above example? Yes! (most compilers only warn about this, but it is flagged as an "error")
But it is correct, no?
This easy example shows that static analyses tend to be sound and conservative
Sound: if the SA says the property holds, it does.
Conservative: if the SA says the property does not hold, it still might.
In statistical lingo, this means there will never be false positives (said true but it really was false), but there might be false negatives (said false but it really was true). (Also can say we allow type II errors but not type I.)
Why? Because the static analysis must consider a simplified version of the program, and because of this it loses precision and thus becomes an approximation. The whole trick is to have a safe approximation.

Abstraction always hides some detail and thus loses some information, but it can help make a problem tractable. That's why we use it!
Conservative can be viewed as pessimistic: if I don't know for sure that something bad cannot happen, then I assume it will!
p0 int x;
p1 x = input();
p2 if (x % 2) {
p3 x = x - 1;
}
p4 x = 3 * x;
p5 output(x);
Property: Output of this program is always an even integer
Solution: rather than run program over integers, run program over a binary type E which has two values, even and odd.
Must know that int -> E (or E -> Int?)
This is called abstract interpretation, or model checking
Now we have just two possible input values, and so "testing" the program is easy:
Yes! our program only outputs even numbers.
How is this different than testing on inputs 2 and 3?
Testing shows correct operation on two concrete runs. You must extrapolate from there.
Abstract interpretation declares that for all runs, the property holds.
But is there a catch?
Yes. abstract interpretation also had to make some assumptions. What are they?
Suppose the if condition was (x % 3)
....
Whoa, this seems unanalyzable under the "parity" datatype!
But really we just need to define (x % 3) -> "any", because here the even-ness of x does not decide the true/false-ness of the expression (think about what 13, 15, 28, and 30 do).
So now under our parity abstract interpretation, we must follow both paths. So we have:
No! our program sometimes outputs odd numbers!
The important concept here is that when we say no, we have a counter-example to back it up. The analysis can show us the path(s) that was(were) taken that produced the odd output.
If our formalisms are reversible, we can concretize the counter-example by mapping back to our real datatypes -- i.e., choosing an integer that produces the counter-example.
Most all static analysis tools can print out a counter-example when they report that your property doesn't hold.
Testing instantiates a concrete platform and some concrete inputs.
E.g., for an i586 (15.1.2) processor running Linux v2.4.25 (compiled with gcc v2.95.3 20010315 (release)), and my application program compiled with gcc v3.2.1, and with input 2, my program prints an even integer.
Static analysis generally assumes an infinitely correct execution platform for the program.
A mapping between dataype "int" and datatype "parity"
An understanding of the control flow of the program
An understanding of the data flow of the program
A formal definition of the modulus operator -- not only that, but a formal definition of the "mod by 2" operator
A formal definition of the subtract and multiply operators (esp. multiply by 2)
The operator defs above are needed for the "parity" datatype, not just the "int" datatype.
A CFG is a graph representation of the control flow of the program.
A node in the graph can represent either basic blocks, or possibly individual statements (or expressions).
A directed edge between two nodes means that control can proceed from the source node to the sink node.
int n,m;
1 n = input();
2 m = 1;
3 while (n > 1) {
4 m = m * n;
5 n = n - 1;
}
6 output(m);
|
|
Goal: Find out which values can be used where
Each node in the CFG may define some variables and may use some variables.
Define (we'll use def) means that a new value is stored in a variable.
For some variable X, it has def nodes (nodes in the CFG where it is assigned to) and use nodes (nodes in the CFG where it is read).
A def(X) is live at a use(X) iff there is a path on the CFG from the def(X) node to the use(X) node that does not contain any other def(X) nodes.
A def(X) is killed at another def(X) iff there is a path on the CFG from the first def(X) node to the second def(X) node (with no intervening def(X) nodes).
In program above, for use(n,5), def(n,1) is live, and so is def(n,5).
CFG can be augmented with dataflow dependence edges, where each edge represents a reaching definition to a use.
int n,m;
1 n = input();
2 m = 1;
3 while (n > 1) {
4 m = m * n;
5 n = n - 1;
}
6 output(m);
|
|
If we remove the control flow edges from the above graph, we then have a graph showing the data dependencies.
int n,m;
1 n = input();
2 m = 1;
3 while (n > 1) {
4 m = m * n;
5 n = n - 1;
}
6 output(m);
|
|
If we then add control dependence edges, we have a program dependence graph, or PDG.
|
int n,m;
1 n = input();
2 m = 1;
3 while (n > 1) {
4 m = m * n;
5 n = n - 1;
}
6 output(m);
|
|
Control dependence edges are an edge from a decision node to a node whose execution depends on the outcome of that decision node.
What is this good for?
Dead code elimination, Clone detection, automatic parallelization, ...
Property verification? Certain kinds...
A big security issue is information flow: Can information "leak" out of this system?
Slicing
int n,m,*p;
1 n = input();
2 m = 1;
A p = &m;
3 while (n > 1) {
4 m = m * n;
5 n = n - 1;
B *p = n+1;
}
6 output(m);
|
int n,m,*p;
1 n = input();
2 m = 1;
A p = &n;
3 while (n > 1) {
4 m = m * n;
5 n = n - 1;
B *p = n+1;
}
6 output(m);
|
int n,m,*p;
1 n = input();
2 m = 1;
A p = random();
3 while (n > 1) {
4 m = m * n;
5 n = n - 1;
B *p = n+1;
}
6 output(m);
|
int n,m,*p;
1 n = input();
2 m = 1;
A p = &m + 4;
3 while (n > 1) {
4 m = m * n;
5 n = n - 1;
B *p = n+1;
}
6 output(m);
|
What are the def-use chains for these programs?
What are the PDG's?
How do you know?
Analyzing a program to decide where pointers can point to is called a points-to analysis.
Pythagorean Theorem: A^2 + B^2 = C^2
Facts:
Can a computer prove the Pythagorean theorem?
Many static analyses make use of a theorem proving engine or some other solver behind the scenes. (we're not going to talk about how that is done)
The theorem is a formulation of the expected property. The facts are derived from your program (or a model of it).
Often, the theorem prover uses a negation of your theorem (e.g., see the ESC paper). ESC claims this gives a better error context.
Most "interesting" analyses are in NP.
Created by mathematicians, for mathematicians
User had to be fluent in some arcane formal language
Even smart, fluent users often got formulas wrong
Had to specify everything (adopt it all or none)
Make FV as easy as using your compiler
Allow incremental adoption of FV benefits
Do as much as possible with no user specifications
Handle partial systems (and libraries)
Make property specification languages "familiar"
Drop the requirement of soundness? (ESC paper)
"lint" was a popular tool that checked a bunch of ad-hoc "common" C programming errors. The nice thing was that it was just as automatic as a compiler.
Splint's goal was(is) the same, except to support more powerful, formally defined, and extensible, property checks.
It uses the Larch theorem prover behind the scenes. (The Larch Home Page)
NOT geared towards checking functional correctness. Assumes you are doing that already through testing.
Splint checks for "hard-to-test" conditions, like memory access violations, accidental aliasing, memory bounds violations, interface violations or mismatched assumptions, etc.
These are the things that you don't see if your program is passing its test cases, but that could easily be latent errors in your program.
It already "knows" alot about C and the C libraries (i.e. someone has written the specs down for these so you don't have to)
But you can provide annotations to your own code to let Splint check your program better.
Annotations are in the form of stylized comments, using /*@ expr @*/
Does not support C++.
As paper details, originally developed for Modula-3
That went over like a lead balloon...so
ESCJava!, and now ESCJava/2
Similar idea to Splint, but allows functional property verification
Annotations are in comments
Annotations essentially use base PL syntax for expressions. Also extend to allow universal and existential quantification.
Their annotations are based around the idea of design by contract
A procedure (method) requires some conditions to be true when it is called and if they are, it then ensures that some other conditions will be true when it is done. ESC also adds a modifies clause to indicate side effects.
Additionally, classes can have invariants that do not change.
ESC checks the code against these annotations, and also checks the annotations themselves -- i.e., module-module checking
Annotations are familiar to programmers -- kind of like assert()
statements. Ohh, but very different....
ESC's purpose is to find errors, not to prove they don't exist.
"...we don't try to prove that a program does what it is supposed
to do, only to check for certain specific types of errors..."
"...we are free to declare that some kinds of errors are out of
the tool's range."
On success, tool output is "Sorry, can't find any more errors"!
Error list on p24:
These are handled soundly, no? Maybe not always...
modularity introduces unsoundness -- different interpretations
(see paper for details)
Loops are hard! (always!)
Unsound reduction: ESC "ewp" technique only considers computations
that execute the loop 0 or 1 times!
k-limiting loop executions is a well-known static analysis
technique.
Only consider executions (and corresponding inputs) that
exercise the loop up to k times.
Is this unsound?
The even/odd example was an example of model checking
Model checking is generally considered to be the exhaustive
simulation of a model of a program to determine if it always
obeys some property.
The model is an abstraction of the program. It should
have a much smaller execution space (but still might be large).
The model is chosen with the property in mind.
E.g., it would be useless to choose the even/odd abstraction
if the property we wanted to confirm was whether the variable x
always contained values in the range {0..10}.
The model generally must be finite-state. Thus, models
must take finite input, or do finite things.
You do!
Well, hopefully not all...
For some common analyses (say, type checking), our tools already
know everything they need to. So you don't have to write anything
extra.
For more specific property checks, you at least need to write
down the formal specification of the property you want.
Hopefully, then, a tool can automatically extract a formal model
of your program that "matches" the property you are checking.
If it can't, then you need to write a model of the system, too!
Ok, I do all that, and my checker says there's an error in my
system. I can't find it! Now what?
A discovered error may not be a bug in your system!
It might be a bug in your property specification.
Or it might be a bug in your system model (esp. if hand-created)
Or it might be a bug in the model extractor (if auto-extracted)
Or it might be a bug in the model checker itself!
What, you mean they didn't verify the verifier!?!?!?
No. Generally speaking, the model checker (a software system itself)
is too big to verify.
Much of this may seem like overkill to you. After all,
most of your programs work.
Indeed, very high coverage can be achieved by
testing...
(let's ignore the low level problems in C/C++ that can
cause many headaches, security breaches, etc., etc.)
Programs with concurrency, are the killer app
for model checkers.
Why?
Because it is generally impossible to test the interleavings
(you don't control it), and the "equivalence classes" are hard
to see (and control).
Many very subtle concurrency problems: deadlock,
livelock, fairness, starvation
A very popular model checker. Find it at
http://spinroot.com/spin/whatispin.html.
Spin takes a model, written in Promela, and possibly
a property, and does a variety of things with it:
Spin has been used to verify (or find bugs in!) published algorithms,
safety-critical software, protocol standards (e.g., TCP/IP), and
many many other applications and research ideas.
An running SPIN Workshop (1st in 1995, 11th in 2004) continues
to be successful.
Many research papers have been written as: define problem, define
formal semantics, build translator to Promela, have Spin verify.
Or, take industrial system, extract model, translate to Promela,
have Spin verify, report bugs!
Important modes of spin:
Simple example:
The above is a simple concurrent process with a race condition.
There is no attempt to protect the shared variable.
If we "spin" it, it might work and it might not, depending on
the random number seed. To verify it, we "spin -a" it, then
"gcc pan.c", then "a.out", and we see that it can fail. We then
"spin -t -p" ("-p" prints out all steps) it to produce the
example that it failed on.
An attempted fix:
Here we've made a dumb attempt at a semaphore. Does it work?
Well, again, just running spin as a simulator will maybe show
it working, maybe not. We have to "spin -a" it, then compile
and run the verifier.
When we do this, we see that it still doesn't work. Finally,
We can truly get semaphores right by using atomic:
NOW when we verify it, we see that we have a correct solution.
Logic: reasoning over facts with and, or, not, if-then, forall,
foreach, ...
Temporal logic: reasoning about sequences of actions.
See
http://plato.stanford.edu/entries/logic-temporal/
LTL extends regular logic with generally three primitives:
until,
always and eventually. Think of the last two as
kind of analagous to forall and foreach in the non-temporal logic.
Always (a box, G, or two brackets []) and
eventually (a diamond, F, or two angle brackets <>)
are unary operators.
Until (U) is a binary operator: P Until Q means that P is true
until at some point Q is true.
Strong until: Q must become true. Weak until: Q need never become true
Q: Must P become false when Q becomes true? I believe the correct
answer is no. P U Q says nothing about what P does once Q becomes
true, it only says that P must be true until Q is true. I tested
this in Spin and that is how Spin interprets it. In Spin, P must be
true up to and including the state just before Q becomes true. P
can become false at the same time Q becomes true (tested using "atomic"),
P can become false later, or P can remain true.
A nice overview of operators and properties at
http://www.cs.dartmouth.edu/~doug/cs88/ltl.pdf
Some other operators sometimes are added as user-friendly extensions.
Very often a next (X) operator.
Since LTL extends regular logic, basic logical operators are
still available.
Traffic light behavior: atoms: nsgreen, nsred, ewgreen, ewred
In LTL, atoms are time-varying true/false values rather
than fixed.
NS and EW should never be green at the same time:
This is just simple mutual exclusion. Should also exclude own
lights:
DeMorgan's Laws state that []! and !<> are equivalent (and vice versa)
What if we wanted to ensure that one of them was always green?
If a light turns green, it should eventually turn red:
The above is a typical construct for a request-response action.
This is also getting close to fairness (or liveness?):
Add yellow: a light cannot turn directly from red to green
And yellow must intervene at least for one state:
How to say that yellow must be on for three states?
Spin supports checking LTL properties
Spin LTL grammar:
Spin optionally has a next operator if you compile for it
LTL properties are translated into Promela code, and then
executed along with your system
LTL property translation is into what Spin calls "never-claims",
and they are supposed to represent what should not happen.
Thus, the negation of your LTL formula is what you need to use
to create the "never-claim" -- if your formula represents
a property you want the system to have
Use spin -f "LTL-formula" > LTL-file for
producing the LTL Promela code
For our add-to-three system: <>(v==3) OR []<>(v==3)?
Spin shortcut for always-eventually reaching some state(s):
progress labels.
Java Pathfinder (
http://ase.arc.nasa.gov/visser/jpf/) translates Java to
Promela.
The Bandera project
(
http://bandera.projects.cis.ksu.edu/)
is a more Java-centric model checking framework.
A description of Ada-to-Promela is at
http://www.cis.ksu.edu/~dwyer/ada-modelcheck/
ACL,
A theorem prover used to verify systems, many of them hardware.
Uppaal, another model checker.
It supports real-time specifications.
nuSMV, and open source
migration of the long-standing SMV model checker.
A
Formal Methods collection page
ESC claims it is unsound...how?
Model Checking
Who writes all these formal things down???
Where's the error??
What are the good problems for model checking?
The SPIN Model Checker
byte v;
proctype inc1 ()
{
byte tv;
tv = v;
v = tv + 1;
}
proctype inc2 ()
{
byte tv;
tv = v;
v = tv + 2;
}
init {
assert (v==0);
atomic {
run inc1();
run inc2();
}
timeout -> assert (v==3);
}
byte v;
byte s;
proctype inc1 ()
{
byte tv;
(s > 0);
s--;
tv = v;
v = tv + 1;
s++;
}
proctype inc2 ()
{
byte tv;
(s > 0);
s--;
tv = v;
v = tv + 2;
s++;
}
init {
s = 1;
assert (v==0);
atomic {
run inc1();
run inc2();
}
timeout -> assert (v==3);
}
byte v;
byte s;
proctype inc1 ()
{
byte tv;
atomic {
(s > 0);
s--; }
tv = v;
v = tv + 1;
s++;
}
proctype inc2 ()
{
byte tv;
atomic {
(s > 0);
s--; }
tv = v;
v = tv + 2;
s++;
}
init {
s = 1;
assert (v==0);
atomic {
run inc1();
run inc2();
}
timeout -> assert (v==3);
}
LTL: Linear Time Temporal Logic
Always (Not (nsgreen And ewgreen))
[]!(nsgreen && ewgreen)
([]!(nsgreen && nsred)) && ([]!(ewgreen && ewred))
[](nsgreen || ewgreen)
Always (ewgreen -> Eventually ewred) [and same for NS]
[](ewgreen -> <> ewred)
[](<>ewgreen && <>nsgreen)
Always (ewred -> Not Next ewgreen) [and same for NS]
[] (ewred -> ! X ewgreen)
[] (ewred -> (ewred U ewyellow) U ewgreen)
[] (ewred -> (ewred U ewyellow) U ewgreen)
Spin and LTL
Spin LTL formulas
Links