Dynamic Analysis: Program Monitoring and Visualization

lecture #1 began here

Program execution monitoring is the collecting and analysis of program executions. It is the yang to go with the yin of dynamic analysis. They are often used synonymously. If there is a difference, it is that dynamic analysis is concerned with what execution information means, i.e. what should we look for, and program monitoring is concerned with how to collect information about a running program. A reasonable analogy might be drawn with declarative and imperative programming.

Categorizing Execution Monitoring and Visualization Systems

All monitoring systems generally have to address fundamental problems: volume, dimensionality, intrusion, and access. Volume refers to the high amount of information involved in a program execution. A monitor abstracts this and presents a tiny subset of total execution information, but if the subset is not chosen well the result may be of little or no value. Dimensionality refers to the many different kinds of information involved in a program execution: source code location, stack depth, heap activity, I/O, and so forth. Intrusion refers to the Heisenbergian problem that the act of monitoring changes the behavior being monitored. Access refers to the extent and method by which the monitor extracts state information (variables' values, pointer dereferences, and so on).

  1. Information sources and access methods. Does the monitor rely on a programmer manually instrumenting the program? Is instrumentation inserted automatically by a preprocessor or compiler, or at link time, or at load time? At what semantic level and granularity is the information? This may range from machine-level access to very limited information, on up to source-level access to the full program state.
  2. Execution models. There are one-process, two-process, and thread models for the relationship between monitor and program under study. The one-process model is highest performance and usually the most intrusive model. The two-process model is potentially the least intrusive, but also potentially the slowest. The thread model offers a compromise between these two.
  3. User interaction. Is the monitor I/O textual or graphical? Does it report information continuously, in periodic batches, or once at the end? Does it allow the user to start/stop execution, to replay or slow things down, to access program state beyond what is reported, or to modify state to see what would happen if things were different?

Another way to classify monitoring tools is by whether they operate entirely at runtime, or whether they involve post-mortem computation.
Runtime Post-mortem
immediate
user-directed
potentially interactive
analysis
condensation
summary
passive

How would you classify conventional debuggers such as GDB? What about conventional profilers such as gprof?

Alamo

Motivation Feature
Monitor and TP need to be separate programs. We want to mix and match monitors dynamic loading
monitors should be as easy to write as ordinary applications synchronous execution, two-process or thread-like model
monitors should be fast enough to use on real programs coroutine model; monitor must run in same address space
information from the execution should be plentiful and free runtime system instrumentation
execution must be controlled and filtered event masks

ULMA

ULMA (UltraLight Monitor Architecture) is a successor to Alamo, first conceived last summer, so far with only the sketchiest and most preliminary of implementation efforts.
Motivation Feature
Want to monitor other languages' programs Language neutral events
Want to handle different monitors' performance requirements Hybrid execution model. Hybrid communication model.
lecture #2 began here

Program Visualization

PV is the use of graphics to depict aspects of programs, especially behavior. PV tools are used for debugging, understanding, maintaining and improving an existing system, and for educational purposes. PV tools may be geared towards a single algorithm or aspect of a program, a whole program, or a large software system. They may visualize static information about the program, or dynamic execution-based information. The best tools combine static and dynamic information.

The big problem which PV addresses, compared with textual techniques, is the Volume Problem. You can capture most any aspect of program behavior textually, but the resulting log files easily grow to megabytes and beyond, for all but the smallest toy programs.

In addition to using graphics to deal with volume, PV tools must also solve two other hard problems: intrusion (observing some behavior modifies it), and access to program behavior.

Kinds of Program Visualization

There are several different kinds of program visualization tools, including

Biggest needs

What does program visualization build on?

Maps (5000+ years)
Statistical graphs (350 years)
The early work in this area relies on analogies to the physical world.
Data graphics (200 years)
More abstract, relational views of data (scatterplot, bar chart, pie chart, etc.)
Visualization (20 years)
Scientific visualization, modern computer graphics. Low resolution and small screenspace. Animation, color, and sound.

Principles of Graphic Excellence

Many of these suggestions are basic commonsense human computer interaction: minimize the users' surprises. The major players in graphic design for information presentation include: Edward Tufte (Visual Display of Quantitative Information) and Jacques Bertin (Semiology of Graphics). Tufte is more well-known in the US, while the Frenchman Bertin is more highly regarded by some professionals in the field.