Dynamic Analysis: Program Monitoring and Visualization
lecture #1 began here
- Class Notes: http://www.cs.nmsu.edu/~jeffery/courses/581/monvis.html
Program execution monitoring is the collecting and analysis of program
executions. It is the yang to go with the yin of dynamic analysis. They
are often used synonymously. If there is a difference, it is that dynamic
analysis is concerned with what execution information means, i.e. what
should we look for, and program monitoring is concerned with how to
collect information about a running program. A reasonable analogy might
be drawn with declarative and imperative programming.
Categorizing Execution Monitoring and Visualization Systems
All monitoring systems generally have to address fundamental problems:
volume, dimensionality, intrusion, and access. Volume refers to the high
amount of information involved in a program execution. A monitor abstracts
this and presents a tiny subset of total execution information, but if the
subset is not chosen well the result may be of little or no value.
Dimensionality refers to the many different kinds of information involved
in a program execution: source code location, stack depth, heap activity,
I/O, and so forth. Intrusion refers to the Heisenbergian problem that the
act of monitoring changes the behavior being monitored. Access refers to
the extent and method by which the monitor extracts state information
(variables' values, pointer dereferences, and so on).
- Information sources and access methods. Does the monitor rely
on a programmer manually instrumenting the program? Is instrumentation
inserted automatically by a preprocessor or compiler, or at link time,
or at load time? At what semantic level and granularity is the
information? This may range from machine-level access to very limited
information, on up to source-level access to the full program state.
- Execution models. There are one-process, two-process, and
thread models for the relationship between monitor and program
under study. The one-process model is highest performance and
usually the most intrusive model. The two-process model is potentially
the least intrusive, but also potentially the slowest. The thread
model offers a compromise between these two.
- User interaction. Is the monitor I/O textual or graphical?
Does it report information continuously, in periodic batches,
or once at the end? Does it allow the user to start/stop
execution, to replay or slow things down, to access program
state beyond what is reported, or to modify state to see
what would happen if things were different?
Another way to classify monitoring tools is by whether they operate
entirely at runtime, or whether they involve post-mortem computation.
| Runtime | Post-mortem
|
immediate
user-directed
potentially interactive
|
analysis
condensation
summary
passive
|
How would you classify conventional debuggers such as GDB? What about
conventional profilers such as gprof?
Alamo
| Motivation | Feature
|
| Monitor and TP need to be separate programs.
We want to mix and match monitors
| dynamic loading
|
| monitors should be as easy to write as ordinary applications
| synchronous execution, two-process or thread-like model
|
| monitors should be fast enough to use on real programs
| coroutine model; monitor must run in same address space
|
| information from the execution should be plentiful and free
| runtime system instrumentation
|
| execution must be controlled and filtered
| event masks
|
ULMA
ULMA
(UltraLight Monitor Architecture) is a successor to Alamo, first
conceived last summer, so far with only the sketchiest and most preliminary
of implementation efforts.
| Motivation | Feature
|
| Want to monitor other languages' programs | Language neutral events
|
| Want to handle different monitors' performance requirements
| Hybrid execution model. Hybrid communication model.
|
lecture #2 began here
Program Visualization
PV is the use of graphics to depict aspects of programs, especially
behavior. PV tools are used for debugging, understanding, maintaining and
improving an existing system, and for educational purposes. PV tools may be
geared towards a single algorithm or aspect of a program, a whole program,
or a large software system. They may visualize static information about the
program, or dynamic execution-based information. The best tools combine
static and dynamic information.
The big problem which PV addresses, compared with textual techniques,
is the Volume Problem. You can capture most any aspect of program
behavior textually, but the resulting log files easily grow to megabytes
and beyond, for all but the smallest toy programs.
In addition to using graphics to deal with volume, PV tools must also solve
two other hard problems: intrusion (observing some behavior modifies it),
and access to program behavior.
Kinds of Program Visualization
There are several different kinds of program visualization tools, including
- Algorithm animation
- Data visualization
- Data structure visualization
- Heap and stack visualization; variable usage patterns
- Database visualization, file system visualization, etc.
- Machine/hardware visualization
Biggest needs
- Legible - if user can't interpret it its useless. Graphic design helps.
Using familiar metaphors helps. Tieing results back to program source code helps.
- Scalable - handling volumes of data requires 1 or more strategies such as
navigation through large data spaces, fisheye views, use of logarithmic scales...
- Automated - PV tools that know and look for common problems, and can be used without
substantial manual investment on each application
What does program visualization build on?
- Maps (5000+ years)
- Statistical graphs (350 years)
- The early work in this area relies on analogies to the physical world.
- Data graphics (200 years)
- More abstract, relational views of data (scatterplot, bar chart, pie chart, etc.)
- Visualization (20 years)
- Scientific visualization, modern computer graphics. Low resolution and small
screenspace. Animation, color, and sound.
Principles of Graphic Excellence
Many of these suggestions are basic commonsense human computer interaction:
minimize the users' surprises. The major players in graphic design for
information presentation include: Edward Tufte (Visual Display of
Quantitative Information) and Jacques Bertin (Semiology of Graphics).
Tufte is more well-known in the US, while the Frenchman Bertin is more
highly regarded by some professionals in the field.
- Graphic designs should tell the truth about a complex situation using
multiple variables. Give viewer the greatest # of ideas, in the shortest
time, using the least "ink", in the smallest space possible.
- The default aspect ratio should approximate the golden rectangle
about (16 wide to 10 high). Why "landscape" instead of "portrait" mode?
- Use solid fills, not patterns. Humans can distinguish comfortably only
a Very Few (say, 3-9) distinct fill styles or colors; we are Much better at
distinguishing gray scales. We are pretty good at distinguishing sizes,
slopes, and distances.
- Avoid newlines in labels
label like this
not
like
this
|
- Use x-axis for cause, y-axis for effect