Control Flow Visualization of Python


Description:

Procedure:

Visualize Python executions, especially control flow, at coarse fine granularities.

  • Study Python monitoring/debugging hooks API.
  • Extend instrumentation to atleast line-level granularity.
  • Write visualizations.
  • Run on a suite of Python programs.

Weekly Meetings

01/27 (Thursday)

Search for graphics tools that are availabe for Python and see which one is better.

02/03 (Thursday)

We decided to use VPython (Visual Python) as our graphics tool. We looked at systems like PyOpenGL, CGKit, Blender and few other. We thought that VPython is easy to use and high level graphics tool.

Also looked at some of the trace functions provided by Python. sys.settrace( func ) is used to set a trace function and the trace function receives argument 'line' when the interpreter is about to execue a new line of code. We thought that this will be helpful in tracing line numbers

see the python documentation, http://docs.python.org/lib/debugger-hooks.html for more information on sys.settrace(func).

02/10 (Thursday)

Write a python program that will print the line numbers as the program is executed. We thought that the C interface of Python will be more helpful than direct python functions to get the line numbers.

The function int (*Py_tracefunc)(PyObject *obj, PyFrameObject *frame, int PyTrace_LINE, PyObject *arg). PyTrace_LINE is the value passed to trace function when a line number event is generated.

PyEval_SetTrace(Py_tracefunc func,PyObject obj) is used to set the tracing function to func.

Refer http://python.fyxm.net/doc/2.3.4/api/profiling.html for more information.

But the question is how to dynamically load C functions into Python program and how the interpreter should understand C functions.

02/14 (Monday)

While trying to write the trace program, I could not get idea how to dynamically load C function into python program. The result was that, i have to wite a simple Python program and it has to dynamically load a simple C function.

02/17 (Thursday)

Discussed about dynamic loading of C function. I got some more material on that and must implement the program.

http://docs.python.org/ext/intro.html has information on extending python.

Looked at pdb(python debugger) code. We want to see whether it is going to be useful. Looks as if some modifications to the pdb code can help the project, but the question is about performance(may be slow).

02/25 (Thursday)

Dynamic loading of a simple C program into python worked well. Now to concentrate on writing tracing functions in C and dynamically loading into python modules.

Look out for benchmarks that can measure the performance of python programs in terms of time.

03/03 (Thursday)

Program for tracing the line numbers worked. Next step is to print method names, call stack depth and line numbers in a given call and represent them graphically.

Look out for benchmark programs that are used to measure the performance of pyhton programs.

03/10 (Thursday)

still looking for hooks to print method names and other parameters. Try to build graphics program with some rough data.

03/24 (Thursday)

Spring Break

03/31 (Thursday)

Obtained all the data including method names and stack depth. The data is also visualized (simple graph). Try it on some larger programs and check it. Also try to scale the graph depending on number of events and draw axis lines.

04/11 (Monday)

Run the tool on much larger programs. Examine how the program scales. Remove the old points before plotting new ones. Try to color the spheres to separate library calls from others. Is there any way to show which spheres corresponds to which line and method. Also look into file writing (avoid multiple open stmts).

Think on how to make the visualizations look helpful for the user. Try to figure out some animations for visualizations. Think of dynamic visualization techniques.

04/14 (Thursday)

The problem of opening and closing the log file which costed most of the time in tracing is solved. Find some benchmark programs and run them on our first tool. Also improvize the first visualization tool to make it better (mainly removing the old points before plotting the new ones in a particular call).

Gather information and also look into the literature for designing the next visualization tool. A call stack would be a good idea to proceed.

04/22 (Friday)

Visualization now shows method call, returns, starting and termination points and final event in particular call before new call is made. Information about each sphere is also displayed in the shell when a particular sphere is selected. But still the visualization does not seem informative and effective.

Testing the benchmarks provided some interesting results. Log files seem to be large. Though the size of log files isn't a big problem, the performance in terms of speed is slow as a result of trace. The benchmarks seem to run nearly 10 times slower when they are traced. Also it seems my visualization does not terminate or plot all the points when the log file is large.

Thus the first visualization raises different issues that are to be looked into before the final version and acts as a good practice exercise. Now its time to start on the final version and look for some ideas.

As proposed in the last meeting, a call stack would be a good idea. We can use sockets or pipes or threads or shared memory for the visualization. We could also maintain bins of ages for points and delete them after a certain time.

Method calls with in a class and cross class calls would be interesting to visualize. We can represent classes and methods as clusters and draw the control flow between them. It may be interseting to look into the executions of loops also.

04/26 (Tuesday)

Visualizing the calls between classes seems to be a good idea and will be the next version. y-axis represents the stack depth and classes will be represnted by spheres. Calls between the classes will be represented using arrows between these spheres. Fade arrows represent the historical calls. It will be a good idea to use the arrow width to represent the number of call between the classes. Methods of a class can be represented by segments in each sphere.

Benchmarks

Guido van Rossum wrote benchmark programs, Parrotbench, to compare the performances of Python and Parrot. Parrot is virtual machine which executes byte code for interpreted languages and is aimed at Perl 6 compiler. More information about parrotbench is available at http://mail.python.org/pipermail/python-dev/2003-December/041527.html. The b.py program imports all the other programs on the parrotbench and calls them. This test was executed on b.py except the program b5.py is not included.

Pystone is a becnhmark program which is included in standard python distributions. It can be found in Python2.4 as Lib/test/pystone.py.

Another benchmark suite for python is Pybench. It is designed by marc-Andre Lemburg for measuring the perfomance of python implementation. More information on Pybench can be seen at http://mail.python.org/pipermail/python-announce-list/2001-November/001081.html

The following table depicts the reults of my benchmark tests.

 
With out Trace
With Trace
Trace includes imported modules
PyStone
0m2.742s
0m11.127s
0m11.505s
Parrotbench
0m18.259s
0m22.101s
0m41.329s
Pybench (rounds=1)
0m6.756s
0m19.099s
0m48.735s

Screenshots

A python program, Polynomal.py, written by Rick Mulller is used for our visualization. To see the code, click Polynomial.py. The trace module, lineTracer is imported into it and is executed to obtain a log file. This log file is provided as input to the visualization program. The output is a graph with 3D sphere objects repreenting the events.

These are some of the screen shots taken:

Image 1

Image 2

Image 3

Image 4

05/13 (Friday)

Discussed lot of issues involved in the next visualization. The method can be summarized as writing all the events into a log file and then maintain a data structure to hold all the classes, their methods and number of times each method is called. The visualization uses this information to represent each class as an arc and each segment of an arc represents a method. Each arc will be divided into segments according to their weights assigned based on number of times a method is called. The thickness of an arrow represents number of calls made between the classes (especially when there is a for loop with a sequence of function calls). Other issue is how to keep track of control flow i.e the order of events? Draw continuous arrows to represent order of events.

05/19 (Thursday)

When a call event is reported to the trace fucntion, I don't know how to identify whether it is function call or a method call. Disussed this issue and sent email to Rich Saunders and Mike Wilder for suggestions. Try posting in python forums.

05/27 (Friday)

Thanks for the reply from Matther Dixon Cowels of Python-help forums. It helps in identifying whether a call occured is a function call or method call. The solution worked well if the trace function is coded in python but didn't do well when it is written in C. It could differentiate between a function call and a method call but the self object and the class name in which the method is defined couldn't be found.

The reason may be the f_locals of frame is not updated by the time event is reported to the trace function in C but updated for python hook. Try running the program on Python2.4 (may be a bug in python2.3). Also try looking into the interpreter and see if we can add instrumentation to get self object and class name. Also we have another choice of coding the trace function in python.

06/07 (Tuesday)

Tried on Python 2.4 but doesn't work. Looked at the python interpreter (ceval.c and implementation of frameobject). The frame is not receiving the f_locals by the time event is reported to trace function in C. Still have to figure out why the trace fucntion in C doesnot recieve the locals. Try by comaparing the implementations and codes of PyEval_SetTrace() and sys.settrace(). Also try comparing the implementation of call hooks of trace functions written in Python and C.

Also working on the visualization. Coded the trace function in python and obtained the log file. This log file will used in visualization. There is no arc object in vpython so better to use line segments for each class or function. The weighting scheme should also be determined (length of arc varies for all functions and classes).

06/27 (Monday)

Tried adding some code in the Python/ceavl.c (main loop of the interpreter and this is where the trace functions are called by the interpreter). But didnot work. Discussed the differences in the implemetations of python and C trace functions. The interpreter is updating the frame by calling PyFrame_FastToLocals(f) before sending it to the python trace function. I tried including this call in my C trace function. This worked partly. Without this call, f->f_locals was NULL object but after including this call, f->f_locals is a dictionary {"self":''}. The PyFrame_FastToLocals(f) is building f->f_locals dictionary from map and f_fastlocals. The f_fastlocals doesnot have the value for self. So the fastlocals itself is not being updated when there is a C trace function. Also looked at the implementation of frame object in python.

The visualization is a regular polygon with number of sides equal to total number of functions and classes in the program. Further each line representing a class is subdivided into methods. The methods in a class can be differentiated with blue and green colored lines.

The function calls and method calls are represented using arrows from the caller to the called functions. Also all the functions, methods and classes are labelled. Some improvements to be made to the visualization were discussed.

All the functions and methods should be given weights according to the number of calls they make and receive and then draw the lines representing them according to their weights. Also the labels are overlapping, so make them clear. Also width of the arrows are not contant. Needs modification.

06/30 (Thursday)

Weights have been assigned to classes, methods, functions and the Visualization is done according to the weights assigned. Also the width of the arrows is constant now. The visualization seems good but still doesnot show the control flow effectively. Discussed about some modifications to be done.

Use z-axis to show the control flow efectively. May be using z-axis in drawing arrows will be helpful. Also highlight the latest calls and try to remove the old call from the visualization. Try to make visualization effective by trying different possibilities.

07/04 (Monday)

Modified the previous visualization by adding z-axis. z-coord is added for the arrows representing the calls. Also a menu is added for seeing the visualization call by call or all at a time or deleting the previous arrows drawn.The visualization looks decent and the control flow can also be seen.

Give an option to set frequency. Also modify the option of typing 'n' for every call. Instead of clicking continuously, the option should be "continuously print call be call" and also ask for the frequency. The frequency option should be using a widget like a scroller to change the frequency. The default option should be visualizing all the events in a single step.

The visualization must delete the 1st arrow when drawing the kth arrow so that the visualization will be in a sliding window format.

Also test the visualization on different programs (benchmarks and other oops programs). Take screenshots in different views showing differnt kinds of information. This will be useful in writing the report and also in preparing the presentation slides. Also find why the visualization is not able to depict all the calls. So use a monitoring system to find the reason for slow down and also where the CPU processing power is wasted or completely taken.

Also make the previous 2-D visualization more effective.

09/15 (Thursday)

Back from the trip of Colorado Springs. Added a control window that can be used to control the frequency of plotting arrows. Also made changes such that only 50 arrows are shown in the visualization at an instance.

2D visualization is also designed which shows call traffic between the functions, classes and methods. The z-axis is ignored and all the arrows representing call are plotted in XY-axis. A call between two functions is represented by plotting an arrow. If another call happens between these 2 functions, the older arrow is deleted and a new arrow with increased width is plotted.

09/22 (Thursday)

Installed VPython on CS machine. Tested the visualizaions in CS.

10/06 (Thursday)

Tested the visualization on differentt programs. The visualizations look very interesting and informative. This ends the coding phase for the project. Next is to write report and prepare for defense.

The defense is scheduled for 27th October

10/20 (Thursday)

Finished writing report and preparing slides.

10/27 (Thursday)

Passed the Defense Examination.