Published and Unpublished Papers by Jonathan Cook

This page contains titles and abstracts of selected papers. Where possible, links to on-line versions are provided.

The entries are listed in reverse order of authoring or publication date. Unless otherwise noted, all on-line versions are gzipped postscript files.

This page was last updated August 2002.


List by Titles (click to view abstract and download)


List by Title and Abstract


Discovering Models of Behavior for Concurrent Systems

Jonathan E. Cook, Zhidian Du, Chongbing Liu, and Alexander L. Wolf

NMSU Technical Report NMSU-CSTR-2002-010.

Understanding the behavior of a system is crucial in being able to modify, maintain, and improve the system. A particularly difficult aspect of some system behaviors is concurrency. While there are many techniques to specify intended concurrent behavior, there are few techniques to capture and model actual concurrent behavior. This paper presents techniques to discover patterns of concurrent behavior from traces of system events. The techniques are based on a probabilistic analysis of the event traces. Using metrics for the number, frequency, and regularity of event occurrences, a determination is made of the likely concurrent behavior being manifested by the system. After a descriptive model of the behavior structure is discovered, further techniques are used to infer areas of mutual exclusion and synchronization. These techniques are useful in a wide variety of software engineering tasks, including architecture discovery, reverse engineering, user interaction modeling, and software process improvement.

Retrieve Postscript.


Supporting Quick and Dirty CORBA Introspection and Manipulation

Jonathan E. Cook and Abdulmalik Al-Gahmi

NMSU Technical Report NMSU-CSTR-2002-009.

Large scale system development and maintenance projects often need to build scaffolding---tools that help build the target system---that is customized to the project. For some classes of tools, the cost barrier is too high to consider implementing customized support that might be beneficial to the project, and thus the project makes do with whatever off-the-shelf support is available. Run-time monitoring and manipulation tools are one such category.

This paper presents a framework design and protoype implementation of generic support for high-level, flexible, and programmable introspection and manipulation of CORBA-based applications. This type of support can be an effective aid in maintaining existing CORBA components as they evolve throughout the system lifecycle. Introspection allows system engineers to observe the dynamic behavior to better understand how to integrate components together, and manipulation allows them to ``glue'' components together that have slightly different expectations of their interaction.

Our framework implementation is being accomplished by tying a new CORBA dynamic access feature known as Portable Interceptors with an existing very high level programming language, Tcl/Tk.

Retrieve gzip'd Postscript.


Reliable Upgrading of Unix Shared Libraries through Multi-Version Execution

Jonathan E. Cook and Navin Vedagiri

NMSU Technical Report NMSU-CSTR-2002-008.

After a system is deployed, fixes, enhancements, and modifications all occur that change the components that make up the system. Unfortunately, new versions of components can introduce new errors and break existing, depended-upon behavior. When this happens, the old component version could have provided the correct behavior, but it is no longer part of the system.

We have been experimenting with overlapping the deployment of multiple versions such that they all execute the same requests for a time, until the decision is made that the new version of the component is reliable enough to justify removing earlier versions and thus completing the upgrade. In this paper we describe ideas for doing this on Unix (C/C++) shared libraries.

Retrieve gzip'd Postscript.


Discovering Thread Interactions in a Concurrent System

Jonathan E. Cook and Zhidian Du

to appear in the Proceedings of the 2002 Working Conference on Reverse Engineering, Richmond, Virginia, October 2002.

Also as NMSU Technical Report NMSU-CSTR-2002-010.

Understanding the behavior of a system is a central reverse engineering task, and is crucial for being able to modify, maintain, and improve the system. An often difficult aspect of some system behaviors is concurrency, in particular identifying those areas that exhibit mutual exclusion and those that exhibit synchronization. In this paper we present a technique that builds on our previous work in behavior discovery to find the points in the system that demonstrate mutually exclusive and synchronized behavior. Finding these points in the behavior of the system is an important aid in reverse engineering a complete and correct model of the system.

Retrieve gzip'd Postscript.


Measuring Behavioral Correspondence to a Timed Concurrent Model

Jonathan E. Cook, Cha He, and Changjun Ma

Proceedings of the 2001 International Conference on Software Mainenance, Florence, Italy, November 2001.

Also as NMSU Technical Report NMSU-CSTR-2000-02.

Research in formal methods has produced fruitful techniques that can verify global properties of a design of a real-time system, or exact behavioral correspondence to the design. However, exactness is often not achieved, yet understanding how close the design and system correspond would be very valuable, to direct further efforts in achieving exactness, or to modify the design where the system simply cannot achieve the requirements. This paper describes a method and tool that quantitatively measures how closely the behavior of a real-time system corresponds to its specification, given in a timed, concurrent model.

Retrieve gzip'd Postscript.


Supporting Rapid Prototyping through Frequent and Reliable Deployment of Evolving Components

Jonathan E. Cook

in The 12th IEEE Workshop on Rapid System Prototyping, Monterey Bay, California, June 2001.

In rapid system prototyping, there is a need to quickly deploy new versions of software components into test systems, in order to get feedback on how those new versions operate within the system. Compounding this, multiple configurations of various versions may need to be tested together. Incompatibilities between component versions can cause a serious loss of time and effort, as the errors are tracked down and the testbed is reconfigured with known working versions.

We are developing the Hercules framework to safely and reliably deploy and evolve component-based systems by executing and controlling multiple versions of software components at run-time. Hercules naturally fills a need in rapid system prototyping, and can enhance and streamline the overall process of developing a component-based system.

Retrieve Postscript.


Open Source Development: An Arthurian Legend

Jonathan E. Cook

in the Proceedings of the Workshop on Open Source Software Development @ ICSE 2001, Toronto, Canada, May 2001, pp 16--19.

OSSD (Open Source Software Development) achieves remarkable success in delivering complex software systems -- systems which are incredibly reliable and robust -- in a short amount of time and without even paying anyone! Naturally, in the face of this success, organizations are interested in seeing if the mechanisms behind OSSD success can be migrated into their own practices, hopefully improving their systems and their productivity.

In this paper, we look (lighthearted at first) at the motivations behind those involved in OSSD and describe the problems that need to be overcome if OSSD-type practices can be migrated into traditional organizations.

Retrieve Postscript, PDF, or PPT Slides.


Software Engineering Concerns for Mobile Agent Systems

Jonathan E. Cook

in the Proceedings of the Workshop on Software Engineering and Mobility @ ICSE 2001, Toronto, Canada, May 2001, paper 7.

It seems certain that building software systems composed of mobile agents introduces interesting new concerns for software engineering research, but what exactly ought those concerns be? One approach to determining them is to look at the {\em assumptions\/} behind the interest in mobile agent systems, and then deduce some requirements for how these systems will have to be built. These requirements elucidate novel areas of research that will need to be undertaken in order to make the goal of widespread mobile agents a reality.

Retrieve Postscript, or PDF.


Internet-based Software Engineering Enables and Requires Event-Based Management Tools

Jonathan E. Cook

In the 3rd Workshop on Software Engineering over the Internet (at ICSE 2000). Also NMSU Technical Report NMSU-CSTR-2000-03.

Distributed software engineering (DSE) efforts offer difficult challenges to those who need to monitor and manage the overall process. Without the capability to know what is happening in the process, the risk of failing to produce a quality product on schedule increases greatly. With Internet-based DSE, the opportunity exists to capture data from the process at relatively low cost, since so much of the process is already being supported by technology which can be instrumented. Our position is that capturing event traces from these processes can be done fairly easily, and will enable the use of analysis and management tools that provide feedback to the enactors and managers, thus improving the overall process and enabling the efficient production of high-quality software products.

Retrieve gzip'd Postscript.


Highly Reliable Upgrading of Components

Jonathan E. Cook and Jeffery A. Dage

Technical Report NMSU-CSTR-9811

A version appeared in the 1999 International Conference on Software Engineering (ICSE'99), May 1999, Los Angeles, pp 203--212.

After a system is deployed, fixes, enhancements, and modifications all occur that change the components that make up the system. Unfortunately, new versions of components can introduce new errors and break existing, depended-upon behavior. When this happens, the old component version could have provided  the correct behavior, but it is no longer part of the system. We propose a framework for upgrading system components that, instead of removing the old version of the component, keeps multiple versions of a component running. Doing so allows behavior to be utilized from all versions, and maintains system integrity and correctness even in the presence of newly introduced errors. This framework ensures that the move towards dynamic, configurable software systems does not lessen, but rather provides capabilities to enhance, the reliability that software will achieve through the next century.

Retrieve gzip'd Postscript.


Supporting Reliable Evolution of Distributed Objects

Jonathan E. Cook and Jeffery A. Dage

In the Proceedings of the Workshop on Engineering Distributed Objects (EDO'99)}, Los Angeles, California, USA, May 1999 (position paper), pp. 34--39.

Distributed object systems offer a foundation for systems to be highly malleable and configurable, even after deployment. While this malleability offers many benefits and opportunities for creating novel systems, it also becomes a potential source of problems. This is because, unfortunately, new versions of objects can introduce new errors and break existing, depended-upon behavior.

We believe that for this move towards distributed, component-based systems to not have a negative impact on system reliability, the middleware frameworks must allow and support the composition, manipulation, and execution of multiple versions of components. Doing so will ensure that the move towards distributed, component-based software systems does not lessen, but rather provides opportunities to enhance, the reliability that software will achieve through the next century.

Retrieve gzip'd Postscript.


Software Process Validation: Quantitatively Measuring the Correspondence of a Process to a Model Using Event-Based Data

Jonathan E. Cook and Alexander L. Wolf

Technical report CU-CS-820-96

Also in ACM Transactions on Software Engineering and Methodology, vol. 8(2), Apr 1999, pp. 147--176.

To a great extent, the usefulness of a formal model of a software process lies in its ability to accurately predict the behavior of the executing process. Similarly, the usefulness of an executing process lies largely in its ability to fulfill the requirements embodied in a formal model of the process. When process models and process executions diverge, something significant is happening.

We have developed techniques for uncovering and measuring the discrepancies between models and executions, which we call process validation. Process validation takes a process execution and a process model, and measures the level of correspondence between the two. Our metrics our tailorable and give the engineers control over determining the severity of different types of discrepancies. The metrics are also hierarchical, providing detailed information once a high-level measurement indicates the presence of a problem.

We have applied our process validation methods in a real-world, industrial study, of which a small portion is highlighted in this paper. The success of our techniques lead us to view this work as a first step toward a suite of useful methods for process validation.

Retrieve gzip'ed Postscript.


Event-Based Detection of Concurrency

Jonathan E. Cook and Alexander L. Wolf

Technical report NMSU-CSTR-9808, CU-CS-860-98

A version appeared in the 6th SIGSOFT Foundations of Software Engineering Conference (FSE-6), Nov 1998, Orlando, FL, pp. 35--45.

Understanding the behavior of a system is crucial in being able to modify, maintain, and improve the system. A particularly difficult aspect of some system behaviors is concurrency. While there are many techniques to specify intended concurrent behavior, there are few, if any, techniques to capture and model actual concurrent behavior. This paper presents a technique to discover patterns of concurrent behavior from traces of system events. The technique is based on a probabilistic analysis of the event traces. Using metrics for the number, frequency, and regularity of event occurrences, a determination is made of the likely concurrent behavior being manifested by the system. The technique is useful in a wide variety of software engineering tasks, including architecture discovery, reengineering, user interaction modeling, and software process improvement.

Retrieve gzip'ed Postscript.


Balboa: A Framework for Event-Based Process Data Analysis

Jonathan E. Cook and Alexander L. Wolf

Technical Report NMSU-CSTR-9809, CU-CS-851-98

A version appeared in the Proceedings of the 5th International Conference on the Software Process, Chicago, IL, June 1998, pp. 99--110.

Software process research has suffered from a lack of focussed data analysis techniques and tools. Part of the problem is the ad hoc and heterogeneous nature of the data, as well as the methods of collecting those data. While collection methods must be specific to their data source, analysis tools should be shielded from specific data formats and idiosyncrasies. We have built Balboa as a bridge between the data collection and the analysis tools, facilitating the gathering and management of event data, and simplifying the construction of tools to analyze the data. Balboa is a framework that provides to collection methods a flexible data registration mechanism, and provides to tools a consistent set of data manipulation, management, and access services. It unites the variety of collection mechanisms with the variety of tools, thus paving the way for more extensive application of process improvement techniques based on data analysis.

Retrieve gzip'ed Postscript.


Cost-Effective Analysis of In-Place Software Processes

Jonathan E. Cook and Lawrence G. Votta and Alexander L. Wolf

Technical report CU-CS-??

A version appeared in IEEE Transactions on Software Engineering, vol. 24(8), Aug 1998, pp. 650--663.

Process affects product---after all, something is done or not done each time a defect is inserted into a product. Improving a process implies understanding what process factors or features may cause defects or cost inefficiencies, so that they can be modified or eliminated. Showing that a specific process feature causes some problem requires extensive, expensive, and intrusive data collection and experimental control placed on the process. Real-world (industrial) settings will rarely allow for this.

For many processes, however, extensive historical process and product data already exists. Can this existing data be used to empirically explore what process factors might be affecting the outcome of the process? If it can, organizations would have a cost-effective method for quantitatively, if not causally, understanding their process and how it relates to the product being produced.

We present a study that provides a concrete example of such a method. This study makes use of several readily available repositories of process data in an industrial organization. Our results show that some elements of the data can be used to correlate both simple aggregate metrics and complex process metrics with defects in the product. Through this study we demonstrate that process features can be measurably related to product outcomes, and that taking advantage of historical data is an effective method for analyzing processes.

Retrieve gzip'ed Postscript.


Discovering Models of Software Processes from Event-Based Data

Jonathan E. Cook and Alexander L. Wolf

Technical report CU-CS-819-96

A version appeared in ACM Transactions on Software Engineering and Methodology, vol. 7(3), Jul 1998, pp. 215--249.

Many software process methods and tools presuppose the existence of a formal model of a process. Unfortunately, developing a formal model for an on-going, complex process can be difficult, costly, and error prone. This presents a practical barrier to the adoption of process technologies. The barrier would be lowered by automating the creation of formal models.

We have developed techniques that can use basic event data captured from an on-going process to generate a formal model of process behavior. We term this kind of data analysis process discovery. This paper describes three methods for process discovery that we have developed, implemented, and applied in an industrial case study. These methods span the range from purely algorithmic, to algorithmic and statistical, to purely statistical (neural net).

We show that not only is process discovery possible, it is practical and effective in real-world situations.

Retrieve gzip'ed Postscript.


Assertions for the Tcl Language

Jonathan E. Cook

5th Tcl Workshop, July 1997, Boston, Massachusetts, USA

Assertions, even as simple as the C assert macro, offer important self-checking properties to programs, and improve the robustness of software when they are used. This paper describes assertcl, an assertion package for the Tcl programming language. Our assertions take the form of commands in the program text, and cover point assertions about the computation state, assertions about procedure input values and the return value, and assertions about the values that variables may take on over their whole lifetime. In addition, universal and existential quantifiers are provided for both lists and arrays, not only for individual elements but for sequences of elements as well.

Retrieve gzip'ed Postscript. Also online here.


Process Discovery and Validation through Event-Data Analysis

Jonathan E. Cook

Ph.D. Thesis, September 1996, University of Colorado

technical report CU-CS-817-96

Software process is how an organization goes about developing or maintaining a software system. It is the methodology employed when people use machines, tools, and artifacts to create a product. Recent work has applied formal modeling to software process, with the hope of reaping the benefits of unambiguous and analyzable formalisms. Yet industry has been slow to adopt formal model technologies. Two reasons are that it is costly to develop a formal model and, once developed, there are no methods to ensure that the model indeed reflects reality.

This thesis develops techniques for process event data analysis that help solve these two problems, which are termed process discovery and process validation.

For process discovery, event data captured from an on-going process is used to generate a formal model of process behavior. To do this, results from the field of grammar inference are applied, and a new method is also developed. The methods are shown to be efficient and practical to use in an interactive tool that is developed in the course of this work.

For process validation, event data is used to measure the correspondence between existing process models and the actual process, yet allowing discrepancies to exist. A paradigm based on string distance metrics is developed, and several validation metrics in this paradigm are described. How these metrics can be calculated is then shown, and a tool set for doing process validation is provided.

In implementing these methods, a framework is developed, called Balboa, for managing process data and facilitating the construction of analysis tools. This framework serves to unite the variety of collection mechanisms and tools by providing consistent data manipulation, management, and access services, and assistance in tool construction.

Finally, the techniques developed in this thesis are applied in an industrial study. This study provides concrete results showing that one can relate the quality of a process as prescribed by a model to the quality of the product. In doing so, it also shows that the discovery and validation techniques are able to capture important aspects about software process, and can be applied in the real world.

Retrieve gzip'ed Postscript.


Balboa, Discovery, and Validation User Manuals

Jonathan E. Cook

These manuals cover the basics in how to use the Balboa framework, the process discovery tools, and the process validation tools.

Retrieve tar'ed gzip'ed Postscript.


Automating Process Discovery through Event-Data Analysis

Jonathan E. Cook and Alexander L. Wolf

ICSE 17, April 1995, Seattle, Washington, USA

Many software process methods and tools presuppose the existence of a formal model of a process. Unfortunately, developing a formal model for an on-going, complex process can be difficult, costly, and error prone. This presents a practical barrier to the adoption of process technologies. The barrier would be lowered by automating the creation of formal models. We are currently exploring techniques that can use basic event data captured from an on-going process to generate a formal model of process behavior. We term this kind of data analysis process discovery. This paper describes and illustrates three methods with which we have been experimenting: algorithmic grammar inference, Markov models, and neural networks.

Retrieve gzip'ed Postscript.


Toward Metrics for Process Validation

Jonathan E. Cook and Alexander L. Wolf

ICSP 3, October 1994, Reston, Virginia, USA

To a great extent, the usefulness of a formal model of a software process lies in its ability to accurately predict the behavior of the executing process. Similarly, the usefulness of an executing process lies largely in its ability to fulfill the requirements embodied in a formal model of the process. When process models and process executions diverge, something significant is happening. We are developing techniques for uncovering discrepancies between models and executions under the rubric of process validation. Further, we are developing metrics for process validation that give engineers a feel for the severity of the discrepancy. We view the metrics presented here as a first step toward a suite of useful metrics for process validation.

Retrieve gzip'ed Postscript.

Slides.ps.gz


Partition Selection Policies in Object Database Garbage Collection

Jonathan E. Cook, Alexander L. Wolf, and Benjamin G. Zorn

SIGMOD, May 1994, Minneapolis, Minnesota, USA

The automatic reclamation of storage for unreferenced objects is very important in object databases. Existing language system algorithms for automatic storage reclamation have been shown to be inappropriate. In this paper, we investigate methods to improve the performance of algorithms for automatic storage reclamation of object databases. These algorithms are based on a technique called partitioned garbage collection, in which a subset of the entire database is collected independently of the rest. Specifically, we investigate the policy that is used to select what partition in the database should be collected. The policies that we propose and investigate are based on the intuition that the values of overwritten pointers provide good hints about where to find garbage. Using trace-driven simulation, we show that one of our policies requires less I/O to collect more garbage than any existing implementable policy and performs close to a near-optimal policy over a wide range of database sizes and object connectivities.

Retrieve compressed Postscript.


Assertions for C++

Jonathan E. Cook

Class project for Programming Languages

In this report I describe types of program annotations that could be used in C++ to enhance the testability, assurance, and overall quality of the code being developed. These annotations are formal, processable assertions which capture constraints and specifications which cannot be discerned from the program code itself. I first describe in some depth previous work in this area, and then try to apply this work to the C++ programming language.

Retrieve gzip'ed Postscript.


AgentSim: A Simulation of a Petri Net Based Hardware Description Language

Jonathan E. Cook

M.S. Thesis, May 1991, Case Western Reserve University

Gdl, the hardware description language of the Agent design environment, is used by an engineer to describe the behavioral, structural, and physical aspects of his design. Gdl supports the design specification at the architectural and organizational levels of abstraction. Agent currently allows a designer to create and edit his design, and has the capability to do some static complexity analysis and synthesis. AgentSim is that part of Agent in which simulation of designs specfified in Gdl is done. The behavioral specification in Gdl is based on Petri net theory, with extensions added to make the use of Petri nets practical. AgentSim takes the complete behavioral specification, the complete or partial structural specification, and initialization information supplied by the handles simulation of multiple processes and synchronization between them.

Retrieve gzip'ed Postscript.