The following list is in no particular order -- specifically, the list should not be interpreted as a prioritized list.
Last updated: May 9, 2014 (could still use some more editing, though)
TEAMS is an AOP system we are building; TEAMS stands for The Extensible Aspect Monitoring System. It will be designed specifically for creating tools that monitor software systems.
Use TEAMS on applications(MS)
Select one large or a few small Java applications and monitoring needs as case studies and use TEAMS to accomplish the monitoring goals on those applications. Since TEAMS is in prototype stage this may involve finding and fixing issues in TEAMS.
Implement new PCD(s) using TEAMS(MS)
TEAMS is meant to be extensible, meaning new capabilities called pointcut designators can be added to it. This project would design and implement one or a few related PCDs in TEAMS, and demonstrate that they are effective and efficient.
SciMon TEAMS?(MS, PHD?)
Begin the design and construction of a non-Java-based TEAMS that is specifically for monitoring parallel scientific applications.
In general I am interesting in exploring the use of runtime monitoring to help understand and improve applications that run on HPC platforms. This extends from the development side (where highly intrusive monitoring can be done) to the production side (where barely any monitoring can be done).
ProMon is short for "Production Monitoring" and is the result of a Ph.D. student's research. It is a framework for performing lightweight production-side monitoring of scientific applications. Projects related to ProMon include:
1: Extend ProMon to use MRNet (Multicast Reduction Network), a tool developed at UNM;
2: Extend ProMon to use LDMS/Ovis (a project out of Sandia Labs);
Application Profile Inference(PHD,MS)
Input into ProMon includes a specification of where in the application we want to monitor. What would be best is if we could automatically infer the places in the application that should be monitored. This involves extracting as much information as possible out of an application that is heavily monitored in the lab, and then data-mining that information to infer general behavior and then map that behavior to the application code. This project is being undertaken by Omar Aaziz as a Ph.D. project but there is probably room for an M.S. project or two in it. Nasim Ghazfarani is already completing an M.S. project in this topic.
Java client for FTB(MS)
Build a Java library for creating FTB clients in Java. (Not high on the list right now.)
Agile software development is a new-ish software development model that is much less structured than others. Can it be measured? What does it mean to be measured? Can it be modeled? Can one discover a model of what is actually being done? Can one measure conformance to a model?
Modeling Agile Processes(MS/PHD)
How can agile processes be modeled? Can models be discovered? What is the right kind of model?
Measuring Agile Processes(MS/PHD)
What are meaningful measurements that could actually be achieved in an agile process?
Use and extend declarative mining tools(MS?)
Get some of the declarative process mining tools and use them. Find their weaknesses and implement extensions.
1. Opteron Simulator Enhancement
We have a mostly completed stochastic simulator of the AMD Opteron CPU, in C++. It needs completed, tested, and enhanced in a variety of directions, mostly related to its memory model (including instruction fetch model). Other needs are in using it to explore CPU design decisions and to genericize the model away from the direct Opteron capabilities. This project would also involve using PAPI and Pin tools.
EZM is a small system for "easy monitoring" of MPI programs running on computational clusters. It needs extended in a variety of ways. You can read about it here.
1. EZM to CIFTS
CIFTS is a large system for cluster fault tolerance, and it has a component called FTB, or "Fault Tolerant Backplane". FTB is essentially a mechanism that will accept published event data, and deliver it to subscribers who are interested in that event (this is called a publish-subscribe mechanism). EZM is an ideal publisher of events, and so it needs the capability added to it to connect to and publicize events to FTB/CIFTS.
2. EZM to MRNet
MRNet is a framework for efficient extraction of data from a cluster. It stands for "multi-cast reduction network". Since EZM generates data on every node of a cluster, and potentially lots of data, it could use MRNet to efficiently move the data out of the cluster.
3. EZM to OVIS
OVIS is a end-user sytem for monitoring the status of a cluster and the applications running on it. Since EZM generates application monitoring data, it could communicate it to OVIS, and OVIS could be extended to include a panel that shows application-specific information.
4. EZM (and others) to Web Services
Web Services offer a web-centric mechanism for making services and data available to a variety of clients. A Web Service portal to EZM data would be nice. It should also integrate other data and possibly enable some control capabilities.
Cloud computing has the potential for revolutionizing the way we deploy computations and services. It has several models but in the IaaS, "Infrastructure as a Service", model, typically a user prepares an entire system image, including the operating system and applications, onto a cloud, and their image is run on virtualized hardware (e.g., using VMWare, VirtualBox, Xen, or other hypervisors). Most of my interests are in the IaaS model.
1. Custom kernel for cloud application monitoring
Since users have to prepare an entire OS and applicaton image in IaaS, we can create a custom IaaS OS kernel that includes some application monitoring capability. This project would do just that, probably focusing on using KProbes-like hooks and dynamic linker hooks for inserting application monitoring capability. More work to be done here, another project is possible.
2. SBRT ported to cloud kernel
SBRT is a dynamic linker extension created as part of a PhD thesis. We could port it to an existing cloud Linux kernel distribution and use it for cloud application management and monitoring.
3. Computing Workflow
Workflow systems can automate and control the process of multi-step computational projects (and many other business processes). BPEL is an industry standard workflow language that also works with Web Services to control and coordinate workflow. A free BPEL server is the Apache ODE project. A MS project would be to set up a BPEL workflow server and to program a workflow and some web services needed to implement the workflow.
AVR Java-based simulator and environment
Create an Arduino/AVR simulator in Java so that CS 273 students and others around the world can simulate and debug their Arduino programs.
Open source VHDL definition of the AVR processor
Build a VHDL implementation of the AVR processor definition; this could be used fab an AVR clone, or embed it into an FPGA that has other capabilities included.
Evaluate the Kitten OS as a cloud-capabale OS; deploy an application software stack on Kitten running in a VM.
Arduino simulator / debugger
Build some Arduino tools that we can use in our undergraduate labs. Maybe a graphical simulator, on-board debugging tools, or other tools to be determined.