The Notio API: Design Principles And Description
Last Updated: 98/08/04
NOTE: This document is out of date. It will be revised or replaced
shortly. The only major change is that a Referent class has been added
that contains the Designator and Quantifier classes, and is contained by
Concepts.
Contents
What is Notio?
Notio is a Java class library for constructing and manipulating
conceptual graphs. Currently, the API provides facilities for operations
on single graphs or pairs of graphs. It is not a DBMS, though it
is intended to provide a means for working with data retrieved from conceptual
graph databases. Most conceptual graph software consists of applications,
not tools for building
applications. To address this deficiency, the primary focus in
creating Notio has been to design an interface rather than an implementation.
Thus, the classes and methods described in the documentation may be implemented
in a variety of ways. The hope is that applications will be written
on top of the Notio API and that the actual implementation could be seamlessly
replaced should the need arise. This project has therefore produced
two things:
-
documentation of the required classes
-
a reference implementation of the classes
The design places the highest priority on convenience
for the user of the API and only makes concessions to the implementor of
the package itself where they do not materially injure the simplicity and
usability of the API. It is anticipated that many different implementations
could be tried, some involving communications with custom CG databases
via a network client or calls into native libraries, some memory-based,
some in pure Java (like the reference implementation), and hybrid approaches.
Since these different approaches may involve radically different designs,
the API is intended as a set of abstractions that can sit on top of any
implementation and separate the application from the internal representation,
storage and manipulation routines.
Currently, Notio provides support for the management
of individual graphs, not large groups of graphs. Most of its operations
act on only one or two graphs at a time. As such, it is ideal as
a basis for CG editors or as a representation for data retrieved from a
large-scale system. Future development of Notio to include implementation
independent interfaces for dealing with large numbers of graphs may be
very useful and any discussion on this subject would be greatly valued.
Goals for Notio Design
-
Implementation Independence
-
Application Independence
-
Simplicity
-
Thread Safety
-
Exception-style Error Handling
-
Consistency
-
Interoperability
-
Extensibility
The Scope of Notio
The Notio API is not intended to implement all of Sowa's
original vision of Conceptual Graphs as outlined in his book [Sowa 1984].
In the main, it attempts to follow the ANSI CG98 Draft Standard as closely
as possible. The following sections of the 1984 book are, or will
be, covered by this API:
-
All of Chapter 3 except:
-
No support for the formula operator discussed in 3.3.2.
-
No explicit implementation of a canon (3.4.5) though all components
needed to construct a canon are present.
-
No special support for quantity contraction. This may be represented
using a relationship.
-
The following sections of Chapter 4:
-
Contexts and lines of identity (4.2.1 - 4.2.5).
An Overview of the API
Notio is carefully designed to avoid forcing any particular
organization on the application. For example, one is not forced to
create a knowledge base in which graphs are stored, there are no limits
on the number of type hierarchies and knowledge bases that may be created,
nodes can be created and manipulated without belonging to any graph, and
all structures can be connected and disconnected fairly freely. The
underlying implementation may impose constraints by requiring that components
follow some rules, but the API itself only imposes such restrictions where
reason requires it. This design principle avoids preconceptions that
may limit the range of possible applications of the API.
The following sections will provide an overview of
some of the more important classes and features of the API, along with
an account of the reasoning behind their design. There will also
be a limited discussion of the issues that should be considered while trying
to create an implementation of the API.
Graphs
(Graph)
The focus of most operations will be the Graph class.
Many applications will build and modify graphs with relatively little reference
to the substructures. Some applications, such as editors, will have
to address this more detailed level.
Graphs are broken down into nodes: concepts, relations,
and actors. The arcs are stored as part of the relation (or actor) objects.
Concepts may be added one by one to the graph, allowing unconnected graphs
which may be of use to graph editing software. Graphs can be set
to allow or forbid partially connected relations and actors. Editors
will probably want to allow such incomplete nodes whereas other applications
would prefer these to throw an exception.
If the concepts attached to relational nodes are
not already present in the graph, they are automatically added. There
is also a method which allows multiple relations (and their associated
concepts) to be added at the same time. This allows the creation
of a complete graph, or complex additions to a graph, in one atomic, thread-safe
operation.
Implementations may store the information about arcs
and nodes in whatever manner they please but the interface will always
give the impression that the nodes are contained by the graph and the arcs
are contained by the relations and actors.
Nodes
(Node, Concept, Relation, Actor)
The nodes are considered to be largely immutable objects.
Once created, the type of a node may not be changed. In the case of concept
nodes, the referent may not be changed either. Any operations that might
be viewed as changing a node (e.g. restriction) actually produce a new
node, distinct from the original but with the required differences. This
new node can then replace the existing node to effect a change in a graph.
These replacement operations should be atomic in order to provide thread
safety.
Note that in the current API, the arguments to relations
and actors may be incompletely specified and also changed at any time which
may present problems to thread-safe implementations. Since this feature
is chiefly intended for editing applications which will probably not require
thread-safety, implementations may choose to guarantee thread-safety only
when graphs are set to allow the addition of complete relations only.
This immutability of nodes has one major motivation
behind it. By requiring that changes to the components of a graph
be made via the Graph object, we reduce the amount of inter-class communication
required to update indices and other structures used for storing the graph.
This avoids the creation of a large number of call-back and notification
methods on the various classes. The immutable nodes force the removal
and addition of nodes in order to make changes. However, this principle
is not carried over to the arcs since it begins to seriously impede the
application developer.
Concepts
(Concept, Designator)
Concepts consist of a type, a quantifier, and a designator.
The latter two components constitute the referent of the concept.
Types are discussed below. Quantifiers are presently poorly defined
in Notio. At present they are represented by using a class that implements
the Macro interface. Macros can either be used as a simple placeholder
object with a name, or can actually provide an executable operation that
'executes' the macro and changes the graph accordingly. It is hoped
that future discussions in the CG community will lead to a more specific
and useful conception of quantifiers.
There are several kinds of designators implemented
in Notio. They are:
NameDesignator
Name designators are not really necessary since they can be expanded into
name relations.
They may be removed in the future but may be a useful means for speeding
up matching tasks.
One major problem is that it is not possible to combine a name designator
with other designators.
LiteralDesignator
These designators can be associated with any arbitrary Java Object.
This allows the embedding of strings, numbers, images, sounds, or any other
data type in a graph.
More work is needed to determine what features should be included to support
literals.
MarkerDesignator
Notio provides a MarkerSet class for storing Marker instances.
This combination of MarkerSets, Markers, and MarkerDesignators allows designation
of specific individuals.
Markers can also be associated with literal objects.
DescriptorDesignator
This designator is associated with a Graph object that describes the concept.
Compound graphs can be constructed in this manner. A concept with
a DescriptorDesignator is also known as a ‘context’.
SetDesignator
Set designators are not really necessary since they can be represented
in terms of relations and the other designator types, but could be used
to reduce computational complexity.
These designators consist of a set of other designators.
Set designators are not specified in CG98, and may be removed or provided
as a Notio extension.
LambdaDesignator
These designators act as bindable variables or wildcards depending on how
they are configured.
When set to bind, they bind against the first designator they are matched
against. Thereafter, they assume that designator's form for the purposes
of matching.
These can be used for complex operations similar in nature to Prolog queries.
Additions should perhaps be made to assist backtracking query systems.
Relations
(Relation)
Relations consist of a type and arguments (arcs) which
are an ordered list of Concept instances. Relations in Notio use
the CG convention that the first argument is an outgoing arc while the
rest are incoming arcs. With this convention, it is unnecessary to
specify arc directions. If a significant number of CG applications
and/or underlying implementations use directed arcs outside of this convention,
the Notio specification should be adjusted accordingly.
Actors
(Actor)
Notio currently provides a baseline level of support
for actors. Essentially, they are treated like Relations but may
have multiple incoming and multiple outgoing arcs. No 'active' functionality
has been defined yet. It is hoped that the CG community will suggest
ways of implementing active actors. It is even possible that a pluggable
actor implementation may be designed so that the behaviour of actors can
be selected dynamically.
Types
(ConceptType, ConceptTypeHierarchy, RelationType, RelationTypeHierarchy)
Concept types and their hierarchies are maintained by
creating a ConceptTypeHierarchy instance and adding ConceptType instances
to the hierarchy. The hierarchy instance can be used to find ConceptType
instances by type label, find subtypes and supertypes, and rearrange type
relationships. The type hierarchy instance acts as a container that
can control queries and changes in order to facilitate thread-safety.
Concept types may be described by a label, a type
definition, or both. For convenience, type relationships can be queried
by calling methods on the type instances themselves. The type instances
are mutable since it would prove difficult to replace a type in all existing
nodes that use it. Since all types are associated with a hierarchy,
this mutability does not seriously impede thread-safety as all changes
can be synchronized via the hierarchy.
RelationTypes and RelationTypeHierarchies work in
a manner almost identical to ConceptTypes and ConceptTypeHiearchies except
that RelationTypes may also specify a 'valence' (number of arcs).
This valence may be left unspecified in which case no restrictions are
placed on the number of arcs a relation may have. This feature is
particularly useful for editing applications where the user may want to
add arcs one at a time. Both Relations and Actors use the RelationTypes
since Actors are considered to be an extension of conceptual relations.
ConceptTypeDefinitions and RelationTypeDefinitions
are based on an Abstraction class which contains a Graph object and specifies
the Concepts that act as formal parameters in the Abstraction.
Knowledge Bases
(KnowledgeBase, MarkerSet, IndividualMarker)
A knowledge base is an optional component provided chiefly
as a convenient means for grouping together the type hierarchies and a
marker set into a single object. Additionally, a KnowledgeBase can
have an 'outermost context', which is a single Concept node that has a
DescriptorDesignator. This allows a KnowledgeBase to act as the ultimate
container of a compound graph and is consistent with the CG98 definition
of a knowledge base.
A MarkerSet is a set of Markers. Markers are
conceptually associated with an individual in the universe of discourse.
They have a unique integer identifier and are optionally associated with
a Java object. The MarkerSet roughly corresponds to the 'catalogue
of individuals' defined in CG98.
The only time a KnowledgeBase is absolutely required
is when initializing a parser or generator (see Translation).
This is because most useful translators require type hierarchies and a
marker set, and these are most conveniently packaged using a KnowledgeBase.
Matching
(MatchingScheme, MatchResult, NodeMapping)
One essential feature of almost any conceptual graph
system is the ability to provide some form of matching between graphs.
Different conceptual graph databases provide different types of matching
capabilities, often depending on the way that they store the graphs.
Applications may require many different forms of matching in order to extract
the information required. In the Notio API, very few restrictions
are imposed on how matching can be performed.
One complication involves the matching of compund
graphs. In order to deal with compound graphs and allow for almost
unlimited matching types, applications construct a MatchingScheme instance
which they pass to the matching methods. These matching schemes consist
of a set of predefined flags that specify how to match graphs and the various
substructures of graphs. Implementations need not allow all forms of matching.
The matching scheme requested by an application can be rejected by a Notio
implementation if it cannot handle that scheme. The matching scheme
is applied recursively to compound graphs but Notio also allows for nested
matching schemes so that different matching schemes can be applied to different
contexts.
Notio distributes matching capabilities across the
classes, so it is possible to match individual Concepts, Relations, Actors,
and types, rather than just graphs. Notio implementations may implement
their graph matching routines in terms of these simpler matching routines
or not as they choose, but should make these simpler routines available
if possible.
Matching results for graphs may be returned as a
MatchResult object. These objects contain a set of NodeMappings which
represent all of the different matches found between two graphs.
A NodeMapping lists the matching pairs of nodes from the two graphs.
Like the matching schemes, the MatchResults are nested in order to provide
the details of matches for compound graphs.
Copying
(CopyingScheme)
Copy is one of the four canonical operations defined
on conceptual graphs. It seems on the surface to be a simple operation
but when considering compound graphs with coreference links, there are
several possible ways to view the copy operation. It is also useful
to be able to copy individual nodes. Finally, the implementational
aspects of copying complicate the purely theoretical interpretation.
Notio attempts to provide a general purpose copy operation by using an
approach that is similar to the way matching is handled.
Copying is distributed across the classes so that
individual nodes may be copied. Notio implementations are not required
to use the simpler routines as a basis for the more complex, but they can
if they choose. The copying is controlled by creating an instance
of the CopyingScheme class which is essentially a collection of flags that
describe how each component should be copied. Like the MatchingScheme
class, one can create nested CopyingSchemes so that components in different
contexts may be copied differently. It is also possible to specify
a CopyingScheme for all contexts contained within a given context.
The copying process also uses a 'substitution table'
which ensures that once a component is copied, all subsequent references
to it will refer to that copy, rather than creating a new copy every time
the component is referenced. This is similar to the way in which
Java performs object serialization. These substitution tables can
be used to customize the copying process and also provide a means for mapping
the original components to the copies once the operation is complete.
Extensions
(Extensions)
Since many different variations and features will be
required in different applications, Notio provides a means for formally
and programmatically identifying and controlling extensions to the API.
This means that applications can ask a Notio implementation if it supports
a given extension, activate it, and then configure it if necessary.
Using Java, it may be possible to write implementation-independent extensions.
Such extensions would not directly access the underlying implementation
but instead would only use the standard Notio API. These non-native
extensions could be added to any Notio implementation. Native extensions
could exploit unique features of the underlying implementation or provide
more powerful or efficient alternatives to standard API calls.
Translation
(Parser, Generator, TranslationContext)
In order to be useful, the Notio API must provide means
for importing and exporting graph information. Given that there are
several formats available for interchanging graphs (CGIF, LF, KIF, and
others) it was decided that the Notio API itself would not support any
particular format but instead provides a pluggable architecture for Parsers
and Generators. Parsers read in graph descriptions and use standard
Notio calls to generate the graphs. Generators use standard Notio
calls to examine graphs and generate output.
Parser and Generator are interfaces which any parser
or generator must implement in order to conform to the Notio pluggable
translation architecture. These interfaces provide some standard
method signatures that applications can use to invoke any translation implementation.
Since the translators use standard Notio calls, they can be used on any
Notio implementation and need not be replicated for each implementation.
All translators are stream-based, so they can be used on files, network
stream, or any of the many stream classes that Java provides.
The reference implementation currently includes a
CGIF parser and a CGIF generator. LF translation implementations
are also underway. These reference implementations are contained
in a separate package (notio.translators) since they are not part of the
Notio API but they can be used with any Notio implementation.
The TranslationContext class allows for the storage
of symbol tables and other data constructed during parsing and generating.
This allows information to preserved between different translation sessions.
For example, using a translation context means that variable names or individual
markers can be preserved between parsing and subsequent generation.
It is also necessary since the parsers and generators cannot always determine
when it is appropriate to clear symbol tables, etc. Repeated requests
to parse a graph may be either a set of distinct graphs or a sequence of
nested graphs. In the first case, the symbol tables might need to
be cleared between sessions whereas in the second case, clearing the tables
would destroy the sense of the graph. This class still needs considerable
work in order to provide a truly universal context for translation.
At present it is too specific to the requirements of CGIF translations.
Operations
There are many operations that may be performed on graphs
or their substructures. Notio attempts to provide many of the common
operations and more may be implemented by using combinations of calls.
The standard canonical operations on graphs and nodes are defined as part
of the API (Join, Restrict, Simplify, Copy). Type expansion operations
are also included amongst others. In the current phase of Notio development,
the focus is on the data structures, basic building operations, and translators.
The reference implementation provides some of the API's defined operation
but much work remains to be done in this area. There are also many
operations still to be added to the API. Available or planned operations
include:
-
Join
-
Notio allows for single-point, multi-point, and maximal joins.
-
Simplify
-
Notio allows for the simplification of a specific relation node, all nodes
of a given type, or an entire graph.
-
Copy
-
Restrict
-
Restriction is actually implemented on the nodes and typically produces
a new node with the required restriction. This approach suffers from
the same problems as the copy operation and will need to be reconsidered.
-
Type Expansion
-
Notio allows for expansion of concept and relation types.
-
Type Contraction
-
This has yet to be added to the API.
-
Set Coercion and Expansion
-
This has yet to be added to the API and depends partly on whether sets
remain part of the standard API or are made into an extension.
-
Denotation
-
This has yet to be added to the API. Since Notio is not currently
designed for large numbers of graphs, it is not clear how this should be
added.