Analysis of dictionary definitions

In the mid-80's, a trend began towards building more realistic applications of natural language processings. Earlier work, in addition to being restricted to toy domains, generally dealt with limited lexicons. Therefore, the analysis of machine readable dictionaries (MRD's) became a popular way to overcome this limitation. The initial approaches concentrated on using the information explicitly provided, such as grammatical codes, with the exception that definitions were analyzed to establish the ISA hierarchies that were implicitly specified for the terms defined. This involved the extraction of the genus headwords along the lines of Amsler's [3] manual analysis of Webster's pocket dictionary. Much work was done with Longman's Dictionary of Contemporary English (LDOCE), both because of favorable research licensing and due to its restricted vocabulary of defining terms. [12] contains a good survey of early LDOCE-related research. It also illustrates some of the difficulties commonly encountered, such as in dealing with errors in the typesetting formatting and inconsistencies in the definitions.

Later on, additional implicit information was extracted from the dictionaries. Alshawi [2] and Jensen and Binot [63] applied pattern matchers to extract case or thematic relations among words defined in the entries. There have been similar approaches applied since, most recently by Barriére [7] and Vanderwende [114,116]. The review will concentrate on this work since it is most relevant for the research to be performed. See [125] for a comprehensive survey of work on MRD-related research.

The main contribution of Amsler's [3] thesis is the development of procedures for the extraction of genus hierarchies from machine readable dictionaries. This is a manually intensive process because the genus terms must be disambiguated by human informants. The noun and verb hierarchies extracted from the Merriam-Webster Pocket Dictionary were analyzed in depth. In addition, this work contains useful information on other aspects of analyzing dictionary definitions, such as in the analysis of the differentia descriptors used in motion verbs, suggestions for parsing dictionary definitions, indications of what might be expected from a deep analysis (unraveling morphological relations), and the description of a technique for disambiguating dictionary definitions. A similar disambiguation technique was popularized by Lesk [72]. Note that these analyses establish a practical limit for what might be expected from automated analysis of dictionary definitions.

The collection of papers in [12] provides a comprehensive account on extracting information from LDOCE, including implicit as well as explicit information. Several of these deal with extracting syntactic information through analysis of the subcategorization codes [1,14,26]. The general consensus is that, compared to other dictionaries, the LDOCE grammatical coding is very useful but that it needs to be refined for the application to account for a few minor inconsistencies. Specifically, LDOCE is sufficient for basis of syntactic lexicon but manual verification is required to ensure accuracy. The article by Boguraev and Briscoe [14] is noteworthy in that it presents an early test of Levin's [73] hypothesis regarding verbal classes and syntactic realization. Using the grammar codes, they determined when certain verb senses allowed the dative alternation.⁵ These verb senses were then classified into one of the Levin classes that license the alternation (transfer of possession verbs and change of position verbs). The results show that 73% of the cases alternate but only 50% of these were in the licensing classes, indicating that the hypothesis seems more of a tendency than a hard and fast rule. A few articles discuss extracting semantic information from LDOCE. Vossen et al. [120] characterize the four major types of nominal definitions in LDOCE, termed link (basic genus format), synonym, linker (empty-head), and shunters (e.g., cross-reference to verbal definition). The last two types are the most frequent, which indicates that genus identification might not as simple as it seems. One reason for the large number of shunters is that derivatives occur often.

Alshawi [2] discusses how to extract semantic information from dictionary definitions. This is important being was one of the first articles to discuss the extraction of information outside of the genus terms. Pattern matching rules are applied to the definitions, such as the following (converted into extended regular expressions⁶):
$\begin{example}N .* (DET)? .* (ADJ)* (NOUN)? \\ N (DET)? (ADJ)* (NOUN)* NOUN THAT-WHICH {$\langle$ }VERB-PRED{$\rangle$ } \end{example}$
When a rule applies, an associated structure building rule is processed. For instance, in the first case, the genus will be identified along with the optional modifiers. In the second case, the predication will also be identified. For efficiency, the rules are organized hierarchically, which in this case would make the second rule a child of the first. Performance results indicate that the rules are effective in extracting 77% of the genus terms for a sample of 387 definitions. Furthermore, 53% of the cases had additional information correctly extracted.

Markowitz et al. (1986) describe several common patterns used in dictionary definitions. Some of these are specifically to resolve the genus relationships given uninformative genus headwords such as ``any''. This is the empty-head problem noted by Bruce and Guthrie [22]. One common pattern for these is Any-NP or Any-of-NP, in which the genus is given by the NP element. There is a special case of this pattern for definitions of terms in biology:
$\begin{example}x: any of / [mod] taxon (formal name) / of [mod] superordinate / ... ...amily (Gramineae) of monocotyledonous mostly herbaceous plants ... \end{example}$
Other taxonomy-related patterns are used to indicate set membership and to implicitly indicate certain aspects of the term. For instance, one that ... describes an agent (usually human), and Act of $\langle$ verb $\rangle$ ing indicates that the verb associated with the noun is active rather than just stative. There are also patterns for the active/stative difference in regard to adjectives: being ... suggests active whereas of or relating to suggests stative. This article presented several useful patterns for analyzing dictionary definitions. But, the emphasis is on what is possible to infer from definitions, rather than on the description of just what is inferred from the definitions. However, unlike [2] and [63], Markowitz et al. present statistics indicating how often the rules apply in dictionary definitions. This is very important for determining how prevalent the patterns are. Unfortunately, there are no coverage statistics to indicate the extent to which other patterns are required.

Wilks et al. [122] describe three different methods for analyzing LDOCE. All deal with aspects of taking the existing information in LDOCE and converting into machine tractable dictionaries. Method one is based on co-occurrence analysis of the control vocabulary usage in the definitions and examples. Using Pathfinder, reduced networks are produced showing the connectivity of related terms. Also, comparisons of the co-occurrence-based semantic relatedness scores versus human ratings show high correlations. Method two is based on bootstrapping a lexicon from a handcrafted lexicon for a subset of the controlled vocabulary. Details are sketchy, but the process iteratively produces new lexical entries based on the lexical entries for the defining words along with parse analysis of the definition and examples. Each lexical entry includes the superordinate term along with preferred values for specific case roles. Method three (Lexicon-Producer) creates lexical entries based on the explicit information in the online version of LDOCE (grammar code, box code, and subject code), as well as from pattern matching over parses of the definition to yield the genus term, basic features (e.g., modifiers), and some functional properties (e.g., used-for). Again details are sketchy.

Two methods are described for using this information. One is the Lexicon-Consumer, which parses text using the word-sense frames from the Lexicon-Producer. Alternative parses are maintained ordered by preference satisfaction. The other is the system of collative semantics, which is designed for producing mappings between sense frames to capture their relatedness. Noun sense frames have structural and functional properties. Verbal sense frames and other two-place predicates (e.g., prepositions) have case information. Other sense frames (one-place predicates such as adjectives and adverbs) have preferences (restrictions) and assertions (defaults or expectations). Note that the frames for prepositions, adjective, and adverbs are richer than those produced by the Lexical-Provider. So this article just sketches the problems expected if this information were derived from LDOCE.

In work related to the above, Slator and Wilks [109] sketch out an approach for deriving rich lexical entries from the information present in Longman's. This first incorporates the explicit information available in the online version of LDOCE: the extended grammatical category, semantic restrictions, and pragmatics code. The definitions are then parsed to extract information from the differentia. The parse tree is added to the entry as well as case information derived via pattern-matching rules. A preliminary investigation of the patterns in LDOCE suggests that the case usage is fairly uniform.

The collection of papers in [50] discuss the issues involved in creating machine tractable dictionaries (MTD's), which are basically machine readable dictionaries (MRD's) made more explicit. Most of these papers are favorable to the notion of taking information from existing dictionaries and making it more explicit, although additional sources will be needed, such as via linguistically-motivated corpus analysis [57] and incorporating the data lexicographers use in preparing dictionaries [25]. Atkins [6] voices the main objections, as discussed above, feeling that the MTD's should be designed entirely from scratch. Nirenburg [87] also favors manual construction and illustrates some of the problems that need to be addressed based on practical experience in developing a large lexicon for KBMT. For instance, in the language-independent ontology there needs to be a balance between having too many concepts that are not universal versus too few concepts (i.e., primitives) thus making it difficult to specify mappings. The collection closed with an extended article by Guo [84], which is basically a condensed version of his thesis. This fleshes out the bootstrapping process described above. In addition, it discusses how lexical entries are created by parsing the LDOCE definitions, guided by the manually encoded preference knowledge for a subset of the defining vocabulary. These consist of thematic-style relations for pairs of word senses. This information is used primarily for word-sense disambiguation of the definition text. Analysis of the definitions and example sentences yield further preference information; inductive machine learning techniques are used to generalize these (via the genus hierarchy) to cover a larger number of cases. Note that this inferred preference information consists solely of associations among word senses (i.e., no thematic relations). The end result will be a lexicon consisting of word-sense disambiguated definitions along with the preference information; but, only a small sample has been processed so far.

Amsler's [4] introduction to the collection is noteworthy in providing an honest appraisal of the articles presented and in adding more depth to the discussion, in fact, being more informative on the main issues than several of the individual articles. A common theme in his criticisms is how dictionaries are perceived by computational linguists as being more definitive than they actually are. For instance, he notes the common fallacies of 1) considering different dictionaries to have uniform quality of 2) feeling that logical definitions should be sufficient to cover a broad range of uses.

Vanderwende [114] shows how thematic information extracted from dictionary definitions can be used to interpret noun sequences, specifically for the case of two word sequences. For each pair of the senses for the head and the modifier of the sequence, the algorithm checks a series of rules, each of which lists criteria on the thematic relationship between the head and modifier. It first checks the rules for exact matches of the head and modifier relationships using the definitions from the two words. If no matches are found, then indirect matching is applied by checking whether some thematic relation can be established from either the head or modifier to another with which direct matching can be performed.

Her thesis [116] provides more details on the rules used to extract semantic information from MRD's. This approach, based on the work of Jensen and Binot [63], relies heavily on syntactic analyses provided by the MEG parser⁷. Some points of interest regarding the grammar follow. First of all, it is not customized to dictionary definitions at all. In addition, subcategorizations are only used to guide the parse, not rule out potential parses, which is important because definitions incorporate various forms of ellipsis more than normal text. And, a single parse is always produced (using the default right attachment but indicating other potential attachment sites). The following is a typical rule from her system [116, p. 193]:

$\begin{example}LOCATION-OF pattern: if the hypernym is post-modified by a relati... ...ION-OF relation with the head of the relative clause as the value. \end{example}$

In addition to extracting thematic roles, similar rules are used to extract functional information, such as the underlying subject and object for embedded verbals. The difficulty in extracting this type of information via simple pattern matching over definitions, as well as complications due to coordination, suggests that more superficial approaches will not suitable. The full set of relations extracted is shown in Figure 4. Note that the rules for extracting several of these were not described.

Table 4: Relations extracted by Vanderwende's system

Cause	Domain
Hypernym	Location
Manner	Material
Means	Part
Possessor	Purpose
Quasi-hypernym	Synonym
Time	TypicalObject
TypicalSubject	User

Adapted from [101, Table 2.1]

Barrière's thesis [7] presents a coherent account of all the steps for acquiring semantic knowledge from a dictionary written for children, in particular the American Heritage First Dictionary (AHFD). The strengths of this work include the following: the use of the conceptual graph formalism as a unifying framework; the empirical analysis which supported the research, especially regarding the semantic relations; and the useful comparisons between AHFD and the American Heritage Dictionary (AHD) throughout the analysis. There are four basic steps in her process: parse the definitions using a general grammar; transform the parses to conceptual graphs; refine the lexical relations in the conceptual graphs; and, combine the conceptual graphs from different definitions. The grammar is a simple context free grammar with some customization to dictionary definitions, such as the use of the ``meaning verb'' category. Parsing heuristics are used for filtering out certain cases, adding a lite-semantic component to the parser. The conversion into conceptual graphs is basically a surface-level transformation from the parse tree into the conceptual graph notation. Some rules are more general than just for dictionary definitions since they are geared for the typical usage sentences (e.g., ``ash is what is left ...''). Semantic Relations Transformation Graphs (SRTG) are used to extract relations from the conceptual graph representation of the shallow parses for the MRD entries. Some rules are quite specific and lead to unambiguous semantic relations; others are just heuristics about plausible interpretations. See Table 5 for the list of relations extracted; note that some of these, such as home (``a hive is a home for bees''), are very special purpose. It seems that some of these work due to the limited range of the AHFD entries, thus restricting the applicability to more general texts. These transformation rules constitute the most important part of the system in terms of the extraction of lexical relations.

The final step is to combine the conceptual graphs from different definitions to integrate the information about particular concepts. This is useful because the information from a subordinate concept might refine that of the superordinate (e.g., stem-partof- $\rangle$ plant $\Rightarrow$ plant-haspart- $\rangle$ stem). Richardson's work [101] exploits this information by inverting links rather than through graph operations. Note that it would be interesting to see how this scales up to encyclopedic entries: some useful relations should be extractable.

Table 5: Relation's extracted by Barriere's System

About	Accompaniment	Act
Agent	As	Attribute
Cause	Content	Direction
During	Event	Experiencer
Frequence	Function	Goal
Home	Instrument	Intention
Like	Location	Manner
Material	Method	Modification
Name	Object	Obligation
Opposite	Path	Possession
Process	Recipient	Result
Sequence	Synonymy	Taxonomy
Transformation

Based on transformation rules in Appendix E of [7]