next up previous
Next: Semantic Knowledge Up: Syntactic Knowledge Previous: Morphology, phonology, etc.

Subcategorizations, etc.

Levin [73] has characterized the semantics of verbs according to alternations in the expression of arguments that affect meaning (termed diathesis alternations). The motivating principle is that verbal meaning determines syntactic realizations. A practical consequence of this would be that lexicon entries wouldn't have to encode as much information about subcategorization, since this would be determined by the word subtype. More importantly, syntax can serve as an important constraint on the possible meanings for a given verb by corroborating certain classes and disqualifying others. A simple example will illustrate the distinctions that the various classes make: see Figure 1, which shows four classes for verbs of contact. As can be seen, these verbs can be categorized based on the type of alternations that they participate in.


  
Figure 1: Examples of Levin verb classes
\begin{figure}\begin{center}
\par\begin{tabular}{ll}
\multicolumn{2}{l}{Alternat...
...tch \\
Break: & hack, split, tear \\
\end{tabular}\par\end{center}\end{figure}

One of the first experiments to test this claim was done by Dorr and Jones [35]. They extracted syntactic signatures from superficial parses of the examples sentences of each class, including both positive and negative cases. Of the 191 Levin verb classes, 187 had unique signatures associated with them (97.9%).

A potential problem with the Levin verb classes is that verbs can belong to more than one class. To determine the resulting ambiguity, each verb was grouped according to the the union of the signatures for each of the verb classes it occurs in. This produced 768 groups, of which only 12 had direct mappings into a Levin class. So only 6.3% of the Levin verb classes can be determined uniquely from ambiguous verbs. This suggests resolving the verb class directly from context alleviates semantic ambiguity. However, as discussed later, Boguraev and Briscoe [14] performed a simple test on the Levin hypothesis, producing less positive results.

NYU's COMLEX Syntax resulted from another project that places strong emphasis on subcategorizations, although the connection to semantics is not as stressed [47,74].    COMLEX contains a lexicon of 38,000 root forms, detailing each word's subcategorization frames and providing citations of the subcategorizations in several different corpora (100MB total). It was constructed from various sources, including LDOCE, OALD (Oxford Advanced Learner's Dictionary), ACQUILEX, and the Brandeis Verb Lexicon. One potential use would be to assess the plausibility of certain verb senses, given the subcategorization frame. But, since COMLEX doesn't sense tag the data, this requires that a separate annotation be obtained for the same corpus, which limits the applicability.


next up previous
Next: Semantic Knowledge Up: Syntactic Knowledge Previous: Morphology, phonology, etc.