Melanie Martin
Paper for Ling 501
Dr. Foltz
December 12, 2000

Discourse Processing in Psycholinguistics

Introduction

In their summary of psycholinguistics research in the 90s, McKoon and Ratcliff claim that the job of psycholinguistic research is to describe and explain how readers come to experience narrative worlds (McKoon & Ratcliff 1998). This seems to be a tall order and while much progress has been made, we are clearly not there yet. I believe that success in this endeavor will require a multidisciplinary approach that includes work from psychology, linguistics, and natural language processing (computational linguistics). To date, some lack of coordination among these areas results form the fact that researchers in different areas often have different goals. For example, psychologists may be primarily concerned with models of brain function, or human behavior, while computational linguists may be more concerned with a system that works and have little interest in whether it models brain function or psychological reality. However, there has been fertile cross-pollination between psychology and computer science, to the extent that it is sometimes difficult to determine whether the way we understand the brain comes from our understanding of computers, or whether the way we construct computers comes from our understanding of brain processes. In this paper I hope to explore the state of the art in discourse processing from both the psychological and computational points of view, in particular, where they converge and diverge and prospects for the future. Here I am concerned with discourse or text processing, so I will only minimally address issues of sentence processing.

Some relatively recent developments in the area of discourse processing are the increase in processing power and memory of computers and the availability of large corpora. This provides the opportunity to test complex psychological and discourse models on realistic corpora, and enables the use of statistical techniques. Large corpora are also a rich source of linguistic examples.

On the surface of discourse processing some issues immediately arise:
1. To what extent does the text stand on its own, or do we take the writer's or reader's perspectives into account (Kintsch 1994)?
2. Can we view text as primarily for information transmission, or have we oversimplified by omitting the social, emotional, persuasive, and entertainment aspects?
3. Can or should a model be both useful for applications and psychologically valid?
4. How to integrate world knowledge in the implementation of a model? Can sufficient general world knowledge actually be input in the form of text or propositional logic?
5. Can a finite number of semantic primitives, or rhetorical relations, be sufficient? And if not how do we deal with this?

Psychology and Psycholinguistics

Here we will focus, primarily, on Kintsch's (1994) Construction-Integration Model as an example of a general model of discourse comprehension. There are other competing models, but, unfortunately, to date it does not appear that there has been a systematic comparison of the model. There seems to be a reasonable amount of empirical evidence supporting this model, but some aspects, in particular, the bottom-up processing during the construction phase, is controversial (Whitney 1998). It incorporates the idea, first proposed by van Dijk and Kintsch (1983), that we form multiple memories for discourse.

A summary of the construction-integration model as explained by Kintsch (1994), describes the sequence of cognitive states, the mental representation of texts, the processing cycles, knowledge elaboration, macroprocesses, and inferences. We will consider each one in turn:

1. The sequence of cognitive states in Text Comprehension
Kintsch characterizes cognition as a sequence of cognitive states, which can be considered as: the contents of short-term memory, or the focus of attention, or the state of consciousness. A cognitive state (focus of attention) can be viewed as a word, sentence, phrase, or paragraph, depending on the type of analysis to be done. (Here we will assume sentences are states.) The state is the result of input and analyses. In put can be from the outside world or long-term memory, including lexical, perceptual, and general world knowledge and beliefs. The contents of short-term memory can be viewed as retrieval cues which activate the items they are linked to in long term memory and retrieve them. The input is analyzed with the use of temporary buffers. For each state there is a processing cycle consisting of receiving the input and analyzing it. In order that there be some coherence between cycles it is assumed that some elements generate in a given cycle are carried over to the next cycle in a buffer. Kintsch hypothesizes that these elements are the ones most strongly activated.

2. Mental Representations of Texts
After processing the mental representation of a text will contain at a minimum three levels: surface (linguistic or word), propositional (conceptual or semantic), and situation model. Additional levels may come in to play, for example, in poetry or mathematics. The propositional level and situation model are generally most important, both in the laboratory and in real life.

The propositional level is modeled by constructing (usually by hand coding) a hierarchical structure of propositions and arguments based on lexical information from the text. A proposition is relational term or predicate and one or more arguments. Arguments may be concepts or other propositions, classified in terms of their semantic case roles.

The situation model, representing the situation described in the text, may be a mental image, procedural, abstract, or propositional, depending on the text.

Once these representational units are constructed, their interrelations may be viewed as a graph, where the elements are the nodes and the relations between the elements are the edges. The graph can be translated into matrix form, here the rows and columns are index by the elements and nonzero entries in the matrix correspond to relations between the elements. Generally, nodes are considered to be related if they have a common element, however, depending on the level of analysis necessary, additional relations may be desirable.

The first step in simulation of text comprehension is to construct a graph or network, as described above, based on the text. The construction process may be rough (imprecise), for example, all meanings of ambiguous words may be included and parsing ambiguities may be considered in parallel. The network may contain irrelevant or contradictory elements.

The next step is the integration process, which can be viewed spreading activation in the network until it reaches a stable state. Here it is necessary to use the matrix representation of the network. A vector, called the activity vector, is initialized with equal activation values for all elements. The activity vector is multiplied by the matrix and renormalized repeatedly, until the activity values stabilize. This has the effect of strengthening strongly interconnected parts of the network, while isolated parts become deactivated. The result is a coherent mental representation of the text.

This procedure is in sharp contrast to schema theories which provide a control structure to ensure context sensitivity in the construction phase, thereby eliminating the need for the integration phase. This is at the cost of a much more complex construction process.

3. Processing Cycles
Text is processed sequentially: a sentence is read, network is constructed and integrated, then when another sentence is read parts of the old network participate in the new integration process. In order to maintain coherence in the network, it is assumed that the strongest propositions from the previous sentence are maintained in the focus of attention when processing the current sentence.

4. Knowledge Elaboration
The fact that the mental representation of a text contains both information derived from the text and knowledge elaborations for long-term memory, begs the question:
Is a reader reminded of only relevant, contextually appropriate things, or are irrelevant, contextually inappropriate things also activated, and if the latter is the case why are we not conscious of them? One possible explanation is that schema or other structure filter out the irrelevant, or inappropriate knowledge. The explanation offered by the construction-integration model is that many irrelevant and contradictory pieces of knowledge will be retrieved in the construction process, but they will be quickly deactivated in the integration process. The model views knowledge activation as an uncontrolled, bottom-up process, determined only by the strength of associations between the items in the text and the items in long-term memory. Although the reliance of this model on bottom-up processing is somewhat controversial, empirical lexical decision studies of priming indicate that irrelevant items are activated, but quickly inhibited. This tends to support the construction-integration model view.

5. Macroprocesses
In addition to the global model of the situation described by the text, there is global model of the propositional structure of the text, called the macrostructure. The macrostructure is a hierarchy of propositions which reflects the rhetorical structure of the text. The macrostructure may or may not be correlated with the situation model.

The macrostructure is constructed strategically in response to cues indicating relative importance of various portions of the text. Three types of operators that reduce information are deletion, generalization and construction. These operator can be regarded as inference processes.

6. Inferences
Inferences may be divided into two categories: those that reduce information (macro-operators as above), and those that add information to the text. For those that add information to the text we classify them as follows: based on where the information comes from (long-term memory or newly created), and based on whether the inference process is automatic or controlled.

Knowledge retrieval during comprehension can be either automatic or controlled. If it is automatic, it is a locally determined, associative process where much of what is retrieved may be irrelevant or contradictory and will be removed in the integration process. The process retrieves items in long-term memory that are strongly linked to the text. The controlled process occurs when there is a comprehension problem or in response to special tasks or goals.

Controlled knowledge generation can occur during comprehension or later as the text is studied again or reviewed in memory. Like controlled knowledge retrieval it occurs when there is a comprehension problem or in response to special tasks or goals. This type is what is commonly considered inferencing. The importance of distinguishing between inferences generated at the time of reading and inferences generated later is also noted by McKoon and Ratcliff (1998).

The Construction-Integration model has been implemented by Kintsch and others (Goldman and Varma 1995; Kintsch 1988, 1992; Kintsch et al. 1990; Langston, Trabasso, and Magliano 1999; Mross and Roberts 1992; Tapiero and Denhiere 1995). In a standard implementation, text is represented by a series of interconnected nodes, each corresponding to a concept, sentence, phrase, or word in the text. The model's long-term memory contains all the nodes and their connections that have been processed so far. On input of a new node, it is processed in working memory, which contains the n most activated nodes of those that have been processed. Processing consists of spreading activation among the nodes in working memory until the process stabilized or settles. At this point, the activation values for the nodes in long-term memory are adjusted based on the settled activation values and the model determines which nodes should be kept in working memory by choosing the n most highly activated nodes. The model is then ready to receive new input and the process is repeated until all nodes have been processed.

Starting with the standard implementation, Goldman and Varma (1995) examined the impact of relaxing the working memory buffer size, finding that the simplifying assumption of a fixed buffer size was psychologically implausible (Fletcher 1999).
Langston, Trabasso, and Magliano (1999) assumed a relaxed spreading activation mechanism so that nodes in long-term memory that are connected to nodes in working memory are allowed to participate in the processing. Their model, coupled with discourse analysis of text relations, was able to account for on-line comprehension as measured by reading time or fit judgments.

While these implementations have been used successfully to test theories of retrieval from long-term memory, and various theories of mental representation and organization within a construction-integration framework, they do not fully implement Kintsch's model. Notably lacking is an implementation of the situation model. In addition, the input data structures require preprocessing, which may involve the use of different discourse analyses, based on different theories of discourse processing. From the computational linguist's point of view it does not solve any real-world problems. One might also ask to extent syntactic processing can, or should, be integrated in to this approach?

Computational Linguistics

There are several ways to categorize the work of computational linguists on discourse structure, Kintsch (1994) does so in terms of linguistic or psychological orientation. However, since much of the work that has been done is application-oriented it makes sense to classify the work in terms of the types of applications that are expected to come out of it: the two main areas off application are language generation and interpretation. Here we will focus on how discourse structure theory might be used in the interpretation of text.

We will look at three important papers on the theory of discourse structure
(Hobbs 1979, Grosz & Sidner 1986, Mann & Thompson 1988), and subsequent papers which discuss, extend and reconcile these theories. There has been significant debate in the computational discourse community between between proponents of these theories, in particular Rhetorical Structure Theory (Mann & Thompson 1988) and the intensionality based theory of Grosz & Sidner (Grosz & Sidner 1986). Much of this theoretical work was done at a time when resources (large corpora, thesauri, etc) were not available to adequately test these theoretical frameworks. As these resources became available and implementation and testing became possible, the trend has been toward a synthesis of these theories (Moser & Moore 1996, Marcu 2000).

We will look at each of these theories in turn, considering their usefulness for text interpretation, shortcomings and areas where additional research is needed.

Hobbs
Hobbs research interest is in discourse interpretation, this work seems quite compatible with memory based work in psycholinguistics. He defines a finite set of coherence relations that hold between portions of a discourse. The relations are defined with computable precision in the framework of the inference component of a language processor (Hobbs 1978). The coherence relations are defined in terms of the inferences that a reader makes, using world knowledge, to recognize them. So understanding the discourse structure is equivalent to finding the best proof explaining the information a portion of discourse.

Taking into account world knowledge in propositional form can be view as analogous to Kintsch's text representation. But his reliance on the underlying logical representation of world and language knowledge, while providing a concise characterization of when and why a coherence relation holds, also requires a great deal of preprocessing and his inferencing process is computationally costly.

Mann and Thompson
Mann and Thompson's Rhetorical Structure Theory (RST) (1988) has found wide use in the area of text generation, for example, in generating text summaries. They claim that approximately twenty three rhetorical relations are necessary to account for discourse coherence. The relations link different portions of text, called "spans", which can range in size from clauses to paragraphs. Adjacent spans are relate by exactly one of the possible rhetorical relations, forming new spans that are subsequently related to their neighboring spans until all spans are connected. In this way a hierarchy, or tree structure is formed.

Rhetorical relations constrain the components of the span and then intended effects of the span. Component spans are either nuclei or satellites, the nuclei being the more important span: "more essential to the writer's purpose than the other,"(Mann and Thompson 1988) and the satellites are the less important. All relations contain an "effect" field which describes the intended effect of the text on the reader. So as with Hobbs, the focus is on the reader, rather than on the intentions of the writer. However, a weakness of RST is that the model of the effects that each span has on the reader's mental state is imprecise.

Moore and Pollack (1992) argue that RST does not take proper account of the distinction between relations and the restriction that only a single relation can hold between pairs of adjacent spans is incorrect because discourse elements are related simultaneously on multiple levels (Grosz and Sidner 1986). Despite this RST has been widely implemented.

The construction of an RST tree for an entire text can be liken to the construction of the macrostructure in Kintsch's model.

Grosz and Sidner
Grosz and Sidner (1986) view discourse structure as three interrelated components: a linguistic structure, an intentional structure, and an attentional state. The linguistic structure consists of discourse segments and an embedding relationship that can hold between them. The intentional structure consists of discourse segment purposes (DSPs) and discourse purposes (DPs). The DSPs are related to each other by one of two relations: dominance and satisfaction. The attentional state distinguishes the most salient information from other less salient information to aid in the interpretation of subsequent discourse segments. It can be viewed as an abstraction of the discourse participants' focus of attention and is modeled by a stack of focus spaces, each holding the most salient information from a given discourse segment. The transition rules that add to or delete from the stack correspond to the dominance relation from the intentional structure.

It appears that the attentional state may be to some extent analogous to Kintsch's cognitive states.

While on the surface it appears the the discourse structure theories of Grosz and Sidner (1986) and Mann and Thompson (1988) are quite different, and in fact, there has been a decade long debate in the computational linguistics community between their respective proponents, recent work has been toward a synthesis. Moser and Moore (1996) have found considerable common ground between the two theories, based primarily on understanding the correspondence between the notions of dominance in Grosz and Sidner and nuclearity in RST. Based on this work Marcu (2000) has extended his normalization of RST (Marcu 1996) to incorporate the intentional structure of Grosz and Sidner (1986) to reduce the ambiguity of discourse.

What is still not incorporated is the attentional structure, which seems to hole the most promise for approximating a psychological model.

The three primary theories above provide a sample of discourse theory from a computational linguists point of view. These theories were developed, at least to some extent, with the idea of using them for applications in mind, rather than in the interest of modeling brain function. All three theories rely to some extent on intentional structure and take into account the intended effect on the reader. This still leave open to what extent the intent of the writer should be factored in.

To my knowledge, Grosz is primarily interested in planning, so it is not clear to what extent Grosz and Sidner's theories have been implemented in discourse comprehension application. RST, on the other hand, has been implemented and used in applications, however the development of an annotated corpus of text, parsed for rhetorical structure, is costly. As in the case of Hobbs and Langston et al., incorporating sufficient knowledge or preprocessing still requires substantial human effort. This contrasts sharply with implemented models like LSA and HAL.

The Intersection of Computation and Psychology

Here we will look at two computational techniques, developed by psychologists, which claim to at least partially model brain function as well as being useful for many text processing applications: Latent Semantic Analysis (LSA) (Landauer, Foltz and Latham 1998) and hyperspace analog to language (HAL) (Burgess 1998).

LSA is a fully automatic corpus-based statistical method for extracting and inferring relations of expected contextual usage of words in discourse (Landauer, Foltz and Latham 1998). In LSA the text is represented as a matrix, where there is a row for each unique word in the text and the columns represent a text passage or other context. The entries in this matrix are the frequency of the word in the context. There is a preliminary information-theoretic weighting of the entries, followed by singular value decomposition (SVD) of the matrix. the result is a 100-150 dimensional "semantic space", where the original words and passages are represented as vectors. The meaning of a passage is the average of the vector of the words in the passage (Laudauer, Laham, Rehder, and Schreiner 1997).

LSA can be viewed as a tool to characterize the semantic contents of words and documents, but in addition it can be viewed as a model of semantic knowledge representation and semantic word learning (Foltz 1998). While LSA has been able to simulate human abilities and comprehension in a variety of experiments, there is still some controversy over its validity as a model. The main objection seems to center around the fact that it ignores word order and syntax. Objections raised by Perfetti (1998) have been successfully, I believe, refuted by Landauer (1999).

LSA does not claim to be a complete model of discourse processing. Laudauer (1999) points out that the more general class of models, to which LSA belongs, associative learning and spectral decomposition, are well-understood in terms of formal modeling properties and as existing phenomena at both psychological and physiological levels. Perhaps this is the beginning of an explanation of why LSA seems to do so well at simulating human abilities, with so little.

HAL is a similar model to LSA. HAL is a computational model of high-dimensional context space, where word vector representations are based on a 10-word moving window. HAL's vector representations can characterize a variety of grammatical and semantic features of words, with a similar representation to that of humans (Foltz 1998). Since many of the issues with HAL are similar to those of LSA we will not go into detail here.

Summary and Conclusion

We have considered six models of discourse structure and comprehension and looked, at least superficially, at their similarities and differences. None of them claim to be, and none of them is a complete psychological model of discourse comprehension. The computational linguists have made more of an attempt to deal with the intentions of the reader, than the psycholinguists. Incorporation of syntax in the models ranges from minimal to nonexistent. All processing is sequential and one wonders if we need to wait until we have a better handle on the processing before considering the possibility of incorporating parallelism. Perhaps this would be a way to deal with syntax, when it is finally incorporated.

HAL and LSA appear to show much promise for the future. They are both still able to exploit their present structure, so it may be awhile before we see a serious attempt to make the models more complete. Currently, one of their big advantages is the minimization of preprocessing: all of the other models require a great deal at this point, which is certainly costly and can lead to a certain level of artificiality in the data or initial structures. No doubt the debate over how far one can go without considering word order will continue on some fronts until the issue can be definitively decided.

The issue of how and how much to attempt to incorporate general world knowledge into computational models is still unresolved in general. This will require more research in both the psychological and computational areas. Similarly, the issue of whether or not there are a finite number of semantic primitives or rhetorical relations will require more research to settle, if it can be settled at all.

As models improve we will need to investigate the incorporation of the reader's and writer's perspectives, as well as, consider text as more than information transmission. We will need to incorporate the situation model computationally.

Where does this leave us and how can we work together across disciplines? As computational linguistic models are developed, perhaps more attention should be paid to their psychological plausibility. It seems likely that psychological models will be implemented computationally as they are developed, as an aid in modeling and experimentation, but some systematic comparison needs to take place. While the ideal model would be both useful for applications and psychologically valid, our best bet is to continue to approach it from both directions, but perhaps keeping the other view in mind.

References

Burgess, C. (1998) From simple associations to the building blocks of language: Modeling meaning in memory with the HAL model. Behavior Research Methods, Instruments, & Computers, 30, 188-198. [invited address].

Fletcher, C. R. (1999). Computational Models of Reading and Understanding: What Good Are They? In Computational Models of Reading and Understanding, Ashwin Ram and Kenneth Moorman, (Eds.), The MIT Press.

Foltz, P. (1998). Quantitiative Approaches to Semantic knowledge Representations.
Discourse Processes, 25, 127-130.

Grosz, Barbara J. & Sidner, Candace L. (1986). Attention, Intentions, and the Structure of Discourse. Computational Linguistics 12(3), pp. 175-204.

Hobbs, Jerry R. (1979). Coherence and Coreference. Cognitive Science 3(1), pp. 67-90.

Kintsch, W. (1994). The Psychology of Discourse Processing. In Handbook of Psycholinguistics, M. Gernsbacher (Ed.), San Diego: Academic Press. pp. 721-739.

Landauer , T. K., (1999). Latent semantic Analysis is a Theory of the Psychology of Language and Mind. Discourse Processes, 27, 303-310.

Landauer, T. K., Foltz, P. W., & Laham, D. (1998). Introduction to Latent Semantic Analysis. Discourse Processes, 25, 259-284.

Landauer, T. K., Laham, D., Rehder, B., & Schreiner, M. E., (1997). How well can passage meaning be derived without using word order? A comparison of Latent Semantic Analysis and humans. In M. G. Shafto & P. Langley (Eds.), Proceedings of the 19th annual meeting of the Cognitive Science Society (pp. 412-417). Mawhwah, NJ: Erlbaum.

Langston, M. C., Trabasso, T., Magliano, J. P. (1999). A Connectionist Model of Narrative Comprehension. In Computational Models of Reading and Understanding, Ashwin Ram and Kenneth Moorman, (Eds.), The MIT Press.

Mann, William C. & Thompson, Sandra (1988). Rhetorical Structure Theory: Towards a Functional Theory of Text Organization. Text 8, pp. 243-281.

Marcu, Daniel (1996). Building Up Rhetorical Structure Trees. The Proceedings of the Thirteenth National Conference on Artificial Intelligence, vol 2, pages 1069-1074, Portland, Oregon, August 1996.

Marcu, Daniel (2000). Extending a Formal and Computational Model of Rhetorical
Structure Theory with Intentional Structures a la Grosz and Sidner. The 18th International Conference on Computational Linguistics COLING'2000, Luxembourg, July 31-August 4, 2000.

McKoon, G., & Ratcliff, R. (1998). Memory based language processing: Psycholinguistic research in the 1990's. Annual Review of Psychology, 49, 25-42.

Moore, Johanna D. & Pollack, Martha E. (1992). A Problem for RST: The Need for
Multi-Level Discourse Analysis. Computational Linguistics, 18, pp. 537-544.

Moser, M. G. & Moore, Johana D. (1996). Toward a Synthesis of Two Accounts of Discourse Structure, Computational Linguistics 22(3):409-420, 1996.

Perfetti, C. A. (1998). The Limits of Co-Occurence: Tools and Theories in Language Research. Discourse Processes, 25, 363-377.

Whitney, P. (1998). The Psychology of Language. Houghton Mifflin Company. Chapter 8.

Secondary References

Goldman, S. R., & Varma, S. (1995). CAPping the construction-integration model of discourse comprehension. In C. Weaver, S. Mannes, & C. Fletcher (Eds.), Discourse comprehension: Essays in honor of Walter Kintsch (pp. 337-358). Hillsdale, NJ: Erlbaum.

Kintsch, W. (1988) The role of knowledge in discourse comprehension: A construction-integration model. Psychological Review 95: 163-82.

Kintsch, W. (1992). How readers construct situation models for stories: The role of syntactic cues and causal inferences. In A. F. Healy, S. M. Kosslyn, and R. M. Schiffin, (Eds.), Essays in Honor of William K. Estes. Hillsdale, NJ: Lawrence Erlbaum.

Kintsch, W., Welsch, D., Schmalhofer, F., & Zimmy, S. (1990). Sentence memory: A theoretical analysis. Journal of Memory and Language 29:133-59.

Mross, E.F., & Roberts, J.O. (1992). The construction-integration model: A program and manual. Institute of Cognitive Science Technical Report, #92-14.

Tapiero, I. & Denhiere, G. (1995). Simulating recall and recognition by using Kintsch's Construction-Integration model. In C. Weaver, S. Mannes, & R. C. Fletcher (Eds.), Discourse comprehension: Models of processing revisited. Essays in honor of Walter Kintsch (pp. 211-233). Hillsdale, N.J.: Lawrence Erlbaum Ass.

van Dijk, T & Kintsch, W. (1983) Strategies of discourse comprehension. New York: Academic Press.