AI Seminar
September 16, 2002
Dan Tappan

Toward Automated Scene Generation from Textual Descriptions


Natural language communicates descriptions of the world.  Humans can
decompose a complex visual scene into salient details, represent it with
relatively few words, transmit it in written or verbal form, and then
effortlessly reconstruct it with high fidelity.  Very little information
is actually stated, so humans rely heavily on commonsense knowledge and
reasoning to fill in the gaps.  Together, this explicit and implicit
information helps the receiver build and manipulate a corresponding mental
model of the scene.

Text understanding by computers is generally limited to superficial
processing of grammar and vocabulary only. Consequently, most systems
cannot benefit from important contextual cues to narrow interpretations,
reduce ambiguity, and so on, and their performance suffers accordingly.

This prototype computational-linguistics system works toward bridging the
gap between human and computer language processing.  It employs concepts
from linguistics, psychology, computer science, and related areas to
convert textual descriptions into plausible visual interpretations.

Formal reasoning and inference methods are applied across these issues in
conjunction with a source of relevant background knowledge.  The final
output is a simplistic graphical rendering of a virtual-reality
environment that roughly corresponds to a textual description.  The
display is augmented with further information to account for alternative
interpretations.