Building a Concordance

Object

To test understanding of arrays and strings, especially the use of an array of objects. The pages to read in the textbook are 68-71.

Problem Statement

Write a program in Java to build and print a concordance from a piece of arbitrary English text. A concordance is a list of all the different words that occur in a piece of text, together with the number of times each occurs.

Outline

Assuming the concordance will essentially be an array of objects, each of which contains a string and a count, a number of problems will have to be solved:

  1. How many different words to expect.
  2. How to recognize a word, i.e. which characters are in the word, and which are not.
  3. What to do about punctuation.
  4. What to do about capital letters, i.e. "this" and "This" are the same word.

The success of your program is dependent on answering these questions correctly, incorporating them into a design, and then writing the appropriate code.

The design

Your design should use three classes, one which stores a word and its frequency count, one that stores the concordance list itself, and one that includes the main method.

Checking for words already seen

So that the concordance list has no duplicates, you must check the list every time a word is read from the input stream to see whether the word is already present. To do this you will have to compare two strings. Since instances of the String class are objects, you cannot usee the == operator. You must use the equals method. See class String for all the String methods.

Capital letters

Although these can be handled easily by using you own Java code, the Character class contains a number of useful routines for handling characters. In particular, the class method "toLowerCase" converts a character to lower case, while "toUpperCase" converts it to upper case. In addition, there are testing functions like "isDigit", "isLetter" etc. Visit the documentation page for the class Character.

Input

The input will be from a file and can be any piece of text. Do not use a Java program as input. Instead you should use a piece of normal English text. We will test your program on a simple piece of text that we will choose.

Output

Your program should print a list of the different words found in the text, and the number of times it occurs. E.g. for the text:

The cat sat on the mat. She was sleepy and comfortable on the mat.

You should get:

1: the		3
2: cat		1
3: sat		1
4: on		2
5: mat		2
6: she		1
7: was		1
8: sleepy	1		
9: and		1
10: comfortable	1

 

Deliverables

  1. The source code, with good layout and comments, and
  2. Your sample output.
  3. Mail your source code to the grader, login hhuang.

Due date

September 26th. 1997 before 5:00pm.