CS167 C Programming
Spring 1998
Building a Concordance
Object
To test understanding of arrays and pointers, especially the use of an array of strings (pointers to characters).
Problem Statement
Write a program in C to build a concordance from a piece of arbitrary text. A concordance is a list of all the different words that occur in a piece of text.
Outline
Assuming the concordance will essentially be an array of pointers to characters (i.e. strings), a number of problems will have to be solved:
The success of your program is dependent on answering these questions correctly, incorporating them into a design, and then writing the appropriate code.
The design
Your design should use at least three functions, including the main function (see the section below).
Space allocation for new words
A pointer to each new word that you read from the input stream must be stored in the array of pointers. Your design should include a separate function to allocate space for the word. I will be sending you the correct code for this operation via email. You should take my code and incorporate it, with suitable comments, into your function. Here is what you will get:
/*********************************************************************** This function takes a pointer to a character that is assumed to be the start of an array. The length of the string (up to but not including the null character at the end) is calculated and space allocated for a copy of the string using calloc, a standard library function. The string is then copied into the new space by strcpy, a string library function. Finally, a pointer to the new string is returned. To use this function include both <stdlib.h> and <string.h> in your program. ************************************************************************/ char *AllocateString(char *s) { char *t = calloc(strlen(s) + 1, sizeof(char)); strcpy(t, s); return t; }
Note that you will need to include both stdlib.h and string.h for this to work.
Checking for words already seen
So that the concordance list has no duplicates, you must check the list every time a word is read from the input stream to see whether the word is already present. Use the library function strcmp to do this, by including the head file string.h at the top of your file.
Capital letters
Although these can be handled easily by using you own C code, the library header ctype.h contains a number of useful routines for handling characters. In particular, "tolower" converts a character to lower case, while "toupper" converts it to upper case. In addition, there are testing functions like "isdigit", "isalpha" etc. Type "man ctype" at the UNIX prompt to get a full list of these useful routines.
Input
The input will be from standard input (no file opening and closing is necessary) and can be any piece of text. Do not use a C program as input. Instead you should use a piece of normal English text. We will test your program on a simple piece of text that we will choose.
Output
Your program should print a list of the different words found in the text. e.g.:
1: the 2: cat 3: sat 4: on 5: mat 6: she 7: was 8: sleepy 9: and 10: comfortable ... ...
Deliverables
Due date
Monday, April 13th. 1998 before 5:00pm.