A Text Analyzer

Write a program to read any text file and print out a table listing the frequency of occurrence of each letter of the alphabet. The program should also print a count of all the whitespace characters, and all the non-alphabetic, non-whitespace characters in the file, and the text itself.

Procedure

  1. Write a one or two sentence problem statement.
  2. Write a problem description, including any aspects of the problem that you consider important. You will want to treat the input file as a stream of characters.
  3. Study the attached class declaration of CType and and derive three new classes from it--one for alphabetic characters, one for whitespace characters and one for all others. You will need to use the constructor and member functions addChar and addRange for CType to do this. Each class should encapsulate a counting mechanism specialized for it.
  4. Design an application class to prompt for a file name, read the text file and analyze it using the classes you have derived from CType.
  5. Write and compile the program.
  6. Consider how to test this program, and write a brief description of your test suite. Refer to the Testing handout for this. You will need several test input files corresponding to (a) special cases, (b) extreme cases and (c) representative cases. Think about line length, and file length, and whether word length is relevent.
  7. Test your program on each of you chosen test files and print out the results. Use redirection of standard output (or any other technique you know) to do this.

Hints


     #include <fstream.h>
     ...
     char fn[256];
     cin >> fn;	// reads the file name from the user
     ifstream in(fn);	// opens the file for reading
     ...
     char c;
     in.get(c)...	// reads a single character from the file
     ...

Here is the code for the class Ctype (this will be mailed to you):

class CType {
private:
  class CSubRange { // a private nested class
  private:
    char low, high;  // the upper and lower bounds of the range
  public:
    CSubRange(int l = 0, int h = 0) : low(l), high(h) {}
    bool inRange(char c) { return c >= low && c <= high; }
  };
protected:
  char *singles;        // a pointer to an array of characters
  CSubRange *ranges;    // a pointer to an array of sub-ranges
  int nSingles, nRanges;  // the number of characters in 
                          // singles and the
                          // number of sub-ranges in ranges
public:
  CType(int nMaxRanges = 0, int nMaxSingles = 0) :
    singles(new char[nMaxSingles]), ranges(new CSubRange[nMaxRanges]),
    nSingles(0), nRanges(0) {}
  ~CType() { delete [] singles; delete [] ranges; }
  void addChar(char c) {   // add a new character
    singles[nSingles++] = c;
  }
  void addRange(int l, int h) { // add a new sub-range
    ranges[nRanges++] = CSubRange(l, h);
  }
  bool contains(char c) const; // returns true if c is a member of
                         // the type
};

bool CType::contains(char c) const {
  for (int s = 0; s < nSingles; s++)  // is it one of the single characters?
    if (c == singles[s])
      return true;
  for (int r = 0; r < nRanges; r++)  // is it in one of the ranges?
    if (ranges[r].inRange(c))
      return true;
  return false;
}

Deliverables

  1. Your design document (problem statement, description and object design).
  2. A printout of your program source code.
  3. A description of your test suite, with printouts of your test files.
  4. Printouts of your program running on each of the test files.

Due Date

Hand in your documents to me (RTH) on November 6th. before 5:00 pm.