Grammars

Before trying the quiz, please read section 3.3.1 through to 3.3.1.5 of Sebesta.

A grammar is a formal way of defining the syntax of a language. By formal we simply mean that its form, i.e. the way it is presented, has rules of tis own. In fact languages of grammars themselves have syntax definitions, but it is usual to present them informally at first. The most important formal grammar is called BNF, or Backus-Naur Form, after two of the grandfathers of computer science, John Backus and Peter Naur. Chomsky later called this kind of grammar a context-free grammar and described several other kinds in the Chomsky hierarchy of languages. It is the most important class of grammars for describing computer languages formally.

Their idea was to present the syntax of a grammar as a set of rules, each one describing the makeup of a catgeory or token of the language. The steps to defining a BNF grammar are:

1. Construct a set of lexemes for the language. These are the basic constituents of programs. They called these terminal symbols since they do not need refining any further (unless it is into ASCII chaaracters).

2. Construct a set of tokens for the language. These syntactic categories are called non-terminal symbols in BNF. Each one needs a definition that shows how it can be refined.

3. Construct a set of rules, at least one per non-terminal symbol, that show how the non-terminal can be expanded. The expansion can consist of a number of alternatives, and each alternative can consist of a sequence of terminals and/or non-terminals.

4. Designate one non-terminal as the start symbol. This will be the only one that does not appear as the expansion part of any rule.

As an example, let us build a grammar for binary numbers such as 0, 1, 100011, 111101001001000100100.

1. The alphabet of this language consists of the two characters 0 and 1. These are also the lexemes or terminal-symbols.

2. We will need three non-terminals: one, called binary-num that will be the start symbol; one, called digit-seq, that will exapnd to a sequence of binary digits; and one, called binary-digit that can be either a 0 or a 1.

3. The rules are as follows. The traditional syntax of BNF uses diamond brackets to distingish terminals from non-terminals, the vertical bar to separate alternatives, and the production symbol ::= to define the expansion of each non-terminal.

<binary-num> ::= <binary-seq>
<binary-seq> ::= <binary-digit> | <binary-digit> <binary-seq>
<binary-digit> ::= 0 | 1

The second rule is the most important and the most interesting. It contains two alternatives, one of which mentions the non-termianl being defined. It is recursive. This is the only way that BNF can define unbounded sequences of lexemes (Sebesta calls them lists). This is vital for programming langauges since programs can be of any size or shape, limited only by the memory of the computer and the ingenuity of the programmer.

This grammar can be used as a generator or a recognizer. The next section deals with parsing and parse trees, but here, we can show how the grammar can generate any binary number. The sequence is as follows:

1. Take the start symbol and expand it. There are no alternatives.

binary-num becomes binary-seq

2. Take the expansion and expand every non-terminal in it. If there are alternatives, choose one at random.

binary-seq becomes binary-digit binary-seq (taking the second alternaive)

3. Repeat step 2 until there are no more non-terminals to expand.

1 binary-digit binary-seq (choosing 1 for binary-digit, and the second alternative for binary-seq again)

1 0 binary-digit binary-seq (choosing 0 for binary-digit, and the second alternative for binary-seq again)

1 0 1 binary-digit (choosing 1 for binary-digit, and the first alternative for binary-seq - this will stop the recusrion)

1 0 1 1 (choosing 1 for binary-digit)

Since there are no more non-terminal symbols, this is the final string generated by the grammar. It is easy to see (but hard to prove!) that this can generate all possible binary-numbers, including oddities like 00000000.

Now try the self-test called grammars by clicking on Self-Test in the Action Menu.