Ambiguous grammars (and what to do about them)

Before trying the quiz, please read section 3.3.1.7 through to 3.3.1.10 of Sebesta.

An ambiguous grammar is one for which a given sentence in the langauge has at least two parse trees. This is a bad thing since the semantics attached to each tree by the compiler may well be different. This could lead to the same program having different behaviors when compiled by different compilers. In natural language ambiguity is tolerated, but is still a problem. A most famous example in English is the sentence "time flies like an arrow" which has many different meanings depending on how it is parsed. This, hwever cannot be tolerated in a computer language. Ambiguity must be removed either syntactically or semantically. It is reasonably certain that any BNF rule that has double recursion (the non-terminal being defined is mentioned twice in the expansion) leads to ambiguity. Sometimes it doesn't matter, as in this grammar:

<num> ::= <dig> | <num> <num>
<dig> ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 |7 | 8 | 9

The sentence 123 can be parsed in two ways:

                        num                                 num
                         /\                                  /\
                        /  \                                /  \
                       /    \                              /    \
                     num     \                            /    num
                      /\      \                          /      /\
                     /  \      \                        /      /  \
                    /    \      \                      /      /    \
                   num   num   num                    num    num   num
                    |     |     |                      |      |     |
                    |     |     |                      |      |     |
                   dig   dig   dig                    dig    dig   dig
                    |     |     |                      |      |     |
                    |     |     |                      |      |     |
                    1     2     3                      1      2     3

It doesn't matter since a compiler would probably not handle the parsing of numbers either way - the value produced would be the same in each case. However, the examples in Sebesta for arithmetic expressions, and for the if-then-else construct point out that there can indeed be a difference. It matters whether the sentence 2 + 3 * 4 is parsed as 2 + (3 * 4) or (2 + 3) * 4. It also matters whether

if E1 then if E2 then S1 else S2

is parsed as

if E1 then begin if E2 then S1 else S2 end (if form 1)

or as

if E1 then begin if E2 then S1 end else S2 (if form 2)

The placement of the end in two different places gives the conditional two different outcomes. This table summarizes thi possibilities:

E1 True True False False
E2 True False True False
if form 1executes: S1 S2 neither neither
if form 2 executes: S1 neither S2 S2

Removing ambiguity

Syntactic ambiguity can be handled in two ways It can be removed in the grammar - unambiguous syntax leads to unambiguous semantics; or it can be handled by the semantic phase of the compiler, by applying a rule. Since many details of languages are handled at the semantic level, especially those that cannot be expressed by BNF grammars, it would seem appropriate to take the second way out. However, many language designers feel that all syntactic ambiguity should be removed. Section 3.3.1.7 describes the standard removal of ambiguity in expressions by applying a notion of operator precedence in the BNF rules themselves. This is done by removing the double recursion, and creating two rules out of one. Section 3.3.1.10 does the same with the if-then-else form in C. We can also change the syntax to simply avoid the problem. Modula 2, for instance, insists on and END reserved word to terminate every if statement. i.e. What is ambiguous in C:

if (x > 0) 
  if (y = 0)
  {
    x = 1;
    y = 2;
  }
else
  y = 3;

becomes unambiguous in Modula 2:

IF X > 0 THEN
  IF y = 0 THEN
    x := 1;
    y := 2
  END
ELSE
  y := 3
END;

Now try the self-test called ambiguity by clicking on Self-Test in the Action Menu