Parse Trees

Before you try the quiz, please read sections 3.3.1.6 to 3.3.1.10 in Sebesta

Although using a grammar to generate sentences in a language is instructive, it is much more useful to use a grammar for parsing, i.e. for recognizing (or accepting) sentences as conforming to the rules of the grammar. It should be emphasized, however, that a grammar is more often used as a definition of the syntax of the language rather than as a guide to building a real parser. For information on real parsing algorithms and techniques look at this book. The only thing we will learn here is how to use parse trees, or derivations from the grammar to examine choices in language design.

Let us take the simple grammar for binary numbers again:

<binary-num> ::= <binary-seq>
<binary-seq> ::= <binary-digit> | <binary-digit> <binary-seq>
<binary-digit> ::= 0 | 1

Now consider the sentence 1011. Our job is to determine whether or not it belongs to the language. We will use the rules to build a parse tree (if possible) that tells us the structure of the sentence given by the non-terminal symbols. Ther are many ways to do this, just as there are many ways to design algorithms to do it. We will use a bottom-up, left-to-right approach to build the tree.

The first character in the sentence is 1. Since 1 only appears in the rule for binary-digit, we must classify it such. We can then start drawing a parse tree with terminal symbols at the leaves and terminals in the internal nodes:

 binary-digit

      |

      |
      1


We can do the same with all four digits:

binary-digit    binary-digit    binary-digit     binary-digit

     |               |               |                |

     |               |               |                |

     1               0               1                1

Out job now is to attempt to combine these trees into a single tree. This is where we have a choice. We could choose to classify each binary-digit as binary-seq, but our parsing woud come to a halt because none of the rules have binary-seq followed by binary-seq. What we need to do is to consume (that's a parsing word) each binary digit in turn using the second alternative of rule 2. What's more, we need to do it from the right-most one so that we can use it again at the next step. But in order to use the second alterniave of rule 2, we need a sequence binary-digit followed by binary-seq. So first we need to calssify the right-most binary-digit as binary-seq using the first alternative of rule 2:

                                                 binary-seq
                                                      |
                                                      |

binary-digit    binary-digit    binary-digit     binary-digit

     |               |               |                |

     |               |               |                |

     1               0               1                1

We can then use our second alternative to combine the last two. Notice how the use of a rule involves going backwards through a rule form right to left, gathering up terminals and non-terminals into the single non-terminal on the left of the rule:

                                         binary-seq
                                             /\
                                            /  \
                                           /   binary-seq
                                          /           |
                                         /            |
binary-digit    binary-digit    binary-digit     binary-digit

     |               |               |                |

     |               |               |                |

     1               0               1                1

If we continue using the same rule, we end up with a single tree with binary-seq as its root. Adding the first rule is easy to complete the parse tree:

         
                           binary-num
                                |
                                |
                           binary-seq
                               /\
                              /  \
                             /    \
        ---------------------    binary-seq
        |                             /\
        |                            /  \
        |                           /    \
        |                          /     binary-seq
        |              ------------          /\
        |              |                    /  \
        |              |                   /   binary-seq
        |              |                  /           |
        |              |                 /            |
binary-digit    binary-digit    binary-digit     binary-digit

     |               |               |                |

     |               |               |                |

     1               0               1                1

The right-most nature of the tree is a visual indication that rule number 2 uses right recursion, i.e. the recursive reference to the non-terminal being defined is at the right-most end of the alternative in which it appears.

Now try the self-test called parse trees by clicking on Self-Test in the Action Menu