Components of Syntax

Before trying the quiz at the end of this section, please read section 3.2 of Sebesta.

Almost all computer languages use the standard western roman alphabet commonly called the ASCII characer set (strictly speaking only the printing characters and whitespace). Internationalization has led to other characer sets being available, but they are not commonly used in programming.

i.e. A-Z, a-z, 0-9, and the graphic symbols ~!@$%^&*()_+`-={}|[]\:";'<>,./?

Characters form lexemes such as identifiers, literals (also called constants), operators and reserved words.

e.g. fred, A_SMALL_NUMBER, 123.456, "abcdg", +=, if, else

A token is a set of lexemes (possibly infinite).

e.g. the set of identifiers contains fred and A_SMALL_NUMBER and lots more; the set of plus_operators contains only one: +. Different languages will need different tokens.

A language recognizer is a mechanism that analyzes a string of characters and checks it for conformity with a grammar, which is a formal definition of the language's syntax. Programs that implement this mechanism are called parsers.

A language generator is a mechanism that produces strings of characters that conform to a grammar. Programs could be written to implement the mechanism, but are of limited usefulness.

Now try the self-test on Syntax Components by clicking on Self-Test in the Action Menu.