Code Reading: Token identification

Alphabet

C++ uses the standard character set, taken from the printing and non-printing characters produced by the standard computer keyboard. These characters form the ASCII set of character codes. Non-printing characters include the whitespace characters, space, tab and newline.

We always try to use a fixed-width font, such as Courier, to print out program listings. This ensures that any formatting using whitespace characters present will be accurately shown in printed form.

Tokens

A token is any number of consecutive characters that is a meaningful unit to the C++ compiler. Tokens can be names, keywords, operators, constants or comments. Separating tokens are punctuation, which can be whitespace, semi-colon, matched parentheses or braces, and, in certain cases, the equals sign and the comma.

Names

Any sequence of case-sensitive alphanumeric characters, including underscore, terminated by punctuation. e.g.

Keywords

These names are reserved and cannot be used for any purpose other than the use designated by the language. Note that since the language is case-sensitive, merely changing one character to upper case renders the name usable (but this is not reccomended!).

type and type-associated words:

bool, char, class, double, enum, false, float, int, long, namespace, private, protected, public, short, sizeof, struct, template, this, true, typedef, union, void
type modifiers:
auto, const, extern, inline, register, signed, static, unsigned, virtual, volatile
control words:
break, case, catch, continue, default, do, else, for, if, return, throw, try, while
operators:
operator, new, delete

Operators

Parens: () []

Access:
. -> .* ->*

Unary: + - ++ -- ! ~ & *

Bitwise: & | ^ >> <<

Arithemetic: + - * / %

Relational: == != < > <= >=

Logical: && ||

Assignment: = += -= *= /= %= &= |= ^= <<= >>=

Special: ? : (the conditional expression operator) , (the comma operator) :: (the scope-resolution operator)

Constants

Strings are any sequence of characters surrounded by double quotes. e.g, Characters are single characters, or escape characters surrounded by single quotes. e.g. Numbers are standard decimal format for integers and decimals, and an optional engineering format for floating-point. e.g.

Comments

Single line comments start with two forward slashes, like this:

// this is a single line comment

Multiple line comments start with /* and end with */, like this:

/*************************************************
* a multiple line comment very often is laid out *
* to look like a box                             *
*************************************************/

Punctuation

Braces: {}, parentheses() and brackets, []are used in pairs. Other punctuation characters are semi-colon, colon and comma.