One of the themes of this class is that "everything's a number" — instructions are just numbers; what instructions manipulate is numbers, too. An important class of data that needs to be manipulated is character data, or text -- so how does the computer represent text as numbers?
Probably the best place to start is with the sort of "unbreakable" codes we all used as kids: we decide that in our super-secret messages, we'll use a "1" instead of an "A", a "2" instead of a "B", and so forth until we use "26" to represent "Z". Now when we want to send the message "CAT", we'll send 3 1 20.
Computer character sets are just agreed-upon encodings like this. Back in the early days, there were a vast number of encodings; typically every manufacturer had their own; the widths of the encodings varied as well. CDC, who had a 60 bit word size, used a six bit encoding (so they could put ten characters in a word). One of the most famous manufacturer-specific encodings was IBM's Extended Binary-Coded Decimal Interchange Code.
The American Standard Code for Information Interchange (ASCII) was proposed in 1963 and standardized in 1968, with its most recent revision in 1986. ASCII is a seven-bit code (there's a good reason for this, which we'll get to later), so it can represent 128 different symbols of various types. The Internet Assigned Numbers Authority likes it to be referred to as US-ASCII.
ASCII was designed to represent the subset of the English alphabet used in data processing. While it's perfectly adequate for this, it isn't capable of representing the "extra" characters used by other languages such as the Spanish ñ or the German ö, let alone other character sets like Greek or Hebrew, and let's not even think about Asian languages! However, all the more modern character encodings in wide use today are descended from, and supersets of, ASCII.
For instance, the ISO-8859 family of encodings are all 8-bit codes; they all use ASCII for the first 128 characters in the encoding, and language-specific extensions for the second 128 characters.
Here's an ASCII code chart:
| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | A | B | C | D | E | F | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | NUL | SOH | STX | ETX | EOT | ENQ | ACK | BEL | BS | HT | LF | VT | FF | CR | SO | SI |
| 1 | DLE | DC1 | DC2 | DC3 | DC4 | NAK | SYN | ETB | CAN | EM | SUB | ESC | FS | GS | RS | US |
| 2 | SP | ! | " | # | $ | % | & | ' | ( | ) | * | + | , | - | . | / |
| 3 | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | : | ; | < | = | > | ? |
| 4 | @ | A | B | C | D | E | F | G | H | I | J | K | L | M | N | O |
| 5 | P | Q | R | S | T | U | V | W | X | Y | Z | [ | \ | ] | ^ | _ |
| 6 | ` | a | b | c | d | e | f | g | h | i | j | k | l | m | n | o |
| 7 | p | q | r | s | t | u | v | w | x | y | z | { | | | } | ~ | DEL |
Looking at it, we can see it is divided into several sections:
'0'.
The IBM PC defined an eight bit "extended ASCII" that used another 128 characters for various graphics and line-drawing purposes.
ASCII has been adopted as the basis of the ISO 8859-x family of character encodings; in these, the remaining 128 characters are used for various language-dependant extensions such as German umlauted vowels and the like.