Character encodings are based on a similar principle to those "unbreakable" codes kids use to communicate: A is 1, B is 2, and so forth. Messages are just sequences of numbers; to send CAT we send 3 1 20.
Computer character sets are just agreed-upon encodings like this. Back in the early days, there were a vast number of encodings; typically every manufacturer had their own; the widths of the encodings varied as well. CDC, who had a 60 bit word size, used a six bit encoding (so they could put ten characters in a word). One of the most famous manufacturer-specific encodings was IBM's Extended Binary-Coded Decimal Interchange Code.
The American Standard Code for Information Interchange (ASCII) was proposed in 1963 and standardized in 1968. ASCII is a seven-bit code, so it can represent 128 different symbols of various types. Here's an ASCII code chart:
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | A | B | C | D | E | F | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | NUL | SOH | STX | ETX | EOT | ENQ | ACK | BEL | BS | HT | LF | VT | FF | CR | SO | SI |
1 | DLE | DC1 | DC2 | DC3 | DC4 | NAK | SYN | ETB | CAN | EM | SUB | ESC | FS | GS | RS | US |
2 | SP | ! | " | # | $ | % | & | ' | ( | ) | * | + | , | - | . | / |
3 | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | : | ; | < | = | > | ? |
4 | @ | A | B | C | D | E | F | G | H | I | J | K | L | M | N | O |
5 | P | Q | R | S | T | U | V | W | X | Y | Z | [ | \ | ] | ^ | _ |
6 | ` | a | b | c | d | e | f | g | h | i | j | k | l | m | n | o |
7 | p | q | r | s | t | u | v | w | x | y | z | { | | | } | ~ | DEL |
Looking at it, we can see it is divided into several sections:
'0'
.
The IBM PC defined an eight bit "extended ASCII" that used another 128 characters for various graphics and line-drawing purposes.
ASCII has been adopted as the basis of the ISO 8859-x family of character encodings; in these, the remaining 128 characters are used for various language-dependant extensions such as German umlauted vowels and the like.