Character Encodings

Character encodings are based on a similar principle to those "unbreakable" codes kids use to communicate: A is 1, B is 2, and so forth. Messages are just sequences of numbers; to send CAT we send 3 1 20.

Computer character sets are just agreed-upon encodings like this. Back in the early days, there were a vast number of encodings; typically every manufacturer had their own; the widths of the encodings varied as well. CDC, who had a 60 bit word size, used a six bit encoding (so they could put ten characters in a word). One of the most famous manufacturer-specific encodings was IBM's Extended Binary-Coded Decimal Interchange Code.

ASCII

The American Standard Code for Information Interchange (ASCII) was proposed in 1963 and standardized in 1968. ASCII is a seven-bit code, so it can represent 128 different symbols of various types. Here's an ASCII code chart:

0123456789ABCDEF
0NULSOHSTXETXEOTENQACKBELBSHTLFVTFFCRSOSI
1DLEDC1DC2DC3DC4NAKSYNETBCANEMSUBESCFSGSRSUS
2SP!"#$%&'()*+,-./
30123456789:;<=>?
4@ABCDEFGHIJKLMNO
5PQRSTUVWXYZ[\]^_
6`abcdefghijklmno
7pqrstuvwxyz{|}~DEL

Looking at it, we can see it is divided into several sections:

Extensions to ASCII

The IBM PC defined an eight bit "extended ASCII" that used another 128 characters for various graphics and line-drawing purposes.

ASCII has been adopted as the basis of the ISO 8859-x family of character encodings; in these, the remaining 128 characters are used for various language-dependant extensions such as German umlauted vowels and the like.


Last modified: Mon Oct 31 13:20:49 MST 2005