Floating point notation is essentially the same as scientific notation, only translated to binary. There are three fields: the sign (which is the sign of the number), the exponent (some representations have used a separate exponent sign and exponent magnitude; IEEE format does not), and a mantissa (which IEEE format calls a significand).
As we discuss the details of the format, you'll find that the motivations used to select some features seem like they should have driven other features in directions other than what was actually used. It seems inconsistent to me, too...
One other thing to mention is that the IEEE floating point format actually specifies several different formats: a ``single-precision'' format that takes 32 bits (ie one word in most machines) to represent a value, and a ``double-precision'' format that allows for both greater precision and greater range, but uses 64 bits. We'll be talking about single-precision here.
The one-bit sign is 0 for positive, or 1 for negative. The representation is sign-magnitude.
The exponent gives a power of two, rather than a power of ten as in scientific notation (again, there have been floating point formats using a power of eight or sixteen; IEEE uses two).
The eight-bit exponent uses excess 127 notation. What this means is that the exponent is represented in the field by a number 127 greater than its value. Why? Because it lets us use an integer comparison to tell if one floating point number is larger than another, so long as both are the same sign.
Using a binary exponent gives us an unexpected benefit. In scientific notation, we always work with a ``normalized'' number: a number whose mantissa is between 1 and 9. If a binary floating point number is normalized, it must have the form 1.f -- the most significant bit must be a 1. Well, if we *know* what it is, we don't need to explicitly represent it, right? So we just store the fraction part in the word, and put in the 1. when we're actually inside the floating point unit. Sometimes this is called using a ``phantom bit'' or a ``hidden bit.''
Since we're going to fill a 32-bit word, the significand is 23 bits, but represents a 24 bit value. The 23 bits actually represented are called the ``significand,'' and the full 24 bits including the hidden bit is the fraction.
The value represented by an IEEE floating point number is
(-1)s * 1.f * 2exp-127
IEEE FP uses a normalized representation where possible, and also extends its range at the expense of normalization with denormalized numbers.
Extend range of representation (at cost in precision of really small numbers) with ``denormals.'' These have an exp field of 0, and represent
(-1)s * 0.f * 2-126So we can actually represent numbers as small as s-149, though with less and less precision as they get smaller. This also gives us a way to precisely represent 0 - and to use the same representation for floating point 0 as for integer 0.
exp field of ff is used for other goodies: if fraction field is 0, +- infinity; any other fraction is Not A Number.
So we can express everything possible in the format like this:
| Sign | Exponent | Fraction | Represents | Notes |
|---|---|---|---|---|
| 1 | ff | != 0 | NaN | |
| 1 | ff | 0 | -infinity | |
| 1 | 01-fe | anything | -1.f * 2(exp-127) | |
| 1 | 00 | != 0 | -0.f * 2-126 | |
| 1 | 00 | 0 | -0 | (special case of last line) |
| 0 | 00 | 0 | 0 | (special case of next line) |
| 0 | 00 | != 0 | 0.f * 2-126 | |
| 0 | 01-fe | anything | 1.f * 2(exp - 127) | |
| 0 | ff | 0 | infinity | |
| 0 | ff | != 0 | NaN |
IEEE FP has many, many more features -- this is just scratching the surface. For example, it's possible to specify the rounding behavior, the use of guard bits, behavior on overflow and underflow....
Let's add 2.5 + 4.75
| Old | Old/2 | Bit |
|---|---|---|
| 2 | 1 | 0 |
| 1 | 0 | 1 |
So we get 10
| Old | Bit | New |
|---|---|---|
| .5 | 1 | 0 |
So the fraction part is .1
The number we're converting is 10.12, which is 1.01 x 2 1. The exponent is 127+1 = 12810, or 100000002, and the fraction is 010-02.
| 2.5 | |
|---|---|
| Sign: | 0 |
| Exponent: | 10000000 |
| Mantissa: | 1.01 |
| 4.75 | |
| Sign: | 0 |
| Exponent: | 10000001 |
| Mantissa: | 1.0011 |
0.1010 | ||
1.0011 | ||
------ | ||
1.1101 |
| Old | Bit | New |
|---|---|---|
| 0 | 1 | 1 |
| 1 | 1 | 3 |
| 3 | 1 | 7 |
| Old | Bit | New |
|---|---|---|
| 0 | 1 | 1 |
| .5 | 0 | 0.5 |
| .25 |