Search results
Results From The WOW.Com Content Network
The standard defines five basic formats that are named for their numeric base and the number of bits used in their interchange encoding. There are three binary floating-point basic formats (encoded with 32, 64 or 128 bits) and two decimal floating-point basic formats (encoded with 64 or 128 bits).
In the IEEE 754 standard, the 64-bit base-2 format is officially referred to as binary64; it was called double in IEEE 754-1985. IEEE 754 specifies additional floating-point formats, including 32-bit base-2 single precision and, more recently, base-10 representations (decimal floating point).
A floating-point variable can represent a wider range of numbers than a fixed-point variable of the same bit width at the cost of precision. A signed 32-bit integer variable has a maximum value of 2 31 − 1 = 2,147,483,647, whereas an IEEE 754 32-bit base-2 floating-point variable has a maximum value of (2 − 2 −23) × 2 127 ≈ 3.4028235 ...
The otherwise binary Wang VS machine supported a 64-bit decimal floating-point format in 1977. [2] The Motorola 68881 supported a format with 17 digits of mantissa and 3 of exponent in 1984, with the floating-point support library for the Motorola 68040 processor providing a compatible 96-bit decimal floating-point storage format in 1990. [2]
The last two examples illustrate what happens if x is a rather small number. In the second from last example, x = 1.110111⋯111 × 2 −50 ; 15 bits altogether. The binary is replaced very crudely by a single power of 2 (in this example, 2 −49) and its decimal equivalent is used.
Huberto M. Sierra noted in his 1956 patent "Floating Decimal Point Arithmetic Control Means for Calculator": [1] Thus under some conditions, the major portion of the significant data digits may lie beyond the capacity of the registers.
The precision limit is different from the range limit, as it affects the significand, not the exponent. The significand is a binary fraction that doesn't necessarily perfectly match a decimal fraction. In many cases a sum of reciprocal powers of 2 does not match a specific decimal fraction, and the results of computations will be slightly off.
The 2008 revision extended the previous standard where it was necessary, added decimal arithmetic and formats, tightened up certain areas of the original standard which were left undefined, and merged in IEEE 854 (the radix-independent floating-point standard). In a few cases, where stricter definitions of binary floating-point arithmetic might ...