Smallissimo: September 2011

Sunday, September 25, 2011

Reviewing fraction asFloat

After changing Integer>>asFloat, let's inspect Fraction>>asFloat. The spirit is quite simple: perform an Euclidean division after shifting the numerator or the denominator enough to obtain a certain precision in the quotient. The quotient will be the mantissa of the Float and the remainder are the truncated bits. Then come a dissertation on the conditions when to round upper or not. Let's see the code:

asFloat
    "Answer a Float that closely approximates the value of the receiver.
    This implementation will answer the closest floating point number to
    the receiver.
    It uses the IEEE 754 round to nearest even mode
    (can happen in case denominator is a power of two)"

    | a b q r exponent floatExponent n ha hb hq q1 |
    a := numerator abs.
    b := denominator abs.
    ha := a highBit.
    hb := b highBit.

    "If both numerator and denominator are represented exactly in floating point number,
    then fastest thing to do is to use hardwired float division"
    (ha < 54 and: [hb < 54]) ifTrue: [^numerator asFloat / denominator asFloat].

    "Try and obtain a mantissa with 54 bits.
    First guess is rough, we might get one more bit or one less"
    exponent := ha - hb - 54.
    exponent > 0
        ifTrue: [b := b bitShift: exponent]
        ifFalse: [a := a bitShift: exponent negated].
    q := a quo: b.
    r := a - (q * b).
    hq := q highBit.

    "check for gradual underflow, in which case we should use less bits"
    floatExponent := exponent + hq - 1.
    n := floatExponent > -1023
        ifTrue: [54]
        ifFalse: [54 + floatExponent + 1022].

    hq > n
        ifTrue: [exponent := exponent + hq - n.
            r := (q bitAnd: (1 bitShift: hq - n) - 1) * b + r.
            q := q bitShift: n - hq].
    hq < n
        ifTrue: [exponent := exponent + hq - n.
            q1 := (r bitShift: n - hq) quo: b.
            q := (q bitShift: n - hq) bitAnd: q1.
            r := (r bitShift: n - hq) - (q1 * b)].

    "check if we should round upward.
    The case of exact half (q bitAnd: 1) isZero not & (r isZero)
    will be handled by Integer>>asFloat"
    ((q bitAnd: 1) isZero or: [r isZero])
        ifFalse: [q := q + 1].

    ^ (self positive
        ifTrue: [q asFloat]
        ifFalse: [q = 0
            ifTrue: [Float negativeZero]
            ifFalse: [q asFloat negated]])
        timesTwoPower: exponent

Let's review the three parts in yellow : first the comment (can happen in case denominator is a power of two) is obscure. It is supposed to explain that in case of tie, the algorithm will round to nearest even. The case of tie can only happen if the denominator is a power of two indeed, any other factor in the denominator will lead to an infinite series of digits and thus cannot produce a tie. But this information is not essential enough to be placed in the header. After all it is not essential at all.

The second part in yellow tells that the quotient could be one bit less. Let's see how false it is.

Let's write that b have hb bits, a ha bits and q hq bits, and the conditions of Euclidean division:

(1<< (ha-1)) <= a, a < (1 << ha)
b < (1 << hb)
q < (1 << hq)
a=(b*q+r)
r < b

From the last two relations we have a < (b*q+b), that is a < (b * (q+1)).

We still have, q + 1 <= (1 << hq), thus b * (q+1) < ((1 < hb) * (1 << hq)), that is a < (1 < (hb+hq)).

But we also have (1 << (ha - 1)) <= a, thus ha - 1 < (hb + hq).

This can be written hq > (ha - hb - 1), or hq >= (ha - hb). So the quotient will have at least ha - hb bits, maybe more, but never less.

The third part in yellow is trying to perform an addition with bitAnd: - ouch ! It could eventually be bitOr:, but why not simply writing q := (q bitShift: n - q) + q1...

Fortunately, this is the case of one bit less which can never happen.

There are other things to be enhanced. The numbers 54 and 1022 are kind of magical and use implicit knowledge that 54 = (Float precision + 1) and -1022 = Float emin, the minimum possible exponent of a Float before gradual underflow occurs.

So we should better re-write this code and:

change the heading comment;
remove the unreachable branch;
use the new floating point inquiry messages precision and emin;
name the mantissa mantissa rather than q;
avoid to compute the remainder r, but just check if it is zero or hasTruncatedBits;
let the floating point hardware handle the case of negative zero by itself
(we just have to keep at least one bit in the mantissa).

asFloat
    "Answer a Float that closely approximates the value of the receiver.
    This implementation will answer the closest floating point number to the receiver.
    In case of a tie, it will use the IEEE 754 round to nearest even mode.
    In case of overflow, it will answer +/- Float infinity."

    | a b mantissa exponent hasTruncatedBits lostBit n ha hb hm |
    a := numerator abs.
    b := denominator.    "denominator is always positive"
    ha := a highBit.
    hb := b highBit.

    "Number of bits to keep in mantissa plus one to handle rounding."
    n := 1 + Float precision.

    "If both numerator and denominator are represented exactly in floating point number,
    then fastest thing to do is to use hardwired float division."
    (ha < n and: [hb < n]) ifTrue: [^numerator asFloat / denominator asFloat].

    "Shift the fraction by a power of two exponent so as to obtain a mantissa with n bits.
    First guess is rough, the mantissa might have n+1 bits."
    exponent := ha - hb - n.
    exponent >= 0
        ifTrue: [b := b bitShift: exponent]
        ifFalse: [a := a bitShift: exponent negated].
    mantissa := a quo: b.
    hasTruncatedBits := a > (mantissa * b).
    hm := mantissa highBit.

    "Check for gradual underflow, in which case the mantissa will loose bits.

    Keep at least 1 bit to let the underflow preserve the sign of zero."
    lostBit := Float emin - (exponent + hm - 1).
    lostBit > 0 ifTrue: [n := n - lostBit max: 1].

    "Remove excess bits in the mantissa."
    hm > n
        ifTrue:
            [exponent := exponent + hm - n.
            hasTruncatedBits := hasTruncatedBits or: [mantissa anyBitOfMagnitudeFrom: 1 to: hm - n].
            mantissa := mantissa bitShift: n - hm].

    "Check if mantissa must be rounded upward.
    The case of tie (mantissa odd & hasTruncatedBits not)
    will be handled by Integer>>asFloat."
    (hasTruncatedBits and: [mantissa odd])
        ifTrue: [mantissa := mantissa + 1].

    ^ (self positive
            ifTrue: [mantissa asFloat]
            ifFalse: [mantissa asFloat negated])
        timesTwoPower: exponent

That's it. No revolution, but I hope it's a bit clearer, and for the same price, a tiny bit more efficient.

I could also explain in this blog why it is better to use floating point division if both numerator and denominator can be converted asFloat exactly... Or why the mantissa loose bits in case of underflow... But, oh, I'm so lazy, ain't that boring enough?

Thursday, September 22, 2011

Clarifying and optimizing Integer>>asFloat

I wanted to explain some notions about floating point and thought using Integer>>asFloat in Squeak as example. Unfortunately, I found the code was both not that clear and not that efficient and some of the comments were inaccurate (I can tell so without any diplomacy since it's my own production).

So I decided to change it once again.

First, let's see the comment. Three lines should be enough. Note that I didn't mention nearest Float, because in case of overflow we will answer Float infinity, while the nearest Float would be Float fmax.

Integer>>asFloat
   "Answer a Float that best approximates the value of the receiver.
   This algorithm is optimized to process only the significant digits of a LargeInteger.
   And it does honour IEEE 754 round to nearest even mode in case of excess precision (see details below)."

Then, it might be useful to detail what round to nearest even means exactly. It's quite long and I'm not sure this belongs to the method or if it should just be replaced by a link... But it does also help understanding the code below.

   "How numbers are rounded in IEEE 754 default rounding mode:
   A shift is applied so that the highest 53 bits are placed before the floating point to form a mantissa.
   The trailing bits form the fraction part placed after the floating point.
   This fractional number must be rounded to the nearest integer.
   If fraction part is 2r0.1, exactly between two consecutive integers, there is a tie.
   The nearest even integer is chosen in this case.
   Examples (First 52bits of mantissa are omitted for brevity):
   2r0.00001 is rounded downward to 2r0
   2r1.00001 is rounded downward to 2r1
   2r0.1 is a tie and rounded to 2r0 (nearest even)
   2r1.1 is a tie and rounded to 2r10 (nearest even)
   2r0.10001 is rounded upward to 2r1
   2r1.10001 is rounded upward to 2r10
   Thus, if the next bit after floating point is 0, the mantissa is left unchanged.
   If next bit after floating point is 1, an odd mantissa is always rounded upper.
   An even mantissa is rounded upper only if the fraction part is not a tie."

Then we can come with a cleaner implementation by just naming the variables more explicitly to fit the comments:

| mantissa shift sum excess mask trailingBits nextBitIsSet tie |

mantissa := self abs.

"Check how many bits excess the maximum precision of a Float mantissa."

excess := mantissa highBit - Float precision.

excess > 0

ifTrue:

["Remove the excess bits"

mask := (1 bitShift: excess) - 1.

trailingBits := mantissa bitAnd: mask.

"trailingBits isZero ifFalse: [Inexact signal]."

mantissa := mantissa bitShift: excess negated.

shift := excess.

"But care to honour IEEE 754 round to nearest even mode."

nextBitIsSet := trailingBits highBit = excess.

(nextBitIsSet and: [mantissa odd or:

[tie := trailingBits isPowerOfTwo.

tie not]])

ifTrue: [mantissa := mantissa + 1]]

ifFalse: [shift := 0].

"Now that mantissa has no more excess precision, the following floating point operations will be exact."

sum := 0.0.

1 to: mantissa digitLength do:

[:byteIndex |

sum := sum + ((mantissa digitAt: byteIndex) asFloat timesTwoPower: shift)

shift := shift + 8].

^ self positive

ifTrue: [sum]

ifFalse: [sum negated]

Note that in this case, the rounding is performed entirely in Smalltalk and the algorithm will round to nearest even whatever the hardware rounding mode is, because all successive floating point operations will be exact.

The hardware inexact flag thus won't be set. We could replace this by a soft Inexact signal, but this is commented out because we don't care of and don't have such exception in Smalltalk until now. It's just a Smalltalkish comment.

But this code still performs many bit operations on LargeIntegers in case of excess precision. We could try alternatives that won't produce intermediate LargeIntegers.

excess > 0

ifTrue:

["Take care to honour IEEE 754 round to nearest even mode."

mustRoundUpper := (mantissa bitAt: excess) > 0 "The bit after floating point is set"

and: [(mantissa bitAt: 1 + excess) > 0 "The mantissa is odd"

or: [excess > 1 and: [self anyBitOfMagnitudeFrom: 1 to: excess - 1 "This is not a tie"]]].

"Remove the excess bits"

mantissa := mantissa bitShift: excess negated.

shift := excess.

mustRoundUpper ifTrue: [mantissa := mantissa + 1]]

But see how we must assist the code with many comments... Not that good. And not that faster...

We can still write more efficient code: we just have to let the hardware perform the rounding by itself. In effect, if we just add one excess bit, we are sure that there will be at most one inexact operation, and thus that we won't cumulate round off errors. But in this case, it is necessary to explain in preamble:

"Algorihm details:

Floating point hardware will correctly handle the rounding by itself with a single inexact operation if mantissa has one excess bit of precision.

Except in the last case when extra bits are present after an even mantissa, we must round upper by ourselves.

Note 1: the inexact flag in floating point hardware must not be trusted because it won't take into account the bits we truncated by ourselves.

Note 2: the floating point hardware is presumed configured in default rounding mode."

    | mantissa shift sum excess |
   mantissa := self abs.

   "Check how many bits excess the maximum precision of a Float mantissa."
   excess := mantissa highBit - Float precision.
   excess > 1
       ifTrue:
           ["Remove the excess bits but one"
           mantissa := mantissa bitShift: 1 - excess.
           shift := excess - 1.
           "Handle the case of extra bits truncated after an even mantissa."
           ((mantissa bitAnd: 2r11) = 2r01 and: [self anyBitOfMagnitudeFrom: 1 to: shift])
               ifTrue: [mantissa := mantissa + 1]]
       ifFalse: [shift := 0].

   "Now that mantissa has at most 1 excess bit of precision, let floating point operations perform the final rounding."
   sum := 0.0.
   1 to: mantissa digitLength do:
       [:byteIndex |
       sum := sum + ((mantissa digitAt: byteIndex) asFloat timesTwoPower: shift).
       shift := shift + 8].
   ^ self positive
       ifTrue: [sum]
       ifFalse: [sum negated]

This is shorter and faster, and a bit more fragile because now depending on hardware settings... But we never change these settings in Smalltalk, so we don't mind. Eventually, we could also come with a version that will always honour hardware rounding mode in the future - only if we can control it from within Smalltalk. The most annoying thing is that we trade some didactic properties now hidden into hardware black box. Now I must decide wether to commit immediately or wait for further comments...