Friday, March 5, 2021

Autopsy of an accuracy problem of log2 in Squeak

Some times ago, I've been annoyed by self log: 2 which was performing (self ln / 2 ln).
The problem is that it is inexact for some trivial cases. Whatever the libm used by the VM, we'll always find an integer n, such that (1.0 timesTwoPower: n) ln / 2 ln ~= n.

Rather than hacking the VM and adding a primitive, I have implemented the log2 function like this in Squeak which is way more simple:

    ^self significand ln / Ln2 + self exponent

In Squeak, every Float (but nan as usual) verifies:

    self = (self significand timesTwoPower: self exponent).

Also for every finite Float (but 0.0 and -0.0) - even the denormals (which means that the exponent is more evolved than just extracting the exponent bits):

    1 <= self significand and: [self significand < 2].

Beware when porting, this may differ in other dialects, for example in Visualworks:

    self class emin - 1 <= self exponent and: [self class exponent <= (self class emax + 1)].

Where emin - 1 is for denormals, and emax + 1 is for special values (infinities and NaNs).

Back to our formulation, the nice thing is that self significand = 1.0 if and only if the Float is a power of two. It's so simple that it's the new definition of Float>>isPowerOfTwo in Squeak from now on.

And another nice thing is that 1.0 ln = 0.0 whatever libm implementation. the result is that we have an exact log2 for every exact power of two. That's  goodness, goal achieved!

From the accuracy side, we have an inexact function with ln, an inexact representation of Ln2 (2 ln), and inexact division, and the last summation which can be inexact too. Even with the best libm library (a correctly rounded one), that's up to 2 ulp of error. That's not too bad for such a simple 1 liner, and just half ulp more than previous implementation!

Err! except in case of catastrophic cancellation!
That's exactly what happens in the neighbourhood of 1.0.
Let's take one example:

    x := 1.0 predecessor.
    x significand = 2.0 predecessor.
    x exponent = -1.
    2.0 predecessor ln/2 ln = 1.0 predecessor.
    1.0 predecessor - 1  = 0.5 ulp negated.
   
x significand ln / 2 ln + x exponent  = 0.5 ulp negated.
    -> -1.1102230246251565e-16

Compared to old implementation:

    1.0 predecessor ln / 2 ln.
    -> -1.6017132519074588e-16

Which we can verify with doubled precision ArbitraryPrecisionFloat:

    (1.0 predecessor asArbitraryPrecisionFloatNumBits: Float precision * 2) log2 asFloat
    -> -1.6017132519074588e-16

That's about 50% of error for the new implementation, more than 10^14 ulp, abysmally inacceptable for a math library!

Where is the problem?
The problem of catastrophic cancellation is when evaluating the logarithm, then subtracting 1.
In other words, ln(1- x) is easy to evaluate for small x, but ln(2 - 2x) - ln(2) is not.
Bummer, that's precisely the operation that significand is bringing.

We must scale down (divide by 2) if we're getting a significand near 2, but must not spoil the accuracy of 1.0 successor log2... That's easy:

    | s |
    s := self significand.
    ^s > 1.3333333333333333
        ifTrue: [(0.5 * s) ln / Ln2 + (1 + self exponent)]
        ifFalse: [s ln / Ln2 + self exponent]

With this formulation, s will remain in the interval [2/3 , 4/3]. It remains close to 1 for receiver close to 1. Our pathological case is now cured:

    1.0 predecessor log2.
    -> -1.6017132519074588e-16

Float can be tricky, they are not always 1 liner friendly...
The alternative is to profit by already existing wisdom and implementations of log2 from the external world, there must be some good one, somewhere...
That means, adding more and more primitives or FFI calls. But dealing with all the library path and other complexity brought by the external world is neither fun, nor a 1 liner! I've been there before. Until someone find another flaw,
I prefer my 5 liners, that's managed entropy.


Wednesday, March 3, 2021

A hidden feature of Interval

This short post to advertise a hidden feature of Interval which I documented in Cincom store SYSBUG-FloatIntervalTest some time ago and rediscovered recently.

This feature is shared by vast majority of Smalltalk dialects, Dolphin, Pharo, Squeak, Visualworks, ...

Certainly more an accidental feature than a deliberate one:

     ((10 to: 12) at: 1.5) = 10.5.

Funny, isn't it?

Friday, February 19, 2021

Cross compiling for Windows target with cmake

This is not directly related to Smalltalk, but it is a common source of trouble: how to compile a third-party package for windows platform especially when it comes with cmake files?

My latest example: recompile an upgraded version of lapack from netlib. Lapack sources are now conveniently located on github.

I do not want to use mingw directly (mingw stands for minimal gnu for windows), because I did spend too much time for installing the various tools on mingw in the past.

Instead, I either use cygwin cross-compilation toolchain for mingw target, or linux equivalent thru wsl (windows subsystem for linux). The main advantage of cygwin or linux distributions is the large amount of packages ready to use and easy to install (FORTRAN compilers and other niceties upgraded from the past).

Usually, we find many half-working solutions on the web, like setting a bunch of environment variable. For example I first tried this little shell after cloning the git source in /home/nice/git/lapack.

#!/bin/bash
#
# my own compilation of LAPACK dll for mingw target via cygwin cross-compilation
#
[ -d build_win64 ] || mkdir build_win64
cd build_win64
CC=x86_64-w64-mingw32-gcc \
FC=x86_64-w64-mingw32-gfortran \
AR=x86_64-w64-mingw32-ar \
NM=x86_64-w64-mingw32-nm \
DLLTOOL=x86_64-w64-mingw32-dlltool \
DLLWRAP=x86_64-w64-mingw32-dllwrap \
OBJCOPY=x86_64-w64-mingw32-objcopy \
STRP=x86_64-w64-mingw32-strip \
cmake .. -DBUILD_SHARED_LIBS=ON -DBUILD_INDEX64=ON
make

The worse thing is that it somehow worked...

Except that the generated dll had a ugly and incorrect cyg prefix. And except that when I tried to add the C interface with additionnal options -DCBLAS=ON -DLAPACKE=ON that miserably failed with this message:

CMake Error at /usr/share/cmake-3.17.3/Modules/FortranCInterface.cmake:383 (message):
 The Fortran compiler:

   /usr/bin/x86_64-w64-mingw32-gfortran.exe

 and the C compiler:

   /usr/bin/cc

 failed to compile a simple test project using both languages.  The output
 was:

   Change Dir: /home/nice/git/lapack/build_win64/CMakeFiles/FortranCInterface/VerifyC

Ouch this sticky /usr/bin/gcc is annoying, that's not what I specified! I tried to have more sticky environment variable thru export CC=... but that failed - possibly because I did not want to reconstruct the whole thing from scratch (it takes hours).

I knew that I should have instructed cmake about the toolchain instead. But how?

The wrong answer: spend some times in the terrific, horribilific, unhelpful cmake documentation. I'm not exagerating. Tell me how much are your skill increased on a scale of 10 after reading this toolchain reference. For me, the answer is very close to zero ! I've never seen such a poor documentation in my already long life!

The good answer: it's on the web of course, somewhere... In order to increase the probability of hitting one, I decided to duplicate the already existing answers and share my findings.

Here is the little toolchain file /home/nice/toolchain_x86_64-w64-mingw32.cmake that I assembled after an hour or more of browsing:

# For cross compiling for a windows target
# using cygwin or linux gcc toolchain
#
# The target System
set(CMAKE_SYSTEM_NAME Windows)

# Factor out the prefix which is common to our set of tools
set(TOOLCHAIN_PREFIX x86_64-w64-mingw32)

# The various compilers to use
set(CMAKE_C_COMPILER ${TOOLCHAIN_PREFIX}-gcc)
set(CMAKE_Fortran_COMPILER ${TOOLCHAIN_PREFIX}-gfortran)
set(CMAKE_RC_COMPILER ${TOOLCHAIN_PREFIX}-windres)

# and where to find includes etc...
SET(CMAKE_FIND_ROOT_PATH /usr/${TOOLCHAIN_PREFIX})

Then I just have to change my script like this:

#!/bin/bash
#
# my own compilation of LAPACK dll for mingw target via cygwin cross-compilation
#
[ -d build_win64 ] || mkdir build_win64
cd build_win64
cmake .. -DBUILD_SHARED_LIBS=ON -DBUILD_INDEX64=ON -DCBLAS=ON -DLAPACKE=ON \
  -DCMAKE_TOOLCHAIN_FILE=../toolchain_x86_64-w64-mingw32.cmake
make

Et voilĂ ... This snapshot is somehow a proof of concept:

 

It's simple and magic! Don't ask why ortran is lowercase, nor any other question about this masterpiece.

Magic is exactly the main grief I have toward cmake. As I always tell, cmake is great, as long as someone writes guesses the files for you...

There are other griefs, like not recompiling what is strictly necessary, but that's the IT trend, waste cycles of our sophisticated machines for repeating the same compilation again and again... Bad for the planet!

If this post can help someone, that will be already something. At least, I'm pretty sure that it will help myself in the near future, because with the age, I developped a new capability: forget the un-important things (we can delegate that to google, no matter the cost).