Thursday, February 7, 2013

Compiling the Squeak COG virtual machine on linux with gcc version > 4.2.1

The obsolescence problem

It's currently not possible to compile an optimized COG VM on a modern C compiler. The C standards have evolved, and modern compilers reflect these evolutions.

On the Squeak side, the code has recently evolved too, but mostly from the feature side (clever use of native Stack and JIT). Adoption of new standards (if we can call C99 new) was left well behind. Consequently, the Squeak VM now rely on some Undefined Behavior (the so called UB) and cannot be compiled on gcc version > 4.2 nor clang... Or we soon get a Segmentation Fault

How to attack the problem ?

Correcting the whole source base

The C Compiler emits a bunch of warning to help us detecting and killing those potentially broken constructs. Compiling a VM will produce several hundreds of these... That's what we can call a technical debt! Not all of these will lead to a VM crash, but some will, how to find them? Ideally, we should have no warning at all, and that should be a mid-term goal...

Unfortunately, we have a few handicaps:
  • The code base is large.
  • There are not so many people working on the VM.
  • There are too many warnings to be fixed by a too few people.
  • Understanding the C standards, what does not work anymore that used to, and how to workaround is often very technical.
  • More over, the VM code is translated into C from Slang (a limited set of Smalltalk messages). So there is no point in correcting the generated code, we must correct the generator.
Unfortunately, the generator is a clever hack working on the Smalltalk Abstract Syntax Tree, but unaware of any property of the underlying C types which are just programmer's hints. The generator does not model the underlying C language complexity. It ain't gonna be easy to make it aware of UB, and we'll likely have to fix Slang with convoluted workarounds (like using those nice (unsigned) casts)...

Focusing corrections on a few hot spots

I tried to analyze some of these warnings, and to remove some UB, like for example testing signed arithmetic overflow in post-condition ( Remember, you can find Monticello package along with some other experimentations at SmalltalkHub, But such guess was essentially random, it's not amazing that I failed so far to produce a VM that doesn't crash with gcc version > 4.2.1.

How to identify the most nasty construct(s)?

Debugging the VM

We have a way to simulate our VM code inside Smalltalk, but that is of course of no help, because we in fact debug Slang code, and Slang is correct, Smalltalk behavior is generally simple and well defined.
Debugging generated code, especially jitted one is another thing. It's not that the jitted code is faulty. It's just that it gets in our way and fools the debugger enough to make our life hard. Moreover the Segmentation fault can strike well after the erroneous section. I rapidly abandonned this approach...

Comparing generated code

I had this brilliant (™) idea to compare generated code between two versions of gcc. It's possible to access assembly via options like gcc -S, or gcc -Wa,-ahdln -g. Oh, but this can't work with optimizations, because the C Compiler will reorder functions and blocks, perform inlining, etc... Between two versions of gcc, there are enough differences to make this comparison totally impracticle - like 150,000 lines are different out of 200,000. This idea finally was kind of stupid (™ too) !

Selectively enabling compiler optimizations

Our main source of problems is that the C compiler now has much more aggressive optimizations because it has a license to presume that we don't rely on UB (no one should obviously).
Fortunately, without optimization, the VM is working correctly, it is even working with gcc version 4.6.3 level one optimizations -O1. So, I wanted to know if we were killed by some strict-aliasing or strict-overflow assumptions or one of the signed arithmetic grey zone... I started to selectively disable some options starting from -O2, or enable some others starting from -O1, according to gcc documentation. I used many attempts as shown in table below... As an excuse, I also got some nasty gcc error report, I can't resist to offer you the french localization: 

$ /bin/rm `find . -name '*.[ao]'`
$ ../../platforms/unix/config/configure --without-npsqueak CFLAGS="-g -O1 -fno-strict-aliasing -fno-strict-overflow -fthread-jumps -falign-functions  -falign-jumps -falign-loops  -falign-labels  -fcaller-saves -fcrossjumping -fcse-follow-jumps  -fcse-skip-blocks -fdevirtualize -fexpensive-optimizations -fgcse  -fgcse-lm -finline-small-functions -findirect-inlining -fipa-sra -foptimize-sibling-calls -fpartial-inlining -fpeephole2 -fregmove -freorder-blocks  -freorder-functions -frerun-cse-after-loop  -fsched-interblock  -fsched-spec -fschedule-insns  -fschedule-insns2 -ftree-switch-conversion -ftree-pre -ftree-vrp -Wall -Wextra -msse2 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -DNDEBUG -DITIMER_HEARTBEAT=1 -DNO_VM_PROFILE=1 -DCOGMTVM=0 -DDEBUGVM=0" LIBS=-lpthread
$ make


/Squeak/vm_cog/src/vm/gcc3x-cointerp.c:10010:1: erreur: unable to find a register to spill in class ‘GENERAL_REGS’
/Squeak/vm_cog/src/vm/gcc3x-cointerp.c:10010:1: erreur: ceci est le insn:
(insn 3247 3251 3248 223 (set (mem/c/i:SI (symbol_ref:SI ("instructionPointer") [flags 0x2] <var_decl 0xb6564540 instructionPointer>) [0 instructionPointer+0 S4 A32])
        (reg/f:SI 2426 [ localIP.4333 ])) /Squeak/vm_cog/src/vm/gcc3x-cointerp.c:4869 50 {*movsi_internal}
     (expr_list:REG_DEAD (reg/f:SI 2426 [ localIP.4333 ])

/Squeak/vm_cog/src/vm/gcc3x-cointerp.c:10010: embrouillé par les erreurs précédentes, abandon

I'm proud of this imbroglio , though not my primary goal...

And the winner is:
    Enable values to be allocated in registers that will be clobbered by function calls, by emitting extra instructions to save and restore the registers around such calls. Such allocation is done only when it seems to result in better code than would otherwise be produced.

    This option is always enabled by default on certain machines, usually those which have no call-preserved registers to use instead.

    Enabled at levels -O2, -O3, -Os.

So we can compile the VM with following options on linux/gcc 4.6.3:
$ cd unibuild/bld
$ /bin/rm `find . -name '*.[ao]'`
$ ../../platforms/unix/config/configure --without-npsqueak CFLAGS="-g -O2 -fno-caller-saves -Wall -Wextra -msse2 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -DNDEBUG -DITIMER_HEARTBEAT=1 -DNO_VM_PROFILE=1 -DCOGMTVM=0 -DDEBUGVM=0" LIBS=-lpthread
$ make

This was for my own VMMaker.oscog-nice.276, and it might work only on some versions of gcc. If you use a regular .oscog branch, it's untested yet, you might want to add some -fno-strict-overflow.

Also note that the FloatMathPlugin requires the -fno-strict-aliasing option. OpenJDK uses fdlibm for bit-identical float functions too and set this flag apparently with some success.

Well, this post does not contain much Smalltalk code, that's quite boring... 
But we now have a way to compile the VM with decent optimization on a modern gcc.
It does not address clang, but that's already something.

That's not all, we could also compare generated assembly with or without -fno-caller-saves and localize one source of UB problems.
That'll be for another day, or maybe another hacker...