Monday, April 1, 2013

32-bit word LargeIntegers backport in Interpreter VM

For 3 months, I'm using a modified COG VM with LargeIntegersPlugin v2.0. The plugin is stable and smooth behaved - no crash.

This plugin is a hack that handles LargeInteger as natively ordered 32-bit digits on VM side, while the class is still seen as 8-bit digits on image side.

Yesterday, I wanted to check how easy it would be to backport this version to an Interpreter VM. Normally, the answer should be very, because most of the plugins code is shared between VMs. Well, most is shared, but some dust will inevitably jam the cogs.

The first thing is that COG still provides some class variables that have disappeared from the Interpreter, and of course the plugins uses some of these (VMBIGENDIAN, BytesPerWord, BaseHeaderSize, ...). So we have to modify a bit (simple enough, vmEndiannessbytesPerWord and baseHeaderSize messages are available)...

Then,  Eliot Miranda has corrected a lot of C code generation quirks in the COG branch, and these are percolating back into the interpreter branch very slowly. LargeIntegersPlugin v2.0 uses 64 bit integers to store the results of operations on 32 bits words, and then split the results with bit operations, bitAnd: 16rFFFFFFFF, and bitShift: -32 (>> 32) all along the code. But old code generator cast every right shifted operand to an unsigned int in order to avoid Undefined Behavior of C with right shifted signed ints (Tsss!). But usqInt is 32 bits long in a 32 bits VM, so this cast is wrong for 64 bits ints. LargeIntegersPlugin v2.0 requires a backport of this specific change.

But that's not all. The SqueakVMUNIXPATHS.xcodeproj project used to compile on Mac lacks a settings for operating on 64 bits ints:
Missing Xcode Project Setting


I think that's all I had to do to make it work, so here are the first results of largeIntegerPlugins v2.0 (right column), compared to a 4.10.10 VM (left column) compiled on same old MacMini computer.
Micro benchmark on basic LargeInteger operations (# ops per seconds)

The micro-benchmark shows a poor performance on +. As we can see, the operations which should be theoretically proportional to bit-length, is not. Which means that most time is spent in primitive overhead. The v2.0 plugin has more overhead, because it operates on 32-bit words, and the final word is generally too large (has more than 8 leading zero bits). So a final normalization requires one more allocation and copy in a smaller LargeInteger, which spoils efficiency by a factor 2. Theoretically, we could avoid the copy and just hack the header of the LargeInteger to modify it's length but this part of object model is unbelievably complex, so I avoided hacking it so far.

Another dumb benchmark, running Squeak 4.5 KernelTests-Number takes 6.2 seconds with 2.0 plugins versus 7.8 seconds for VM 4.10.10 (2.0 seconds vs 3.6 seconds in COG).

Source code can be found in my SmalltalkHub repository http://smalltalkhub.com/#!/~nice/NiceVMExperiments, at VMMaker-nice.311 or more to date VMMaker-nice.315.

There are still a few items on the TODO list, all for the BigEndian VM cases:
  • implement a decently fast 8-bit digit at: and at:put: primitives both on COG and interpreter;
  • check about image segments (they might require byte swapping too);
  • handle the primitives that copy bytes (at least abort them).



No comments:

Post a Comment