Friday, March 27, 2015

Is bitShift: equivalent to division in Smalltalk? And in Slang?

Let's try it in Squeak/Pharo:

(1 to: 8) allSatisfy: [:shift |
    (-255 to: 255) allSatisfy: [:e | (e//(1<<shift) ) = (e >> shift)]]

-> true

Isn't it surprising? It is well known that arithmetic shift for negative numbers is NOT equivalent to integer division (quotient of Euclidean division). See Non-equivalence_of_arithmetic_right_shift_and_division

Let's see. Above article assumes that:
  • negative are represented in 2-complement. We can assume that in Smalltalk for every bit operation (in fact large integers are represented as sign-magnitude, and we emulate two-complement but that's a detail) ;
  • bitShift: is well defined for negative and is propagating the sign bit. Oh yes it is, integers are not bounded and we assume an infinite sequence of 1 left of negatives, so what else would be propagated ;
  • the quotient is rounded toward zero.
Ah the last point is explaining the difference with common knowledge: // is not rounding toward zero, it's rounding toward negative infinity.

If we use quo: instead, then the answer is false and matches wikipedia's answer.

What about Slang? Slang is a subset of Smalltalk which can be translated to C code and which is used for generating the Squeak/Pharo Virtual Machine.

We must have a look at initializeCTranslationDictionary method in CCodeGenerator from the VMMaker package. It maps // to C operation / and \\ to C operation %. Ouch, but / is rounding toward zero as quo: does, and % behaves like rem:.

So what is true in Smalltalk is NOT in Slang. Too bad that we did not use quo: and rem: in Slang, we did not even map them!

This kind of mismatch gave us a famous bug once upon a time. Since right shift was for historical reason translated as a logical shift ((unsigned) a >> b), we thought we could just use // for arithmetic shifts, and the worse thing is that when simulating the Slang code, all was working as expected... Alas not in the VM once compiled.

Today, for the purpose of eliminating false positive warning in dead branches, I was about to generalize constant folding like bindVariableUsesIn:andConstantFoldIf:in: of TSendNode... This method is evaluating the constant in Smalltalk and is generating the value in C. This will work as long as no one mix a negative constant expression with //, \\ or >>. Otherwise, the same expression might well lead to two different results in the generated code, depending on the inlining level! For example, self minSmallInteger + 1 // 2 might well lead to a tricky variant of this bug.

The right thing would be to write slang exclusively with quo: and rem: and to change the generation of >> since we can allways force the logical one by sending a asUnsigned >> b. We can catch all the senders at runtime with a halt, but it's tedious... And there is more than a branch of VMMaker.

If we really insist on using a//b, it translates as:
(a/b)-(int)(a & (((unsigned)a)>>(sizeof(int)*8-1))
 But adopting the C philosophy, we ain't gonna pay for what we don't buy ;)