Sunday, April 5, 2015

Removing UB in bytecodePrimMultiply

In last post, I said that LLVM did not seem to be a problem w.r.t. testing overflow in post-condition in bytecodePrimMultiply.

But I was fooled by the Xcode assembly output tool, and when I compile a VM with Xcode 42, LLVM 3.0, I get failures like this one:

self assert: (Duration days: 1 hours: 2 minutes: 3 seconds: 4 nanoSeconds: 5 ) asMilliSeconds positive.

It should be 93784000, but I get -906.

The conversion asMilliSeconds is going through a 47 bits intermediate integer holding the number of nanoseconds in the delay:
     ^ ((seconds * NanosInSecond) + nanos) // (10 raisedToInteger: 6)

With an undetected overflow it would be equivalent to something like:
    ^((seconds * NanosInSecond) + nanos bitAnd: 16rFFFFFFFF) - (1<<32) // (10 raisedToInteger: 6)

...exactly -906 BINGO! Typically the overflow we were supposed to detect in post-condition!

OK, that's not a new thing, the bug was already known when I published this  (Stefan Marr reported the bug somewhere in the thread). I provided a fix at the google-code issue tracker, but this was closed by Pharo team, so now the fix is roting at Pharo issue tracker du jour (unfortunately with restricted access):

Anyway, with 64bits Spur, the fix must be revised, so let's go back to Smalltalk code. We can easily implement the overflow test in precondition as prescribed in the link of last post:
    | rcvr arg result overflow oop |
    <var: #result type: 'sqLong'>
    rcvr := self internalStackValue: 1.
    arg := self internalStackValue: 0.
    (objectMemory areIntegers: rcvr and: arg)
        ifTrue: [rcvr := objectMemory integerValueOf: rcvr.
                arg := objectMemory integerValueOf: arg.
                result := rcvr.
                (self sizeof: rcvr) + (self sizeof: arg) <= 8
                        [overflow := false.]
                        [overflow := rcvr > 0
                            ifTrue:  [arg > 0
                                ifTrue: [rcvr > (16r7FFFFFFFFFFFFFFF / arg)]
                                ifFalse: [arg < (-16r8000000000000000 / rcvr)]]
                            ifFalse: [arg > 0
                                ifTrue: [rcvr < (-16r8000000000000000 / arg)]
                                ifFalse: [(rcvr < 0) and: [arg < (16r7FFFFFFFFFFFFFFF / rcvr)]]]].
                        [result := result * arg.
                        oop := self signed64BitIntegerFor: result.
                          self internalPop: 2 thenPush: oop.
                        ^self fetchNextBytecode "success"]]
        ifFalse: [self initPrimCall.
                self externalizeIPandSP.
                self primitiveFloatMultiply: rcvr byArg: arg.
                self internalizeIPandSP.
                self successful ifTrue: [^ self fetchNextBytecode "success"]].

    messageSelector := self specialSelector: 8.
    argumentCount := 1.
    self normalSend

If we are on a 32 bits machine, then we never generate overflow because the product is evaluated on 64 bits, so we have a very fast solution using Large Integers.

If SmallInteger can be longer than 32bits (which is the case of Spur64), then the overflow might happen - unless sqLong is 128 bits but we don't even bother to test it, since we don't have a portable signed128BitIntegerFor: available anyway. 

All is well, except that... bad luck, the code generated is incorrect on 64bits for other reasons: the literals are generated as unsigned long long!!! Comparing signed and unsigned of same length will promote to unsigned, and the overflow tests will miserably fail. Writing correct C code is hard enough, but generating correct C code in all cases is a nightmare, especially when whe hope to do it with clever hacks...

After some iterations, I came to this implementation in CCodeGenerator:
cLiteralFor: anObject
    "Return a string representing the C literal value for the given object."
    anObject isNumber
            [anObject isInteger ifTrue:
                [| printString |
                 printString := (anObject > 0
                                and: [(anObject >> anObject lowBit + 1) isPowerOfTwo
                                and: [(anObject highBit = anObject lowBit and: [anObject > 65536])
                                      or: [anObject highBit - anObject lowBit >= 4]]])
                                    ifTrue: ['0x', (anObject printStringBase: 16)]
                                    ifFalse: [anObject printString].
                ^anObject > 16r7FFFFFFFFFFFFFFF
                        ifTrue: [printString, self unsignedLong64Suffix]
                        ifFalse: [anObject > 16rFFFFFFFF
                            ifTrue: [printString,self signedLong64Suffix]
                                ifFalse: [anObject > 16r7FFFFFFF
                                ifTrue: [printString,self unsignedLong32Suffix]
                                ifFalse: [anObject < -16r80000000
                                    ifTrue: [anObject = -16r8000000000000000
                                            ifTrue: ['(-0x7FFFFFFFFFFFFFFF',self signedLong64Suffix,'-1)']
                                            ifFalse: [printString,self signedLong64Suffix]]
                                    ifFalse: [anObject = -16r80000000
                                            ifTrue: ['(-0x7FFFFFFF-1)']
                                            ifFalse: [printString]]]]]].
            anObject isFloat ifTrue:
                [^anObject printString]]
            [anObject isSymbol ifTrue:
                [^self cFunctionNameFor: anObject].
            anObject isString ifTrue:
                [^'"', (anObject copyReplaceAll: (String with: Character cr) with: '\n') , '"'].
            anObject == nil ifTrue: [^ 'null' ].
            anObject == true ifTrue: [^ '1' ].
            anObject == false ifTrue: [^ '0' ].
            anObject isCharacter ifTrue:
                [^anObject == $'
                    ifTrue: ['''\'''''] "i.e. '\''"
                    ifFalse: [anObject asString printString]]].
    self error: 'Warning: A Smalltalk literal could not be translated into a C constant: ', anObject printString.

Note that the generation is all but uniform, it goes through signed, unsigned, signed, unsigned when the magnitude grows... These hacks are those reducing the number of compiler warnings and currently lead to mostly correct code (I can't remember if it was necessary to force in slang code an asUnsignedInteger here or there, or the contrary). Note that I also implemented unsignedLong32Suffix, signedLong64Suffix and unsignedLong64Suffix in the CCodeGenerator, like this:
    "Answer the suffix that should be appended to a 64bits literal in generated code."

    ^(self sizeOfIntegralCType: #'unsigned long') = 4
        ifTrue: ['ULL']
        ifFalse: ['UL']

The method cLiteralFor: has a sister cLiteralFor:name: which should be modified too.

So, for compiling cog and spur 32bits with LLVM 3.0 and without exotic (heretic) compiler options, one should at least fix bytecodePrimMultiply, and for 64bits spur, at least cLiteralFor: and co.

I can tell that my Spur32 and cog32 VM are working fine with these changes. Not yet the Spur64, I still can't compile it for other reasons (MacOSX SDK problems...).

As usual, all the material is published on my VMMaker branch at NiceVMExperiments VMMaker.oscog-nice.1157, but mixed with other changes.

1 comment:

  1. > Writing correct C code is hard enough, but generating correct C code in all cases is a nightmare, especially when we hope to do it with clever hacks...

    And the same goes for using C via FFI. Realizing this is a problem just begs the question... so we do that why, exactly? Or, at the very least, how about doing it without clever hacks? One might even wonder whether a well written C snippet would be smaller than the to-be-translated Smalltalk.

    These types of situations hint something could be done better. Since the alternative is potentially -906, there is little choice but to carefully reconsider the situation.

    Keep going :).