Monday, August 24, 2015

Smallapack moved to github

After ArbitraryPrecisionFloat, I also moved the Smallapack project from https://code.google.com/p/smallapack to https://github.com/nicolas-cellier-aka-nice/smallapack.

Like ArbitraryPrecisionFloat, there is no much code on the repository, except a bit outdated Dolphin verison. The Visualworks and Squeak versions are handled with Store and Monticello in traditional dialect repositories...

Maybe it's time to update a bit the project and open a Pharo 5 branch based on Opal Compiler, but we'll see that...

Tuesday, August 11, 2015

Moving ArbitraryPrecisionFloat from google-code to github

I've tested the export to github button for https://code.google.com/p/arbitrary-precision-float then completed the wiki export with https://github.com/morgant/finishGoogleCodeGitHubWikiMigration.

It mostly worked except a few links that I manually restored, and I can say that I'm satisfied with the result.

Note that the code is still not hosted on github, but rather on http://www.squeaksource.com/ArbitraryPrecisionFl.html or cincom public store...

There is also a fork in SciSmalltalk but for now I prefer to keep a standalone project because it has some value per se, and also to maintain code for more dialects. This will imply some extra sync burden, but it should be manageable.


Sunday, August 9, 2015

Compiling Cog.Spur on MacOSX Yosemite

The MacOSX target of Squeak VM currently uses some old Mac Carbon API.

This might be a problem with a recent version of Xcode, none of which is distributed with such backward compatibility support.

However, there is at least one solution: using the shell script provided at https://github.com/devernay/xcodelegacy worked like a charm for me (on OSX 10.10.4, with Xcode 6.4).

I can even compile my on brand of squeak.cog.spur (VMMaker.oscog-nice.1428 from http://smalltalkhub.com/mc/nice/NiceVMExperiments/main) with Apple LLVM 6.1.0 (clang-602.0.53) and get no regression against Eliot's r3410 when running the trunk tests.

That's allways good to know.

Sunday, April 5, 2015

Removing UB in bytecodePrimMultiply


In last post, I said that LLVM did not seem to be a problem w.r.t. testing overflow in post-condition in bytecodePrimMultiply.

But I was fooled by the Xcode assembly output tool, and when I compile a VM with Xcode 42, LLVM 3.0, I get failures like this one:

self assert: (Duration days: 1 hours: 2 minutes: 3 seconds: 4 nanoSeconds: 5 ) asMilliSeconds positive.

It should be 93784000, but I get -906.

The conversion asMilliSeconds is going through a 47 bits intermediate integer holding the number of nanoseconds in the delay:
     ^ ((seconds * NanosInSecond) + nanos) // (10 raisedToInteger: 6)

With an undetected overflow it would be equivalent to something like:
    ^((seconds * NanosInSecond) + nanos bitAnd: 16rFFFFFFFF) - (1<<32) // (10 raisedToInteger: 6)

...exactly -906 BINGO! Typically the overflow we were supposed to detect in post-condition!

OK, that's not a new thing, the bug was already known when I published this http://lists.squeakfoundation.org/pipermail/vm-dev/2012-August/011161.html  (Stefan Marr reported the bug somewhere in the thread). I provided a fix at the google-code issue tracker, but this was closed by Pharo team, so now the fix is roting at Pharo issue tracker du jour (unfortunately with restricted access): https://pharo.fogbugz.com/f/cases/11364/Don-t-test-signed-SmallInteger-multiply-overflow-in-post-condition

Anyway, with 64bits Spur, the fix must be revised, so let's go back to Smalltalk code. We can easily implement the overflow test in precondition as prescribed in the link of last post:
bytecodePrimMultiply
    | rcvr arg result overflow oop |
    <var: #result type: 'sqLong'>
    rcvr := self internalStackValue: 1.
    arg := self internalStackValue: 0.
    (objectMemory areIntegers: rcvr and: arg)
        ifTrue: [rcvr := objectMemory integerValueOf: rcvr.
                arg := objectMemory integerValueOf: arg.
                result := rcvr.
                (self sizeof: rcvr) + (self sizeof: arg) <= 8
                    ifTrue:
                        [overflow := false.]
                    ifFalse:
                        [overflow := rcvr > 0
                            ifTrue:  [arg > 0
                                ifTrue: [rcvr > (16r7FFFFFFFFFFFFFFF / arg)]
                                ifFalse: [arg < (-16r8000000000000000 / rcvr)]]
                            ifFalse: [arg > 0
                                ifTrue: [rcvr < (-16r8000000000000000 / arg)]
                                ifFalse: [(rcvr < 0) and: [arg < (16r7FFFFFFFFFFFFFFF / rcvr)]]]].
                overflow
                    ifFalse:
                        [result := result * arg.
                        oop := self signed64BitIntegerFor: result.
                          self internalPop: 2 thenPush: oop.
                        ^self fetchNextBytecode "success"]]
        ifFalse: [self initPrimCall.
                self externalizeIPandSP.
                self primitiveFloatMultiply: rcvr byArg: arg.
                self internalizeIPandSP.
                self successful ifTrue: [^ self fetchNextBytecode "success"]].

    messageSelector := self specialSelector: 8.
    argumentCount := 1.
    self normalSend


If we are on a 32 bits machine, then we never generate overflow because the product is evaluated on 64 bits, so we have a very fast solution using Large Integers.

If SmallInteger can be longer than 32bits (which is the case of Spur64), then the overflow might happen - unless sqLong is 128 bits but we don't even bother to test it, since we don't have a portable signed128BitIntegerFor: available anyway. 

All is well, except that... bad luck, the code generated is incorrect on 64bits for other reasons: the literals are generated as unsigned long long!!! Comparing signed and unsigned of same length will promote to unsigned, and the overflow tests will miserably fail. Writing correct C code is hard enough, but generating correct C code in all cases is a nightmare, especially when whe hope to do it with clever hacks...

After some iterations, I came to this implementation in CCodeGenerator:
cLiteralFor: anObject
    "Return a string representing the C literal value for the given object."
    anObject isNumber
        ifTrue:
            [anObject isInteger ifTrue:
                [| printString |
                 printString := (anObject > 0
                                and: [(anObject >> anObject lowBit + 1) isPowerOfTwo
                                and: [(anObject highBit = anObject lowBit and: [anObject > 65536])
                                      or: [anObject highBit - anObject lowBit >= 4]]])
                                    ifTrue: ['0x', (anObject printStringBase: 16)]
                                    ifFalse: [anObject printString].
                ^anObject > 16r7FFFFFFFFFFFFFFF
                        ifTrue: [printString, self unsignedLong64Suffix]
                        ifFalse: [anObject > 16rFFFFFFFF
                            ifTrue: [printString,self signedLong64Suffix]
                                ifFalse: [anObject > 16r7FFFFFFF
                                ifTrue: [printString,self unsignedLong32Suffix]
                                ifFalse: [anObject < -16r80000000
                                    ifTrue: [anObject = -16r8000000000000000
                                            ifTrue: ['(-0x7FFFFFFFFFFFFFFF',self signedLong64Suffix,'-1)']
                                            ifFalse: [printString,self signedLong64Suffix]]
                                    ifFalse: [anObject = -16r80000000
                                            ifTrue: ['(-0x7FFFFFFF-1)']
                                            ifFalse: [printString]]]]]].
            anObject isFloat ifTrue:
                [^anObject printString]]
        ifFalse:
            [anObject isSymbol ifTrue:
                [^self cFunctionNameFor: anObject].
            anObject isString ifTrue:
                [^'"', (anObject copyReplaceAll: (String with: Character cr) with: '\n') , '"'].
            anObject == nil ifTrue: [^ 'null' ].
            anObject == true ifTrue: [^ '1' ].
            anObject == false ifTrue: [^ '0' ].
            anObject isCharacter ifTrue:
                [^anObject == $'
                    ifTrue: ['''\'''''] "i.e. '\''"
                    ifFalse: [anObject asString printString]]].
    self error: 'Warning: A Smalltalk literal could not be translated into a C constant: ', anObject printString.
    ^'"XXX UNTRANSLATABLE CONSTANT XXX"'


Note that the generation is all but uniform, it goes through signed, unsigned, signed, unsigned when the magnitude grows... These hacks are those reducing the number of compiler warnings and currently lead to mostly correct code (I can't remember if it was necessary to force in slang code an asUnsignedInteger here or there, or the contrary). Note that I also implemented unsignedLong32Suffix, signedLong64Suffix and unsignedLong64Suffix in the CCodeGenerator, like this:
unsignedLong64Suffix
    "Answer the suffix that should be appended to a 64bits literal in generated code."

    ^(self sizeOfIntegralCType: #'unsigned long') = 4
        ifTrue: ['ULL']
        ifFalse: ['UL']


The method cLiteralFor: has a sister cLiteralFor:name: which should be modified too.

So, for compiling cog and spur 32bits with LLVM 3.0 and without exotic (heretic) compiler options, one should at least fix bytecodePrimMultiply, and for 64bits spur, at least cLiteralFor: and co.

I can tell that my Spur32 and cog32 VM are working fine with these changes. Not yet the Spur64, I still can't compile it for other reasons (MacOSX SDK problems...).

As usual, all the material is published on my VMMaker branch at smalltalkhub.com NiceVMExperiments VMMaker.oscog-nice.1157, but mixed with other changes.

Wednesday, April 1, 2015

The more or less defined behavior we are relying upon in Squeak/Pharo VM


C is a tough language with many pitfalls, backed by a great philosophy:
      You don't pay for what you don't buy

That means that if an implementer has a choice between a cheap implementation and a right implementation, then the cheap one shall not be excluded by the standard.

The consequence is that the standard gives very few guaranty on a number of behaviour. And the computing model is a sort of minimal common denominator. Some operations are implementation defined, some are unspecified and some are explicitely undefined. That makes C not so portable at the end. And terribly difficult because there are so many negative things to learn about what we should not do, what will not work, what won't be portable, etc... The standard is worth reading for who wants to dive in the VM. I use this draft version http://www.open-std.org/JTC1/SC22/WG14/www/docs/n1256.pdf.

Squeak and Pharo VM are written in Smalltalk and translated in C. The VM essentially depends on integer and pointer arithmetic. But going from a high level model (no Integer bounds, so no Integer overflow, perfectly defined semantic for bit operations, etc...) to a very low level language may reveal adventurous and full of traps.


We'll see here a few implementation defined and undefined behaviors that we rely upon for Squeak/Pharo VM. The list is incomplete, first because it's only a human review, and also because this post is already too long. Nonetheless, it's good to know where we put our feats.

The implementation defined behaviour 

First we rely upon a 2-complement representation for negative integers. This is implementation defined (§6.2.6.2 rule 2), but such architecture is so common that the risk of portability issues is very very low.

Then we rely upon the right bit shift on signed int to be an arithmetic shift propagating the sign bit. Again, implementation defined (§6.5.7 rule 5) but in good accordance with previous assumption. This is very common and low risk.

Then we assume that casting an unsigned to signed when it is not representable will overflow on sign bit. This is implementation defined (§6.3.1.3 rule 3), but in good agreement with previous two assumptions. Another common feature with low risk.

We also rely on converting integer to pointer and pointer to integer, which is implementation defined (§6.3.2.2 rule 5 and §6.3.2.3 rule 6). Modern architectures have a uniform addressing of memory, and we care of using a type long enough to hold a pointer (either #sqInt or #usqInt sometimes in the plugins), so the risk is again pretty low, even if this kind of conversion drives the C compilers nut optimization-wise.

The undefined behaviour

We rely on two undefined behaviors (§6.5.7 rule 4) with left shift on signed int:
  • shift of positive int leading to unrepresentable value will overflow onto the sign bit
  • shift of negative int will behave same as logical shift on unsigned.
This is bad but evitable! Since we assume it behaves as for unsigned, then we should use shift on unsigned, and cast back to signed.

So instead of non portable:
    int a;
    a << 3;

We should generate something like:
    int a;
    (int)( (unsigned) a << 3);

As said above this is implementation defined, but far more portable.
I should propose a fix for the CCodeGenerator.

Then we rely on signed int overflow. For example in this StackInterpreter method:
bytecodePrimMultiply
    | rcvr arg result |
    rcvr := self internalStackValue: 1.
    arg := self internalStackValue: 0.
    (objectMemory areIntegers: rcvr and: arg)
        ifTrue: [rcvr := objectMemory integerValueOf: rcvr.
                arg := objectMemory integerValueOf: arg.
                result := rcvr * arg.
                (arg = 0
                 or: [(result // arg) = rcvr and: [objectMemory isIntegerValue: result]]) ifTrue:
                    [self internalPop: 2 thenPush: (objectMemory integerObjectOf: result).
                     ^self fetchNextBytecode "success"]]
        ifFalse: [self initPrimCall.
                self externalizeIPandSP.
                self primitiveFloatMultiply: rcvr byArg: arg.
                self internalizeIPandSP.
                self successful ifTrue: [^ self fetchNextBytecode "success"]].

    messageSelector := self specialSelector: 8.
    argumentCount := 1.
    self normalSend


The bad thing is to test for overflow in post-condition (bold code). The modern interpretation of the standard is that one: you shall not rely on UB. Since overflow is UB (§6.5 rule 5), you shall not rely on overflow, and the compiler has the right to assume that the expression (result // arg) = rcvr will allways be true. It may well eliminate it for the sake of aggressive optimizations. This method was reported to fail once with clang optimization at work. I really recommend reading http://blog.llvm.org/2011/05/what-every-c-programmer-should-know.html and the following posts for better understanding. I could not observe the problem in assembly output with Xcode4.2 and Apple LLVM 3.0, neither with -Os, -O2, nor -O3 flags, but we have a sword of Damocles hanging...

The workaround is well know. The overflow test must be written in pre-condition. All is explained at https://www.securecoding.cert.org/confluence/display/c/INT32-C.+Ensure+that+operations+on+signed+integers+do+not+result+in+overflow.

Phew, in assembler, that's just IntegerMULtiply and JumpifOverflow. You don't pay for what you don't buy? I'll add this:
      but when you'll have to buy, it'll cost you an arm.

I'm not only speaking of development time, but even at runtime, the holy runtime for which so many sacrifices were done. That sounds like a miserable failure.

What would have sane people done? Maybe they would have defined a virtual machine with uniform and well defined behaviour whatever the hardware. Ah, err, it's Smalltalk, it must be a joke.

Back to our VM, there are other UB like
  • comparing pointers that do not point to the same array or aggregate (§6.5.8 rule 5);  this is unfortunately hardly avoidable because generation scavenger depends on this;
  • pointer aliasing (§6.5 rule 7) - mostly expunged from the VM, but still present in FFI for example (primitiveFFIDoubleAt).

I'll stop here for today.

Friday, March 27, 2015

Is bitShift: equivalent to division in Smalltalk? And in Slang?

Let's try it in Squeak/Pharo:

(1 to: 8) allSatisfy: [:shift |
    (-255 to: 255) allSatisfy: [:e | (e//(1<<shift) ) = (e >> shift)]]

-> true

Isn't it surprising? It is well known that arithmetic shift for negative numbers is NOT equivalent to integer division (quotient of Euclidean division). See Non-equivalence_of_arithmetic_right_shift_and_division

Let's see. Above article assumes that:
  • negative are represented in 2-complement. We can assume that in Smalltalk for every bit operation (in fact large integers are represented as sign-magnitude, and we emulate two-complement but that's a detail) ;
  • bitShift: is well defined for negative and is propagating the sign bit. Oh yes it is, integers are not bounded and we assume an infinite sequence of 1 left of negatives, so what else would be propagated ;
  • the quotient is rounded toward zero.
Ah the last point is explaining the difference with common knowledge: // is not rounding toward zero, it's rounding toward negative infinity.

If we use quo: instead, then the answer is false and matches wikipedia's answer.

What about Slang? Slang is a subset of Smalltalk which can be translated to C code and which is used for generating the Squeak/Pharo Virtual Machine.

We must have a look at initializeCTranslationDictionary method in CCodeGenerator from the VMMaker package. It maps // to C operation / and \\ to C operation %. Ouch, but / is rounding toward zero as quo: does, and % behaves like rem:.

So what is true in Smalltalk is NOT in Slang. Too bad that we did not use quo: and rem: in Slang, we did not even map them!

This kind of mismatch gave us a famous bug once upon a time. Since right shift was for historical reason translated as a logical shift ((unsigned) a >> b), we thought we could just use // for arithmetic shifts, and the worse thing is that when simulating the Slang code, all was working as expected... Alas not in the VM once compiled.

Today, for the purpose of eliminating false positive warning in dead branches, I was about to generalize constant folding like bindVariableUsesIn:andConstantFoldIf:in: of TSendNode... This method is evaluating the constant in Smalltalk and is generating the value in C. This will work as long as no one mix a negative constant expression with //, \\ or >>. Otherwise, the same expression might well lead to two different results in the generated code, depending on the inlining level! For example, self minSmallInteger + 1 // 2 might well lead to a tricky variant of this bug.

The right thing would be to write slang exclusively with quo: and rem: and to change the generation of >> since we can allways force the logical one by sending a asUnsigned >> b. We can catch all the senders at runtime with a halt, but it's tedious... And there is more than a branch of VMMaker.

If we really insist on using a//b, it translates as:
(a/b)-(int)(a & (((unsigned)a)>>(sizeof(int)*8-1))
 But adopting the C philosophy, we ain't gonna pay for what we don't buy ;)


Friday, April 4, 2014

Little rant about some runaway pharo refactorings

I'm still trying to explore different ways to play with MC/git/filetree on the pharo-vm substrat.

But there's a serious hurdle, it's named Pharo2.0 image, and unfortunately the VM is produced by this brand.

IMO, it's the worst brand ever. I much much prefer Pharo3.0.

Pharo2.0 tools are constantly getting on my way, here is a small example of a disastrous change.

I have merged several branches directly with git in a batch, I know it won't managed MC meta data, but I want  to see how it goes.
In parallel, I have merged the MC version from package-cache at image side.
When I try to save my merged package on filetree with MC, there is a wait dialog opening but not progressing.
 

Hmm let's interrupt with CMD+SHIFT+. the process is blocked in resultSemaphore wait 
MCRepositoryGroup>>includesVersionNamed: aString
    " check for existing version name in parallel over all repositories "
    | resultSemaphore results |
    resultSemaphore := Semaphore new.
    results := Array new: self repositories size.
   
    self repositories doWithIndex: [:repository :index |
        " fork of test for each repository "
        [[ results
            at: index
            put: (repository includesVersionNamed: aString)
        ] ensure: [ resultSemaphore signal ]] fork ].

    " wait for all requests to finish "
    self repositories size timesRepeat: [ resultSemaphore wait ].
    " check if any repository included the given versionName already"
    ^ results anySatisfy: [:result| result = true ]  
 


Huh? The purpose of anySatisfy: is to exit the loop at first match.
Here we will scan all the repositories, in parallel (well there is a single native thread so far, but this can eventually avoid some long network interrupts if only I/O are non blocking).
Alas, in all case, it will scan all repositories, a tremendous work defeating the purpose of anySatisfy:. Not even thinking what will happen if one is unreachable?
Though, it can make an interesting subject for a future post: how to implement a real parallel version of anySatisfy: that is to launch several processes, stop them at this first positive hit and return?

But there is worse than this unprogressive dialog, since I broke a filetree monticello.meta/version via incorrect git merge, a debugger wanted to open in background, but didn't until I manually interrupted the wait. Ah great tools.

Now let's see how it was cleaned up in Pharo 3.0:

includesVersionNamed: aString
    " check for existing version name in parallel over all repositories "
    | results |

    results := Array new: self repositories size.
   
    self repositories doWithIndex: [:repository :index |
        " fork of test for each repository "
        results
            at: index
            put: (repository includesVersionNamed: aString) ].

    " check if any repository included the given versionName already"
    ^ results anySatisfy: [:result| result = true ] 


Good to sea there is no more fork but in the comments! For sure Pharo code is better... than Pharo.
But don't you see several smells remaining?

In parallel, take a look at Squeak method:

includesVersionNamed: aString
    ^ repositories anySatisfy: [ : each | [each includesVersionNamed: aString] on: Error do: [false]]



OK, OK, Pharo has a license to break things when they are too much entangled. It's not my favourite strategy, but it can reward faster, especially when starting from Squeak.
But in this particular case I fail to see how. Bad code is bad code.

OK, OK, It's easy to pick one wrong fix and criticize.
But the problem is that 2.0 tools are so unusable that I'm quite sure that such wrong refactorings are not isolated.

Squeak is plenty of dust and has room for improvment, no doubt about that.
But it took many years to accumulate all that mud in Squeak, I would say.
If Pharo is accumulating its own at this pace, waouh, I better understand now how the number of closed issues can grow exponentially ;)

Hey, it's a joke, I don't want to ruin the moral of Stephane and Marcus, so I'll mitigate: I see plenty of positive refactorings coming with 3.0, and I wish Pharo will succeed in making a better Squeak. So please enforce the reviews!
Count this post as my contribution of the day to the review process.