Wednesday, April 1, 2015

The more or less defined behavior we are relying upon in Squeak/Pharo VM


C is a tough language with many pitfalls, backed by a great philosophy:
      You don't pay for what you don't buy

That means that if an implementer has a choice between a cheap implementation and a right implementation, then the cheap one shall not be excluded by the standard.

The consequence is that the standard gives very few guaranty on a number of behaviour. And the computing model is a sort of minimal common denominator. Some operations are implementation defined, some are unspecified and some are explicitely undefined. That makes C not so portable at the end. And terribly difficult because there are so many negative things to learn about what we should not do, what will not work, what won't be portable, etc... The standard is worth reading for who wants to dive in the VM. I use this draft version http://www.open-std.org/JTC1/SC22/WG14/www/docs/n1256.pdf.

Squeak and Pharo VM are written in Smalltalk and translated in C. The VM essentially depends on integer and pointer arithmetic. But going from a high level model (no Integer bounds, so no Integer overflow, perfectly defined semantic for bit operations, etc...) to a very low level language may reveal adventurous and full of traps.


We'll see here a few implementation defined and undefined behaviors that we rely upon for Squeak/Pharo VM. The list is incomplete, first because it's only a human review, and also because this post is already too long. Nonetheless, it's good to know where we put our feats.

The implementation defined behaviour 

First we rely upon a 2-complement representation for negative integers. This is implementation defined (§6.2.6.2 rule 2), but such architecture is so common that the risk of portability issues is very very low.

Then we rely upon the right bit shift on signed int to be an arithmetic shift propagating the sign bit. Again, implementation defined (§6.5.7 rule 5) but in good accordance with previous assumption. This is very common and low risk.

Then we assume that casting an unsigned to signed when it is not representable will overflow on sign bit. This is implementation defined (§6.3.1.3 rule 3), but in good agreement with previous two assumptions. Another common feature with low risk.

We also rely on converting integer to pointer and pointer to integer, which is implementation defined (§6.3.2.2 rule 5 and §6.3.2.3 rule 6). Modern architectures have a uniform addressing of memory, and we care of using a type long enough to hold a pointer (either #sqInt or #usqInt sometimes in the plugins), so the risk is again pretty low, even if this kind of conversion drives the C compilers nut optimization-wise.

The undefined behaviour

We rely on two undefined behaviors (§6.5.7 rule 4) with left shift on signed int:
  • shift of positive int leading to unrepresentable value will overflow onto the sign bit
  • shift of negative int will behave same as logical shift on unsigned.
This is bad but evitable! Since we assume it behaves as for unsigned, then we should use shift on unsigned, and cast back to signed.

So instead of non portable:
    int a;
    a << 3;

We should generate something like:
    int a;
    (int)( (unsigned) a << 3);

As said above this is implementation defined, but far more portable.
I should propose a fix for the CCodeGenerator.

Then we rely on signed int overflow. For example in this StackInterpreter method:
bytecodePrimMultiply
    | rcvr arg result |
    rcvr := self internalStackValue: 1.
    arg := self internalStackValue: 0.
    (objectMemory areIntegers: rcvr and: arg)
        ifTrue: [rcvr := objectMemory integerValueOf: rcvr.
                arg := objectMemory integerValueOf: arg.
                result := rcvr * arg.
                (arg = 0
                 or: [(result // arg) = rcvr and: [objectMemory isIntegerValue: result]]) ifTrue:
                    [self internalPop: 2 thenPush: (objectMemory integerObjectOf: result).
                     ^self fetchNextBytecode "success"]]
        ifFalse: [self initPrimCall.
                self externalizeIPandSP.
                self primitiveFloatMultiply: rcvr byArg: arg.
                self internalizeIPandSP.
                self successful ifTrue: [^ self fetchNextBytecode "success"]].

    messageSelector := self specialSelector: 8.
    argumentCount := 1.
    self normalSend


The bad thing is to test for overflow in post-condition (bold code). The modern interpretation of the standard is that one: you shall not rely on UB. Since overflow is UB (§6.5 rule 5), you shall not rely on overflow, and the compiler has the right to assume that the expression (result // arg) = rcvr will allways be true. It may well eliminate it for the sake of aggressive optimizations. This method was reported to fail once with clang optimization at work. I really recommend reading http://blog.llvm.org/2011/05/what-every-c-programmer-should-know.html and the following posts for better understanding. I could not observe the problem in assembly output with Xcode4.2 and Apple LLVM 3.0, neither with -Os, -O2, nor -O3 flags, but we have a sword of Damocles hanging...

The workaround is well know. The overflow test must be written in pre-condition. All is explained at https://www.securecoding.cert.org/confluence/display/c/INT32-C.+Ensure+that+operations+on+signed+integers+do+not+result+in+overflow.

Phew, in assembler, that's just IntegerMULtiply and JumpifOverflow. You don't pay for what you don't buy? I'll add this:
      but when you'll have to buy, it'll cost you an arm.

I'm not only speaking of development time, but even at runtime, the holy runtime for which so many sacrifices were done. That sounds like a miserable failure.

What would have sane people done? Maybe they would have defined a virtual machine with uniform and well defined behaviour whatever the hardware. Ah, err, it's Smalltalk, it must be a joke.

Back to our VM, there are other UB like
  • comparing pointers that do not point to the same array or aggregate (§6.5.8 rule 5);  this is unfortunately hardly avoidable because generation scavenger depends on this;
  • pointer aliasing (§6.5 rule 7) - mostly expunged from the VM, but still present in FFI for example (primitiveFFIDoubleAt).

I'll stop here for today.

1 comment:

  1. Surely you are aware by now that this work can be tedious and seemingly unrewarding. But the good thing is that when one obeys the laws and follows best practice patterns, in the long run one pays way less maintenance costs. In the end, you go faster.

    I really like the citations to the relevant specs. Keep going :).

    PS1: pointer comparison and 6.5.8/5 (quoting your citation here), that's a pain because it's such a common idiom. However, I'd be curious to see if one can prove doing something like (uint_ptr)(ptrA) > (uint_ptr)(ptrB) has well defined behavior.

    PS2: pointer aliasing in FFI... another big pain... you might want to look into unions for that.

    PS3: keep in mind one of the compilers' counterparts to UB is the -O flag with (customarily) arguments greater than 2. In that context, "unsafe optimization" == "undefined behavior". And it's not a compiler bug for e.g. -O3 to cause such problems, because the user is asking for it...

    PS4: the standard should be mandatory reading material for anybody doing FFI work from the image, not just VM developers --- interfacing via FFI will expose your Smalltalk program to the spec lawyering world of C, POSIX, MSDN, etc.

    PS5: IIRC the C spec says "there will be one stack, or else UB". Funny that in the context of JIT VMs.

    ReplyDelete