The shootout benchmark is written for Visualworks, so I tried to port the meteor contest in VW7.8 non commercial.
Here are the required modifications:
- << and >> are not understood by Integer and must be replaced by proper signed bitShift:
- bitsDo: bitReverse: and bitCount operations are missing and must be added as extensions
- SequenceableCollection>>reversed is not understood, the VW version is reverse.
Yes, but... Unlike Squeak, VW positive SmallInteger don't have 30 bits but only 29. The Squeak solution was thus a bit sub-optimal in this context (I like to read a bit in double sense).
Oh, let's just change that! The major generator of 30-bits Integer is the bitReverse: operation when we turn the board 180° once first 6 rows were filled. After the reversal, the boardMask has two rows of barrier toward north, remember:
1 1 1 1 1
1 1 1 1 1
1 0 1 1 0
1 0 0 0 0
0 1 0 0 0
0 0 0 0 0
Yes, that's one more row than necessary, this would work too: 0 0 0 0 0
1 1 1 1 1
1 0 1 1 0
1 0 0 0 0
0 1 0 0 0
0 0 0 0 0
The variant is quite easy in:searchPuzzlesWithColorMask: colorMask boardMask: bMask rowOffset: rowOff pieces: pArray ifFound: solutionBlock
| nextFreeCell possibles colorBit iRow boardMask rowStatus |
colorMask = 0 ifTrue: [ ^solutionBlock value: (self boardStringWithPieces: pieces) ].
loopCount := loopCount + 1.
boardMask := bMask.
iRow := rowOff.
[(nextFreeCell := (boardMask + 1) lowBit) > twoRows]
whileTrue:
[boardMask := (iRow := iRow + 2) = 6
ifTrue: [boardMask bitReverse: sixRows]
ifFalse: [boardMask bitShift: 0 - twoRows]].
| nextFreeCell possibles colorBit iRow boardMask rowStatus |
colorMask = 0 ifTrue: [ ^solutionBlock value: (self boardStringWithPieces: pieces) ].
loopCount := loopCount + 1.
boardMask := bMask.
iRow := rowOff.
[(nextFreeCell := (boardMask + 1) lowBit) > twoRows]
whileTrue:
[boardMask := (iRow := iRow + 2) = 6
ifTrue: [boardMask bitReverse: sixRows]
ifFalse: [boardMask bitShift: 0 - twoRows]].
...snip...
just change the last but one line with:
ifTrue: [(boardMask bitShift: 0 - ncol) bitReverse: sixRows - ncol]
ifTrue: [(boardMask bitShift: 0 - ncol) bitReverse: sixRows - ncol]
This change makes VW slightly faster than Cog/Squeak again.
The good thing with VW is that we can play with:
TimeProfiler new
samplingInterval: 3;
profile:[ShootoutMeteorBoard solveDefault].
And this one tells us we waste a bunch of time in SequenceableCollection>>do:samplingInterval: 3;
profile:[ShootoutMeteorBoard solveDefault].
Ah yes, Squeak MessageTally told that too, but I didn't trust it enough.
Deceivingly, we will have to inline this do: loop by ourselves. So we also change the end of above method:
...snip...
rowStatus :=
(iRow < 6 ifTrue: [0] ifFalse: [2]) +
(((iRow = 0 or: [iRow = 6]) and: [nextFreeCell <= ncol]) ifTrue: [1] ifFalse: [2]).
possibles := (positionsPerPiece at: rowStatus) at: nextFreeCell.
colorBit := 1.
1 to: pieces size do: [:pieceNumber |
(colorMask bitAnd: colorBit) = 0
ifFalse:
[ | positions |
positions := possibles at: pieceNumber.
1 to: positions size do: [:i |
| aPiece |
((aPiece := positions at: i) fitOnBoard: boardMask)
ifTrue:
[pieces at: pieceNumber put: (aPiece forRow: iRow).
self searchPuzzlesWithColorMask: colorMask - colorBit boardMask: boardMask + aPiece mask rowOffset: iRow pieces: pArray ifFound: solutionBlock]]].
colorBit := colorBit * 2].
^nil
(iRow < 6 ifTrue: [0] ifFalse: [2]) +
(((iRow = 0 or: [iRow = 6]) and: [nextFreeCell <= ncol]) ifTrue: [1] ifFalse: [2]).
possibles := (positionsPerPiece at: rowStatus) at: nextFreeCell.
colorBit := 1.
1 to: pieces size do: [:pieceNumber |
(colorMask bitAnd: colorBit) = 0
ifFalse:
[ | positions |
positions := possibles at: pieceNumber.
1 to: positions size do: [:i |
| aPiece |
((aPiece := positions at: i) fitOnBoard: boardMask)
ifTrue:
[pieces at: pieceNumber put: (aPiece forRow: iRow).
self searchPuzzlesWithColorMask: colorMask - colorBit boardMask: boardMask + aPiece mask rowOffset: iRow pieces: pArray ifFound: solutionBlock]]].
colorBit := colorBit * 2].
^nil
And measure the time again:
[ShootoutMeteorBoard solveDefault] timeToRun.
-> 708 milliseconds (fraction omitted).Good thing, this applies to Cog/Squeak too:
[ShootoutMeteorBoard solveDefault] timeToRun.
-> 746 (milliseconds is implicit in Squeak).For 885075 loops, that's less than 1 microsecond per loop, not that bad after all.
For single-core oriented programs, my machine has more or less same performance than shootout reference machine, as can be verified with the reference g++ solution (which runs in 80ms as reported on the shootout site).
/usr/bin/g++ --version
i686-apple-darwin10-g++-4.2.1 (GCC) 4.2.1 (Apple Inc. build 5666) (dot 3)
Copyright (C) 2007 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
/usr/bin/g++ -m32 -c -pipe -O3 -fomit-frame-pointer meteor.gpp-4.c++ -o meteor.gpp-4.c++.o && \
/usr/bin/g++ -m32 meteor.gpp-4.c++.o -o meteor.gpp-4.gpp_run
/usr/bin/g++ -m32 meteor.gpp-4.c++.o -o meteor.gpp-4.gpp_run
time ./meteor.gpp-4.gpp_run 2098
real 0m0.103s
user 0m0.090s
sys 0m0.002s
And with -m64
real 0m0.089s
user 0m0.079s
sys 0m0.002s
user 0m0.079s
sys 0m0.002s
I'm reluctant to play with all the g++ code generation options, but that makes roughly a factor 10 between C++ and Smalltalk for that one (not accounting for image startup time).
Sure, I played with very low level bit tricks, but hey, that was the spirit of the game.
very nice thread. Pun intended!
ReplyDelete