## Sunday, April 15, 2012

### Meteor Contest - Part 8 dumb and fast

Remember our clever trick of rotating the board 180° once 6th row is filled up?
The possible positions of pieces were simple, just one case for south edge, and another for all the other pairs of rows...
...and another two cases after we rotate the board 180°, because we have to take complementary orientations of the 6th piece into account in order to generate only half the solutions.

All in all, that was not so simple. So let's go back to the dumb case of generating all possible positions for all rows.
We keep the shift of south row pairs as soon as filled to avoid generating costly LargePositiveInteger, so we still generate all possible positions on two rows:

ShootoutMeteorBoard>>possiblePositionsOnTwoRows
| possible iRot |
possible := (Array new: twoRows) collect: [:freeCell | Array new: 12 withAll: (ShootoutMeteorPiece new mask: 0)].
iRot := 0.
self rotationsOf: aPieceMask do: [:rotated |
iRot := iRot + 1.
self placesFor: rotated do: [:shifted |
(possible at: shifted lowBit) at: iRot put: (ShootoutMeteorPiece
mask: ((self hasEastOrWestIsland: shifted) ifTrue: [0] ifFalse: [shifted])
islands: (self islandsFor: (shifted bitOr: shifted - 1)))]].
possible]

Note that (shifted bitOr: shifted - 1) previously was in islandsFor: but we have to change it for finding north islands.

We just have to eliminate different cases for each of the 5 different twoRows offsets.
For this, we will have to detect the case of north edge again.
So our possible positions becomes:

initializePossiblePositions
| positionsPerPiecePerCell thePieceWhichBreakSymmetry |
positionsPerPiecePerCell := self possiblePositionsOnTwoRows.
thePieceWhichBreakSymmetry := (1 to: pieces size) detectMax: [:i | (positionsPerPiecePerCell at: i) detectSum: [:orientations | orientations count: [:e | e mask > 0]]].
positionsPerPiece := (1 to: 5) collect: [:iRow |
maxMaskForRow := (1 bitShift: (#(6 6 6 4 2) at: iRow) * ncol) - 1.
northRow :=  southEdge bitShift: ((#(-1 -1 6 4 2) at: iRow) - 1 * ncol).
(1 to: twoRows) collect: [:cellNumber |
(1 to: pieces size) collect: [:pieceNumber |
| orientations n |
orientations := (positionsPerPiecePerCell at: pieceNumber) at: cellNumber.
n := pieceNumber = thePieceWhichBreakSymmetry ifTrue: [6] ifFalse: [12].
Array new: n streamContents: [:str |
1 to: n do: [:i |
| aPiece |
aPiece := orientations at: i.
ifTrue:
[(iRow = 1 and: [cellNumber <= ncol])
ifFalse: [(aPiece mask bitAnd: northRow) > 0
ifTrue: [(self hasNorthIsland: aPiece mask row: iRow)
ifFalse:
[| isle |
isle := iRow = 5
ifTrue: [0]
ifFalse: [self northIslandsFor: aPiece mask row: iRow].
ifFalse: [str nextPut: aPiece]]]]]]]]

Quite a long method...
But we now offer the luxury of recomputing which piece shall better have half orientations removed (thePieceWhichBreakSymmetry), instead of hardcoding 6.
And we compute the case of northern islands (except on the last two rows, we'll see later why).

For detecting north islands, we change our existing island detection to stop a bit earlier:

fillMaskStartingAt: pos stoppingAbove: maxCell ifFoundEnough: exitBlock
(fillMask bitAnd: pos) = 0 ifFalse: [^self].
(pos > maxCell) ifTrue: [^exitBlock value].
(self canShiftE: pos) ifTrue: [self fillMaskStartingAt: (self shiftE: pos) stoppingAbove: maxCell ifFoundEnough: exitBlock].
(self canShiftNE: pos) ifTrue: [self fillMaskStartingAt: (self shiftNE: pos) stoppingAbove: maxCell ifFoundEnough: exitBlock].
(self canShiftNW: pos) ifTrue: [self fillMaskStartingAt: (self shiftNW: pos) stoppingAbove: maxCell ifFoundEnough: exitBlock].
(self canShiftW: pos) ifTrue: [self fillMaskStartingAt: (self shiftW: pos) stoppingAbove: maxCell ifFoundEnough: exitBlock].
^self

islands := 0.
top := 1 << (fillMask highBit - 1 // ncol * ncol - 1).
whileTrue:
[open := false.
^islands

Finding case of bad northern island is same as bad southern island once we reverse the bits:
| bitReverse |
bitReverse := (#(-1 -1 6 4 2) at: iRow) * ncol.

And finding the possibly bad islands is also a bit more tricky on north:
| filled isleSEW bitReverse isleNE isleNW |
bitReverse := (#(-1 -1 6 4 2) at: iRow) * ncol.
isleSEW := self islandsFor: filled.
(aPieceMask bitAnd: (eastEdge bitOr: westEdge)) = 0 ifFalse: [^isleSEW].
(isleSEW bitAnd: (eastEdge bitOr: westEdge)) = 0 ifFalse: [^isleSEW].
ifTrue: [filled := filled >> ncol << ncol].
isleNE := ((self islandsFor: (filled bitReverse: bitReverse)) bitReverse: bitReverse) bitOr: isleSEW.
isleNW := (1 << bitReverse - 1 - (isleNE bitOr: (aPieceMask bitOr: aPieceMask - 1))) bitOr: isleSEW.
^isleNW bitCount < isleNE bitCount
ifTrue: [isleNW]
ifFalse: [isleNE]

Why being so clever? This deserves some explanations.
Obviously, if we put a barrier on south with aPieceMask bitOr: aPieceMask - 1 then every remaining cell will be an island. We must isolate more useful information.
Knowing that previous cells were filled for sure (dark gray), we first see if we can find an isle (light gray) on east or west like these cases:

or on south, like this case:

If the piece touches an east or west edge, then we're done because if island there is, it already has been handled by hasNorthIsland:row:
If the islands touches east or west edge, then we're done too, we gathered enough information.

Else, we then find isleNE only in north-east side by removing the south row bits which belong to previous pieces:

But the complementary island isleNW in north-west side might be smaller.

So we consider only the island with lesser bitCount, because it will fast up our tests inside inner loops.

Last thing, there was one case we forgot, which is when a piece touches both east and west edges, it then must have a multiple of 5 free cells below (and above), otherwise be eliminated.
^ (self hasInsetZero: southEdge * (eastEdge bitAnd: aMask))
or: [(self hasInsetZero: southEdge * (westEdge bitAnd: aMask))
or: [(aMask bitAnd: eastEdge) > 0 and: [(aMask bitAnd: westEdge) > 0 and: [(self findIsland: aMask) bitCount \\ 5 > 0]]]]

Let's see how many pieces*orientations we get for each pair of rows:
| possible |
possible := ShootoutMeteorBoard default instVarNamed: 'positionsPerPiece'.
possible collect: [:e | e inject: 0 into: [:s1 :e1 |
s1 + (e1 inject: 0 into: [:s2 :e2 |
s2 + (e2 size)])]]
-> #(487 600 594 326 19)

Yes, that's pretty few on the last pair, so this should accelerate the solving a lot! And that's also why we did not bother about island tests on the last two rows...

Now we modify the search algorithm again, and remove the bitReverse:
| nextFreeCell possibles colorBit iRow boardMask |
colorMask = 0 ifTrue: [ ^solutionBlock value: (self boardStringWithPieces: pieces) ].
loopCount := loopCount + 1.
iRow := rowOff.
[(nextFreeCell := (boardMask + 1) lowBit) > twoRows]
whileTrue:
[ iRow := iRow + 2.
possibles := (positionsPerPiece at: iRow // 2 + 1) at: nextFreeCell.
colorBit := 1.
1 to: pieces size do: [:pieceNumber |
ifFalse:
[ | positions |
positions := possibles at: pieceNumber.
1 to: positions size do: [:i |
| aPiece |
((aPiece := positions at: i) fitOnBoard: boardMask)
ifTrue:
[pieces at: pieceNumber put: (aPiece forRow: iRow).
self addPiece: aPiece index: pieceNumber forRow: iRow.
self removePiece: aPiece index: pieceNumber forRow: iRow]]].
colorBit := colorBit * 2].
^nil

Since we don't bitReverse anymore, we have to correct filling of board with a piece mask:
ShootoutMeteorPiece>>fillSolution: aString ncol: ncol withColor: c
| offset |
offset := row * ncol.
mask bitsDo: [:k | aString at: offset + k put: c]

ShootoutMeteorBoardMorph>>fillPiece: aPiece forRow: row withColor: hColor
| offset |
offset := row * 5.
aPiece mask bitsDo: [:k | (submorphs at: offset + k) fillColor: hColor].
self changed.

Twice the same code, but we prefer to duplicate for CPU optimization.

If we try it without the instrumentation hooks,
[ShootoutMeteorBoard solveDefault] timeToRun.
-> 507 (milliseconds)
That's a few milliseconds more with the hooks (525 to 530).
Only 6 times slower than the faster C++ candidate, great!
And we get:
545906 loops
The C++ solution only has
309378 loops

But there is more memory and more island tests involved, I'm not sure whether I can find a better balance in Smalltalk...

Here is an animation, you can see by yourself that first solution is found faster:
(ShootoutMeteorInstrumentedBoard default) forMovie: true; solvedPuzzleDo: [:e | ^self].

Ah, it's good to be dumb!
The materials is at http://ss3.gemstone.com/ss/Shootout/Shootout.blog-nice.9.mcz

## Saturday, April 14, 2012

### Meteor Contest - Part 7 Visualisation

Until there, we played with low level bit tricks, and have used the official ASCII board printing, but these bits are frustrating, we are in Smalltalk, a graphical environment. So we are going to illustrate the algorithm with a poor man Morph in Squeak. I say poor man because the morph will be dedicated to visualisation, not interaction, but that will already serve.

First, we craft a morph just for displaying an hexagonal cell. We hardcode the edge length as an integer between 10 and 20 pixels, that fall near a whole pixel when multiplied by 60 degreeSin and 60 degreeCos: that is an even integer, and we catch it with a centered modulo in interval [-1/2,1/2[
(10 to: 20 by: 2) detectMin: [:n | (n * 3 sqrt + 1 / 2 \\ 1 - (1 / 2)) abs]
->14

Supposing the centre of the polygon is in the middle of pixel 0@0, the vertices would thus be:
(30 to: 330 by: 60) collect: [:a | ((a degreeCos@a degreeSin)*14) rounded].
-> {12@7 . 0@14 . -12@7 . -12@ -7 . 0@ -14 . 12@ -7}

We thus create:
Morph subclass: #ShootoutMeteorHexaMorph
instanceVariableNames: 'fillColor'
classVariableNames: ''
poolDictionaries: ''
category: 'Shootout'
initialize
super initialize.
bounds := 0@0 extent: 25@29.
fillColor := Color transparent.
bounds: newBounds
bounds := (-12 @ -14 extent: 25@29) translateBy: newBounds center rounded.
fillColor
^fillColor
fillColor: aColor
fillColor := aColor.
self changed
vertices
^{ 12 @ 7. 0 @ 14. -12 @ 7. -12 @ -7. 0 @ -14. 12 @ -7 }
drawOn: aCanvas
aCanvas
drawPolygon: (self vertices collect: [:e | e + self bounds center])
fillStyle: self fillColor
borderWidth: 1
borderColor: Color black

And test if magnified drawing is OK:
hexa := ShootoutMeteorHexaMorph new.
aFormCanvas := FormCanvas extent: hexa extent depth: 32.
aFormCanvas fillRectangle: (0@0 extent: hexa extent) fillStyle: Color lightGray.
hexa drawOn: aFormCanvas.
aFormCanvas := FormCanvas on: (aFormCanvas form magnifyBy: 16).
w := hexa bounds width.
h := hexa bounds height.
c := w@h*8.
aFormCanvas fillRectangle: (c-6 corner: c+6) fillStyle: Color green.
vertices := (30 to: 330 by: 60) collect: [:a | ((a degreeCos@a degreeSin)*14*16+c) rounded].
vertices do: [:p | aFormCanvas fillRectangle: (p-5 corner: p+5) fillStyle: Color blue].
vertices do: [:p | aFormCanvas line: c to: p width: 2 color: Color yellow].
aFormCanvas drawPolygon: vertices color: Color transparent borderWidth: 2 borderColor: Color green.
0 to: w-1 do: [:x | aFormCanvas line: x@0*16 to: x@h*16 width: 1 color: Color veryLightGray].
0 to: h-1 do: [:y | aFormCanvas line: 0@y*16 to: w@y*16 width: 1 color: Color veryLightGray].
aFormCanvas line: w@0*16-1 to: w@h*16-1 width: 1 color: Color veryLightGray.
aFormCanvas line: 0@h*16-1 to: w@h*16-1 width: 1 color: Color veryLightGray.
aFormCanvas form asMorph openInWorld.

The line drawing is not very accurate, and don't have anti-alisasing when we draw on a Form, but that will do.
Then we create a board of 50 cells.
Morph subclass: #ShootoutMeteorBoardMorph
instanceVariableNames: ''
classVariableNames: ''
poolDictionaries: ''
category: 'Shootout'
initialize
| cells center hexa x y |
super initialize.
cells := Array new: 50.
x := #(0 2 4 6 8 1 3 5 7 9).
y := (0 to: 9).
0 to: 49 do: [:i|
center := ((x at: 10 - (i \\ 10))*12+12) @ ((y at: 10 - (i // 5))*21+14).
hexa := ShootoutMeteorHexaMorph new.
hexa bounds: (center - (12@14) corner: center + (12@14)).
cells at: i + 1 put: hexa].
submorphs := cells.
bounds := cells inject: cells first bounds into: [:b :h | b merge: h bounds]
drawOn: aCanvas

We can test it:
ShootoutMeteorBoardMorph new openInWorld.
Or if we want to create a form that will later be exported as png and feed this blog:.
board := ShootoutMeteorBoardMorph new.
board position: 0@0.
aFormCanvas := FormCanvas extent: board extent depth: 32.
aFormCanvas fillRectangle: (0@0 extent: board extent) fillStyle: Color veryLightGray.
board fullDrawOn: aFormCanvas.
aFormCanvas form asMorph openInWorld.

At this time, we don't reify the piece, we just add a hook to fill a bit mask with pre-defined colours (with an eleventh one useful to mark islands for example).
ShootoutMeteorBoardMorph>>rgbColorSpecs
^#(
#(1.0  0.2  0.2)
#(0.0  1.0  0.0)
#(0.2  0.2  1.0)
#(1.0  1.0  0.0)
#(0.0  1.0  1.0)
#(0.8  0.2  0.8)
#(1.0  0.6  0.1)
#(0.6  0.1  0.1)
#(0.1  0.6  0.1)
#(0.1  0.1  0.6)
#(0.2  0.2  0.2)
)
fillPiece: aPiece forRow: row withColor: hColor
| offset |
row >= 6
ifTrue:
[offset := submorphs size + 1 - (row - 6 * 5).
aPiece mask bitsDo: [:k | (submorphs at: offset - k) fillColor: hColor]]
ifFalse:
[offset := row * 5.
aPiece mask bitsDo: [:k | (submorphs at: offset + k) fillColor: hColor]].
self changed.
addPiece: aPiece index: pieceIndex forRow: row
| rgb hColor |
rgb := self rgbColorSpecs at: pieceIndex.
hColor := Color r: rgb first g: rgb second b: rgb third.
self fillPiece: aPiece forRow: row withColor: hColor
removePiece: aPiece index: pieceIndex forRow: row
self fillPiece: aPiece forRow: row withColor: Color transparent

We can now instrument our solver to display its own progress. Of course, since last number of loops was more than 800,000 that means that visualizing the whole algorithm would require more than 80,000 seconds even if we display 10 pieces additions per second. But we will stop the animation before the end.

First, we add two empty hooks
ShootoutMeteorBoard>>addPiece: aPiece index: pieceNumber forRow: iRow
ShootoutMeteorBoard>>removePiece: aPiece index: pieceNumber forRow: iRow
And modify inner loop of the solving method
...snip...
((aPiece := positions at: i) fitOnBoard: boardMask)
ifTrue:
[pieces at: pieceNumber put: (aPiece forRow: iRow).
self addPiece: aPiece index: pieceNumber forRow: iRow.
self removePiece: aPiece index: pieceNumber forRow: iRow]]].
...snip...

Then we define a subclass
ShootoutMeteorBoard subclass: #ShootoutMeteorInstrumentedBoard
instanceVariableNames: 'boardMorph'
classVariableNames: ''
poolDictionaries: ''
category: 'Shootout'
initializeMorph
boardMorph := ShootoutMeteorBoardMorph new.
boardMorph position: Display boundingBox center.
boardMorph openInWorld.
fromString: boardString
super fromString: boardString.
self initializeMorph
solvedPuzzleDo: solutionBlock
^[super solvedPuzzleDo: [:aSolution |
boardMorph flash.
(Delay forSeconds: 1) wait.
solutionBlock value: aSolution]]
ensure: [boardMorph delete]
addPiece: aPiece index: pieceNumber forRow: iRow
boardMorph addPiece: aPiece index: pieceNumber forRow: iRow.
boardMorph displayWorld.
(Delay forMilliseconds: 200) wait.
removePiece: aPiece index: pieceNumber forRow: iRow
(Delay forMilliseconds: 200) wait.
boardMorph removePiece: aPiece index: pieceNumber forRow: iRow.
boardMorph displayWorld.

And get it to work:
ShootoutMeteorInstrumentedBoard solveDefault.

Here is the first solution found:

This animation clearly confirms that we do not eliminate every island. For example after this case:

3 more trials are necessary before abandoning the solution:

And the second confirmation is that our idea to turn the board above 6th row was very clever for minimizing the possible positions set-up , but not at all optimal for abandoning bad solutions early, because some islands are rejected in upper rows once we rotate the board, as illustrated on this snapshot:

So there is room for further improvements, and in next post I'll be back to a more dumb solution. Also, I created this little movie with the first minutes of solving by simply generating the PNG with instrumented code (and an iFrame instance variable initialized to 0),

ShootoutMeteorInstrumentedBoard>>addPiece: aPiece index: pieceNumber forRow: iRow
boardMorph addPiece: aPiece index: pieceNumber forRow: iRow.
boardMorph world displayWorld.
iFrame := iFrame + 1.
boardMorph exportAsPNGNamed: 'meteor_' , (iFrame printPaddedWith: \$0 to: 5) , '.png'.
removePiece: aPiece index: pieceNumber forRow: iRow
boardMorph removePiece: aPiece index: pieceNumber forRow: iRow.
boardMorph world displayWorld.
iFrame := iFrame + 1.
boardMorph exportAsPNGNamed: 'meteor_' , (iFrame printPaddedWith: \$0 to: 5) , '.png'.

and then aggregating the PNG with iMovie (the interactive part clearly does not scale, I should better have installed ffmpeg):

This stuff is in http://ss3.gemstone.com/ss/Shootout/Shootout.blog-nice.8.mcz

## Thursday, April 12, 2012

### Meteor Contest Part 6 - Visualworks solution

The shootout benchmark is written for Visualworks, so I tried to port the meteor contest in VW7.8 non commercial.
Here are the required modifications:
1. << and >> are not understood by Integer and must be replaced by proper signed bitShift:
2. bitsDo: bitReverse: and bitCount operations are missing and must be added as extensions
3. SequenceableCollection>>reversed is not understood, the VW version is reverse.
Surprise, VW did not perform better than Cog on my mac mini. 1.5 seconds instead of 1.3s for solving the board.

Yes, but... Unlike Squeak, VW positive SmallInteger don't have 30 bits but only 29. The Squeak solution was thus a bit sub-optimal in this context (I like to read a bit in double sense).
Oh, let's just change that! The major generator of 30-bits Integer is the bitReverse: operation when we turn the board 180° once first 6 rows were filled. After the reversal, the boardMask has two rows of barrier toward north, remember:
1 1 1 1 1
1 1 1 1 1
1 0 1 1 0
1 0 0 0 0
0 1 0 0 0
0 0 0 0 0
Yes, that's one more row than necessary, this would work too:
0 0 0 0 0
1 1 1 1 1
1 0 1 1 0
1 0 0 0 0
0 1 0 0 0
0 0 0 0 0
The variant is quite easy in:
| nextFreeCell possibles colorBit iRow boardMask rowStatus |
colorMask = 0 ifTrue: [ ^solutionBlock value: (self boardStringWithPieces: pieces) ].
loopCount := loopCount + 1.
iRow := rowOff.
[(nextFreeCell := (boardMask + 1) lowBit) > twoRows]
whileTrue:
[boardMask := (iRow := iRow + 2) = 6
ifFalse: [boardMask bitShift: 0 - twoRows]].
...snip...
just change the last but one line with:
ifTrue: [(boardMask bitShift: 0 - ncol) bitReverse: sixRows - ncol]
This change makes VW slightly faster than Cog/Squeak again.

The good thing with VW is that we can play with:
TimeProfiler new
samplingInterval: 3;
profile:[ShootoutMeteorBoard solveDefault].
And this one tells us we waste a bunch of time in SequenceableCollection>>do:
Ah yes, Squeak MessageTally told that too, but I didn't trust it enough.
Deceivingly, we will have to inline this do: loop by ourselves. So we also change the end of above method:
...snip...
rowStatus :=
(iRow < 6 ifTrue: [0] ifFalse: [2]) +
(((iRow = 0 or: [iRow = 6]) and: [nextFreeCell <= ncol]) ifTrue: [1] ifFalse: [2]).
possibles := (positionsPerPiece at: rowStatus) at: nextFreeCell.
colorBit := 1.
1 to: pieces size do: [:pieceNumber |
ifFalse:
[ | positions |
positions := possibles at: pieceNumber.
1 to: positions size do: [:i |
| aPiece |
((aPiece := positions at: i) fitOnBoard: boardMask)
ifTrue:
[pieces at: pieceNumber put: (aPiece forRow: iRow).
colorBit := colorBit * 2].
^nil

And measure the time again:
[ShootoutMeteorBoard solveDefault] timeToRun.
-> 708 milliseconds (fraction omitted).

Good thing, this applies to Cog/Squeak too:
[ShootoutMeteorBoard solveDefault] timeToRun.
-> 746  (milliseconds is implicit in Squeak).

For 885075 loops, that's less than 1 microsecond per loop, not that bad after all.

For single-core oriented programs, my machine has more or less same performance than shootout reference machine, as can be verified with the reference g++ solution (which runs in 80ms as reported on the shootout site).
/usr/bin/g++ --version
i686-apple-darwin10-g++-4.2.1 (GCC) 4.2.1 (Apple Inc. build 5666) (dot 3)
Copyright (C) 2007 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

/usr/bin/g++ -m32 -c -pipe -O3 -fomit-frame-pointer meteor.gpp-4.c++ -o meteor.gpp-4.c++.o &&  \
/usr/bin/g++ -m32 meteor.gpp-4.c++.o -o meteor.gpp-4.gpp_run

time ./meteor.gpp-4.gpp_run 2098
real    0m0.103s
user    0m0.090s
sys    0m0.002s

And with -m64

real    0m0.089s
user    0m0.079s
sys    0m0.002s

I'm reluctant to play with all the g++ code generation options, but that makes roughly a factor 10 between C++ and Smalltalk for that one (not accounting for image startup time).
Sure, I played with very low level bit tricks, but hey, that was the spirit of the game.

## Tuesday, April 10, 2012

### Meteor Contest - Part 5 little progress

We have sped up individual loops enough and can now go back to a bit more reduction of combinations.
Next idea is to detect more islands cases at pre-process time. We already removed pieces having two cell groups on the same edge, but what if they have a single cell group?
For example:
...
* * * * 1
* * * 1 *
* * 1 1 *
* * 1 X X

The green Xs indicate filling of the board by previous pieces, and blue 1s is the new piece we want to insert.
It obviously creates an island indicated here by red stars *.
In above figure, there are at least 3 stars that should hold 1s, or maybe 8, because the number of 1s is always a multiple of 5 - the piece size. So the read stars could well be filled by previous pieces. But if it contains one or more 0 holes, then the blue piece cannot fit this place.
This can be detected early. We first have to modify our filling algorithm because it must stop at north most row of the piece, which indicates an open sea. For this, we set the fillMask as an instance variable.
ShootoutMeteorBoard>>fillMaskStartingAt: pos stoppingAbove: maxCell ifFoundEnough: exitBlock
(fillMask bitAnd: pos) = 0 ifFalse: [^self].
(pos > maxCell) ifTrue: [^exitBlock value].
(self canShiftE: pos) ifTrue: [self fillMaskStartingAt: (self shiftE: pos) stoppingAbove: maxCell ifFoundEnough: exitBlock].
(self canShiftNE: pos) ifTrue: [self fillMaskStartingAt: (self shiftNE: pos) stoppingAbove: maxCell ifFoundEnough: exitBlock].
(self canShiftNW: pos) ifTrue: [self fillMaskStartingAt: (self shiftNW: pos) stoppingAbove: maxCell ifFoundEnough: exitBlock].
(self canShiftW: pos) ifTrue: [self fillMaskStartingAt: (self shiftW: pos) stoppingAbove: maxCell ifFoundEnough: exitBlock].
^self
And we use it like this:
open := false.
self fillMaskStartingAt: nextFreeCellMask stoppingAbove: 1 << (fillMask highBit - 1 // ncol * ncol - 1) ifFoundEnough: [open := true].
^open
ifTrue: [0]

fillMask - aMask will just return the red stars bit pattern, unless of course the filling can reach north most row, which is a case of open sea.

1 << (fillMask highBit - 1 // ncol * ncol - 1) is the west-most cell of north-most but one cell of fillMask for testing those open seas.

And 1 + aMask bitAnd: -1 - aMask is yet another bit hack to isolate the lowest significant free cell (0 bit) and replace it with 1. We start filling at this next free cell.

Indeed, aMask bitAnd: aMask negated isolates the low bit, and the low bit of aMask + 1 is the low 0 bit of aMask, as we used it previously to detect next free cell on the board.

...00000000100 1 << (aMask lowBit - 1)

Note that we can use this new fill algorithm to detect edge or corner islands on south edge, and thus remove the old fill:
^(self findIsland: aMask) bitCount \\ 5 > 0
or: [(self findIsland: fillMask) bitCount \\ 5 > 0]

Now, if we want to find all islands created by a piece on the second row and above (that is with south row and east columns already filled with Xs) we write it:
| islands |
islands := 0.
whileFalse:
[islands := islands + (self findIsland: fillMask)].
^islands

We stop only when we have filled all the bits up to last row: (fillMask + 1 bitAnd: fillMask) = 0

To use it, we create a new class,
ShootoutMeteorPiece subclass: #ShootoutMeteorPieceWithIsland
instanceVariableNames: 'islands aPieceCouldFitIntoIsland'
classVariableNames: ''
poolDictionaries: ''
category: 'Shootout'
aPieceCouldFitIntoIsland := islands bitCount >= 5
| occupied |
[(occupied := aBoardMask bitAnd: islands) = islands
or: [aPieceCouldFitIntoIsland and: [(islands - occupied) bitCount = 5]]]

We create a factory:
^i = 0
ifFalse: [ShootoutMeteorPieceWithIsland new mask: p; islands: i]

And we change our generation of possible positions:
initializePossiblePositions
| positionsPerPiecePerCell |
positionsPerPiecePerCell := pieces collect: [:aPieceMask |
| possible iRot |
possible := (Array new: twoRows) collect: [:freeCell | Array new: 12 withAll: (ShootoutMeteorPiece new mask: 0)].
iRot := 0.
self rotationsOf: aPieceMask do: [:rotated |
iRot := iRot + 1.
self placesFor: rotated do: [:shifted |
(possible at: shifted lowBit) at: iRot put: (ShootoutMeteorPiece
mask: ((self hasEastOrWestIsland: shifted) ifTrue: [0] ifFalse: [shifted])
islands: (self islandsFor: shifted))]].
(possible at: shifted lowBit) at: iRot put: (ShootoutMeteorPiece new mask:
((self hasEastOrWestIsland: shifted) ifTrue: [0] ifFalse: [shifted]))]].
possible].
positionsPerPiece := (1 to: 4) collect: [:rowStatus |
(1 to: twoRows - (rowStatus \\ 2 * ncol)) collect: [:cellNumber |
(1 to: pieces size)
collect: [:pieceNumber |
| possible |
possible := (positionsPerPiecePerCell at: pieceNumber) at: cellNumber.
rowStatus odd ifTrue: [possible := possible collect: [:e | ShootoutMeteorPiece mask: e mask islands: 0]].
((pieceNumber = 6)
ifTrue: [possible copyFrom: rowStatus // 3 * 6 + 1 to: rowStatus // 3 * 6 + 6]
ifFalse: [possible])
reject: [:aPiece | aPiece mask = 0
or: [rowStatus odd and: [self hasSouthIsland: aPiece mask]]]]]].
Now let's try how it performs:
MessageTally spyOn: [ShootoutMeteorBoard solveDefault].
-> 885075 loops
and 1.3 to 1.4 seconds.
We did not save as many CPU cycles as loops, because our additional fitOnBoard: tests are paid on many loops.
But that's already something worth the few lines of codes.
So it's in http://ss3.gemstone.com/ss/Shootout/Shootout.blog-nice.6.mcz

## Saturday, April 7, 2012

### Meteor contest - Part 4 the SmallInteger

In previous step, we were at about 1,000,000 loops for only 10,000 productive.
And we have about a factor 100 to gain compared to C++.
I'm not sure we can be that clever and avoid any false solution.
Yes, sure, we cannot reach speed of C++ yet with our VM, even the Cog one.
But we can do something to reduce a single loop cost.

One thing to notice is that the 1st part generating all pieces positions do a lot of useless work. Until pieces reach the north edge, we repeat the same bit pattern every two rows. That's a clue indicating something is wrong.
Moreover, in 32 bit Squeak image, we only have 30 bits left for representing a positive SmallInteger.
That's exactly 6 rows. Then we'll start memory intensive LargePositiveInteger allocations, and more involving computations too.

Now the idea: instead of shifting the pieces position north, why not shift the board south every time the pair of bottom rows is full ?
Since we fill east to west and from south to north, this is doable and we then have to generate the possible pieces positions only for the first pair of row. For example
0 0 0 0 0
0 0 0 0 0
0 0 0 0 0
0 0 0 0 0
0 0 0 0 0
0 0 0 1 0
0 0 0 1 0
0 1 1 1 0
1 1 1 1 1
1 1 1 1 1
can be shifted two rows south

0 0 0 0 0
0 0 0 0 0

0 0 0 0 0
0 0 0 0 0
0 0 0 0 0
0 0 0 1 0
0 0 0 1 0
0 1 1 1 0
Since a piece can span over 5 rows, and we consider filling on row 1 and 2, the northern two rows are useless and the board requires only 6 rows, that is 30 bits, a positive SmallInteger...

Good. For the first row, we can use a specially crafted southPossiblePositions.
But how to handle north rows above row 6 and prevent them to spread out of north edge?
Well, we don't have to care of north. We can use the 180° symmetry and as soon as 6 rows is full, reverse the whole board and start at south again.

Of course, we still have to shift the solution for filling the board. To avoid creating a LargeInteger, we'll store the rowOffset shift in a new ShootoutMeteorPiece object, and use it in possible positions, and in solutions.
Object subclass: #ShootoutMeteorPiece
classVariableNames: ''
poolDictionaries: ''
category: 'Shootout'
forRow: rowOffset
row := rowOffset
We also delegate whether the piece fits on the board and how to fill the compact string solution (with implicit knowledge of our board rotation trick above 6th row).
fillSolution: aString ncol: ncol withColor: c
| offset |
^ row >= 6
ifTrue:
[offset := aString size + 1 - (row - 6 * ncol).
mask bitsDo: [:k | aString at: offset - k put: c]]
ifFalse:
[offset := row * ncol.
mask bitsDo: [:k | aString at: offset + k put: c]]

ShootoutMeteorBoard>>boardStringWithPieces: pArray
| board |
board := String new: ncell.
1 to: pArray size do: [:i | | c |
c := '0123456789*' at: i.
(pArray at: i) fillSolution: board ncol: ncol withColor: c].
^board

One subtle thing is that we already handled the symmetry by removing half orientations of piece #6. If we rotate the board, we have to consider the rotated orientations of piece #6. Hmm, this will be a bit technical to keep this optimization but we'll do.
First, we must stop generating pieces positions above twoRows:

[(cellNumber := westMask lowBit) > twoRows ifTrue: [^self].

Let now start the massacre of our previous code.
ShootoutMeteorBoard>>initializePossiblePositions
| positionsPerPiecePerCell |
positionsPerPiecePerCell := pieces collect: [:aPieceMask |
| possible iRot |
possible := (Array new: twoRows) collect: [:freeCell | Array new: 12 withAll: (ShootoutMeteorPiece new mask: 0)].
iRot := 0.
self rotationsOf: aPieceMask do: [:rotated |
iRot := iRot + 1.
self placesFor: rotated do: [:shifted |
(possible at: shifted lowBit) at: iRot put: (ShootoutMeteorPiece new mask:
((self hasEastOrWestIsland: shifted) ifTrue: [0] ifFalse: [shifted]))]].
possible].
positionsPerPiece := (1 to: 4) collect: [:rowStatus |
(1 to: twoRows - (rowStatus \\ 2 * ncol)) collect: [:cellNumber |
(1 to: pieces size)
collect: [:pieceNumber |
| possible |
possible := (positionsPerPiecePerCell at: pieceNumber) at: cellNumber.
((pieceNumber = 6)
ifTrue: [possible copyFrom: rowStatus // 3 * 6 + 1 to: rowStatus // 3 * 6 + 6]
ifFalse: [possible])
reject: [:aPiece | aPiece mask = 0
or: [rowStatus odd and: [self hasSouthIsland: aPiece mask]]]]]].
Here the rowStatus will be 1 on first row, 2 from row 2 to 6, 3 at row 10 (after we just rotated the board), 4 from rows 9 down to 7.
Notice also that we prefer to handle the cellNumber index first, before the pieceIndex, as that fits better our solve message.

Of course, since we moved the 6th piece trick above, we remove it from generation of rotations
| next |
aBlock value: (next := aPieceMask); value: (self flip: next).
5 timesRepeat:  [aBlock value: (next := self rotate: next); value: (self flip: next)]

Now that we lost the northEdge, why bother filling toward south ?
fill: aMask startingAt: pos count: countBlock
| filled |
countBlock value.
(self canShiftE: pos) ifTrue: [filled := self fill: filled startingAt: (self shiftE: pos) count: countBlock].
(self canShiftNE: pos) ifTrue: [filled := self fill: filled startingAt: (self shiftNE: pos) count: countBlock].
(self canShiftNW: pos) ifTrue: [filled := self fill: filled startingAt: (self shiftNW: pos) count: countBlock].
(self canShiftW: pos) ifTrue: [filled := self fill: filled startingAt: (self shiftW: pos) count: countBlock].
^filled

We will have to search islands only at south
or: [(self hasCornerIsland: aMask edge: southEdge edge: eastEdge)
or: [(self hasCornerIsland: aMask edge: southEdge edge: westEdge)]]

A bit more clean-ups to stop masks at sixRows:
fromString: aString
| rawString |
rawString := aString reject: [:e | e isSeparator].
ncell := rawString size.
ncol := (aString readStream upTo: Character cr) count: [:e | e isSeparator not].
twoRows := ncol * 2.
sixRows := ncol * 6.
pieces := rawString asSet asSortedArray collect: [:char |
self shiftSEmost:
(rawString inject: 0 into: [:pmask :c | pmask * 2 + (c = char ifTrue: [1] ifFalse: [0])])].
self initializePossiblePositions.

southEdge := 1 << ncol - 1.
southToNorthMasks := (0 to: 5) collect: [:i | southEdge << (ncol * i)].
eastEdge := 1<<sixRows-1/southEdge.
eastToWestMasks := (0 to: ncol - 1) collect: [:i | eastEdge << i].

^String new: ncol * 2 + 1 * 6 streamContents: [:aStream | self printMask: aPieceMask on: aStream]

6 to: 1 by: -1 do: [:irow |
irow odd ifTrue: [aStream space].
1 to: ncol do: [:icol |
aStream
ifTrue: [\$0]
ifFalse: [\$1]);
space].
aStream cr]

We are ready to write our new solving algorithm:
| nextFreeCell possibles colorBit iRow boardMask rowStatus |
colorMask = 0 ifTrue: [ ^solutionBlock value: (self boardStringWithPieces: pieces) ].
loopCount := loopCount + 1.
iRow := rowOff.
[(nextFreeCell := (boardMask + 1) lowBit) > twoRows]
whileTrue:
[boardMask := (iRow := iRow + 2) = 6
rowStatus :=
(iRow < 6 ifTrue: [0] ifFalse: [2]) +
(((iRow = 0 or: [iRow = 6]) and: [nextFreeCell <= ncol]) ifTrue: [1] ifFalse: [2]).
possibles := (positionsPerPiece at: rowStatus) at: nextFreeCell.
colorBit := 1.
1 to: pieces size do: [:pieceNumber |
ifFalse: [(possibles at: pieceNumber) do: [:aPiece |
ifTrue:
[pieces at: pieceNumber put: (aPiece forRow: iRow).
colorBit := colorBit * 2].
^nil

solvedPuzzleDo: solutionBlock
loopCount := 0.
self searchPuzzlesWithColorMask: 1 << pieces size - 1 boardMask: 0 rowOffset: 0 pieces: pieces copy ifFound: solutionBlock.
^loopCount

That is a bit more complex right now, that's the tribute to optimizations. Especially the rowStatus management cries for a refactoring. But right now, we just focus on CPU efficiency, not source sustainability.
Note that bitReverse: may be absent from a Pharo image. Just pick it in Squeak.
What is interesting is to spy what occurs at bit reversal
Just before the first reversal:
0 0 0 0 0
0 0 0 1 0
0 0 0 0 1
0 1 1 0 1
1 1 1 1 1
1 1 1 1 1
And just after:
1 1 1 1 1
1 1 1 1 1
1 0 1 1 0
1 0 0 0 0
0 1 0 0 0
0 0 0 0 0
The now two north-most rows are filled with ones, and it's on purpose that we did not shift before reversing, to keep a barrier toward north.