## Saturday, April 7, 2012

### Meteor contest - Part 4 the SmallInteger

In previous step, we were at about 1,000,000 loops for only 10,000 productive.
And we have about a factor 100 to gain compared to C++.
I'm not sure we can be that clever and avoid any false solution.
Yes, sure, we cannot reach speed of C++ yet with our VM, even the Cog one.
But we can do something to reduce a single loop cost.

One thing to notice is that the 1st part generating all pieces positions do a lot of useless work. Until pieces reach the north edge, we repeat the same bit pattern every two rows. That's a clue indicating something is wrong.
Moreover, in 32 bit Squeak image, we only have 30 bits left for representing a positive SmallInteger.
That's exactly 6 rows. Then we'll start memory intensive LargePositiveInteger allocations, and more involving computations too.

Now the idea: instead of shifting the pieces position north, why not shift the board south every time the pair of bottom rows is full ?
Since we fill east to west and from south to north, this is doable and we then have to generate the possible pieces positions only for the first pair of row. For example
0 0 0 0 0
0 0 0 0 0
0 0 0 0 0
0 0 0 0 0
0 0 0 0 0
0 0 0 1 0
0 0 0 1 0
0 1 1 1 0
1 1 1 1 1
1 1 1 1 1
can be shifted two rows south

0 0 0 0 0
0 0 0 0 0

0 0 0 0 0
0 0 0 0 0
0 0 0 0 0
0 0 0 1 0
0 0 0 1 0
0 1 1 1 0
Since a piece can span over 5 rows, and we consider filling on row 1 and 2, the northern two rows are useless and the board requires only 6 rows, that is 30 bits, a positive SmallInteger...

Good. For the first row, we can use a specially crafted southPossiblePositions.
But how to handle north rows above row 6 and prevent them to spread out of north edge?
Well, we don't have to care of north. We can use the 180° symmetry and as soon as 6 rows is full, reverse the whole board and start at south again.

Of course, we still have to shift the solution for filling the board. To avoid creating a LargeInteger, we'll store the rowOffset shift in a new ShootoutMeteorPiece object, and use it in possible positions, and in solutions.
Object subclass: #ShootoutMeteorPiece
classVariableNames: ''
poolDictionaries: ''
category: 'Shootout'
forRow: rowOffset
row := rowOffset
We also delegate whether the piece fits on the board and how to fill the compact string solution (with implicit knowledge of our board rotation trick above 6th row).
fillSolution: aString ncol: ncol withColor: c
| offset |
^ row >= 6
ifTrue:
[offset := aString size + 1 - (row - 6 * ncol).
mask bitsDo: [:k | aString at: offset - k put: c]]
ifFalse:
[offset := row * ncol.
mask bitsDo: [:k | aString at: offset + k put: c]]

ShootoutMeteorBoard>>boardStringWithPieces: pArray
| board |
board := String new: ncell.
1 to: pArray size do: [:i | | c |
c := '0123456789*' at: i.
(pArray at: i) fillSolution: board ncol: ncol withColor: c].
^board

One subtle thing is that we already handled the symmetry by removing half orientations of piece #6. If we rotate the board, we have to consider the rotated orientations of piece #6. Hmm, this will be a bit technical to keep this optimization but we'll do.
First, we must stop generating pieces positions above twoRows:

[(cellNumber := westMask lowBit) > twoRows ifTrue: [^self].

Let now start the massacre of our previous code.
ShootoutMeteorBoard>>initializePossiblePositions
| positionsPerPiecePerCell |
positionsPerPiecePerCell := pieces collect: [:aPieceMask |
| possible iRot |
possible := (Array new: twoRows) collect: [:freeCell | Array new: 12 withAll: (ShootoutMeteorPiece new mask: 0)].
iRot := 0.
self rotationsOf: aPieceMask do: [:rotated |
iRot := iRot + 1.
self placesFor: rotated do: [:shifted |
(possible at: shifted lowBit) at: iRot put: (ShootoutMeteorPiece new mask:
((self hasEastOrWestIsland: shifted) ifTrue: [0] ifFalse: [shifted]))]].
possible].
positionsPerPiece := (1 to: 4) collect: [:rowStatus |
(1 to: twoRows - (rowStatus \\ 2 * ncol)) collect: [:cellNumber |
(1 to: pieces size)
collect: [:pieceNumber |
| possible |
possible := (positionsPerPiecePerCell at: pieceNumber) at: cellNumber.
((pieceNumber = 6)
ifTrue: [possible copyFrom: rowStatus // 3 * 6 + 1 to: rowStatus // 3 * 6 + 6]
ifFalse: [possible])
reject: [:aPiece | aPiece mask = 0
or: [rowStatus odd and: [self hasSouthIsland: aPiece mask]]]]]].
Here the rowStatus will be 1 on first row, 2 from row 2 to 6, 3 at row 10 (after we just rotated the board), 4 from rows 9 down to 7.
Notice also that we prefer to handle the cellNumber index first, before the pieceIndex, as that fits better our solve message.

Of course, since we moved the 6th piece trick above, we remove it from generation of rotations
| next |
aBlock value: (next := aPieceMask); value: (self flip: next).
5 timesRepeat:  [aBlock value: (next := self rotate: next); value: (self flip: next)]

Now that we lost the northEdge, why bother filling toward south ?
fill: aMask startingAt: pos count: countBlock
| filled |
countBlock value.
(self canShiftE: pos) ifTrue: [filled := self fill: filled startingAt: (self shiftE: pos) count: countBlock].
(self canShiftNE: pos) ifTrue: [filled := self fill: filled startingAt: (self shiftNE: pos) count: countBlock].
(self canShiftNW: pos) ifTrue: [filled := self fill: filled startingAt: (self shiftNW: pos) count: countBlock].
(self canShiftW: pos) ifTrue: [filled := self fill: filled startingAt: (self shiftW: pos) count: countBlock].
^filled

We will have to search islands only at south
or: [(self hasCornerIsland: aMask edge: southEdge edge: eastEdge)
or: [(self hasCornerIsland: aMask edge: southEdge edge: westEdge)]]

A bit more clean-ups to stop masks at sixRows:
fromString: aString
| rawString |
rawString := aString reject: [:e | e isSeparator].
ncell := rawString size.
ncol := (aString readStream upTo: Character cr) count: [:e | e isSeparator not].
twoRows := ncol * 2.
sixRows := ncol * 6.
pieces := rawString asSet asSortedArray collect: [:char |
self shiftSEmost:
(rawString inject: 0 into: [:pmask :c | pmask * 2 + (c = char ifTrue: [1] ifFalse: [0])])].
self initializePossiblePositions.

southEdge := 1 << ncol - 1.
southToNorthMasks := (0 to: 5) collect: [:i | southEdge << (ncol * i)].
eastEdge := 1<<sixRows-1/southEdge.
eastToWestMasks := (0 to: ncol - 1) collect: [:i | eastEdge << i].

^String new: ncol * 2 + 1 * 6 streamContents: [:aStream | self printMask: aPieceMask on: aStream]

6 to: 1 by: -1 do: [:irow |
irow odd ifTrue: [aStream space].
1 to: ncol do: [:icol |
aStream
ifTrue: [\$0]
ifFalse: [\$1]);
space].
aStream cr]

We are ready to write our new solving algorithm:
| nextFreeCell possibles colorBit iRow boardMask rowStatus |
colorMask = 0 ifTrue: [ ^solutionBlock value: (self boardStringWithPieces: pieces) ].
loopCount := loopCount + 1.
iRow := rowOff.
[(nextFreeCell := (boardMask + 1) lowBit) > twoRows]
whileTrue:
[boardMask := (iRow := iRow + 2) = 6
rowStatus :=
(iRow < 6 ifTrue: [0] ifFalse: [2]) +
(((iRow = 0 or: [iRow = 6]) and: [nextFreeCell <= ncol]) ifTrue: [1] ifFalse: [2]).
possibles := (positionsPerPiece at: rowStatus) at: nextFreeCell.
colorBit := 1.
1 to: pieces size do: [:pieceNumber |
ifFalse: [(possibles at: pieceNumber) do: [:aPiece |
ifTrue:
[pieces at: pieceNumber put: (aPiece forRow: iRow).
colorBit := colorBit * 2].
^nil

solvedPuzzleDo: solutionBlock
loopCount := 0.
self searchPuzzlesWithColorMask: 1 << pieces size - 1 boardMask: 0 rowOffset: 0 pieces: pieces copy ifFound: solutionBlock.
^loopCount

That is a bit more complex right now, that's the tribute to optimizations. Especially the rowStatus management cries for a refactoring. But right now, we just focus on CPU efficiency, not source sustainability.
Note that bitReverse: may be absent from a Pharo image. Just pick it in Squeak.
What is interesting is to spy what occurs at bit reversal
Just before the first reversal:
0 0 0 0 0
0 0 0 1 0
0 0 0 0 1
0 1 1 0 1
1 1 1 1 1
1 1 1 1 1
And just after:
1 1 1 1 1
1 1 1 1 1
1 0 1 1 0
1 0 0 0 0
0 1 0 0 0
0 0 0 0 0
The now two north-most rows are filled with ones, and it's on purpose that we did not shift before reversing, to keep a barrier toward north.

MessageTally spyOn: [ShootoutMeteorBoard solveDefault].
->1218297 loops

That's a 20% more loops than Part3, but that's possible, we didn't use the same order for filling the board, especially after the rotation we started back at southEast in a rather less constrained area, and we delayed detection of bad boards.