Saturday, April 7, 2012

Meteor contest - Part 4 the SmallInteger

In previous step, we were at about 1,000,000 loops for only 10,000 productive.
And we have about a factor 100 to gain compared to C++.
I'm not sure we can be that clever and avoid any false solution.
Yes, sure, we cannot reach speed of C++ yet with our VM, even the Cog one.
But we can do something to reduce a single loop cost.

One thing to notice is that the 1st part generating all pieces positions do a lot of useless work. Until pieces reach the north edge, we repeat the same bit pattern every two rows. That's a clue indicating something is wrong.
Moreover, in 32 bit Squeak image, we only have 30 bits left for representing a positive SmallInteger.
That's exactly 6 rows. Then we'll start memory intensive LargePositiveInteger allocations, and more involving computations too.

Now the idea: instead of shifting the pieces position north, why not shift the board south every time the pair of bottom rows is full ?
Since we fill east to west and from south to north, this is doable and we then have to generate the possible pieces positions only for the first pair of row. For example
0 0 0 0 0
 0 0 0 0 0
0 0 0 0 0
 0 0 0 0 0
0 0 0 0 0
 0 0 0 1 0
0 0 0 1 0
 0 1 1 1 0
1 1 1 1 1
 1 1 1 1 1
can be shifted two rows south

0 0 0 0 0
 0 0 0 0 0

0 0 0 0 0
 0 0 0 0 0
0 0 0 0 0
 0 0 0 1 0
0 0 0 1 0
 0 1 1 1 0 
Since a piece can span over 5 rows, and we consider filling on row 1 and 2, the northern two rows are useless and the board requires only 6 rows, that is 30 bits, a positive SmallInteger...

Good. For the first row, we can use a specially crafted southPossiblePositions.
But how to handle north rows above row 6 and prevent them to spread out of north edge?
Well, we don't have to care of north. We can use the 180° symmetry and as soon as 6 rows is full, reverse the whole board and start at south again.

Of course, we still have to shift the solution for filling the board. To avoid creating a LargeInteger, we'll store the rowOffset shift in a new ShootoutMeteorPiece object, and use it in possible positions, and in solutions.
Object subclass: #ShootoutMeteorPiece
    instanceVariableNames: 'mask row'
    classVariableNames: ''
    poolDictionaries: ''
    category: 'Shootout'
mask: aPieceMask
    mask := aPieceMask
forRow: rowOffset
    row := rowOffset
mask
    ^mask
We also delegate whether the piece fits on the board and how to fill the compact string solution (with implicit knowledge of our board rotation trick above 6th row).
fitOnBoard: aBoardMask
    ^0 == (aBoardMask bitAnd: mask)
fillSolution: aString ncol: ncol withColor: c
    | offset |
    ^ row >= 6
        ifTrue:
            [offset := aString size + 1 - (row - 6 * ncol).
            mask bitsDo: [:k | aString at: offset - k put: c]]
        ifFalse:
            [offset := row * ncol.
            mask bitsDo: [:k | aString at: offset + k put: c]]

ShootoutMeteorBoard>>boardStringWithPieces: pArray
    | board |
    board := String new: ncell.
    1 to: pArray size do: [:i | | c |
        c := '0123456789*' at: i.
        (pArray at: i) fillSolution: board ncol: ncol withColor: c].
    ^board

One subtle thing is that we already handled the symmetry by removing half orientations of piece #6. If we rotate the board, we have to consider the rotated orientations of piece #6. Hmm, this will be a bit technical to keep this optimization but we'll do.
First, we must stop generating pieces positions above twoRows:
ShootoutMeteorBoard>>placesFor: aPieceMask do: aBlock
    | westMask eastMask cellNumber |
    eastMask := self shiftSEmost: aPieceMask.
  
    [[westMask := eastMask.
    [(cellNumber := westMask lowBit) > twoRows ifTrue: [^self].
    (self hasEastOrWestIsland: westMask) ifFalse: [aBlock value: westMask].
    self canShiftW: westMask] whileTrue: [westMask := self shiftW: westMask].
    self canShiftNE: eastMask] whileTrue: [eastMask := self shiftNE: eastMask].
    self canShiftNW: eastMask] whileTrue: [eastMask := self shiftNW: eastMask]

Let now start the massacre of our previous code.
ShootoutMeteorBoard>>initializePossiblePositions
    | positionsPerPiecePerCell |
    positionsPerPiecePerCell := pieces collect: [:aPieceMask |
        | possible iRot |
        possible := (Array new: twoRows) collect: [:freeCell | Array new: 12 withAll: (ShootoutMeteorPiece new mask: 0)].
        iRot := 0.
        self rotationsOf: aPieceMask do: [:rotated |
            iRot := iRot + 1.
            self placesFor: rotated do: [:shifted |
                (possible at: shifted lowBit) at: iRot put: (ShootoutMeteorPiece new mask:
                    ((self hasEastOrWestIsland: shifted) ifTrue: [0] ifFalse: [shifted]))]].
        possible].
    positionsPerPiece := (1 to: 4) collect: [:rowStatus |
        (1 to: twoRows - (rowStatus \\ 2 * ncol)) collect: [:cellNumber |
            (1 to: pieces size)
                collect: [:pieceNumber |
                    | possible |
                    possible := (positionsPerPiecePerCell at: pieceNumber) at: cellNumber.
                     ((pieceNumber = 6)
                        ifTrue: [possible copyFrom: rowStatus // 3 * 6 + 1 to: rowStatus // 3 * 6 + 6]
                        ifFalse: [possible])
                            reject: [:aPiece | aPiece mask = 0
                                or: [rowStatus odd and: [self hasSouthIsland: aPiece mask]]]]]].
Here the rowStatus will be 1 on first row, 2 from row 2 to 6, 3 at row 10 (after we just rotated the board), 4 from rows 9 down to 7.
Notice also that we prefer to handle the cellNumber index first, before the pieceIndex, as that fits better our solve message.

Of course, since we moved the 6th piece trick above, we remove it from generation of rotations
rotationsOf: aPieceMask do: aBlock
    | next |
    aBlock value: (next := aPieceMask); value: (self flip: next).
    5 timesRepeat:  [aBlock value: (next := self rotate: next); value: (self flip: next)]

Now that we lost the northEdge, why bother filling toward south ?
fill: aMask startingAt: pos count: countBlock
    | filled |
    (aMask bitAnd: pos) = 0 ifFalse: [^aMask].
    countBlock value.
    filled := aMask + pos.
    (self canShiftE: pos) ifTrue: [filled := self fill: filled startingAt: (self shiftE: pos) count: countBlock].
    (self canShiftNE: pos) ifTrue: [filled := self fill: filled startingAt: (self shiftNE: pos) count: countBlock].
    (self canShiftNW: pos) ifTrue: [filled := self fill: filled startingAt: (self shiftNW: pos) count: countBlock].
    (self canShiftW: pos) ifTrue: [filled := self fill: filled startingAt: (self shiftW: pos) count: countBlock].
    ^filled

We will have to search islands only at south
hasSouthIsland: aMask
    ^(self hasInsetZero: (southEdge bitAnd: aMask))
    or: [(self hasCornerIsland: aMask edge: southEdge edge: eastEdge)
    or: [(self hasCornerIsland: aMask edge: southEdge edge: westEdge)]]

A bit more clean-ups to stop masks at sixRows:
fromString: aString
    | rawString |
    rawString := aString reject: [:e | e isSeparator].
    ncell := rawString size.
    ncol := (aString readStream upTo: Character cr) count: [:e | e isSeparator not].
    twoRows := ncol * 2.
    sixRows := ncol * 6.
    self initializeRowColMasks.
    pieces := rawString asSet asSortedArray collect: [:char |
        self shiftSEmost:
            (rawString inject: 0 into: [:pmask :c | pmask * 2 + (c = char ifTrue: [1] ifFalse: [0])])].
    self initializePossiblePositions.

initializeRowColMasks
    southEdge := 1 << ncol - 1.
    southToNorthMasks := (0 to: 5) collect: [:i | southEdge << (ncol * i)].
    eastEdge := 1<<sixRows-1/southEdge.
    eastToWestMasks := (0 to: ncol - 1) collect: [:i | eastEdge << i].
    westEdge := eastToWestMasks last.
    oddRowsMask := 1<<sixRows-1/(1<<twoRows-1)*southEdge.
    evenRowsMask := oddRowsMask << ncol.
    northWestMask := westEdge bitAnd: evenRowsMask.
    northEastMask := eastEdge bitAnd: oddRowsMask.
    southWestMask := southEdge bitOr: (westEdge bitAnd: evenRowsMask).
    southEastMask := southEdge bitOr: (eastEdge bitAnd: oddRowsMask).

printMask: aPieceMask
    ^String new: ncol * 2 + 1 * 6 streamContents: [:aStream | self printMask: aPieceMask on: aStream]

printMask: aPieceMask on: aStream
    6 to: 1 by: -1 do: [:irow |
        | mask |
        irow odd ifTrue: [aStream space].
        mask := 1 << (southToNorthMasks at: irow) highBit.
        1 to: ncol do: [:icol |
            aStream
                nextPut: ((aPieceMask bitAnd: (mask := mask >> 1)) = 0
                    ifTrue: [$0]
                    ifFalse: [$1]);
                space].
        aStream cr]

We are ready to write our new solving algorithm:
searchPuzzlesWithColorMask: colorMask boardMask: bMask rowOffset: rowOff pieces: pArray ifFound: solutionBlock
    | nextFreeCell possibles colorBit iRow boardMask rowStatus |
    colorMask = 0 ifTrue: [ ^solutionBlock value: (self boardStringWithPieces: pieces) ].
    loopCount := loopCount + 1.
    boardMask := bMask.
    iRow := rowOff.
    [(nextFreeCell := (boardMask + 1) lowBit) > twoRows]
        whileTrue:
            [boardMask := (iRow := iRow + 2) = 6
                ifTrue: [boardMask bitReverse: sixRows]
                ifFalse: [ boardMask >> twoRows]].
    rowStatus :=
        (iRow < 6 ifTrue: [0] ifFalse: [2]) +
        (((iRow = 0 or: [iRow = 6]) and: [nextFreeCell <= ncol]) ifTrue: [1] ifFalse: [2]).
    possibles := (positionsPerPiece at: rowStatus) at: nextFreeCell.
    colorBit := 1.
    1 to: pieces size do: [:pieceNumber |
        (colorMask bitAnd: colorBit) = 0
            ifFalse: [(possibles at: pieceNumber) do: [:aPiece |
                (aPiece fitOnBoard: boardMask)
                    ifTrue:
                        [pieces at: pieceNumber put: (aPiece forRow: iRow).
                        self searchPuzzlesWithColorMask: colorMask - colorBit boardMask: boardMask + aPiece mask rowOffset: iRow pieces: pArray ifFound: solutionBlock]]].
        colorBit := colorBit * 2].
    ^nil

solvedPuzzleDo: solutionBlock
    loopCount := 0.
    self searchPuzzlesWithColorMask: 1 << pieces size - 1 boardMask: 0 rowOffset: 0 pieces: pieces copy ifFound: solutionBlock.
    ^loopCount

That is a bit more complex right now, that's the tribute to optimizations. Especially the rowStatus management cries for a refactoring. But right now, we just focus on CPU efficiency, not source sustainability.
Note that bitReverse: may be absent from a Pharo image. Just pick it in Squeak.
What is interesting is to spy what occurs at bit reversal
Just before the first reversal:
0 0 0 0 0
 0 0 0 1 0
0 0 0 0 1
 0 1 1 0 1
1 1 1 1 1
 1 1 1 1 1
And just after:
1 1 1 1 1
 1 1 1 1 1
1 0 1 1 0
 1 0 0 0 0
0 1 0 0 0
 0 0 0 0 0
The now two north-most rows are filled with ones, and it's on purpose that we did not shift before reversing, to keep a barrier toward north.

What about the performance?

MessageTally spyOn: [ShootoutMeteorBoard solveDefault].
->1218297 loops

That's a 20% more loops than Part3, but that's possible, we didn't use the same order for filling the board, especially after the rotation we started back at southEast in a rather less constrained area, and we delayed detection of bad boards.
But now, the cost is about 1.7 seconds instead of 7.

Great! This would place Smalltalk at a not so ridiculous rank among those shootout languages.
And we still have room to reduce the number of combinations.
The code is at http://ss3.gemstone.com/ss/Shootout/Shootout.blog-nice.5.mcz

No comments:

Post a Comment