Tuesday, July 12, 2011

Xtreams version of the optimized prime factors factorial

I now want to use Xtreams on the optimized version of primeFactorFactorial:

optimizedPrimeFactorFactorialViaXtreams
    | primes uniqPowers highestPower prevPower primeGroup primesGroupedByPower bitIterator terms numGroupsEstimate |

Forming a generator of primes:

     primes := [:out | self orLessAllPrimesDo: [:prime | out put: prime ] ] reading.

Removing first prime, because we will handle power of 2 by a final bitShift:

     primes get.

Forming two collections, the group of primes having the same power, and their unique powers.

    numGroupsEstimate := 1 << (self highBit // 2).
    uniqPowers := (Array new: numGroupsEstimate) writing.
    primesGroupedByPower := (Array new: numGroupsEstimate) writing.
    prevPower := nil.
    primeGroup := Array new writing.
    primes do: [:p |
        | primePower |
        primePower := self - (self sumDigitsInBase: p) // (p - 1).
        prevPower
            ifNil:
                [uniqPowers put: primePower.
                prevPower := highestPower := primePower]
            ifNotNil: [prevPower ~= (prevPower := primePower)
                ifTrue:
                    [primesGroupedByPower put: primeGroup conclusion reading.
                    uniqPowers put: primePower.
                    primeGroup := Array new writing]].
        primeGroup put: p].
    primesGroupedByPower put: primeGroup conclusion reading.
    primesGroupedByPower := primesGroupedByPower conclusion reading.
    uniqPowers := uniqPowers conclusion reading.

Note that this part is not lazy, because I later want to reuse these two collections several times. This part is not the most elegant I came up with. It is much like using a (ending:) slicing sequence, but ending: is not splitting the groups where I want to...
Another option would be to count the number of equal powers, and use that in a (limiting:) slicing sequence, but limiting: is not taking a block argument...

Forming an iterator on the powers bits:

    bitIterator := [:out |
        | bitProbe |
        bitProbe := 1.
        [ bitProbe <= highestPower ]
            whileTrue:
            [out put: bitProbe.
            bitProbe := bitProbe bitShift: 1 ] ] reading.

For each bit, grouping primes having the bit set in their associated power, and raise the product of this group to the power of corresponding bit:

    terms := bitIterator collect: [:bit |
        primesGroupedByPower += 0.
        uniqPowers += 0.
        (primesGroupedByPower selecting: [:pp |
            pp += 0.
            (uniqPowers get bitAnd: bit) ~= 0])
                stitching rest karatsubaProduct raisedToInteger: bit].

Note that stitching is concatenating the substreams into a single stream, while += 0 is resetting the streams at the beginning.

Performing the product of products and shifting to account the power of 2:

    ^terms karatsubaProduct bitShift: self - self bitCount

There is still an operation I didn't Xtreamize, the karatsubaProduct. It consists in a recursive split. Not sure how to do it...

Now let us see if timing is decent:
Smalltalk garbageCollect.
[50000 optimizedPrimeFactorFactorialViaXtreams] timeToRun
-> 1321

Approximately the same as the non Xtreams version (+4%). That's pretty good.
But this version has a tricky bug: stitching is raising Incomplete if there is no substream, and this happens for some numbers. For example,
     9 optimizedPrimeFactorFactorialViaXtreams
produces the groups ((3) (5 7)) with the powers (4 1). When the bitIterator generator is evaluating with bit = 2, there is no group having a power with this bit set, Incomplete is raised by stitching, and caught by the generator which closes the stream without the final term (3 raisedTo: 4). Thus it returns a wrong result... Ouch! That's one reason for I dislike Exceptions.

My single test on a huge number did not fail, and I only discovered the bug with the generator free version:

optimizedPrimeFactorFactorialViaXtreams2
    | uniqPowers highestPower prevPower primeGroup primesGroupedByPower terms numGroupsEstimate |
    numGroupsEstimate := 1 << (self highBit >> 1).
    uniqPowers := (Array new: numGroupsEstimate) writing.
    primesGroupedByPower := (Array new: numGroupsEstimate) writing.
    prevPower := nil.
    highestPower := 0.
    primeGroup := Array new writing.
    Integer primesUpTo: self + 1 do: [:p | p = 2 ifFalse: [
        | primePower |
        primePower := self - (self sumDigitsInBase: p) // (p - 1).
        prevPower
            ifNil:
                [uniqPowers put: primePower.
                prevPower := highestPower := primePower]
            ifNotNil: [prevPower ~= (prevPower := primePower)
                ifTrue:
                    [primesGroupedByPower put: primeGroup conclusion reading.
                    uniqPowers put: primePower.
                    primeGroup := Array new writing]].
        primeGroup put: p]].
    primesGroupedByPower put: primeGroup conclusion reading.
    primesGroupedByPower := primesGroupedByPower conclusion reading.
    uniqPowers := uniqPowers conclusion reading.
    terms := (1 to: highestPower highBit) collect: [:bit |
        primesGroupedByPower += 0.
        uniqPowers += 0.
        [| product |
        product := (primesGroupedByPower selecting: [:pp |
            pp += 0.
            (uniqPowers get bitAt: bit) ~= 0])
                stitching rest karatsubaProduct.
        1 to: bit - 1 do: [:i | product := product squared].
        product] on: Incomplete do: [1]].
    ^terms karatsubaProduct bitShift: self - self bitCount

Overall, I find Xtreams powerfull, far more than Smalltalk Stream. But there are still things lacking. For example, the grouping was not obvious. The stitching of empty streams was surprising. I might need a readingStitching that works with an empty stream.

Xtreams is interesting both for its functional like aspects and for its potential use in parallelism, a lot like this. From this later point of view, I find it easy to pipe serial operations with Xtreams in case of a single assembly line. It also seems possible to fork processing for each element when they are independent (this is data parallelism - and could be applied for performing karatsubaProduct of each term in above example, and thus get a very efficient parallel version of factorial).
On the other hand, having two parallel assembly lines does not seem that easy. There is a duplicating: operation that should allow such parallel streams, but it necessarily uses a writeStream. I'd like the assembly chain to be as easy as:

fork := [:in :out1 :out2 | | item | item := in get. out1 put: item. out2 put: item] forking.
line1 := [:in :out | out put: in get op1] filtering.
line2 := [:in :out | out put: in get op2] filtering.
join := [:in1 :in2 :out | out put: (in1 get op3: in2 get)] joining.
factory := generator | fork | (line1 & line2) | join.

Funily, I used unix pipe | to denote serial and ampersand & to denote parallel, but I find the contrary more natural: || makes me think of parallelism, and & of sequencing. Just a point of view...

From above point of view, the differences between read and write streams seems a bit arbitrary. Filters are inherently both. The Generator pattern could be used to turn any WriteStream (push) into a ReadStream (pull), but it's current implementation has potential dead lock conditions. But wait, it's possible I missed some features, or miss-used some others, Xtreams is evolving rapidly. There is a new post about using Xtreams with parallel VM, I have to inquire a bit more!

No comments:

Post a Comment