## Tuesday, July 12, 2011

### Xtreams version of the optimized prime factors factorial

I now want to use Xtreams on the optimized version of primeFactorFactorial:

optimizedPrimeFactorFactorialViaXtreams
| primes uniqPowers highestPower prevPower primeGroup primesGroupedByPower bitIterator terms numGroupsEstimate |

Forming a generator of primes:

primes := [:out | self orLessAllPrimesDo: [:prime | out put: prime ] ] reading.

Removing first prime, because we will handle power of 2 by a final bitShift:

primes get.

Forming two collections, the group of primes having the same power, and their unique powers.

numGroupsEstimate := 1 << (self highBit // 2).
uniqPowers := (Array new: numGroupsEstimate) writing.
primesGroupedByPower := (Array new: numGroupsEstimate) writing.
prevPower := nil.
primeGroup := Array new writing.
primes do: [:p |
| primePower |
primePower := self - (self sumDigitsInBase: p) // (p - 1).
prevPower
ifNil:
[uniqPowers put: primePower.
prevPower := highestPower := primePower]
ifNotNil: [prevPower ~= (prevPower := primePower)
ifTrue:
uniqPowers put: primePower.
primeGroup := Array new writing]].
primeGroup put: p].

Note that this part is not lazy, because I later want to reuse these two collections several times. This part is not the most elegant I came up with. It is much like using a (ending:) slicing sequence, but ending: is not splitting the groups where I want to...
Another option would be to count the number of equal powers, and use that in a (limiting:) slicing sequence, but limiting: is not taking a block argument...

Forming an iterator on the powers bits:

bitIterator := [:out |
| bitProbe |
bitProbe := 1.
[ bitProbe <= highestPower ]
whileTrue:
[out put: bitProbe.
bitProbe := bitProbe bitShift: 1 ] ] reading.

For each bit, grouping primes having the bit set in their associated power, and raise the product of this group to the power of corresponding bit:

terms := bitIterator collect: [:bit |
primesGroupedByPower += 0.
uniqPowers += 0.
(primesGroupedByPower selecting: [:pp |
pp += 0.
(uniqPowers get bitAnd: bit) ~= 0])
stitching rest karatsubaProduct raisedToInteger: bit].

Note that stitching is concatenating the substreams into a single stream, while += 0 is resetting the streams at the beginning.

Performing the product of products and shifting to account the power of 2:

^terms karatsubaProduct bitShift: self - self bitCount

There is still an operation I didn't Xtreamize, the karatsubaProduct. It consists in a recursive split. Not sure how to do it...

Now let us see if timing is decent:
Smalltalk garbageCollect.
[50000 optimizedPrimeFactorFactorialViaXtreams] timeToRun
-> 1321

Approximately the same as the non Xtreams version (+4%). That's pretty good.
But this version has a tricky bug: stitching is raising Incomplete if there is no substream, and this happens for some numbers. For example,
9 optimizedPrimeFactorFactorialViaXtreams
produces the groups ((3) (5 7)) with the powers (4 1). When the bitIterator generator is evaluating with bit = 2, there is no group having a power with this bit set, Incomplete is raised by stitching, and caught by the generator which closes the stream without the final term (3 raisedTo: 4). Thus it returns a wrong result... Ouch! That's one reason for I dislike Exceptions.

My single test on a huge number did not fail, and I only discovered the bug with the generator free version:

optimizedPrimeFactorFactorialViaXtreams2
| uniqPowers highestPower prevPower primeGroup primesGroupedByPower terms numGroupsEstimate |
numGroupsEstimate := 1 << (self highBit >> 1).
uniqPowers := (Array new: numGroupsEstimate) writing.
primesGroupedByPower := (Array new: numGroupsEstimate) writing.
prevPower := nil.
highestPower := 0.
primeGroup := Array new writing.
Integer primesUpTo: self + 1 do: [:p | p = 2 ifFalse: [
| primePower |
primePower := self - (self sumDigitsInBase: p) // (p - 1).
prevPower
ifNil:
[uniqPowers put: primePower.
prevPower := highestPower := primePower]
ifNotNil: [prevPower ~= (prevPower := primePower)
ifTrue:
uniqPowers put: primePower.
primeGroup := Array new writing]].
primeGroup put: p]].
terms := (1 to: highestPower highBit) collect: [:bit |
primesGroupedByPower += 0.
uniqPowers += 0.
[| product |
product := (primesGroupedByPower selecting: [:pp |
pp += 0.
(uniqPowers get bitAt: bit) ~= 0])
stitching rest karatsubaProduct.
1 to: bit - 1 do: [:i | product := product squared].
product] on: Incomplete do: [1]].
^terms karatsubaProduct bitShift: self - self bitCount

Overall, I find Xtreams powerfull, far more than Smalltalk Stream. But there are still things lacking. For example, the grouping was not obvious. The stitching of empty streams was surprising. I might need a readingStitching that works with an empty stream.

Xtreams is interesting both for its functional like aspects and for its potential use in parallelism, a lot like this. From this later point of view, I find it easy to pipe serial operations with Xtreams in case of a single assembly line. It also seems possible to fork processing for each element when they are independent (this is data parallelism - and could be applied for performing karatsubaProduct of each term in above example, and thus get a very efficient parallel version of factorial).
On the other hand, having two parallel assembly lines does not seem that easy. There is a duplicating: operation that should allow such parallel streams, but it necessarily uses a writeStream. I'd like the assembly chain to be as easy as:

fork := [:in :out1 :out2 | | item | item := in get. out1 put: item. out2 put: item] forking.
line1 := [:in :out | out put: in get op1] filtering.
line2 := [:in :out | out put: in get op2] filtering.
join := [:in1 :in2 :out | out put: (in1 get op3: in2 get)] joining.
factory := generator | fork | (line1 & line2) | join.

Funily, I used unix pipe | to denote serial and ampersand & to denote parallel, but I find the contrary more natural: || makes me think of parallelism, and & of sequencing. Just a point of view...

From above point of view, the differences between read and write streams seems a bit arbitrary. Filters are inherently both. The Generator pattern could be used to turn any WriteStream (push) into a ReadStream (pull), but it's current implementation has potential dead lock conditions. But wait, it's possible I missed some features, or miss-used some others, Xtreams is evolving rapidly. There is a new post about using Xtreams with parallel VM, I have to inquire a bit more!