Version VMMaker.oscog-nice.271 is quite safe on a little endian machine, Kernel tests are passing (you need to merge the Kernel version which is provided in same repository along with tests to update the montgomeryDigitLength to 32 instead of 8 as currently hardcoded in raisedTo:modulo:).
But don't ever think of using a modified StackInterpreter VM on a big-endian machine, nor on an image saved from a big endian machine! The Big-Endian bootstrap is not ready yet. It will require:
- an additional bit in image format for specifying that LargeIntegers have digits stored in 32-bits-bigEndian format.
- a modified VM initialization phase to swap or unswap the instance of LargeIntegers (I began looking at updateObjectsPostByteSwapFrom:to:, but above bit must be first allocated).
- an image side modification of digitAt: and digitAt:put: to call the primitives that swap bytes to mimic 8-bit digit order.
There is room for further improvment because the normalization step copies the result if ever its length is not a multiple of 4. Instead, we would just have to let allocated size always be multiple of 4, and connect digitLength to a specialized primitive for handling the leading zeroes.
The micro benchmark is now a bit modified:
#(100 1000 10000) do: [:n |
| p1 p2 |
p1 := 1<<n - 1.
p2 := p1 sqrtFloor*p1+1.
Transcript cr; show: n printString , ' bits'.
Transcript cr; show: '+' , ' -> '; show: [p1+p2] bench.
Transcript cr; show: '-' , ' -> '; show: [p2-p1] bench.
Transcript cr; show: '*' , ' -> '; show: [p1*p2] bench.
Transcript cr; show: '//' , ' -> '; show: [p2//p1] bench.
Transcript cr; show: '\\' , ' -> '; show: [p2\\p1] bench.
Transcript cr; show: 'quo:' , ' -> '; show: [p2 quo: p1] bench.
Transcript cr; show: 'rem:' , ' -> '; show: [p2 rem: p1] bench.
Transcript cr; show: 'quo: 10' , ' -> '; show: [p1 quo: 10] bench.
Transcript cr; show: 'rem: 10' , ' -> '; show: [p1 rem: 10] bench.
Transcript cr; show: '<<53' , ' -> '; show: [p1 << 53] bench.
Transcript cr; show: '>>35' , ' -> '; show: [p2 >> 35] bench.
I experienced a problem of repeatability of timings on these micro-benchmarks since my last post. The reference VM was COG 2640 . Now with COG 2672, +, - ,<< and >> apparently have severe lost of performance. It might be some kind of artefact, because 2672 does not experience any noticeable slowdown in macro benchmarks... So we will only consider 3rd column containing improvment ratio.
|Number of operations per second in 8-bit vs 32-bit Large Integer Plugin|
Macro benchmark won't change significantly, unless they are oriented toward LargeInteger crunching...
But for LargeInteger crunching, speed-up is noticeable:
[ArbitraryPrecisionFloatTest suite run] timeToRun.
-> 29250 COG 2640 (eem.238)
-> 29835 COG 2672 (eem.255)
-> 20823 modified nice.271
select: [:e | (e category beginsWith: 'KernelTests-Numbers')
and: [e inheritsFrom: TestCase]])
do: [:e | e suite run]] timeToRun.
-> 3851 COG 2640
-> 3642 COG 2672
-> 1948 modified nice.271
Same kind of ratio with Shootout piDigits...
The speedup might also be very interesting in cryptography.