In my previous post, I created a short, simple, sweet, and très élégant Perl 6 function to find all the primes up to a given maximum.
Unfortunately, Rakudo spent over 11 seconds to find the 168 primes up to 1000.
Now, granted, this algorithm is exceedingly naïve. It divides every candidate integer by every prime that came before it, resulting in some 91,873 trial divisions (and another 91,873 boolean tests). Even so, that boils down to some 280,000 CPU clock cycles per inner loop (on my old 2.33 GHz Intel Core 2 Duo, with one core tied behind its back).
As a comparison, an implementation of the same algorithm in Perl 5 took only 44 milliseconds to find the same 168 primes using the same number of loops, or about 1100 CPU clock cycles per loop. Still not blazing fast, but more like what I would expect from an interpreted language… Of course I have Rakudo running on the JVM, and I’m benchmarking it after the JIT compiler has squeezed every last little bit of performance it can out of the code it’s given. A nativeJava implementation ran even faster— but I’m getting ahead of myself.
On the other hand, the Perl 5 implementation is a bit messier than the Perl 6 version. Here is, to refresh your memory, the Perl 6 function:
sub primesany (Int $max) { my Int @primes; @primes.push($_) unless $_ %% any(@primes) for 2 .. $max; return @primes; }
For what Perl 6 did in a few lines Perl 5 required a dozen. (And my Java rendition used up twenty.) So maybe Perl 6 might be useful at least for scripting jobs that can benefit from its powerful expressiveness… and can afford to run 250 times slower than Perl 5.
Or maybe we can make the Perl 6 code run faster.
Junctions Are Slow
One of the first experiments I tried was to replace the any
operator with my own code. Again, P6’s expressive power came to the rescue:
sub infix:<%%any> ($n, @a > Bool) is looser(&infix:<==>) is assoc<non> { return True if $n %% $_ for @a; return False; } sub primesloop (Int $max) { my Int @primes; @primes.push($_) unless $_ %%any @primes for 2 .. $max; return @primes; }
As you can see here, I’ve replaced the %% any()
phrase above (pronounced “…is divisible by any…”) with a new, custom %%any
infix operator. Yeah, Perl 6 lets you do that.
Unfortunately, all is not skittles and roses, as Rakudo is only mumblemumblepercent implemented, and not even all at that. A few notes about the above code: Firstly, the signature of the above function might want to be (Int $n, Int @a > Bool)
, but unfortunately Rakudo doesn’t always handle array types correctly. (Actually, Rakudo’s builtin %%
operator is dynamically typed; and though for wellbehaved types it always returns a Bool
, it is not declared as such. Probably just as well. Perl 6 has even more powerful dynamic typing features than Perl 5. But that’s another post.)
Secondly, the precedence of our new operator really wants to be is equiv(&infix:<X>)
, that is, the same precedence as other list operators (like the X
operator); but unfortunately Rakudo doesn’t seem to recognize the X
operator as something that has an accessible precedence. So we compromise: we make it looser than the ==
operator (but tighter precedence than the operators under ==
). There are several precedences between ==
and X
, and I just have to be careful when using %%any
, because it doesn’t parse like other list infix operators.
I’m finding that I waste an awful lot of time in these situations. Because when the compiler chokes, I don’t know whether there’s something wrong with my code, or something wrong with the compiler. (Or both. In one case, Rakudo rightly rejected my code, but gave me a useless and incorrect error message.) So not only do I have to debug my code, but I also have to debug the tools, discover how far I can push the compiler before it breaks, and then develop a workaround. That offsets some of the benefit of being able to write in an expressive language, to be able to throw together elegant code quickly and be fairly certain that it will work the first time.
My %%any
operator doesn’t try to do anything fancy to speed up execution, except that it returns a result as soon as it can determine one. This reduces the number of times it has to run through the inner loop. This means primesloop(1000)
only has to do 15,620 trial divisions (an 83% reduction).
Rakudo’s Junction
implementation, I’m sure evaluates $n %% $_
for every member of @a
. Presumably, it could be optimized to be more intelligent, to evaluate operations on the Junction only so far as is necessary to obtain the desired result. But on the current Perl 6 platform, that is likely to be a fruitless optimization, as we’ll see later.
Even so, this simple adjustment produced a significant speedup, primesloop(1000)
found all 168 primes in that range in “only” 2.3 seconds. Still really, really slow. And unfortunately, it’s as fast as Perl 6 gets.
Eliminating More Needless Loops
The next logical optimization—and I’m sure the mathematicians reading this have been yelling at me through their computer screens, waiting for me to mention this—is to reduce the number of primes we need to examine before we discover a new prime.
Let’s take an example. Let’s say we’ve discovered the following list of primes: 2, 3, 5, 7
. Let’s say we’re now examining 11
to see whether it is a prime. (It is, but let’s not jump ahead of ourselves.) In the above code, we try 11 %% 2
(which is False
), and we try 11 %% 3
(also False
). Then for good measure we try 11 %% 5
and 11 %% 7
.
The thing is, if neither 2 nor 3 are factors of 11, then no other prime will be, either. How do we know? Because we can compute the next number that has at least two prime factors, none of which are less than 5: it’s 5 * 5
, which is 25. (That’s the smallest number that has at least two prime factors which are all at least as great as 5.) Or to turn it around, if we’re checking whether 11 is a prime number, we only have to check up to sqrt 11
.
So, the easy way in Perl 6:
sub infix:<upto> (@a, $max) is equiv(&infix:<%%any>) is assoc<left> { return (for @a { if $_ <= $max { $_ } else { last } }); } sub primesloopuptosqrt (Int $max) { my Int @primes; @primes.push($_) unless $_ %%any (@primes upto sqrt $_) for 2 .. $max; return @primes; }
There are a couple of new things happening here. The first is in the nifty new upto
operator, which is a list comprehension. (Yes, you can do list comprehensions in Perl 6.) The expression we’re returning is a for
loop, which evaluates to a list of all the values produced by all the iterations of the loop.
Actually, we’ve been using list comprehensions from the first. Remember this line?
@primes.push($_) unless $_ %% any(@primes) for 2 .. $max;
It takes a list, applies some condition to the elements of the list, and produces a list of results based on those elements. It just so happens that in this case, we’re only interested in the sideeffect of the result expression: @primes.push($_)
.
(The push
method returns the new array, so the value of this list comprehension is all the subarrays generated by each push
, all concatenated together. Fortunately, that result array never gets constructed, because we’re evaluating it in sink context. At least if I understand the theory correctly.)
Now we can construct a sublist of primes “upto” the squareroot of the number we’re examining, and only do trial divisions for those primes. This means primesloopuptosqrt(1000)
will execute its inner loop only 2,801 times. That’s an 82% reduction in the number of iterations, at the price of doing another comparison each iteration.
Unfortunately, this makes the code slower (not faster). It runs in about 4.3 seconds, which is almost twice as slow as the unoptimized code.
Arrays Are Slow?!
Could it be that list comprehension that’s causing the slowdown?
For now, very briefly (because this post is running long), I’ll leave you with one more series of experiments. I tried manually inlining the inner loops. To hell with elegant code. Let’s squeeze out every last bit of performance we can by eliminating any interim array variables and function calls.
There’s a lot of cruft in this code, but I’ve highlighted the significant lines, so you can easily see how the three functions differ.
sub primesinlineloop (Int $max) { my Int @primes; loop (my $n = 2; $n <= $max; $n++) { # Use a loopexit flag because labels are not implemented yet in Rakudo my Bool $isprime = True; # assume until proven otherwise for @primes > Int $prime { ($isprime = False, last) if $n %% $prime; } @primes.push($n) if $isprime; } return @primes; } sub primesinlineloopuptosqrt (Int $max) { my Int @primes; loop (my $n = 2; $n <= $max; $n++) { my Num $sqrtn = sqrt $n; # Use a loopexit flag because labels are not implemented yet in Rakudo my Bool $isprime = True; # assume until proven otherwise for @primes > Int $prime { ($isprime = False, last) if $n %% $prime; last if $prime >= $sqrtn; # OK to bail if ==; we checked %% above } @primes.push($n) if $isprime; } return @primes; } sub primesinlineloopuptointsqrt (Int $max) { my Int @primes; loop (my $n = 2; $n <= $max; $n++) { my Int $sqrtn = truncate sqrt $n; # Use a loopexit flag because labels are not implemented yet in Rakudo my Bool $isprime = True; # assume until proven otherwise for @primes > Int $prime { ($isprime = False, last) if $n %% $prime; last if $prime >= $sqrtn; # OK to bail if ==; we checked %% above } @primes.push($n) if $isprime; } return @primes; }
The top function, primesinlineloop()
, in its inner loop, performs a trial division for each prime so far discovered, until it finds one that is a factor. That first highlighted line really wants to read next CANDIDATE if $n %% $prime
(where CANDIDATE
is a label on the outer loop), and get rid of the $isprime
variable (which is not really a loopexit variable, though it does change state when the inner loop prematurely exits). Unfortunately, I tried next CANDIDATE
, and Rakudo spewed nonsense all over my terminal window, because loop labels are NYI.
The middle function, primesinlineloopuptosqrt
, does the same, except that it also bails out of the inner loop, with $isprime = True
, as soon as we check up to the squareroot of the candidate prime. At that point, we’ve checked for enough factors to know conclusively that our current candidate is indeed a prime. So we continue on to push it onto @primes
.
The bottom function, primesinlineloopuptointsqrt
, extends the idea further. In the outer loop, it truncates the squareroot to force integer comparisons in the inner loop’s conditional.
The results:

primesinlineloop(1000)
ran in 2.425 seconds (σ = 0.213 seconds). 
primesinlineloopuptosqrt(1000)
ran in 2.311 seconds (σ = 0.131 seconds). 
primesinlineloopuptointsqrt(1000)
ran in 2.274 seconds (σ = 0.161).
(I listed the sigma, that is, the measured standard deviation, of these benchmarks, because even though each function seems to be a little faster that the function before it, the difference between runs of the same function is much greater than the difference between functions.)
There are two interesting conclusions here. Firstly, even though the uptosqrt
version performs 82% fewer iterations than the plain loop
, that extra if $prime >= $sqrtn
is enough to undo this optimization. It still isn’t significantly faster, even though it does far fewer iterations of the inner loop.
Secondly, it appears that the list constructed and passed around by the upto
function above indeed was slowing down the code. I don’t know whether it was the list comprehension. (Shouldn’t have been.) Or whether it was just the process of stuffing values into a whole second set of storage locations… which in a normal universe wouldn’t be that big of a deal. In a normal universe, it might be worth a slight slowdown to get more maintainable code.
But in the Perl 6 universe? We’re dealing with seconds of time here. Tens of thousands of CPU cycles wasted just by adding or removing one seemingly insignificant line.
Recently, Perl 6 has undergone great improvements in its stability and performance. But clearly it still has a long way to go.
TimK
By Matt Oates December 7, 2013  10:00 am
Are your time measurements removing the startup time for the JVM version of Rakudo which is considerable?
By Tim King December 9, 2013  3:12 pm
Hi, Matt. In the test in this post, I’m timing just the execution of the function under test. I am making no attempt to subtract the functioncall time; however, the VM starting up, the parsing and compilation, and any other app setup, that all happens outside of the period being timed. And when run on the JVM, I’m also allowing a period for warmup (JIT optimization), which is also not timed. TimK
By Mouq April 13, 2014  2:18 pm
FWIW, the latest Rakudo on MoarVM gives these results on my Mac:
$ time perl6 e’sub primesany (Int $max) {
my Int @primes;
@primes.push($_) unless $_ %% any(@primes) for 2 .. $max;
return @primes;
}(1000)’
perl6 3.25s user 0.10s system 99% cpu 3.355 total
$ time perl6 e’sub infix: ($n, @a –> Bool) is looser(&infix:) is assoc {
return True if $n %% $_ for @a;
return False;
}
sub primesloop (Int $max) {
my Int @primes;
@primes.push($_) unless $_ %%any @primes for 2 .. $max;
return @primes;
}(1000)’
perl6 1.15s user 0.06s system 99% cpu 1.223 total
$ time perl6 e’sub primesinlineloopuptointsqrt (Int $max) {
my Int @primes;
loop (my $n = 2; $n Int $prime {
($isprime = False, last) if $n %% $prime;
last if $prime >= $sqrtn; # OK to bail if ==; we checked %% above
}
@primes.push($n) if $isprime;
}
return @primes;
}(1000)’
perl6 0.69s user 0.05s system 99% cpu 0.746 total
By Mouq April 13, 2014  2:19 pm
Sorry for the horrendous formatting 🙁
By Matt Oates August 30, 2016  6:41 am
For updated timings and benchmarks check out https://gist.github.com/MattOates/c2e19950f46d1a1c241a