-
Notifications
You must be signed in to change notification settings - Fork 227
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prime Number Functions #400
base: develop
Are you sure you want to change the base?
Conversation
FYI there's a Miller Rabin test in Multiprecision already. |
@mborland : OMG thank you; we've needed a prime sieve for so long. |
@mborland : I would definitely change the name from |
@mborland : IIRC, the Joy of Factoring also creates bitsets which set prime at bit i and has it zeroed at composites. Is this a useful API? |
@NAThompson That sounds like wheel factorization. That method is added to the sieve of Eratosthenes, but would not be applicable here. |
Would it be more ergonomic to have std::vector<int64_t> primes(1000); // gimme 1000 primes
prime_sieve(primes.begin(), primes.end()); (I need to stay in my lane here; @jeremy-murphy is much better at ergonomics.) |
@mborland : I have begun a performance comparison with Kim Walish's prime sieve. Here is the code: #include <vector>
#include <boost/math/special_functions/prime_sieve.hpp>
#include <benchmark/benchmark.h>
#include <primesieve.hpp>
template <class Z>
void prime_sieve(benchmark::State& state)
{
Z upper = static_cast<Z>(state.range(0));
for(auto _ : state)
{
std::vector<Z> primes;
benchmark::DoNotOptimize(boost::math::prime_sieve(static_cast<Z>(2), upper, std::back_inserter(primes)));
}
state.SetComplexityN(state.range(0));
}
BENCHMARK_TEMPLATE(prime_sieve, int64_t)->RangeMultiplier(2)->Range(1 << 1, 1 << 22)->Complexity();
template <class Z>
void kimwalish_primes(benchmark::State& state)
{
Z upper = static_cast<Z>(state.range(0));
for (auto _ : state)
{
std::vector<Z> primes;
primesieve::generate_primes(upper, &primes);
benchmark::DoNotOptimize(primes.back());
}
state.SetComplexityN(state.range(0));
}
BENCHMARK_TEMPLATE(kimwalish_primes, int64_t)->RangeMultiplier(2)->Range(1 << 1, 1 << 22)->Complexity();
BENCHMARK_MAIN(); and the results:
So it looks like there is 1-2 orders of magnitude of performance improvement left in the boost implementation; presumably we need to find it. . . |
@NAThompson It would be fairly easy to create light wrappers to the current I will have to do some digging to see where more performance can be squeezed out. |
Nah, if it's not obviously a better way to do it, I'm not interested in it. |
@mborland : Also, make sure to tag your commit messages with |
I just went through The Joy of Factoring to refresh my memory on how these sieves work. I implemented algorithm 8.2 of that book, which the author claims runs in O(J*log(log(J)) time. He also references P.A. Pritchard who has given an algorithm which runs in O(J/log(log(J)) time; see A sublinear additive sieve for finding prime numbers, Comm. ACM 24, 1981. In any case, this is how my naive implementation of algorithm 8.2 looks:
And the results:
Algorithm 8.2 is actually competitive with Kim Walish for smaller N but starts to lag for larger N. |
@NAThompson I am seeing a similar result using a segmented sieve. Currently faster than kimwalish, and |
Just ran this benchmark under So Kim Walish's prime generator does indeed compute logarithms and fill up a table of primes; quite a bit of time in However if you look at the Joy of Factoring algorithm, it spends way more time in the So there are some opportunities there; you'll see that I'm only checking odd numbers after 2, there's probably some way to extend that to making the fill faster. |
@NAThompson I have found some performance improvements using C-style arrays instead of |
349ca6f
to
a2aedbb
Compare
I think that for the other optimisations to work, and for the wheel to be in synch with the underlying bytes, we have to use a compacted data structure - one that contains only the 8 values in the wheel that form the prime candidates on each rotation (ie the spokes of the wheel). Soronson in this paper: https://www.researchgate.net/publication/2274357_Trading_Time_for_Space_in_Prime_Number_Sieves has some very lucid descriptions of the various sieves and how to index a compacted sieve (see "reducing space with a wheel" in the above ref). Note that this is normally much slower than a straight SoE as the inner loop become much more complex. However, once the small and medium prime optimisations I listed above are applied they are all effectively the same complexity, and now just differ in how far they skip forward with each loop / wheel turn. Another way to look at this: my odd's only sieve is actually a wheel sieve with just one prime (2) in the wheel. So you get 16 numbers and 8 candidate primes per byte. A 2,3,5 wheel has 8 candidates per wheel turn (30), so we would normally start with all bits 1, and each byte represents the 8 possible primes in each block of 30 values of a full wheel turn. Now lets suppose you take that data structure and sieve out 7, 11 and 13 as well. Now you have a repeating bit pattern 71113 bytes long you can use to initialize the sieve, you can then start sieving at 17 rather than at 7 as you normally would. I hope that makes sense, but maybe I misunderstood you? |
Each turn of the wheel being represented in one 8 bit byte since there are only 8 possible primes using the 2,3,5 wheel was what I was getting at. Let me see if I can get this working with the 2,3,5 wheel and then expanding to a larger wheel should be easy enough. In |
types until the max of size_t has been reached [CI SKIP]
@jzmaddock I am making some good headway on your list of additions and optimizations. In commit eafbefc all multiples of the wheel basis in the bitset have been removed so each turn of the wheel now takes only 8 bits to represent. The implementation passes all unit tests, but the sieve is many times slower than it was, especially when using |
Apologies for being late back here - I'll take a look as soon as I can, but I would expect the code to be significantly slower until the sieving optimisations are in place. Sorenson notes this in his paper, the reason being that indexing into the sieve is now much more complex until the optimisations are applied after which all the sieves (compressed or not, wheel or not) are equal. Hope this makes sense! |
Last known point where all composite tests pass [CI SKIP]
An initial addition of two prime number functions:
prime_sieve
is a linear prime sieve algorithm. Currently benchmarks O(n).prime_range
returns all the prime numbers in the range [lower_bound, upper_bound]. For the first 1000 primes it can use the already built in lookup tables; outside of that it will callprime_sieve
. This is the function intended for end users to call.Future additions would include a Miller-Rabin primality test.