Quick question about PP and PP_INVERTED defines

Brett Kuntz kuntz at shaw.ca
Sun Jun 19 03:26:19 CEST 2022


Hello Torbjörn and thank you for your feedback.

1. We always look for improvements! Which ones do you have in mind? 

I am still analyzing and performance testing some stuff, so I will have to get back to you on my findings in the future. A small improvement could be having separate functions for different word sizes could be an easy improvement for smaller primes.

"Fast Primality Testing for Integers That Fit into a Machine Word - Michal Foriˇsek and Jakub Janˇcina"

The above 2 algorithms might be faster for mpz_probab_prime_p's section commented "/* Handle small and negative n.  */" and the code below that titled "/* Do more dividing.  We collect small primes, using umul_ppmm, until we" as both functions use isprime. That may lead to a small improvement.

For this line:

ln2 = mpz_sizeinbase (n, 2);	/* FIXME: tune this limit */

For large inputs (32 million bits) this test will loop millions of times, is this beneficial? For smaller primes the trial division will be efficient because many primes can be packed into a 64-bit word and then a single call to MPN_MOD_OR_MODEXACT_1_ODD can reveal some helpful info. But as primes climb into the millions, only 2 or 3 primes can be "collected" into a 64-bit word, and on top of that, the statistical chance any of those large primes dividing the input approaches 0. I am planning on running some tests to see when trial divisions start to matter very little. At that point we might as well just move on to the miller base 2 test... speaking of which

2. Ouch, insane page faults are no good.

They seem to be coming from mpz_powm but I have not looked at this function yet. If it is allocating lots of temporary memory and that memory is almost certainly going to be used than it makes sense to allocate and commit that memory all at once in 1 or 2 system calls (mlock on nix, VirtualAlloc on windows), rather than just allocating the memory and letting the OS page fault every 4 kilobytes (causing an exception and lengthy kernel handler). For very large primes (millions of digits) the page faults are so problematic that the processes usage was less than 50% as most of the time was spent in the kernel handling the faults.

3. Which "others" do you have in mind?

In gmp-impl.h on approx line 3941 "/* Stuff used by mpn/generic/perfsqr.c and mpz/prime_p.c */" there is the following defines:

#if GMP_NUMB_BITS == 2
#define PP 0x3					/* 3 */
#define PP_FIRST_OMITTED 5
#endif
#if GMP_NUMB_BITS == 4
#define PP 0xF					/* 3 x 5 */
#define PP_FIRST_OMITTED 7
#endif
#if GMP_NUMB_BITS == 8
#define PP 0x69					/* 3 x 5 x 7 */
#define PP_FIRST_OMITTED 11
#endif
#if GMP_NUMB_BITS == 16
#define PP 0x3AA7				/* 3 x 5 x 7 x 11 x 13 */
#define PP_FIRST_OMITTED 17
#endif
#if GMP_NUMB_BITS == 32
#define PP 0xC0CFD797L				/* 3 x 5 x 7 x 11 x ... x 29 */
#define PP_INVERTED 0x53E5645CL
#define PP_FIRST_OMITTED 31
#endif
#if GMP_NUMB_BITS == 64
#define PP CNST_LIMB(0xE221F97C30E94E1D)	/* 3 x 5 x 7 x 11 x ... x 53 */
#define PP_INVERTED CNST_LIMB(0x21CFE6CFC938B36B)
#define PP_FIRST_OMITTED 59
#endif
#ifndef PP_FIRST_OMITTED
#define PP_FIRST_OMITTED 3
#endif

There is a define for PP_INVERTED for 32/64 but not 2,4,8,16. Since I was not sure what PP_INVERTED actually was, I was confused as to why it only existed for some but not all of those defines.

4. You claim GMP is slow, not us.

I did not make that claim. I only said for very large primes (100,000 bits to 32 million bits) it is slow, specifically from page faults. It was when I looked into the page fault problem I saw the potential for other improvements. I will run some tests, implement my improvements, and then let you know when I am finished.

-Brett Kuntz



----- Original Message -----
From: "Torbjörn Granlund" <tg at gmplib.org>
To: "Brett Kuntz" <kuntz at shaw.ca>
Cc: gmp-discuss at gmplib.org
Sent: Saturday, June 18, 2022 9:09:23 AM
Subject: Re: Quick question about PP and PP_INVERTED defines

Brett Kuntz <kuntz at shaw.ca> writes: 

Hello, for a quick background on my question: I have been working on 
large (100k++) prime numbers recently and discovered that the 
primality testing algorithms could use some improvements, both to 

We always look for improvements! Which ones do you have in mind? 

reduce cpu operations and also to reduce insane page faults (50k 
faults per second in some cases). I have been analyzing 

Ouch, insane page faults are no good. 

If you cannot fit "large 100k++ prime number" (not sure exactly what 
that means) into the RAM of your computer, that I suggest that you get 
yourself some more RAM. I here guess that you are looking for primes 
with around 100.000 decimal digits, and that really should never need 
more than a few hundred KiB of RAM. 

Unless you've written a buggy GMP application which leaks a lot... 

buffers. It is pretty ugly now.". Since the first miller test is to 
base 2 and that the mod will always be an odd number I think I can 
make a custom function specifically for the miller base 2 test... 

I think a special modexp function for small bases could indeed be useful. 

I get what PP is, and I get why it's needed. But I do not understand 
what the PP_INVERTED define is, how it's calculated, or why? 

2^128/PP with the most significant bit suppressed. 

See https://gmplib.org/~tege/divcnst-pldi94.pdf 

Why is there a PP_INVERTED for 32/64 bits but not the others? 

Which "others" do you have in mind? 

I am trying to over-haul the primality testing specifically for 64-bit 
limbs and VERY large numbers, so any help would be greatly 
appreciated. 

You claim GMP is slow, not us. We actually disagree. But we'd love to 
hear about the results of your efforts at making GMP fast. 

-- 
Torbjörn 
Please encrypt, key id 0xC8601622 



More information about the gmp-discuss mailing list