GMP «Arithmetic without limitations» GMP developers' corner



Documentation: Online | PDF
Development sources: GMP repository - Repo usage tips | Daily snapshots
Testing: Current status | Current coverage
Tuneup: Current threshold tables
Asm: Assembly loops
Computers: GMP developer's systems

GMP release update

GMP 5.0.0 was received well, and no major bugs have been reported. Initially, there were a number of issues with Mac OS X, memory appetite issues with multiplication, and problems with fat builds. We solved most of these problems in 5.0.1, and several other such minor issues in 5.0.2


GMP 5.1 plans

List of planned GMP 5.1 improvements


New binomial code

[Marco has now merged the new binomial code, see the repo.]


New division interface

We are working in a modernised set of low-level division loops, splitting several of the current functions that have multiple loops into separate functions. Here we will try to describe the progress.

The function naming scheme is not completely coherent, and is far from finalised.

symbol meaning
1n,2n function expects normalised divisor, 1 and 2 limbs respectively
1u,2u function expects unnormalised divisor, 1 and 2 limbs respectively
; delimiter between output and input operands
qp, np, dp pointers to quotient, numerator, and divisior
qh high quotient limb
d divisor limb
r remainder limb
di some sort of inverse of divisor

Euclidian division/mod by a 1-limb divisor
function intfunction frac
r = mpn_div_qr_1n_pi1(qp,qh;np,nn,d,di) r = mpn_divf_qr_1_pi1(qp;fn,r,d,di)
r = mpn_div_qr_1u_pi1(qp,qh;np,nn,d,di)  
r = mpn_div_qr_1n_pi2(qp,qh;np,nn,d,di) r = mpn_divf_qr_1_pi2(qp;fn,r,d,di)
r = mpn_div_qr_1u_pi2(qp,qh;np,nn,d,di)  
r = mpn_div_r_1n_pi1(;np,nn,d,di)
r = mpn_div_r_1u_pi1(;np,nn,d,di)
r = mpn_div_r_1n_pi2(;np,nn,d,di)
r = mpn_div_r_1u_pi2(;np,nn,d,di)
Euclidian division/mod by a 2-limb divisor
function intfunction frac
qh = mpn_div_qr_2n_pi1(qp,rp[2];np,nn,dp[2],di) mpn_divf_qr_2n_pi1(qp;fn,np[2],dp[2],di)
qh = mpn_div_qr_2u_pi1(qp,rp[2];np,nn,dp[2],di)  
qh = mpn_div_qr_2n_pi2(qp,rp[2];np,nn,dp[2],di) mpn_divf_qr_2n_pi2(qp;fn,np[2],dp[2],di)
qh = mpn_div_qr_2u_pi2(qp,rp[2];np,nn,dp[2],di)  
mpn_div_r_2n_pi1(rp[2];np,nn,dp[2],di)
mpn_div_r_2u_pi1(rp[2];np,nn,dp[2],di)
mpn_div_r_2n_pi2(rp[2];np,nn,dp[2],di)
mpn_div_r_2u_pi2(rp[2];np,nn,dp[2],di)
Euclidian division/mod by an m-limb divisor
function
mpn_div_qr
mpn_div_qr
mpn_div_q
mpn_div_q
mpn_div_r
mpn_div_r
Euclidian mod by a 1-limb divisor
function
mpn_mod_1_1p
mpn_mod_1s_2p
mpn_mod_1s_3p
mpn_mod_1s_4p
Hensel division/mod by a 1-limb divisor
function
mpn_bdiv_qr_1_pi1
mpn_bdiv_qr_1_pi2
mpn_bdiv_r_1_pi1
mpn_bdiv_r_1_pi2
Hensel division/mod by a 2-limb divisor
function
mpn_bdiv_qr_2_pi1
mpn_bdiv_qr_2_pi2
mpn_bdiv_r_2_pi1
mpn_bdiv_r_2_pi2
Hensel division/mod by an m-limb divisor
function
mpn_bdiv_qr
mpn_bdiv_qr
mpn_bdiv_q
mpn_bdiv_q
mpn_bdiv_r
mpn_bdiv_r
Hensel mod/redc
function
mpn_redc_1
mpn_redc_2
mpn_redc_n


New Toom multiplication code for unbalanced operands

We have been adding more Toom functions during the last years, mainly for better handling of unbalanced operands.

We believe that even more Toom functions might be needed for optimal performance, probably going further than 8 evaluation points (used by toom54, toom63, toom72). This will largely depend on ongoing work on FFT multiplication; the faster that will become, the less room for higher-order Toom. The large number of functions is a bit disturbing, of course, since complexity is always bad. (But performance is GMP's main objective.)

These pictures show the best current multiplication primitive for operand sizes 1 ≤ bn ≤ an ≤ N limbs, where N is indicated in each diagram. The x-axis represents an, which is the size of the first operand; the y-axis represents bn, the size of the second operand. The first picture uses linear scale, the second uses log/log scale. For larger scale, click the pictures.

Comments:

Multiplication diagrams for AMD Phenom, X4, 2.4 GHz, 512 Kibyte L2, 2048 Kibyte L3. To the left linear scale, to the right log/log scale.


Multiplication diagrams for Intel Nehalem, 2.67 GHz, 256 Kibyte L2, 8192 Kibyte L3. To the left linear scale, to the right log/log scale. An older Core 2 system, with its completely different cache and memory subsystems, give almost identical diagrams, down to the smallest artifact.


Multiplication diagrams for Intel Pentium 4 (64-bit), 3.2 GHz, 2048 Kibyte L2. To the left linear scale, to the right log/log scale.


The ideas behind much of the Toom multiply improvements are due to Marco Bodrato and Alberto Zanoni. Please see Marco Bodrato's site for more information about Toom multiplication. The algorithmic foundation was made by Anatolii Alexeevitch Karatsuba in 1962, and Andrei Toom in 1963. The GMP implementations were made by Marco Bodrato, Niels Möller, and Torbjörn Granlund.


This tool is useful for for optimising the evaluation sequences of Toom functions. It exhaustively searches for possible sequences, given a goal function. It is similar to the GNU superoptimizer, except that this program optimises over a a simple algebraic structure and handles m-ary -> n-ary projections (the GNU superoptimizer optimises over an assembly language subset, and handles m-ary -> 1-ary projections). This is very immature code, but it seems to work.

Download files Last changed What changed?
superopt-va.c 2007-01-16/2 speedup+UI improvement
goal-va.def 2007-01-16 bug fixes



Please send comments about this page to gmp-discuss at gmplib.org
Copyright 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012 Free Software Foundation
Verbatim copying and distribution of this entire article is permitted in any medium, provided this notice is preserved.