mpz_jacobi
Niels Möller
nisse at lysator.liu.se
Wed May 5 16:30:04 CEST 2010
nisse at lysator.liu.se (Niels Möller) writes:
> mpz_jacobi: The new code currently #if:ed out, it probably handles
> jacobi of sizes n/1 and 1/n less efficiently than the old code.
Here's an updated version of that file, which I think I'll push in soon.
Changes since the #if:ed out version in the repository:
1. Strips out powers of two from *both* a and b.
2. Handles the cases of a or b fitting in a single limb specially,
with no allocation (the old code did something similar).
3. To reduce number of cases, swap A and B if asize < bsize (depends
on (1)).
4. Uses the "bit1"-representation for the initial processing, and uses
more of the JACOBI macros (which are a too many for my taste,
really. It's particularly annoying to have two macros that do the
same thing, JACOBI_LS0 and JACOBI_0LS).
For the division when asize > bsize, I try to keep allocation down,
depending on which of a and b needs to be shifted. This isn't terribly
pretty.
As usual, some comments on missing GMP functions.
It would help if one could avoid storing the quotient. In principle, I
think it should be easy to compute A mod B using O(bsize) storage, even
for very large a. (That would be a div_r function. Whether it can also
save some computation time over div_qr in general was discussed some
months ago, and I think the conclusion was "maybe, but not very much").
For both jacobi and gcd, there are fairly efficient base case functions
for handling two-limb inputs. We could make more use of them if we had a
good mod_2. mod_2 can be implemented as a loop around udiv_qr_3by2, but
I suspect it might be more efficient to use the same trick as for mod_1,
computing B^3 mod <d_1, d_0> and maybe higher powers once, and use this
to reduce the input using two independent multiplies per limb.
Reards,
/Niels
/* mpz_jacobi, mpz_legendre, mpz_kronecker -- mpz/mpz Jacobi symbols.
Copyright 2000, 2001, 2002, 2005, 2010 Free Software Foundation, Inc.
This file is part of the GNU MP Library.
The GNU MP Library is free software; you can redistribute it and/or modify it
under the terms of the GNU Lesser General Public License as published by the
Free Software Foundation; either version 3 of the License, or (at your
option) any later version.
The GNU MP Library is distributed in the hope that it will be useful, but
WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License
for more details.
You should have received a copy of the GNU Lesser General Public License along
with the GNU MP Library. If not, see http://www.gnu.org/licenses/. */
#include <stdio.h>
#include "gmp.h"
#include "gmp-impl.h"
#include "longlong.h"
/* This code does triple duty as mpz_jacobi, mpz_legendre and
mpz_kronecker. For ABI compatibility, the link symbol is
__gmpz_jacobi, not __gmpz_kronecker, even though the latter would
be more logical.
mpz_jacobi could assume b is odd, but the improvements from that seem
small compared to other operations, and anything significant should be
checked at run-time since we'd like odd b to go fast in mpz_kronecker
too.
mpz_legendre could assume b is an odd prime, but knowing this doesn't
present any obvious benefits. Result 0 wouldn't arise (unless "a" is a
multiple of b), but the checking for that takes little time compared to
other operations.
Enhancements:
mpn_bdiv_qr should be used instead of mpn_tdiv_qr.
Current code uses the binary algorithm for the smallest sizes, then
Lehmer. It could stright-forwardly be made subquadratic by using
hgcd in the same way as mpn_gcd. */
/* (a/2) = -1 iff a = 3 or a = -3 (mod 8), and (2/b) = -1 iff b = 3 or
b = - 3 (mod 8). Note that when this is used, we have already
excluded the case that a and both have a common factor of two. */
#define STRIP_TWOS(bit1, twos, other_low, p, n, low) \
do { \
JACOBI_STRIP_LOW_ZEROS ((bit1), (other_low), (p), (n), (low)); \
count_trailing_zeros ((twos), (low)); \
(bit1) ^= JACOBI_TWOS_U_BIT1((twos), (other_low)); \
(low) >>= (twos); \
if ((n) > 1 && (twos) > 0) \
{ \
mp_limb_t __second = (p)[1]; \
(low) |= __second << (GMP_NUMB_BITS - (twos)); \
if ((n) == 2 && (__second >> (twos) == 0)) \
n = 1; \
} \
} while(0)
int
mpz_jacobi (mpz_srcptr a, mpz_srcptr b)
{
mp_srcptr asrcp, bsrcp;
mp_size_t asize, bsize, itch;
mp_limb_t alow, blow;
mp_ptr ap, bp, scratch;
unsigned atwos, btwos;
int result_bit1;
int res;
TMP_DECL;
asize = SIZ(a);
asrcp = PTR(a);
alow = asrcp[0];
bsize = SIZ(b);
bsrcp = PTR(b);
blow = bsrcp[0];
/* The MPN jacobi functions requies positive a and b, and b odd. So
we must to handle the cases of a or b zero, then signs, and then
the case of even b.
In addition, to reduce the number of cases, we arrange so that a
is odd, and asize >= bsize. */
if ( (((alow | blow) & 1) == 0))
/* Common factor of 2 ==> (a/b) = 0 */
return 0;
if (bsize == 0)
/* (a/0) = [ a = 1 or a = -1 ] */
return JACOBI_LS0 (alow, asize);
if (asize == 0)
/* (0/b) = [ b = 1 or b = - 1 ] */
return JACOBI_0LS (blow, bsize);
if (bsize < 0)
{
/* (a/-1) = -1 if a < 0, +1 if a >= 0 */
result_bit1 = (asize < 0) << 1;
bsize = -bsize;
}
else
result_bit1 = 0;
STRIP_TWOS (result_bit1, btwos, alow, bsrcp, bsize, blow);
if (asize < 0)
{
/* (-1/b) = -1 iff b = 3 (mod 4) */
result_bit1 ^= JACOBI_N1B_BIT1(blow);
asize = -asize;
}
STRIP_TWOS(result_bit1, atwos, blow, asrcp, asize, alow);
/* Both numbers odd, so arrange so that asize >= bsize */
if (asize < bsize)
{
unsigned t;
MPN_SRCPTR_SWAP (asrcp, asize, bsrcp, bsize);
MP_LIMB_T_SWAP (alow, blow);
t = atwos;
atwos = btwos;
btwos = t;
result_bit1 ^= JACOBI_RECIP_UU_BIT1 (alow, blow);
}
if (bsize == 1)
{
if (blow == 1)
return JACOBI_BIT1_TO_PN (result_bit1);
if (asize > 1)
{
/* We work with {asrcp, asize} mod b, hence throw away the
old alow and undo the shift right by atwos. */
result_bit1 ^= JACOBI_TWOS_U_BIT1 (atwos, blow);
JACOBI_MOD_OR_MODEXACT_1_ODD (result_bit1, alow, asrcp, asize, blow);
}
return mpn_jacobi_base (alow, blow, result_bit1);
}
TMP_MARK;
itch = 3*bsize;
if (asize > bsize)
{
if (btwos > 0)
{
if (asize >= 2 * bsize)
itch = asize + bsize + 1;
}
else if (atwos > 0)
{
if (asize >= 2*bsize)
itch = 2*asize - bsize + 1;
}
else
{
if (asize >= 3*bsize)
itch = asize + 1;
}
}
ap = TMP_ALLOC_LIMBS (itch);
bp = ap + bsize;
scratch = bp + bsize;
if (asize > bsize)
{
/* Do an initial divide. */
if (btwos > 0)
{
/* Result size: 2*bsize, extra: asize - bsize + 1 for
quotient, total: asize + bsize + 1 */
ASSERT (atwos == 0);
ASSERT_NOCARRY (mpn_rshift (bp, bsrcp, bsize, btwos));
bsize -= bp[bsize-1] == 0;
mpn_tdiv_qr (scratch, ap, 0, asrcp, asize, bp, bsize);
}
else
{
if (atwos > 0)
{
/* Result size: bsize, extra: (asize - bsize) + (asize -
bsize + 1) for shifted value, and quotient, total: 2
asize - bsize + 1 */
ASSERT_NOCARRY (mpn_rshift (ap, asrcp, asize, atwos));
mpn_tdiv_qr (ap + asize, ap, 0, ap, asize, bsrcp, bsize);
}
else
/* Result size: bsize, extra: asize - bsize + 1 for
quotient, total asize + 1. */
mpn_tdiv_qr (bp, ap, 0, asrcp, asize, bsrcp, bsize);
MPN_COPY (bp, bsrcp, bsize);
}
alow = ap[0];
}
else
{
/* Result size: 2 * bsize, extra: 0. */
if (atwos > 0)
ASSERT_NOCARRY (mpn_rshift (ap, asrcp, asize, atwos));
else
MPN_COPY (ap, asrcp, asize);
if (btwos > 0)
ASSERT_NOCARRY (mpn_rshift (bp, bsrcp, bsize, btwos));
else
MPN_COPY (bp, bsrcp, bsize);
bsize -= (ap[bsize-1] | bp[bsize-1]) == 0;
}
/* Scratch need: bsize */
res = mpn_jacobi_lehmer (ap, bp, bsize,
mpn_jacobi_init (alow, blow, (result_bit1>>1) & 1),
scratch);
TMP_FREE;
return res;
}
--
Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.
More information about the gmp-devel
mailing list