bug in longlong.h for aarch64 sub_ddmmss
Niels Möller
nisse at lysator.liu.se
Wed Jun 17 12:05:09 UTC 2020
tg at gmplib.org (Torbjörn Granlund) writes:
> Using the ARM "subs rd,rm,imm12" instruction, we compute
>
> {cout, rd} = rm + ~imm + 1
>
> while the "adds rd,rm,imm12" instruction, we compute
>
> {cout, rd} = rm + imm
>
> . which is quite different. The former will for example always set
> cout when rm = imm = 0 as in Vincent's example. The latter will never
> set carry when imm = 0 or rm = 0;
Right, it's a bit subtle. The case we're trying to handle specially is
{ah, al} - {bh, bl}
with bl = B - x, x small.
I would expect that the existing code could be fixed if we exclude bl =
0 (since we'd then get get x = B, which qualifies as "x small" only
modulo B, but not as a plain mathematical integer).
if (__builtin_constant_p (bl) && bl != 0 && -(UDItype)(bl) < 0x1000)
Then, if bl = B - x, we get (modulo B^2):
{ah, al} - {bh, bl} = (ah - bh) B + al + x - B
= (ah + ~bh + 1) B + al + x - B
= (ah + ~bh) B + al + x
which should be computed correctly with the sequence adds, sbc, using
carry out from al + x.
Do you agree?
The excluded case,
sub_ddmmss(ah, al, bh, /*compile time constant*/0)
could clearly be optimized, in a different way, but I'd guess it's rare
enough in real code to not be worth the effort?
Regards,
/Niels
--
Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677.
Internet email is subject to wholesale government surveillance.
More information about the gmp-bugs
mailing list