bug in longlong.h for aarch64 sub_ddmmss

Wed Jun 17 12:05:09 UTC 2020

tg at gmplib.org (Torbjörn Granlund) writes:

> Using the ARM "subs rd,rm,imm12" instruction, we compute
>
>     {cout, rd} = rm + ~imm + 1
>
> while the "adds rd,rm,imm12" instruction, we compute
>
>     {cout, rd} = rm + imm
>
> .  which is quite different.  The former will for example always set
> cout when rm = imm = 0 as in Vincent's example.  The latter will never
> set carry when imm = 0 or rm = 0;

Right, it's a bit subtle. The case we're trying to handle specially is

  {ah, al} - {bh, bl}

with bl = B - x, x small.

I would expect that the existing code could be fixed if we exclude bl =
0 (since we'd then get get x = B, which qualifies as "x small" only
modulo B, but not as a plain mathematical integer).

  if (__builtin_constant_p (bl) && bl != 0 && -(UDItype)(bl) < 0x1000)

Then, if bl = B - x, we get (modulo B^2):

  {ah, al} - {bh, bl} = (ah - bh) B      + al + x - B
                      = (ah + ~bh + 1) B + al + x - B
                      = (ah + ~bh) B     + al + x 

which should be computed correctly with the sequence adds, sbc, using
carry out from al + x.

Do you agree?

The excluded case,

  sub_ddmmss(ah, al, bh, /*compile time constant*/0) 

could clearly be optimized, in a different way, but I'd guess it's rare
enough in real code to not be worth the effort?

Regards,
/Niels

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677.
Internet email is subject to wholesale government surveillance.