Computing A mod d (for small odd d without division and multiplication)
marco.bodrato at tutanota.com
marco.bodrato at tutanota.com
Sat Mar 14 11:18:11 CET 2026
Ciao,
9 mar 2026, 17:58 da marco.bodrato at tutanota.com:
> Do we need all the value in rems[]? I believe we don't.
>
> Instead of accumulating in acc, we can multiply lo by the inverse of B mod d,
> and add hi to it, then we can keep on accumulating.
> At the end of the cycle, we will have in [hi,lo] a number equivalent to A mod d.
>
We can also avoid to compute the inverse of B mod b, and directly use the residue
of B mod b. We just have to add up the lines by the reverse order.
I tested both approaches and the return value is correct. Is it fast?
I know that using tune/speed-ext.c is not the best way to check the actual speed.
but it's the easiest :-) That's why I attach a file with the two functions that can
replace that file...
Everything depends on the (multiplicative) order of B in the ring Z/bZ.
If the order is 1, then the residue of B mod b is 1, and its inverse is 1
(i.e. b divides B-1) and both codes are quite fast, and of course one could
write an even more specialized code for that case.
An example, if b=17:
$ n=17; build/tune/speed-ext -p10000000 -C -s 1-1700000 -f4 mpn_mod_1.$n mpn_mod_1_a.$n mpn_mod_1_b.$n
overhead 4.89 cycles, precision 10000000 units of 7.71e-10 secs, CPU freq 1297.01 MHz
mpn_mod_1.17 mpn_mod_1_a.17 mpn_mod_1_b.17
1 #64.7109 144.3273 167.0704
4 #20.9006 37.0905 42.7984
16 #9.5543 10.6102 11.8468
64 4.3204 #4.2093 4.3162
256 2.9799 2.1959 #2.0831
1024 2.5893 1.6899 #1.4976
4096 2.5008 1.5674 #1.3904
16384 2.4772 1.5701 #1.3881
65536 2.4775 1.6103 #1.4673
262144 2.5024 1.6479 #1.5027
1048576 2.9615 2.1354 #2.0525
If the base is small, so that the order is small, we can look-up the order from a table,
and it is easy to compute the residue of B mod b, to compute its inverse we need
order multiplications and reductions (asymptotically we'd use gcdext, but here we don't).
Assume order is small but not tiny, e.g. if b=417, the order is 69,
computing the inverse is heavy, so that the code not needing it is faster
in a range of lengths.
But there one more problem, swiping the operand many times is not efficient,
if the operand does not fit in the cache!
$ n=417; build/tune/speed-ext -p10000000 -C -s 1-1700000 -f4 mpn_mod_1.$n mpn_mod_1_a.$n mpn_mod_1_b.$n
overhead 4.89 cycles, precision 10000000 units of 5.28e-10 secs, CPU freq 1895.00 MHz
mpn_mod_1.417 mpn_mod_1_a.417 mpn_mod_1_b.417
1 #64.8495 146.5563 2822.5922
4 #20.9894 42.9108 714.6674
16 #9.5659 17.4111 180.0995
64 #4.3715 12.2414 45.2863
256 #2.9932 4.2965 12.2776
1024 2.5905 #2.5338 4.0825
4096 2.5009 #2.2388 2.4175
16384 2.4773 1.7700 #1.6322
65536 #2.4769 2.7296 3.0141
262144 #2.5216 4.0898 4.2677
1048576 #2.9732 8.0419 9.4231
If the order is tiny, even if it is not exactly 1, we need to run over the operand only a few times.
The effect of cache misses is smaller, but at last it is heavy anyway.
An example, if b=274177, the order is 2:
$ n=274177; build/tune/speed-ext -p10000000 -C -s 1-1700000 -f4 mpn_mod_1.$n mpn_mod_1_a.$n mpn_mod_1_b.$n
overhead 4.91 cycles, precision 10000000 units of 7.71e-10 secs, CPU freq 1296.95 MHz
mpn_mod_1.274177 mpn_mod_1_a.274177 mpn_mod_1_b.274177
1 #64.9447 169.8672 266.9416
4 #27.8788 45.2729 50.4220
16 #9.6185 12.4918 13.5013
64 #4.3203 4.6472 5.0284
256 2.9716 2.4373 #2.3494
1024 2.5923 1.7532 #1.5798
4096 2.5010 1.5898 #1.4298
16384 2.5057 1.7232 #1.5530
65536 2.4807 2.0696 #2.0327
262144 2.5032 2.1419 #2.0908
1048576 #2.9749 3.3043 3.2164
On the other side , if b=274175, the order is 415, and the new codes are both inefficient:
$ n=274175; build/tune/speed-ext -p10000000 -C -s 1-1700000 -f4 mpn_mod_1.$n mpn_mod_1_a.$n mpn_mod_1_b.$n
overhead 4.89 cycles, precision 10000000 units of 5.28e-10 secs, CPU freq 1895.61 MHz
mpn_mod_1.274175 mpn_mod_1_a.274175 mpn_mod_1_b.274175
1 #64.7279 14927.8659 16765.8239
4 #20.9373 3736.0827 4192.1050
16 #9.5638 941.4242 1051.1517
64 #4.3743 243.4748 263.8422
256 #2.9712 68.3601 66.6708
1024 #2.6175 19.7176 17.5073
4096 #2.5028 6.2934 5.2420
16384 #2.4967 3.4427 2.8903
65536 #2.4779 2.7415 2.7957
262144 #2.4965 3.3008 3.6258
1048576 #2.9634 20.7607 20.7154
Ĝis,
mb
-------------- next part --------------
A non-text attachment was scrubbed...
Name: speed-ext.c
Type: text/x-csrc
Size: 12785 bytes
Desc: not available
URL: <https://gmplib.org/list-archives/gmp-devel/attachments/20260314/153d483c/attachment-0001.bin>
More information about the gmp-devel
mailing list