Small-base modexp
Marco Bodrato
bodrato at mail.dm.unipi.it
Fri Oct 4 06:13:58 UTC 2019
Il 2019-10-03 21:10 Marco Bodrato ha scritto:
> Il Gio, 3 Ottobre 2019 8:39 am, Niels Möller ha scritto:
>> "Marco Bodrato" <bodrato at mail.dm.unipi.it> writes:
>>> if (r0 <= (m0 >> 1))
>>> r0 <<= 1;
>>> else if (r0 >= m0)
>>> r0 = (r0 - m0) << 1;
>>> else
>>> r0 = (r0 << 1) - m0;
> Anyway, I was simply surprised by the fact that a code with two
> branches
> seems faster than the same idea reimplemented in a branchless-way, with
> masks. Maybe because the compiler translate the branchy code in
> assembler
> using cmovs, and obtain a more efficient branchless object code than
> the
> code with masks? I will check...
No, there are branches in the code above, once compiled.
The code that gets easily comipled in efficient code is that below:
r0 = MIN (r0, r0 - m0);
if (r0 <= (m0 >> 1))
r0 <<= 1;
else
r0 = (r0 << 1) - m0;
On shell, with -O2 is translated into
#DEBUG_VALUE: m0 <- %rcx
#DEBUG_VALUE: r0 <- %rdx
.loc 2 314 13 is_stmt 1 # powm.c:314:13
movq %rdx, %rax
subq %rcx, %rax
cmpq %rax, %rdx
cmovbq %rdx, %rax
#DEBUG_VALUE: r0 <- %rax
.loc 2 315 15 # powm.c:315:15
# %r9 contains m0 >> 1, computed once, out of the loop
cmpq %r9, %rax
leaq (%rax,%rax), %rdx
.loc 2 315 12 is_stmt 0 # powm.c:315:12
movl $0, %eax
cmovaq %rcx, %rax
subq %rax, %rdx
It seems interesting, no branches and no masks.
Ĝis,
m
--
http://bodrato.it/papers/
More information about the gmp-devel
mailing list