Small-base modexp

Fri Oct 4 06:13:58 UTC 2019

Il 2019-10-03 21:10 Marco Bodrato ha scritto:
> Il Gio, 3 Ottobre 2019 8:39 am, Niels Möller ha scritto:
>> "Marco Bodrato" <bodrato at mail.dm.unipi.it> writes:

>>>               if (r0 <= (m0 >> 1))
>>>                 r0 <<= 1;
>>>               else if (r0 >= m0)
>>>                 r0 = (r0 - m0) << 1;
>>>               else
>>>                 r0 = (r0 << 1) - m0;

> Anyway, I was simply surprised by the fact that a code with two 
> branches
> seems faster than the same idea reimplemented in a branchless-way, with
> masks. Maybe because the compiler translate the branchy code in 
> assembler
> using cmovs, and obtain a more efficient branchless object code than 
> the
> code with masks? I will check...

No, there are branches in the code above, once compiled.

The code that gets easily comipled in efficient code is that below:

   r0 = MIN (r0, r0 - m0);
   if (r0 <= (m0 >> 1))
     r0 <<= 1;
   else
     r0 = (r0 << 1) - m0;

On shell, with -O2 is translated into

         #DEBUG_VALUE: m0 <- %rcx
         #DEBUG_VALUE: r0 <- %rdx
         .loc    2 314 13 is_stmt 1      # powm.c:314:13
         movq    %rdx, %rax
         subq    %rcx, %rax
         cmpq    %rax, %rdx
         cmovbq  %rdx, %rax
         #DEBUG_VALUE: r0 <- %rax
         .loc    2 315 15                # powm.c:315:15
# %r9 contains m0 >> 1, computed once, out of the loop
         cmpq    %r9, %rax
         leaq    (%rax,%rax), %rdx
         .loc    2 315 12 is_stmt 0      # powm.c:315:12
         movl    $0, %eax
         cmovaq  %rcx, %rax
         subq    %rax, %rdx

It seems interesting, no branches and no masks.

Ĝis,
m

-- 
http://bodrato.it/papers/