[PATCH] Fibonacci/Lucas tweaks

Tue Feb 17 01:54:06 CET 2026

Sent with Proton Mail secure email.

On Monday, February 16th, 2026 at 23:49, marco.bodrato at tutanota.com <marco.bodrato at tutanota.com> wrote:

> Ciao David,
> 
> 13 feb 2026, 05:36 da sparks05 at proton.me:
> 
> > Here's the results of my poking at the Lucas number evaluation code.
> > 
> > ----8<----
> > Patch 1/3: mpz_lucnum_ui: Use a simpler Lucas formula.
> > 
> > The code previously used
> > L[2k+1] = 5*F[k-1]*(2*F[k]+F[k-1]) - 4*{-1)^k
> > = 5*F[k-1]*F[k+2] - 4*{-1)^k
> > But it's much simpler to ask for the next Fibonacci pair and use
> > L[2k+1] = 5*F[k]*F[k+1] + (-1)^k
> 
> You are right!
> It is known that
> L[k+m] = F[k]*F[m]+ L[m-k](-1)^k
> 
> Using m=k+1, and L[1]=1 as you propose is much simpler and straightforward
> than using F[k-1] and F[k+2]=2F[k]+F[k-1], and L[k+2-(k-1)]=L[3]=4.
> 
> > The diff is larger that expected because this also swaps the
> > order of tests in the loop, so the test for the FIB_TABLE comes first.
> 
> Why did you swap that test?
> 
> The current code:
>  - check if n is smaller than the table limit
>  - then loop and in the loop
>    + checks if n is odd;
>    + otherwise divides by 2, and
>    + checks if the divided n is smaller than the table limit.
> 
> Checking at the beginning should not be worth it, because we just checked before the loop, didn't we? When we start the loop we already know that n does not fit as a table index, we can wait after the division before checking again.

Because I *intended* to get rid of the first test and only use the
one in the loop.

But then I realized that it short-circuits buffer allocation if the
small test succeeds and I couldn't delete it.

Would you like a revised patch that puts it back the way it
should be, or are you happy to do that yourself?

> I wonder if it's wort adding a condition:
>   if (FIB_TABLE_LUCNUM_LIMIT < FIB_TABLE_LIMIT)
>     /* L[n] = F[n+1] + F[n-1] */
>     lp[0] = FIB_TABLE ((int) n + 1) + FIB_TABLE ((int) n - 1);
>   else
>     /* L[n] = F[n] + 2F[n-1] */
>     lp[0] = FIB_TABLE (n) + 2 * FIB_TABLE ((int) n - 1);
> 
> This is not a branch at runtime, because the condition can be checked at compile time.
> But, is it better to read two adjacent values from memory or to avoid a not expensive shift?
> I'm not sure about the answer.

Er, FIB_TABLE_LIMIT - 2 <= FIB_TABLE_LUCNUM_LIMIT <= FIB_TABLE_LIMIT-1.

It should be pretty obvious.
F[n+1] = F[n] + F[n-1] < F[n+1] F[n-1] = L[n] < F[n+1] + F[n] = F[n+2].
If F[n+1] fits into one limb and F[n+2] does not, then L[n-1]
definitely fits, L[n] might, and L[n+1] definitely doesn't.

> > /* check MPN_FIB2_SIZE seems right, compared to actual size and
> > - compared to our float formula */
> > - if (MPN_FIB2_SIZE (n) < MPZ_FIB_SIZE_FLOAT (n))
> > + compared to our float formula. MPN_FIB2_SIZE has a known problem
> > + when n == 1 (it returns 0 rather than 1) which doesn't affect
> > + any real caller, so we accomodate it in the test. */
> 
> Does MPN_FIB2_SIZE really have that problem?
> The 1+ at the beginning should address the issue, doesn't it?

When I tightened it, it definitely does.
MPN_FIB2_SIZE(1) == 0, which isn't enough for the value 1.
For odd n, the approximation is an underestimate, and in
particular you get something like log2(0.7) ~ -0.5.
Rounding down produces -1, so the +1 brings it back up
to 0.
> 
> > ----8<----
> > Patch 3/3: MPN_FIB2_SIZE: Tighten approximation.
> 
> Ok, let's look at it.
> 
> > -#define MPN_FIB2_SIZE(n) \
> > - ((mp_size_t) ((n) / 32 * 23 / GMP_NUMB_BITS) + 4)
> 
> The old code needed: a couple of shifts, a multiplication, an addition.
> 
> > +#define MPN_FIB2_SIZE(n) (1 + \
> > + (mp_size_t)( (n) / 65536 * MPN_FIB2_MULT) + \
> > + (mp_size_t)(((n)|1) % 65536 * MPN_FIB2_MULT + 65536-MPN_FIB2_SUB) / 65536)
> 
> The proposed one: a couple of shifts, an and mask, 3 additions.
> How much does the MPN_FIB2_SUB term change the result?

By 1.16 bits, i.e. one 64-bit limb 1.8% of the time.
I'd have omitted it, but we need an additive constant for the +1
limb anyway, so fine-tuning it doesn't cost more.

The main thing is the time to compute the size estimate is dwarfed
by the computation of the Fibonacci number itself, so it's more the
cache benefits of the 3.4% smaller buffer.  

The place it gets questionable is small (2-3 limb) sized results.
There, the computation isn't so big and the estimate cost matters.
9424

> Should we replace this estimate with a single umul_ppmm (_result, _dummy, n, _const) like we do in macros like DIGITS_IN_BASE_FROM_BITS?
> 
> The constant could be generated by gen-fib.c

No objection, if that's okay.  There's no need for gen-fib; the
constant is just (1+(mp_limb_t)(ldexp(0.6942419136306173017387902668986 ),GMP_NUMB_BITS)/GMP_NUMB_BITS).

Or, if your compiler can't evaluate ldexp() at compile time,
#define GMP_RADIX (1.0 * (CNST_LIB(1) << GMP_NUMB_BITS/2) * (CNST_LIB(1) << (GMP_NUMB_BITS+1)/2)
#define MPN_FIB2_MULT (1+ (mp_limb_t)(0.6942419136306173017387902668986 * GMP_RADIX / GMP_NUMB_BITS))

It's simple compile-time math.