Uninitialized memory bug found in /mpn/generic/mod_1_1.c

Thu Aug 31 02:21:45 CEST 2023

Hello, today I found a bug in the latest GMP (6.3) using uninitialized memory in /mpn/generic/mod_1_1.c while trying to understand some functions like mpn_mod_1_1p_cps.

Take a look at function mpn_mod_1() in /mpn/generic/ mod_1.c on lines 248 - 250

mp_limb_t pre[4];
mpn_mod_1_1p_cps (pre, b);
return mpn_mod_1_1p (ap, n, b, pre);

mp_limb_t pre[4] is not initialized and the mpn_mod_1_1p_cps() function never writes to pre[2]. So if we change that to mp_limb_t pre[4] = { -1, -1, -1, -1 }; as a test we'll quickly see that inside mpn_mod_1_1p_cps() the value cps[2] (pre[2]) is never written to and if we print out pre[4] after running it we'll get output like: 0x21CFE6CFC938B36BU, 0x0000000000000000U, 0xFFFFFFFFFFFFFFFFU, 0x9581CA13918612E1U (pre[2] is -1)

/mpn/generic/mod_1_1.c lines 260-266:

  if (LIKELY (cnt != 0))
    {
      mp_limb_t B1modb = -b;
      B1modb *= ((bi >> (GMP_LIMB_BITS-cnt)) | (CNST_LIMB(1) << cnt));
      ASSERT (B1modb <= b);		/* NB: not fully reduced mod b */
      cps[2] = B1modb >> cnt;
    }

cnt will ALWAYS equal 0 since there will be NO leading 0's when this is called since /mpn/generic/mod_1_1.c checks for the high bit being SET before calling.. which means this function only ever gets called with NO leading zeros!!!

ALSO there are unexplained differences between MOD_1_1P_METHOD 1 and MOD_1_1P_METHOD 2 when it comes to the function mpn_mod_1_1p_cps(). In one of them pre[4] is always set and in the other (method 2) pre[2] is skipped! Why are these two functions different? It looks like their job is to calculate the inverses for the following mod functions so one would expect them to be the same.

Method 2 is the function that was written in x64 asm.

ALSO the x64 assembly versions of these codes also does not always set all 4 values of pre[4] and this has lead to incorrect results when using the exported function mpn_mod_1().

/mpn/x86_64/mod_1_1.asm

	neg	%r12
	mov	%r12, %r8
	mov	%rax, (%rbx)		C store bi
	mov	%rbp, 8(%rbx)		C store cnt
	imul	%rax, %r12
	mov	%r12, 24(%rbx)		C store B2modb
	mov	R32(%rbp), R32(%rcx)
	test	R32(%rcx), R32(%rcx)
	jz	L(z)

	mov	$1, R32(%rdx)
ifdef(`SHLD_SLOW',`
	C Destroys %rax, unlike shld. Otherwise, we could do B1modb
	C before B2modb, and get rid of the move %r12, %r8 above.

	shl	R8(%rcx), %rdx
	neg	R32(%rcx)
	shr	R8(%rcx), %rax
	or	%rax, %rdx
	neg	R32(%rcx)
	shld	R8(%rcx), %rax, %rdx
	imul	%rdx, %r8
	shr	R8(%rcx), %r8
	mov	%r8, 16(%rbx)		C store B1modb
L(z):
	pop	%r12
	pop	%rbx
	pop	%rbp
	FUNC_EXIT()
	ret

Notice on line 230 of the asm file (mov	%r8, 16(%rbx) // C store B1modb) gets skipped over when count is 0 (which it will be 100% of the time when the high-bit is set).

I checked mod_1_2.c, mod_1_3.c, and mod_1_4.c and they do not seem to have this problem.

Also note that most compilers and run-times for me seem to have a flukey '0' here in memory (which will be the correct value when the high-bit is set) so that might be why no one has stumbled upon this bug yet or why it might pass tests.

I do not understand these functions enough to suggest a fix. I do not know if initializing pre[4] to all 0's will produce correct results.

Which is the correct version of mpn_mod_1_1p_cps() to use for now? (Method 1 or Method 2)

My apologies if this is not the best bug report ever.

-Brett Kuntz