make check errors on recent MacOS versions appear to be due to new default linker in XCode

Mark Mentovai mark at mentovai.com
Wed Oct 2 18:50:10 CEST 2024


If you read nothing else, read this:

gmp-6.3.0 ships libtool-2.4.6 (2015-02-16). Update to libtool-2.4.7
(2022-03-17) to solve this problem.

Details:

There does appear to be a bug in Apple’s new linker (ld-new or ld-prime)
when targeting x86_64, producing a Mach-O dynamic library (clang
-dynamiclib), and using the flat namespace option (-flat_namespace). I
observed this as a variety of crashes in `make check`. I
investigated t-bdiv raising SIGILL in particular:

% lldb tests/mpn/.libs/t-bdiv
(lldb) target create "tests/mpn/.libs/t-bdiv"
Current executable set to '…/gmp-6.3.0.build/tests/mpn/.libs/t-bdiv'
(x86_64).
(lldb) env DYLD_LIBRARY_PATH=.libs
(lldb) run
Process 19802 launched: '…/gmp-6.3.0.build/tests/mpn/.libs/t-bdiv' (x86_64)
Process 19802 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason =
EXC_BAD_INSTRUCTION (code=EXC_I386_INVOP, subcode=0x0)
    frame #0: 0x00000001000de806 libgmp.10.dylib`__gmpn_sub_n + 3
Target 0: (t-bdiv) stopped.
(lldb) disassemble
libgmp.10.dylib`:
    0x1000de803 <+0>: jmpq   *0x11ecf(%rip)            ; (void
*)0x00000001000a1a00: __gmpn_sub_n
(lldb) disassemble -s 0x1000de803 -e 0x1000de80f
libgmp.10.dylib`:
    0x1000de803 <+0>: jmpq   *0x11ecf(%rip)            ; (void
*)0x00000001000a1a00: __gmpn_sub_n

libgmp.10.dylib`:
    0x1000de809 <+0>: jmpq   *0x11ed1(%rip)            ; (void
*)0x00000001000a1aae: __gmpn_sub_nc

WIth the fault address at 0x1000de806 falling partway through the
instruction at 0x1000de803, this certainly would be a bad instruction. This
code was assembled from
https://gmplib.org/repo/gmp-6.3/file/62abbaeaab13/mpn/x86_64/core2/aors_n.asm,
at the bottom of the file has __gmpn_sub_nc jumping to within (but not the
beginning of) __gmpn_sub_n. Duplicating that structure in a reduced
testcase:

% cat ts_x86-64.s
.text
.globl _F
.p2align 4, 0x90
_F:
  movl $1, %eax
Lcommon:
  shll %eax
  retq

.globl _G
.p2align 4, 0x90
_G:
  movl $2, %eax
  jmp Lcommon
% cat tc.c
int F();
int G();

int main(int argc, char* argv[]) {
  return G();
}

The problem is easily reproduced:

% clang -dynamiclib -flat_namespace -o libt.dylib ts_x86-64.s
% clang -o t tc.c libt.dylib
% ./t
zsh: segmentation fault  ./t

This dylib is small enough to observe what’s going on inside directly:

% objdump -d libt.dylib

libt.dylib: file format mach-o 64-bit x86-64

Disassembly of section __TEXT,__text:

0000000000000f80 <_F>:
     f80: b8 01 00 00 00               movl $1, %eax
     f85: d1 e0                         shll %eax
     f87: c3                           retq
     f88: 0f 1f 84 00 00 00 00 00       nopl (%rax,%rax)

0000000000000f90 <_G>:
     f90: b8 02 00 00 00               movl $2, %eax
     f95: e9 05 00 00 00               jmp 0xf9f

Disassembly of section __TEXT,__stubs:

0000000000000f9a <__stubs>:
     f9a: ff 25 60 00 00 00             jmpq *96(%rip)               ##
0x1000

The jump at 0xf95 is bad: 0xf9f is a bad jump target. As before, that
address lies within another instruction (in this case, the last byte of the
instruction at 0xf9a). In fact, that’s the very last byte of the section:

% otool -l libt.dylib
[…]
Section
  sectname __stubs
   segname __TEXT
      addr 0x0000000000000f9a
      size 0x0000000000000006
[…]
Section
  sectname __unwind_info
   segname __TEXT
      addr 0x0000000000000fa0
      size 0x0000000000000058

The jump at 0xf95 should target 0xf85, or _G + 0x5. For some reason, the
linker created a stub for this jump (which itself shouldn’t be necessary)
and then, instead of arranging for the stub to resolve and jump to _G +
0x5, jumped to offset 0x5 within the stub.

This is a clear bug in the linker, and I’ll report it to Apple, but don’t
know that anyone could expect much traction.

That doesn’t need to be the end of the story. There’s another concern here:
this bug only occurs with -flat_namespace. gmp shouldn’t need
-flat_namespace, and in fact it’s undesirable to enable it. It’s coming
into this build from configure, via aclocal.m4, having been included from
libtool.m4. In libtool-2.4.6, which gmp-6.3.0 is using, that’s
https://git.savannah.gnu.org/cgit/libtool.git/tree/m4/libtool.m4?h=v2.4.6#n1070.
In particular, it intends to enable -flat_namespace only on very early Mac
OS X versions (pre-10.4, in the PowerPC-only era). But the case that we’d
like to hit, assuming MACOSX_DEPLOYMENT_TARGET is unset (as it normally
would be), doesn’t match $host on a modern macOS system, because the Darwin
version has marched past 20, while the pattern only contemplates versions
up to 19.

https://git.savannah.gnu.org/cgit/libtool.git/commit/m4/libtool.m4?id=9e8c882517082fe5755f2524d23efb02f1522490,
in libtool-2.4.7, modernizes this check in libtool, and with that in use,
does not enable -flat_namespace in this situation. Upgrading libtool in gmp
to that version will fix this problem. I ran `autoreconf --install` with
autoconf-2.69, automake-1.15, and libtool-2.4.7, and observed a clean `make
check` on macOS 14.7 x86_64 (nehalem-apple-darwin23.6.0)/Xcode 15.4 and
macOS 15.0 x86_64 (nehalem-apple-darwin24.0.0)/Xcode 16.0. In both cases,
the linker is ld-new/ld-prime (no -ld_classic).


More information about the gmp-bugs mailing list