[PATCH 1/1] aarch64: support PAC and BTI

Bill Roberts bill.roberts at foss.arm.com
Thu Aug 29 21:26:46 CEST 2024


On 8/12/24 1:54 PM, Bill Roberts wrote:
> Enable Pointer Authentication Codes (PAC) and Branch Target
> Identification (BTI) support for ARM 64 targets.
>
> PAC works by signing the LR with either an A key or B key and verifying
> the return address. There are quite a few instructions capable of doing
> this, however, the Linux ARM ABI is to use hint compatible instructions
> that can be safely NOP'd on older hardware and can be assembled and
> linked with older binutils. This limits the instruction set to paciasp,
> pacibsp, autiasp and autibsp. Instructions prefixed with pac are for
> signing and instructions prefixed with aut are for signing. Both
> instructions are then followed with an a or b to indicate which signing
> key they are using. The keys can be controlled using
> -mbranch-protection=pac-ret for the A key and
> -mbranch-protection=pac-ret+b-key for the B key.
>
> BTI works by marking all call and jump positions with bti c and bti
> j instructions. If execution control transfers to an instruction other
> than a BTI instruction, the execution is killed via SIGILL. Note that
> to remove one instruction, the aforementioned pac instructions will
> also work as a BTI landing pad for bti c usages.
>
> For BTI to work, all object files linked for a unit of execution,
> whether an executable or a library must have the GNU Notes section of
> the ELF file marked to indicate BTI support. This is so loader/linkers
> can apply the proper permission bits (PROT_BRI) on the memory region.
>
> PAC can also be annotated in the GNU ELF notes section, but it's not
> required for enablement, as interleaved PAC and non-pac code works as
> expected since it's the callee that performs all the checking. The
> linker follows the same rules as BTI for discarding the PAC flag from
> the GNU Notes section.
>
> Testing was done under the following CFLAGS and CXXFLAGS for all
> combinations:
> 1. -mbranch-protection=none
> 2. -mbranch-protection=standard
> 3. -mbranch-protection=pac-ret
> 4. -mbranch-protection=pac-ret+b-key
> 5. -mbranch-protection=bti
>
> Additional Notes:
> MPN was handled differently then the standard approach of all PROLOGUES
> getting a SIGN_LR macro. This is becuase MPN does not make use of
> saving the x30, aka the link regeister (LR), to the stack in almost all
> instances. However, some functions do, and they were explicitly handled.
> This not only avoids the cost of the operations to sign and verify the
> LR but also handles instances where branches are taken to labels where
> indirect branches are used over branch and link to optimize the assembly.
>
> Also, within the configure.ac are a myriad of options for different
> architectures, chipsets, ABIs, etc. To compound that, additional
> architecture specifiec features could be enabled with in CFLAGS that
> needs to be respected in-order to get a correct output. For instance in
> aarch64, the PAC and BTI instructions need to be output in the generated
> assembly as well as the GNU notes section added to the ELF output to get
> those security features. Hacking it into the configure options seems
> baroque, especially considering that distro packaging will often just
> set a set of CFLAGS to be respected and move on and that's most users
> would expect. Taking this all into consideration, allowing for a per
> architecture script that can be executed to generate additional m4
> allows for internal definitions, like in the PAC case, to be exposed, or
> any multitude of options if other archs need somethng like this. This
> introduces the variable gen_path_m4 that arch's can set to the script of
> their choosing to generate whatever m4 they need that is prepended to
> the m4 generation command after the defines.
>
> Signed-off-by: Bill Roberts <bill.roberts at arm.com>
> ---
>   configure.ac                   | 12 +++++
>   mpn/Makeasm.am                 |  3 +-
>   mpn/arm64/aors_n.asm           |  4 ++
>   mpn/arm64/aorsmul_1.asm        |  3 ++
>   mpn/arm64/aorsorrlsh1_n.asm    |  2 +
>   mpn/arm64/aorsorrlsh2_n.asm    |  2 +
>   mpn/arm64/aorsorrlshC_n.asm    |  1 +
>   mpn/arm64/arm64-defs.m4        | 67 ++++++++++++++++++++++++++++
>   mpn/arm64/bdiv_dbm1c.asm       |  2 +
>   mpn/arm64/bdiv_q_1.asm         |  3 ++
>   mpn/arm64/cnd_aors_n.asm       |  3 ++
>   mpn/arm64/com.asm              |  2 +
>   mpn/arm64/copyd.asm            |  2 +
>   mpn/arm64/copyi.asm            |  2 +
>   mpn/arm64/divrem_1.asm         |  9 ++++
>   mpn/arm64/gcd_11.asm           |  2 +
>   mpn/arm64/gcd_22.asm           |  2 +
>   mpn/arm64/gen-extra-m4.sh      | 81 ++++++++++++++++++++++++++++++++++
>   mpn/arm64/hamdist.asm          |  7 ++-
>   mpn/arm64/invert_limb.asm      |  2 +
>   mpn/arm64/logops_n.asm         |  3 ++
>   mpn/arm64/lshift.asm           |  2 +
>   mpn/arm64/lshiftc.asm          |  2 +
>   mpn/arm64/mod_34lsub1.asm      |  2 +
>   mpn/arm64/mul_1.asm            |  3 ++
>   mpn/arm64/popcount.asm         |  8 +++-
>   mpn/arm64/rsh1aors_n.asm       |  3 ++
>   mpn/arm64/rshift.asm           |  2 +
>   mpn/arm64/sec_tabselect.asm    |  2 +
>   mpn/arm64/sqr_diag_addlsh1.asm |  2 +
>   mpn/m4-ccas                    | 23 ++++++++--
>   31 files changed, 255 insertions(+), 8 deletions(-)
>   create mode 100755 mpn/arm64/gen-extra-m4.sh
>
> diff --git a/configure.ac b/configure.ac
> index c3a4a9bf8..83a73f3a0 100644
> --- a/configure.ac
> +++ b/configure.ac
> @@ -473,6 +473,11 @@ cc_64_cflags="-O"
>   SPEED_CYCLECOUNTER_OBJ=
>   cyclecounter_size=2
>   
> +# architectures can set this to add defines dynamically to m4 generation.
> +# For example, in arm64 it is used to determine if PAC and BTI are enabled
> +# and enable generation of those instructions in m4 asm.
> +gen_path_m4=
> +
>   AC_SUBST(HAVE_HOST_CPU_FAMILY_power,  0)
>   AC_SUBST(HAVE_HOST_CPU_FAMILY_powerpc,0)
>   
> @@ -781,6 +786,7 @@ case $host in
>   	gcc_cflags_arch="-march=armv8-a"
>   	gcc_cflags_neon="-mfpu=neon"
>   	gcc_cflags_tune=""
> +	gen_path_m4="arm64/gen-extra-m4.sh"
>   	;;
>         [applem[1-9]*])
>   	abilist="64"
> @@ -4051,6 +4057,12 @@ fi
>   AC_PROG_YACC
>   AM_PROG_LEX
>   
> +# This may appear odd, however prefixing with m4 is
> +# reserved in m4/autoconf but not in automake and
> +# beyond. The prefixed version matches things like
> +# gcc_c_flags.
> +AC_SUBST([M4_GEN_PATH], [$gen_path_m4])
> +
>   # Create config.m4.
>   GMP_FINISH
>   
> diff --git a/mpn/Makeasm.am b/mpn/Makeasm.am
> index 5d7306c22..bfdc632fe 100644
> --- a/mpn/Makeasm.am
> +++ b/mpn/Makeasm.am
> @@ -115,4 +115,5 @@ RM_TMP = rm -f
>   	$(CCAS) $(COMPILE_FLAGS) tmp-$*.s -o $@
>   	$(RM_TMP) tmp-$*.s
>   .asm.lo:
> -	$(LIBTOOL) --mode=compile --tag=CC $(top_srcdir)/mpn/m4-ccas --m4="$(M4)" $(CCAS) $(COMPILE_FLAGS) `test -f '$<' || echo '$(srcdir)/'`$<
> +	$(LIBTOOL) --mode=compile --tag=CC $(top_srcdir)/mpn/m4-ccas --m4-gen-path=$(top_srcdir)/mpn/$(M4_GEN_PATH) --m4="$(M4)" \
> +		$(CCAS) $(COMPILE_FLAGS) `test -f '$<' || echo '$(srcdir)/'`$<
> diff --git a/mpn/arm64/aors_n.asm b/mpn/arm64/aors_n.asm
> index b4a6da6ff..a5b542d4d 100644
> --- a/mpn/arm64/aors_n.asm
> +++ b/mpn/arm64/aors_n.asm
> @@ -60,13 +60,16 @@ ifdef(`OPERATION_sub_n', `
>     define(`func_nc',	mpn_sub_nc)')
>   
>   MULFUNC_PROLOGUE(mpn_add_n mpn_add_nc mpn_sub_n mpn_sub_nc)
> +	BTI_C
>   
>   ASM_START()
>   PROLOGUE(func_nc)
> +	BTI_C
>   	SETCY(	x4)
>   	b	L(ent)
>   EPILOGUE()
>   PROLOGUE(func_n)
> +	BTI_C
>   	CLRCY
>   L(ent):	lsr	x17, n, #2
>   	tbz	n, #0, L(bx0)
> @@ -123,3 +126,4 @@ L(end):	ADDSUBC	x12, x6, x10
>   L(ret):	RETVAL
>   	ret
>   EPILOGUE()
> +ADD_GNU_NOTES_IF_NEEDED
> diff --git a/mpn/arm64/aorsmul_1.asm b/mpn/arm64/aorsmul_1.asm
> index 81ec1dabb..05091330d 100644
> --- a/mpn/arm64/aorsmul_1.asm
> +++ b/mpn/arm64/aorsmul_1.asm
> @@ -68,8 +68,10 @@ ifdef(`OPERATION_submul_1', `
>     define(`func',	mpn_submul_1)')
>   
>   MULFUNC_PROLOGUE(mpn_addmul_1 mpn_submul_1)
> +	BTI_C
>   
>   PROLOGUE(func)
> +	BTI_C
>   	adds	x15, xzr, xzr
>   
>   	tbz	n, #0, L(1)
> @@ -143,3 +145,4 @@ L(mid):	sub	n, n, #1
>   	csinc	x0, x15, x15, COND
>   	ret
>   EPILOGUE()
> +ADD_GNU_NOTES_IF_NEEDED
> diff --git a/mpn/arm64/aorsorrlsh1_n.asm b/mpn/arm64/aorsorrlsh1_n.asm
> index c617a67a9..2de3ff992 100644
> --- a/mpn/arm64/aorsorrlsh1_n.asm
> +++ b/mpn/arm64/aorsorrlsh1_n.asm
> @@ -39,5 +39,7 @@ ifdef(`OPERATION_sublsh1_n',`define(`DO_sub')')
>   ifdef(`OPERATION_rsblsh1_n',`define(`DO_rsb')')
>   
>   MULFUNC_PROLOGUE(mpn_addlsh1_n mpn_sublsh1_n mpn_rsblsh1_n)
> +	BTI_C
>   
>   include_mpn(`arm64/aorsorrlshC_n.asm')
> +ADD_GNU_NOTES_IF_NEEDED
> diff --git a/mpn/arm64/aorsorrlsh2_n.asm b/mpn/arm64/aorsorrlsh2_n.asm
> index 852d11720..2161ae2a9 100644
> --- a/mpn/arm64/aorsorrlsh2_n.asm
> +++ b/mpn/arm64/aorsorrlsh2_n.asm
> @@ -39,5 +39,7 @@ ifdef(`OPERATION_sublsh2_n',`define(`DO_sub')')
>   ifdef(`OPERATION_rsblsh2_n',`define(`DO_rsb')')
>   
>   MULFUNC_PROLOGUE(mpn_addlsh2_n mpn_sublsh2_n mpn_rsblsh2_n)
> +	BTI_C
>   
>   include_mpn(`arm64/aorsorrlshC_n.asm')
> +ADD_GNU_NOTES_IF_NEEDED
> diff --git a/mpn/arm64/aorsorrlshC_n.asm b/mpn/arm64/aorsorrlshC_n.asm
> index 1718b7757..97df8c6f0 100644
> --- a/mpn/arm64/aorsorrlshC_n.asm
> +++ b/mpn/arm64/aorsorrlshC_n.asm
> @@ -65,6 +65,7 @@ ifdef(`DO_rsb', `
>   
>   ASM_START()
>   PROLOGUE(func_n)
> +	BTI_C
>   	lsr	x6, n, #2
>   	tbz	n, #0, L(bx0)
>   
> diff --git a/mpn/arm64/arm64-defs.m4 b/mpn/arm64/arm64-defs.m4
> index 46149f7bf..d0ad4b63c 100644
> --- a/mpn/arm64/arm64-defs.m4
> +++ b/mpn/arm64/arm64-defs.m4
> @@ -36,6 +36,73 @@ dnl  don't want to disable macro expansions in or after them.
>   
>   changecom
>   
> +dnl use the hint instructions so they NOP on older machines.
> +dnl Add comments so the assembly is notated with the instruction
> +
> +
> +define(`BTI_C', `hint #34    /* bti c */')
> +define(`PACIASP', `hint #25  /* paciasp */')
> +define(`AUTIASP', `hint #29  /* autiasp */')
> +define(`PACIBSP', `hint #27  /* pacibsp */')
> +define(`AUTIBSP', `hint #31  /* autibsp */')
> +
> +dnl if BTI is enabled we want the SIGN_LR to be a valid
> +dnl landing pad, we don't need VERIFY_LR and we need to
> +dnl indicate the valid BTI support for gnu notes.
> +
> +
> +ifelse(ARM64_FEATURE_BTI_DEFAULT, `1',
> +  `define(`SIGN_LR', `BTI_C')
> +   define(`GNU_PROPERTY_AARCH64_BTI', `1')
> +   define(`PAC_OR_BTI')',
> +   define(`GNU_PROPERTY_AARCH64_BTI', `0')'
> +')
> +
> +dnl define instructions for PAC, which can use the A
> +dnl or the B key. PAC instructions are also valid BTI
> +dnl landing pads, so we re-define SIGN_LR if BTI is
> +dnl enabled.
> +
> +
> +ifelse(ARM64_FEATURE_PAC_DEFAULT, `1',
> +    `define(`SIGN_LR', `PACIASP')
> +     define(`VERIFY_LR', `AUTIASP')
> +     define(`GNU_PROPERTY_AARCH64_POINTER_AUTH', `2')
> +     define(`PAC_OR_BTI')',
> +   ARM64_FEATURE_PAC_DEFAULT, `2',
> +    `define(`SIGN_LR', `PACIBSP')
> +     define(`VERIFY_LR', `AUTIBSP')
> +     define(`GNU_PROPERTY_AARCH64_POINTER_AUTH', `2')
> +     define(`PAC_OR_BTI')',
> +    `ifdef(`SIGN_LR', , `define(`SIGN_LR', `')')
> +     define(`VERIFY_LR', `')
> +     define(`GNU_PROPERTY_AARCH64_POINTER_AUTH', `0')'
> +')
> +
> +dnl ADD_GNU_NOTES_IF_NEEDED
> +dnl
> +dnl Conditionally add into ELF assembly files the GNU notes indicating if
> +dnl BTI or PAC is support. BTI is required by the linkers and loaders, however
> +dnl PAC is a nice to have for auditing. Use readelf -n to display.
> +
> +
> +define(`ADD_GNU_NOTES_IF_NEEDED', `
> +  ifdef(`ARM64_ELF', `
> +    ifdef(`PAC_OR_BTI', `
> +      .pushsection .note.gnu.property, "a";
> +      .balign 8;
> +      .long 4;
> +      .long 0x10;
> +      .long 0x5;
> +      .asciz "GNU";
> +      .long 0xc0000000; /* GNU_PROPERTY_AARCH64_FEATURE_1_AND */
> +      .long 4;
> +      .long eval(indir(`GNU_PROPERTY_AARCH64_POINTER_AUTH') + indir(`GNU_PROPERTY_AARCH64_BTI'));
> +      .long 0;
> +      .popsection;
> +    ')
> +  ')
> +')
>   
>   dnl  LEA_HI(reg,gmp_symbol), LEA_LO(reg,gmp_symbol)
>   dnl
> diff --git a/mpn/arm64/bdiv_dbm1c.asm b/mpn/arm64/bdiv_dbm1c.asm
> index 78984b426..9f15f8e59 100644
> --- a/mpn/arm64/bdiv_dbm1c.asm
> +++ b/mpn/arm64/bdiv_dbm1c.asm
> @@ -45,6 +45,7 @@ ASM_START()
>   	TEXT
>   	ALIGN(16)
>   PROLOGUE(mpn_bdiv_dbm1c)
> +	BTI_C
>   	ldr	x5, [up], #8
>   	ands	x6, n, #3
>   	b.eq	L(fi0)
> @@ -109,3 +110,4 @@ L(wd1):	subs	x4, x4, x12
>   	sbc	x0, x4, x13
>   	ret
>   EPILOGUE()
> +ADD_GNU_NOTES_IF_NEEDED
> diff --git a/mpn/arm64/bdiv_q_1.asm b/mpn/arm64/bdiv_q_1.asm
> index 7fffc9369..401227a83 100644
> --- a/mpn/arm64/bdiv_q_1.asm
> +++ b/mpn/arm64/bdiv_q_1.asm
> @@ -56,6 +56,7 @@ define(`tnc', `x8')
>   
>   ASM_START()
>   PROLOGUE(mpn_bdiv_q_1)
> +	BTI_C
>   
>   	rbit	x6, d
>   	clz	cnt, x6
> @@ -79,6 +80,7 @@ PROLOGUE(mpn_bdiv_q_1)
>   EPILOGUE()
>   
>   PROLOGUE(mpn_pi1_bdiv_q_1)
> +	BTI_C
>   	sub	n, n, #1
>   	subs	x6, x6, x6		C clear r6 and C flag
>   	ldr	x9, [up],#8
> @@ -120,3 +122,4 @@ L(tpn):	ldr	x9, [up],#8
>   
>   L(en1):	ret
>   EPILOGUE()
> +ADD_GNU_NOTES_IF_NEEDED
> diff --git a/mpn/arm64/cnd_aors_n.asm b/mpn/arm64/cnd_aors_n.asm
> index 397aa5100..916708885 100644
> --- a/mpn/arm64/cnd_aors_n.asm
> +++ b/mpn/arm64/cnd_aors_n.asm
> @@ -57,9 +57,11 @@ ifdef(`OPERATION_cnd_sub_n', `
>     define(`func',	mpn_cnd_sub_n)')
>   
>   MULFUNC_PROLOGUE(mpn_cnd_add_n mpn_cnd_sub_n)
> +	BTI_C
>   
>   ASM_START()
>   PROLOGUE(func)
> +	BTI_C
>   	cmp	cnd, #1
>   	sbc	cnd, cnd, cnd
>   
> @@ -127,3 +129,4 @@ L(end):	bic	x6, x12, cnd
>   L(rt):	RETVAL
>   	ret
>   EPILOGUE()
> +ADD_GNU_NOTES_IF_NEEDED
> diff --git a/mpn/arm64/com.asm b/mpn/arm64/com.asm
> index d59494380..82b6787bf 100644
> --- a/mpn/arm64/com.asm
> +++ b/mpn/arm64/com.asm
> @@ -47,6 +47,7 @@ define(`n',  `x2')
>   
>   ASM_START()
>   PROLOGUE(mpn_com)
> +	BTI_C
>   	cmp	n, #3
>   	b.le	L(bc)
>   
> @@ -90,3 +91,4 @@ L(tl1):	tbz	n, #0, L(tl2)
>   	str	x4, [rp]
>   L(tl2):	ret
>   EPILOGUE()
> +ADD_GNU_NOTES_IF_NEEDED
> diff --git a/mpn/arm64/copyd.asm b/mpn/arm64/copyd.asm
> index d542970b7..b221d23a8 100644
> --- a/mpn/arm64/copyd.asm
> +++ b/mpn/arm64/copyd.asm
> @@ -47,6 +47,7 @@ define(`n',  `x2')
>   
>   ASM_START()
>   PROLOGUE(mpn_copyd)
> +	BTI_C
>   	add	rp, rp, n, lsl #3
>   	add	up, up, n, lsl #3
>   
> @@ -83,3 +84,4 @@ L(tl1):	tbz	n, #0, L(tl2)
>   	str	x4, [rp,#-8]
>   L(tl2):	ret
>   EPILOGUE()
> +ADD_GNU_NOTES_IF_NEEDED
> diff --git a/mpn/arm64/copyi.asm b/mpn/arm64/copyi.asm
> index 0de40c5d7..360266c67 100644
> --- a/mpn/arm64/copyi.asm
> +++ b/mpn/arm64/copyi.asm
> @@ -47,6 +47,7 @@ define(`n',  `x2')
>   
>   ASM_START()
>   PROLOGUE(mpn_copyi)
> +	BTI_C
>   	cmp	n, #3
>   	b.le	L(bc)
>   
> @@ -80,3 +81,4 @@ L(tl1):	tbz	n, #0, L(tl2)
>   	str	x4, [rp]
>   L(tl2):	ret
>   EPILOGUE()
> +ADD_GNU_NOTES_IF_NEEDED
> diff --git a/mpn/arm64/divrem_1.asm b/mpn/arm64/divrem_1.asm
> index 9d5bb5959..2bb8850d9 100644
> --- a/mpn/arm64/divrem_1.asm
> +++ b/mpn/arm64/divrem_1.asm
> @@ -66,6 +66,8 @@ dnl                      mp_limb_t d_unnorm, mp_limb_t dinv, int cnt)
>   ASM_START()
>   
>   PROLOGUE(mpn_preinv_divrem_1)
> +	BTI_C
> +	SIGN_LR
>   	cbz	n_arg, L(fz)
>   	stp	x29, x30, [sp, #-80]!
>   	mov	x29, sp
> @@ -86,6 +88,8 @@ PROLOGUE(mpn_preinv_divrem_1)
>   EPILOGUE()
>   
>   PROLOGUE(mpn_divrem_1)
> +	BTI_C
> +	SIGN_LR
>   	cbz	n_arg, L(fz)
>   	stp	x29, x30, [sp, #-80]!
>   	mov	x29, sp
> @@ -154,6 +158,7 @@ L(uend):add	x2, x11, #1
>   	ldp	x21, x22, [sp, #32]
>   	ldp	x23, x24, [sp, #48]
>   	ldp	x29, x30, [sp], #80
> +	VERIFY_LR
>   	ret
>   
>   L(ufx):	add	x2, x2, #1
> @@ -194,6 +199,7 @@ L(nend):cbnz	fn, L(frac)
>   	ldp	x21, x22, [sp, #32]
>   	ldp	x23, x24, [sp, #48]
>   	ldp	x29, x30, [sp], #80
> +	VERIFY_LR
>   	ret
>   
>   L(nfx):	add	x2, x2, #1
> @@ -219,6 +225,7 @@ L(ftop):add	x2, x11, #1
>   	ldp	x21, x22, [sp, #32]
>   	ldp	x23, x24, [sp, #48]
>   	ldp	x29, x30, [sp], #80
> +	VERIFY_LR
>   	ret
>   
>   C Block zero. We need this for the degenerated case of n = 0, fn != 0.
> @@ -227,5 +234,7 @@ L(ztop):str	xzr, [qp_arg], #8
>   	sub	fn_arg, fn_arg, #1
>   	cbnz	fn_arg, L(ztop)
>   L(zend):mov	x0, #0
> +	VERIFY_LR
>   	ret
>   EPILOGUE()
> +ADD_GNU_NOTES_IF_NEEDED
> diff --git a/mpn/arm64/gcd_11.asm b/mpn/arm64/gcd_11.asm
> index d8cc3e2cf..5e18fa21b 100644
> --- a/mpn/arm64/gcd_11.asm
> +++ b/mpn/arm64/gcd_11.asm
> @@ -54,6 +54,7 @@ ASM_START()
>   	TEXT
>   	ALIGN(16)
>   PROLOGUE(mpn_gcd_11)
> +	BTI_C
>   	subs	x3, u0, v0		C			0
>   	b.eq	L(end)			C
>   
> @@ -68,3 +69,4 @@ L(top):	rbit	x12, x3			C			1,5
>   
>   L(end):	ret
>   EPILOGUE()
> +ADD_GNU_NOTES_IF_NEEDED
> diff --git a/mpn/arm64/gcd_22.asm b/mpn/arm64/gcd_22.asm
> index 5367fea02..4a0b902b7 100644
> --- a/mpn/arm64/gcd_22.asm
> +++ b/mpn/arm64/gcd_22.asm
> @@ -56,6 +56,7 @@ define(`tnc',   `x8')
>   
>   ASM_START()
>   PROLOGUE(mpn_gcd_22)
> +	BTI_C
>   
>   	ALIGN(16)
>   L(top):	subs	t0, u0, v0		C 0 6
> @@ -110,3 +111,4 @@ L(end):	mov	x0, v0
>   	mov	x1, v1
>   	ret
>   EPILOGUE()
> +ADD_GNU_NOTES_IF_NEEDED
> diff --git a/mpn/arm64/gen-extra-m4.sh b/mpn/arm64/gen-extra-m4.sh
> new file mode 100755
> index 000000000..09c5b8975
> --- /dev/null
> +++ b/mpn/arm64/gen-extra-m4.sh
> @@ -0,0 +1,81 @@
> +#!/bin/sh
> +#
> +# A script for dynamically generating m4 definitions for aarch64 based on compilation flags.
> +#
> +# Copyright 2024 ARM Ltd.
> +#
> +#  This file is part of the GNU MP Library.
> +#
> +#  The GNU MP Library is free software; you can redistribute it and/or modify
> +#  it under the terms of either:
> +#
> +#    * the GNU Lesser General Public License as published by the Free
> +#      Software Foundation; either version 3 of the License, or (at your
> +#      option) any later version.
> +#
> +#  or
> +#
> +#    * the GNU General Public License as published by the Free Software
> +#      Foundation; either version 2 of the License, or (at your option) any
> +#      later version.
> +#
> +#  or both in parallel, as here.
> +#
> +#  The GNU MP Library is distributed in the hope that it will be useful, but
> +#  WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
> +#  or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
> +#  for more details.
> +#
> +#  You should have received copies of the GNU General Public License and the
> +#  GNU Lesser General Public License along with the GNU MP Library.  If not,
> +#  see https://www.gnu.org/licenses/.
> +
> +# Usage: ./gen-extra-m4.sh "$CC"
> +# Returns: valid M4 to stdout.
> +
> +if test "$#" -ne 1; then
> +  echo "Expected 1 argument, the CC. Got: $#"
> +  exit 1
> +fi
> +
> +CC=$1
> +
> +ARM64_FEATURE_BTI_DEFAULT="0"
> +ARM64_FEATURE_PAC_DEFAULT="0"
> +ARM64_ELF="0"
> +
> +# strip -o from CC line so -dM works
> +_CC=$(echo "$CC" | sed 's/-o [^ ]*//')
> +output=$($_CC -dM -E - < /dev/null || exit $?)
> +while IFS= read -r line; do
> +  # Skip empty lines
> +  if test -z "$line"; then
> +    continue
> +  fi
> +  # Match the #define pattern and extract the macro name and value
> +  case "$line" in
> +    \#define\ *\ *)
> +      macro_name=`echo "$line" | awk '{print $2}'`
> +      macro_value=`echo "$line" | cut -d ' ' -f 3- | sed 's/^"\(.*\)"$/\1/'`
> +      # map's would be nice in POSIX shell, could use eval to simplify, but
> +      # I won't do that to others.
> +      case "$macro_name" in
> +        __ARM_FEATURE_BTI_DEFAULT)
> +          ARM64_FEATURE_BTI_DEFAULT="$macro_value"
> +        ;;
> +        __ARM_FEATURE_PAC_DEFAULT)
> +          ARM64_FEATURE_PAC_DEFAULT="$macro_value"
> +        ;;
> +        __ELF__)
> +          ARM64_ELF="$macro_value"
> +        ;;
> +      esac # end assignments
> +      ;;
> +  esac # end define
> +done <<< "$output"
> +
> +# Output the M4 define statement. To make m4 simpler always output something so we can
> +# use an ifelse without needing to nest it within an ifdef.
> +echo "define(\`ARM64_FEATURE_BTI_DEFAULT', \`$ARM64_FEATURE_BTI_DEFAULT')"
> +echo "define(\`ARM64_FEATURE_PAC_DEFAULT', \`$ARM64_FEATURE_PAC_DEFAULT')"
> +echo "define(\`ARM64_ELF', \`$ARM64_ELF')"
> diff --git a/mpn/arm64/hamdist.asm b/mpn/arm64/hamdist.asm
> index c72ca55b3..418519458 100644
> --- a/mpn/arm64/hamdist.asm
> +++ b/mpn/arm64/hamdist.asm
> @@ -60,12 +60,13 @@ define(`chunksize',0x1ff0)
>   
>   ASM_START()
>   PROLOGUE(mpn_hamdist)
> +	BTI_C
>   
>   	mov	x11, #maxsize
>   	cmp	n, x11
>   	b.hi	L(gt8k)
>   
> -L(lt8k):
> +L(lt8k):	BTI_C
>   	movi	v4.16b, #0			C clear summation register
>   	movi	v5.16b, #0			C clear summation register
>   
> @@ -103,7 +104,8 @@ L(gt4):	ld1	{v2.2d,v3.2d}, [ap], #32	C load 4 limbs
>   L(000):	subs	n, n, #8
>   	b.lo	L(e0)
>   
> -L(chu):	ld1	{v2.2d,v3.2d}, [ap], #32	C load 4 limbs
> +L(chu):	BTI_C
> +	ld1	{v2.2d,v3.2d}, [ap], #32	C load 4 limbs
>   	ld1	{v0.2d,v1.2d}, [ap], #32	C load 4 limbs
>   	ld1	{v18.2d,v19.2d}, [bp], #32	C load 4 limbs
>   	ld1	{v16.2d,v17.2d}, [bp], #32	C load 4 limbs
> @@ -179,3 +181,4 @@ L(gt8k):
>   	mov	x30, x8
>   	ret
>   EPILOGUE()
> +ADD_GNU_NOTES_IF_NEEDED
> diff --git a/mpn/arm64/invert_limb.asm b/mpn/arm64/invert_limb.asm
> index 6a99bf002..a42a3c751 100644
> --- a/mpn/arm64/invert_limb.asm
> +++ b/mpn/arm64/invert_limb.asm
> @@ -40,6 +40,7 @@ C Compiler generated, mildly edited.  Could surely be further optimised.
>   
>   ASM_START()
>   PROLOGUE(mpn_invert_limb)
> +	BTI_C
>   	lsr	x2, x0, #54
>   	LEA_HI(	x1, approx_tab)
>   	and	x2, x2, #0x1fe
> @@ -81,3 +82,4 @@ approx_tab:
>   forloop(i,256,512-1,dnl
>   `	.hword	eval(0x7fd00/i)
>   ')dnl
> +ADD_GNU_NOTES_IF_NEEDED
> diff --git a/mpn/arm64/logops_n.asm b/mpn/arm64/logops_n.asm
> index e959abc71..c3400c760 100644
> --- a/mpn/arm64/logops_n.asm
> +++ b/mpn/arm64/logops_n.asm
> @@ -75,9 +75,11 @@ ifdef(`OPERATION_xnor_n',`
>     define(`LOGOP',   `eon	$1, $2, $3')')
>   
>   MULFUNC_PROLOGUE(mpn_and_n mpn_andn_n mpn_nand_n mpn_ior_n mpn_iorn_n mpn_nior_n mpn_xor_n mpn_xnor_n)
> +	BTI_C
>   
>   ASM_START()
>   PROLOGUE(func)
> +	BTI_C
>   	lsr	x17, n, #2
>   	tbz	n, #0, L(bx0)
>   
> @@ -137,3 +139,4 @@ L(end):	LOGOP(	x12, x6, x10)
>   	stp	x12, x13, [rp]
>   L(ret):	ret
>   EPILOGUE()
> +ADD_GNU_NOTES_IF_NEEDED
> diff --git a/mpn/arm64/lshift.asm b/mpn/arm64/lshift.asm
> index fe8a1aa18..a0cf9a3db 100644
> --- a/mpn/arm64/lshift.asm
> +++ b/mpn/arm64/lshift.asm
> @@ -58,6 +58,7 @@ define(`NSHIFT', lsr)
>   
>   ASM_START()
>   PROLOGUE(mpn_lshift)
> +	BTI_C
>   	add	rp, rp_arg, n, lsl #3
>   	add	up, up, n, lsl #3
>   	sub	tnc, xzr, cnt
> @@ -136,3 +137,4 @@ L(end):	orr	x10, x10, x13
>   	str	x2, [rp,#-24]
>   	ret
>   EPILOGUE()
> +ADD_GNU_NOTES_IF_NEEDED
> diff --git a/mpn/arm64/lshiftc.asm b/mpn/arm64/lshiftc.asm
> index 6bf584400..5880912de 100644
> --- a/mpn/arm64/lshiftc.asm
> +++ b/mpn/arm64/lshiftc.asm
> @@ -58,6 +58,7 @@ define(`NSHIFT', lsr)
>   
>   ASM_START()
>   PROLOGUE(mpn_lshiftc)
> +	BTI_C
>   	add	rp, rp_arg, n, lsl #3
>   	add	up, up, n, lsl #3
>   	sub	tnc, xzr, cnt
> @@ -139,3 +140,4 @@ L(end):	eon	x10, x10, x13
>   	str	x2, [rp,#-24]
>   	ret
>   EPILOGUE()
> +ADD_GNU_NOTES_IF_NEEDED
> diff --git a/mpn/arm64/mod_34lsub1.asm b/mpn/arm64/mod_34lsub1.asm
> index 7945fe72c..ac84675b7 100644
> --- a/mpn/arm64/mod_34lsub1.asm
> +++ b/mpn/arm64/mod_34lsub1.asm
> @@ -62,6 +62,7 @@ ASM_START()
>   	TEXT
>   	ALIGN(32)
>   PROLOGUE(mpn_mod_34lsub1)
> +	BTI_C
>   	subs	n, n, #3
>   	mov	x8, #0
>   	b.lt	L(le2)			C n <= 2
> @@ -122,3 +123,4 @@ L(1):	ldr	x2, [ap]
>   	add	x0, x0, x2, lsr #48
>   	ret
>   EPILOGUE()
> +ADD_GNU_NOTES_IF_NEEDED
> diff --git a/mpn/arm64/mul_1.asm b/mpn/arm64/mul_1.asm
> index fb965efff..87760191d 100644
> --- a/mpn/arm64/mul_1.asm
> +++ b/mpn/arm64/mul_1.asm
> @@ -51,11 +51,13 @@ define(`v0', `x3')
>   
>   
>   PROLOGUE(mpn_mul_1c)
> +	BTI_C
>   	adds	xzr, xzr, xzr		C clear cy flag
>   	b	L(com)
>   EPILOGUE()
>   
>   PROLOGUE(mpn_mul_1)
> +	BTI_C
>   	adds	x4, xzr, xzr		C clear register and cy flag
>   L(com):	lsr	x17, n, #2
>   	tbnz	n, #0, L(bx1)
> @@ -126,3 +128,4 @@ L(2e):	adcs	x12, x8, x11
>   L(1):	adc	x0, x11, xzr
>   	ret
>   EPILOGUE()
> +ADD_GNU_NOTES_IF_NEEDED
> diff --git a/mpn/arm64/popcount.asm b/mpn/arm64/popcount.asm
> index 74de3fc01..4ea179faf 100644
> --- a/mpn/arm64/popcount.asm
> +++ b/mpn/arm64/popcount.asm
> @@ -59,12 +59,14 @@ define(`chunksize',0x1ff0)
>   
>   ASM_START()
>   PROLOGUE(mpn_popcount)
> +	BTI_C
>   
>   	mov	x11, #maxsize
>   	cmp	n, x11
>   	b.hi	L(gt8k)
>   
> -L(lt8k):
> +L(lt8k):	BTI_C
> +
>   	movi	v4.16b, #0			C clear summation register
>   	movi	v5.16b, #0			C clear summation register
>   
> @@ -94,7 +96,8 @@ L(gt4):	ld1	{v2.2d,v3.2d}, [ap], #32	C load 4 limbs
>   L(000):	subs	n, n, #8
>   	b.lo	L(e0)
>   
> -L(chu):	ld1	{v2.2d,v3.2d}, [ap], #32	C load 4 limbs
> +L(chu):	BTI_C
> +	ld1	{v2.2d,v3.2d}, [ap], #32	C load 4 limbs
>   	ld1	{v0.2d,v1.2d}, [ap], #32	C load 4 limbs
>   	cnt	v6.16b, v2.16b
>   	cnt	v7.16b, v3.16b
> @@ -155,3 +158,4 @@ L(gt8k):
>   	mov	x30, x8
>   	ret
>   EPILOGUE()
> +ADD_GNU_NOTES_IF_NEEDED
> diff --git a/mpn/arm64/rsh1aors_n.asm b/mpn/arm64/rsh1aors_n.asm
> index afd3d5be4..17487d5d7 100644
> --- a/mpn/arm64/rsh1aors_n.asm
> +++ b/mpn/arm64/rsh1aors_n.asm
> @@ -56,9 +56,11 @@ ifdef(`OPERATION_rsh1sub_n', `
>     define(`func_n',	mpn_rsh1sub_n)')
>   
>   MULFUNC_PROLOGUE(mpn_rsh1add_n mpn_rsh1sub_n)
> +	BTI_C
>   
>   ASM_START()
>   PROLOGUE(func_n)
> +	BTI_C
>   	lsr	x6, n, #2
>   
>   	tbz	n, #0, L(bx0)
> @@ -166,3 +168,4 @@ L(2):	cset	x14, COND
>   L(ret):	mov	x0, x10
>   	ret
>   EPILOGUE()
> +ADD_GNU_NOTES_IF_NEEDED
> diff --git a/mpn/arm64/rshift.asm b/mpn/arm64/rshift.asm
> index 90187ad51..d3fc16301 100644
> --- a/mpn/arm64/rshift.asm
> +++ b/mpn/arm64/rshift.asm
> @@ -58,6 +58,7 @@ define(`NSHIFT', lsl)
>   
>   ASM_START()
>   PROLOGUE(mpn_rshift)
> +	BTI_C
>   	mov	rp, rp_arg
>   	sub	tnc, xzr, cnt
>   	lsr	x17, n, #2
> @@ -134,3 +135,4 @@ L(end):	orr	x10, x10, x13
>   	str	x2, [rp,#32]
>   	ret
>   EPILOGUE()
> +ADD_GNU_NOTES_IF_NEEDED
> diff --git a/mpn/arm64/sec_tabselect.asm b/mpn/arm64/sec_tabselect.asm
> index 18a268ace..d671b6f74 100644
> --- a/mpn/arm64/sec_tabselect.asm
> +++ b/mpn/arm64/sec_tabselect.asm
> @@ -57,6 +57,7 @@ define(`maskq',  `v4')
>   
>   ASM_START()
>   PROLOGUE(mpn_sec_tabselect)
> +	BTI_C
>   	dup	v7.2d, x4			C 2 `which' copies
>   
>   	mov	x10, #1
> @@ -120,3 +121,4 @@ L(tp1):	cmeq	maskq.2d, v5.2d, v7.2d
>   
>   L(b00):	ret
>   EPILOGUE()
> +ADD_GNU_NOTES_IF_NEEDED
> diff --git a/mpn/arm64/sqr_diag_addlsh1.asm b/mpn/arm64/sqr_diag_addlsh1.asm
> index 39f1cb1bc..599717d3a 100644
> --- a/mpn/arm64/sqr_diag_addlsh1.asm
> +++ b/mpn/arm64/sqr_diag_addlsh1.asm
> @@ -46,6 +46,7 @@ define(`n',  `x3')
>   
>   ASM_START()
>   PROLOGUE(mpn_sqr_diag_addlsh1)
> +	BTI_C
>   	ldr	x15, [up],#8
>   	lsr	x14, n, #1
>   	tbz	n, #0, L(bx0)
> @@ -100,3 +101,4 @@ L(end):	extr	x9, x6, x5, #63
>   
>   	ret
>   EPILOGUE()
> +ADD_GNU_NOTES_IF_NEEDED
> diff --git a/mpn/m4-ccas b/mpn/m4-ccas
> index 16d80c6f5..1d68bfe8b 100755
> --- a/mpn/m4-ccas
> +++ b/mpn/m4-ccas
> @@ -49,6 +49,8 @@ CC=
>   DEFS=
>   ASM=
>   SEEN_O=no
> +M4_GENPATH=
> +M4_GENERATED=
>   
>   for i in "$@"; do
>     case $i in
> @@ -73,6 +75,9 @@ for i in "$@"; do
>         SEEN_O=yes
>         CC="$CC $i"
>         ;;
> +    --m4-gen-path=*)
> +      M4_GENPATH=`echo "$i" | sed 's/^--m4-gen-path=//'`
> +      ;;
>       *)
>         CC="$CC $i"
>         ;;
> @@ -97,11 +102,23 @@ if test $SEEN_O = no; then
>     CC="$CC -o $BASENAME.o"
>   fi
>   
> -echo "$M4 $DEFS $ASM >$TMP"
> -$M4 $DEFS $ASM >$TMP || exit
> +# Does the architecture have any dynamically generated m4?
> +# if so execute the generation script
> +if test -n "$M4_GENPATH"; then
> +  if ! test -f "$M4_GENPATH"; then
> +    echo "$M4_GENPATH not found."
> +    exit 1
> +  fi
> +  echo "$M4_GENPATH \"$CC\""
> +  M4_GENERATED="${TMP%.*}.m4"
> +  "$M4_GENPATH" "$CC" > "$M4_GENERATED" || exit
> +fi
> +
> +echo "$M4 $DEFS $M4_GENERATED $ASM >$TMP"
> +$M4 $DEFS "$M4_GENERATED" $ASM >$TMP || exit
>   
>   echo "$CC"
>   $CC || exit
>   
>   # Comment this out to preserve .s intermediates
> -rm -f $TMP
> +rm -f $TMP "$M4_GENERATED"


Bump... anything?




More information about the gmp-devel mailing list