I notice many of the umul_ppmm implementations in longlong.h do not use the commutative constraint for inputs, e.g., *"%r" (m0), "r" (m1)* Is there a reason for this? It would seem to deny the compiler better register scheduling opportunities. e.g., ARM / AArch64.