Some secondary asm T3,T4,T5 functions

Torbjorn Granlund tg at gmplib.org
Thu Apr 4 12:50:04 CEST 2013


David Miller <davem at davemloft.net> writes:

  Attached is a dive_1.asm that works for me on real hardware as
  well as T4 timings from:
  
  tune/speed -p10000000 -s1-1000 -f1.1 -C mpn_divexact_1.3
  
This timing is most curious.  The cost of inversion computation should
be clearly visible for tiny sizes; I'd expect more than 43 cycles there.
There are 6 chained mulx instructions for the inversion!

And then performance hits about 20 c/l early, only to drop to something
much worse.

Have you seen anything similar for other routines?

  overhead 6.00 cycles, precision 10000000 units of 3.51e-10 secs, CPU freq 2847.41 MHz
          mpn_divexact_1.3
  1             43.0004
  2             26.9174
  3             22.7782
  4             20.5837
  5             19.4004
  6             20.1670
  7             20.0004
  8             20.2504
  9             19.8338
  10            19.7004
  11            19.8186
  12            20.0008
  13            20.3851
  14            20.7862
  15            21.1338
  16            21.4380
  17            21.7067
  18            21.9450
  19            22.1584
  20            22.3505
  22            22.6828
  24            22.9593
  26            23.1933
  28            23.3939
  30            30.7669
  33            29.5761
  36            29.2781
  39            29.0260
  42            28.8098
  46            28.5655
  50            28.3603
  55            28.1458
  60            27.9670
  66            27.7882
  72            27.6392
  79            27.4940
  86            27.3724
  94            27.2556
  103           27.1459
  113           27.0446
  124           26.9519
  136           26.8680
  149           26.7922
  163           26.7242
  179           26.6595
  196           26.6023
  215           26.5491
  236           26.5003
  259           26.4559
  284           26.4158
  312           26.3785
  343           26.3443
  377           26.3133
  414           26.2853
  455           26.2596
  500           26.2363
  550           26.2148
  605           26.1953
  665           26.1778
  731           26.1617
  804           26.1471
  884           26.1338
  972           26.1218

-- 
Torbjörn


More information about the gmp-devel mailing list