  I cooked a modern alternative:

I went ahead and committed that version, replacing the old
HGCD2_METHOD=2.  I expect it is be the fastest method on some platform.

(We might want to arrange for longlong.h to use lzcnt instead of bsr for
modern AMD processors; the initial two count_leading_zeros would
terminate in one cycle instead of 8 thereby!)

