Best way to carry on 2-input architecture?
Niels Möller
nisse at lysator.liu.se
Sun Aug 17 20:00:57 UTC 2014
tg at gmplib.org (Torbjörn Granlund) writes:
> My work is available here: https://gmplib.org/~tege/fisa.pdf
And mine at http://www.lysator.liu.se/~nisse/misc/instr16.pdf (a toy
project, trying to squeeze a decent instruction set with 16 general
purpose registers into 16-bit opcodes).
> I think a separate carry flag might not be good for modern designs.
> Carry/borrow state is better to keep in a plain register.
Maybe. I think the main reason I decided to keep a conditional flag
(for carry as well as for other purposes) was that otherwise, I had
trouble fitting all the needed conditional branch and conditional move
instructions.
If you have the opcode space, I think it also makes some sense to
provide a couple of single-bit flags, say 2-4 of them, and let operations
handling carry, as well as other comparisons and conditionals, specify
which flag (if any) they want to operate on.
I'm not familiar with implementing cpu's (I'd like to learn some day),
but I imagine that additional read/write ports to a couple of flag bits
are cheaper than additional read/write ports to the much wider register
file.
> For multiply, d = (a * b + c) mod B (B being the word base) and d = [(a
> * b + c) / B] are very useful. Again, the encoding and read port
> problem might arise.
Or extending that a bit, one needs an efficient way to compute a*b + c +
d, where the inputs are single-word values, and the result always fits
in two words, with no overflow. E.g., on ARM there's the "umaal"
instruction; to be really useful, it has to be implemented so that it
can start the multiply as soon as a and b are available, with the data
dependency on c and d happening later on in the pipeline. It doesn't
have to be a *single* instruction, of course, as long as it can be
efficiently implemented with available instructions.
A bit more specialized, if you don't have a divide instruction, you want
to be able to do division via multiplication. One then needs an
efficient way to compute the approximate reciprocal,
v = floor ((B^2 - 1) / d) - B,
with input B/2 <= d < B, and where the output v always fits in a single
word. See https://gmplib.org/~tege/division-paper.pdf. And a
count-leading-zeros instruction in order to handle arbitrary,
un-normalized, divisors (count-trailing-zeros is useful too (but it can
be implemented in terms of count-leading-zeros).
Regards,
/Niels
--
Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.
More information about the gmp-devel
mailing list