Best way to carry on 2-input architecture?

Niels Möller nisse at
Mon Aug 18 12:35:41 UTC 2014

"Wesley W. Terpstra" <wesley at> writes:

> On Mon, Aug 18, 2014 at 9:01 AM, Niels Möller <nisse at> wrote:
>> I don't think it has to be that bad. First, the prefix flag and register
>> should be saved and restored on irq, so there should be no problem with
>> irq:s or page faults and the like in the "middle" of an instruction.
> Yes, that's the advantage. Keep in mind, though, that dealing with
> variable-length instructions is a well understood and not-so-difficult
> problem. I just need to report the PC as being at the start of the
> prefix-chain. This change is local to the decoder.

I think the decoder could implement the prefix instruction, as I've
defined it, in that way, treating a sequence of prefix instructions +
non-prefix instruction as an indivisible longer instruction. Supervisor
mode/kernel mode code might need to know if there really is a prefix
register or not, but otherwise, it's an implementation detail not
visible to user code.

> The value in the decoder is ahead of the values seen in the execution
> units. If an exception occurs, you need to be able to rewind/reset the
> value in the decoder to the state it would have had if execution had
> gone to the correct destination at that point.

How to deal with exceptions in an out-of-order cpu is a bit of a mystery
to me. We're drifting off-topic, but if you can educate me a bit on
that, I'd appreciate it.

E.g., for a page fault at instruction fetch, or an external irq, it
seems reasonably simple to stop decoding and issuing any new
instructions, then wait until all previously issued instructions have
completed, and at that point transfer control to the exception handler.
But if you get a page fault from a reordered load or store, or some
other exception associated with the execution of a particular
instruction, how do you stop the instruction flow at the correct point
before the control transfer to the handler? Thinking aloud, it seems one
needs to somehow

(1) cancel execution of all later (in instruction order) instructions,
    or discard any results or exceptions they might generate.

(2) complete all earlier (in instruction order) instructions. And in
    case one of those generates another exception, you need to "rewind"
    further and forget the original exception and its corresponding

and then wait until the dust settles, with no pending instructions in
the machine, and ready to handle the first (in instruction order)
exception. And one would need particular attention to stores, or other
instructions with side effects.

> That said, from what you describe, it sounds to me like they've
> actually decomposed the FMA into two micro-ops.

I also don't know the ARM internals. But short latency between carry in
and carry out is important to make the umaal and umlal instructions
useful for bignum multiplication.


Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.

More information about the gmp-devel mailing list