There is a little complication: Pipelined and superscalar processors,
i.e., RISC and some modern CISC machines can process independent
instructions while waiting for the results of an instruction. The
compiler usually reorders (schedules) the instructions in a way that
achieves good usage of these delay slots. However, on our first tries
the compiler did not do well on scheduling primitives. E.g., for
+
implemented as
n=sp[0]+sp[1]; sp++; sp[0]=n; NEXT;
the NEXT comes strictly after the other code, i.e., there is nearly no
scheduling. After a little thought the problem becomes clear: The
compiler cannot know that sp and ip point to different addresses (and
the version of gcc
we used would not know it even if it was
possible), so it could not move the load of the cfa above the store to
the TOS. Indeed the pointers could be the same, if code on or very near
the top of stack were executed. In the interest of speed we chose to
forbid this probably unused "feature" and helped the compiler in
scheduling: NEXT is divided into the loading part (NEXT_P1
) and
the goto part (NEXT_P2
). +
now looks like:
n=sp[0]+sp[1]; sp++; NEXT_P1; sp[0]=n; NEXT_P2;
This can be scheduled optimally by the compiler.
This division can be turned off with the switch -DCISC_NEXT
. This
switch is on by default on machines that do not profit from scheduling
(e.g., the 80386), in order to preserve registers.
Go to the first, previous, next, last section, table of contents.