sub- in- routine direct direct switch call CPU, Machine, gcc 6.2 4.2 4.7 5.1 8.9 Xeon 5160 3GHz, gcc 4.1.2 20061115 8.6 14.2 15.3 23.4 30.2 Pentium 4 2.26, gcc-2.95.3 4.9 5.6 4.3 5.1 7.64 Pentium M 755 2GHz, gcc-3.4.2 4.7 8.1 9.5 19.0 21.3 Pentium III 1000MHz, gcc-2.95.3 18.5 8.8 10.9 24.8 27.9 Opteron 240 1400MHz, gcc-2.95.3 (32-bit code) 18.4 8.5 10.3 24.5 29.0 Athlon (Thunderbird) 1200MHz, gcc-2.95.1 4.3 9.0 11.0 11.7 12.5 K6-2 333MHz, gcc-3.2.2 -fno-reorder-blocks 13.3 10.3 12.3 15.7 18.7 Itanium 2 (McKinley) 900MHz, rx2600, gcc-3.3 -fno-crossjumping 9.6 8.0 9.5 23.1 38.6 Alpha 21264B 800MHz, UP1500, gcc-2.95.2 7.9 8.4 9.9 18.2 18.0 Alpha 21164A 600MHz, 164LX, gcc-2.95.2 7.8 8.7 10.7 18.5 16.9 Alpha 21164PC 533MHz, SX164, gcc-3.3.2 -fno-reorder-blocks 7.2 9.6 12.0 24.6 19.8 Alpha 21064a, 300MHz, AlphaPC64, gcc-2.95.1 7.8 12.8 12.9 30.2 39.0 PPC 970 2000MHz, PowerMac G5, gcc-2.95 (32 bit) 5.7 9.2 12.3 16.3 17.9 PPC 7447A 1066MHz, iBook G4, gcc-2.95 4.2 5.8 7.7 11.3 11.3 PPC 7400 450MHz, PowerMac G4, gcc-3.3.2 -fno-reorder-blocks 5.7 7.2 9.2 13.6 13.4 PPC 604e 200MHz, Mac 7500, gcc-2.95.2 5.8 10.8 13.7 20.2 46.1 PA8500 360MHz, HP 9000/L2000, gcc-3.2 -fno-reorder-blocks 14.2 14.6 18.7 25.3 72.5 PA8200 240MHz, HP 9000/K260, gcc-2.95.2 7.3 11.4 7.9 11.4 20.4 PA7100LC 64MHz, HP 9000/816, gcc-3.3.2 -fno-reorder-blocks 6.2 6.1 8.0 14.6 12.6 MicroSPARC II 110MHz, gcc-3.3.2 -fno-reorder-blocks 10 11 14 21 29 UltraSPARC T1 1GHz, Sun Fire T1000, gcc-4.0.2 17.53 8.3 10.4 14.7 17.4 MIPS R10000 195MHz, SGI PowerChallenge, gcc-2.8.1 13.0 5.6 7.6 17.7 18.4 MIPS R3000 20MHz, DecStation 5000/120, gcc-3.3.1 -fno-reorder-blocks 7.0 5.7 6.8 9.8 13.5 strongARM SA-1110, iPAQ 3650, gcc-2.95.4 20010703 8.8 8.2 10.9 14.5 19.5 XScale IOP321, Iyonix, gcc 4.1.2 20061115 Comparison of compilers and options: 18.9 16.4 18.6 24.6 35.1 Opteron 240 1400MHz, gcc-3.3.1 -fno-crossjumping 16.0 15.7 18.8 21.4 37.1 Opteron 240 1400MHz, gcc-3.3.1 -fno-crossjumping -m32 (32-bit code) 18.5 8.8 10.9 24.8 27.9 Opteron 240 1400MHz, gcc-2.95.3 (32-bit code) 18.4 8.5 10.3 24.5 29.0 Athlon (Thunderbird) 1200MHz, gcc-2.95.1 21.5 18.0 21.8 24.5 30.5 Athlon (Thunderbird) 900MHz, gcc-3.2.2 -fno-reorder-blocks
wget http://www.complang.tuwien.ac.at/forth/threading/threading.tar.gz gzip -cd threading.tar.gz|tar xf - cd threading makeYou can compute the cycles per dispatch as
cycles = measured user time * clock frequency in MHz / 100If you want to provide options to the compiler, do this through the
make variables CC (to provide additional options) or
CFLAGS (to override the default options -O
-fomit-frame-pointer):
make CC="gcc -V2.95"
The V2 switch dispatch benchmark is written in a way that introduces a jump back to the dispatch code on the gcc versions we have tested, which is what will happen in real-world interpreters (whereas in the V1 version gcc optimized this jump away for 90% of the executed dispatches).
There is still the problem of the benchmark not being realistic wrt the non-dispatch work done. This work costs different amounts of time on different machines, and with different dispatch techniques (much of this work can be done in parallel with the dispatch work on some machines and with some dispatch techniques).
main() is put after the other routines. Using the
restricted convention gives significant speedups: 5.97 cycles on the
21064a, 5.22 cycles on the 21164a, 4.6 cycles on the 21264. Which
convention is more appropriate depends on the system (for RAFTS/Alpha
we will probably use the restricted convention).
Name Last modified Size Description
Parent Directory -
Makefile 13-May-2007 10:35 792
call.c 02-Sep-2003 18:18 480
direct.c 03-Sep-2003 11:53 517
indirect.c 03-Sep-2003 11:53 562
repl-switch.c 13-May-2007 10:56 624
subroutine.c 02-Sep-2003 18:32 477
switch.c 02-Sep-2003 22:17 642
threading.tar.gz 13-May-2007 10:58 1.5K