21Jan91 I have finally been able to spend a little time with ForST, John Redmond's 68000 subroutine-threaded Forth (Forth Dimensions XII #2 et seq.). I found some peculiarities (one of the oddest is that DEPTH returns negative values!) and some "surprise features" (I hesitate to say "bugs" without documentation of the author's intent). My primary interest was to get some sense of the effectiveness of the "native code" compiler and optimizer. Redmond offers a benchmark, the familiar sieve algorithm, so as a comparison I fed the source into two conventional 68000 Forth implementations that I use frequently and got the following results on ten iterations: F32: 36.28 sec. STP_F83: 23.72 sec. ForST with CALLS: 17.86 sec. ForST with MACROS: 5.74 sec. (F32 is a conventional, absolute-address 32-bit ITC implementation, and STP_F83 is a position-independent 16-bit ITC implementation. Both are based on the Laxen & Perry F83 model, and both use byte-wise "fetch" and "store" to avoid the need for address alignment of data. Times were measured using the Atari's 200 Hz system clock.) This is impressive, but one aspect of the sieve benchmark annoys me: it is essentially a single colon definition built almost totally from Forth "primitives" -- that is, words which are code definitions in most Forths -- and arguably a "best case" test for ForST. Casting about for a "better" benchmark, I chanced on the "Eight Queens" (see LeVan, Forth Dimensions II #1 p.6, and my adaptation below). This looks more like the Forth code I usually read and write -- several colon definitions, a little bit of CREATE-DOES, and a moderate level of nesting. The results (with output off): F32: 10.00 sec. STP_F83: 11.35 sec. ForST with CALLS: 12.38 sec. ForST with MACROS: 10.04 sec. As the people in that annoying commercial scream, "Where's my big savings?" Obviously, something is wrong here, and fortunately it is so obviously wrong that I didn't just accept the results and throw my ForST disk away. A little investigation showed that ForST is wasting a lot of time in the array accesses, and digging deeper revealed that all of the waste can be blamed on the multiplication in SWAP CELLSIZE * + It turns out that ForST's signed multiply operator does a massive amount of quite unnecessary work (read the ForST source for details) while F32 has a nice, compact code definition. That determined, I evened the score by replacing CELLSIZE * with two repetitions of 2* (concisely coded in all implementations). The results: F32: 8.90 sec. STP_F83: (omitted) ForST with CALLS: 7.23 sec. ForST with MACROS: 3.77 sec. Not quite as dramatic as the sieve results, but a very useful degree of improvement, and, I submit, a plausible estimate of what can be expected in practice. (I'll discount the malign multiplication as the sort of thing that falls out in beta test.) If ForST were a full-featured system (with vocabularies, block support, a multitasker, and a metacompiler), I'd switch now. Thanks, John. Wilson M. Federici GEnie W.FEDERICI Compuserve 74756,2716 ------------------------------------------------------------ (Eight Queens benchmark, "slow" version) ( queens32 19Jan91 wmf ) ( Eight-queens problem, based on LeVan - F.D. II #1 p.6 ) ( Modified for benchmarking portability. ) decimal 4 constant cellsize \ change to 2 for 16-bit system! 0 constant falseflag -1 constant trueflag 8 constant boardsize : iarray ( n -- ) create cellsize * allot does> ( n -- adr ) swap cellsize * + ; boardsize iarray ax boardsize 2 * iarray bx boardsize 2 * iarray cx boardsize iarray xx : safe? ( n1 n2 -- f ) dup ax @ rot rot ( -- ax[n2] n1 n2 ) over over + bx @ rot rot ( -- ax[n2] bx[n1+n2] n1 n2 ) swap - boardsize 1- + cx @ ( -- ax[n2] bx[n1+n2] cx[n2-n1+7] ) and and ( -- f ) ; : mark ( n1 n2 -- ) dup ax falseflag swap ! ( -- n1 n2 ) over over + bx falseflag swap ! ( -- n1 n2 ) swap - boardsize 1- + cx falseflag swap ! ( -- ) ; : unmark ( n1 n2 -- ) dup ax trueflag swap ! ( -- n1 n2 ) over over + bx trueflag swap ! ( -- n1 n2 ) swap - boardsize 1- + cx trueflag swap ! ( -- ) ; variable printsol? falseflag printsol? ! variable tries : printsol ( -- ) printsol? @ if ." Found on try " tries @ 6 .r cr boardsize 0 do i xx @ 1+ 5 .r loop cr then ; : try ( 0 -- ) ( -- recursion count ) boardsize 0 do 1 tries +! ( key? if quit then ) dup i safe? if dup i mark dup i swap xx ! dup boardsize 1- < if dup 1+ recurse else printsol then dup i unmark then loop drop ; : unmark-all ( -- ) boardsize 0 do trueflag i ax ! trueflag i xx ! loop boardsize 2 * 0 do trueflag i bx ! trueflag i cx ! loop ; : do-it ( -- ) 0 tries ! unmark-all 0 ( counter ) try ; ------------------------------------------------------------