Transcript of a GEnie Forth RoundTable Conference with Mike Haas, network specialist and author of JForth, the JSR threaded Forth for Amiga. Mike's topic, "JSR Threaded Forth". The entire contents of this transcript are Copyright (c) 1991 GEnie Forth RoundTable. The contents may be freely copied and distributed in whole or in part provided origination credit is included. Date: 12/12/91 Time:21:45 EST Attendees: <[Mike-H] FIGGUEST> <-- Guest, Mike Haas, author JForth <[Host SysOp] GARY-S> <-- Moderator SysOp <[Elliott] ELLIOTT.C> <[wheels] S.WHEELER> <[Perry] P.MITCHELL9> <[Len] NMORGENSTERN> <[IRV] I.MONTANEZ> <[Dennis] D.RUFFER> <[Rob] R.ANDRE> <[Prez of Vice] JAX> <[Phil] PLBURK> <[side] M.CHRISTOPH2> Minutes: <[Host SysOp] GARY-S> The GEnie Forth RoundTable is pleased to welcome as tonight's guest, Mike Haas, employed as a Mac networking developer for Starnine and better known to Forthers as author of JSR threaded JForth for Amiga. Mike originally built his own computer (6800-based, 64k, 4 disk dirves, etc..) needed software to run on it, bought the fig-forth listing for $15, and typed it in. Eventually that became his full multi-tasking OS, including cross-compiler, assembler all written in forth. As the writer of Jforth, Mike has tried to implement many modern concepts (files, creation of small standalone executables (small as 3K!), wanting Forth to become competitiveg in the software development arena. Please welcome our special guest, Mike Haas. <[Mike-H] FIGGUEST> hey guys. first, I'd like to say that I'm not supposed to make this a commercial for JForth. So, even though I'm going to talk about it, don't anyone BUY IT! :-) JForth has been around about 5 years, and is currently the only one on the Amiga. It employees (primarily) JSR threading. It also caches the top-of-stack in a 68xxx data register. These two concepts put together some pretty fast code. Using assembly for the execution code (vs.indirect-threaded stuff) lends itself to some pretty interesting things, such as being able to write optimizers for the compiler. One of the most interesting & unique features of a JSR threaded forth is that it is a true compiler.It interprets source and ALWAYS builds assembly directly executable by the hardware. The 'inner interpreter' or NEXT-loop is the actual MICRO-CODE of the cpu! When conventional forths execute their 'programmed steps', they spend some amount of processing just finding the next 'instruction'. (sorry if this is too basic for some, but I'm not real clear on the experience level here). anyway, an indirect-threaded forth (which the great majority are) use several CPU instruction in sequencing through their programs. This is because they are all SIMULATING an execution environment one that is independent of the hard- ware they are actually running on. By contrast, a JSR-threaded forth takes advantage of the fact that there is a "sequencer" built-into the CPU itself this 'strips' away that extra level of indirection between the high-level forth code and the actual execution of the CPU instructions (which, after all, is all you ultimately have). Lets look at a simple short word such as : foo dup @ ; When this word is entered by a conventional inner interpreter, it has to push on the return stack all the info about the word that called it. This includes standard forth registers like IP (the place in the calling word we will return to), W (the 'beginning of that particular word) and others. Then FOO takes over if it doesn't call other high level words (and how many forth words don't do that?), things proceed relatively quick. in between the dup and the @ the process has to 'unwind' all this overhead takes time and doesn't directly contribute to the function of FOO to save a parameter (dup) and then read it (@). JSR threading comes into play by actually compiling CPU instructions to do the DUP followed by more to do the @, instead of just references to dup & @. The result is a much faster 'thread' of the program all CPU instructions executed contribute directly to "getting the job done". Actually, pure JSR threading would put together code something like... JSR dup JSR @ RTS and even the hardware doing all those JSR's and RTS's (which take time since the CPU also has to save it's return addresses) can be improved. This is done in JForth by compiling such short functions completely 'inline'. As I said at the beginning, JForth caches the top-of-stack in a cpu register (d7) and, like other forths, keeps the remaining stack parameters in memory. So to DUP an item means simply "write d7 to memory, pre-decremented". For this reason, a DUP in JForth is one cpu instruction... MOVE.L D7,-(A6) (of course, A6 points to the rest of the 'stack'). Similarly, the @ is also one instruction put the two together... MOVE.L D7,-(a6) MOVE.L 0(A4,D7),D7 The A4 reference is because JForth keeps it's addresses relative to itself it's base address is in A4. So A4 and D7 are added together, and the CPU reads the data at that location into D7. Add an RTS to the above, and you have the entire code that the JForth compiler will put together for the above FOO word. No wasted instructions. To make this even faster, JForth will allow you to compile those twoinstructions INLINE (minus if course the RTS) when a reference to FOO is made. To illustrate...assume another word... : FOO2 FOO + ; The JForth compiler will generate... MOVE.L D7,-(a6) move.l 0(a4,d7),d7 add.l (a6)+,d7 Notice that items are 'dropped' efficiently via the post- increment mode of the 68xxx family. Any questions? <[Host SysOp] GARY-S> Phil Burk has joined us. Phil is co-author of JForth, but wishes to participate only as a spectator at this time. <[IRV] I.MONTANEZ> Is version 2.0 of JForth the most current? <[Mike-H] FIGGUEST> Yes, soon 3.0 will be released with many AmigaDOS 2.0 features supported. <[side] M.CHRISTOPH2> This J forth should be competitibve with C for speed? <[Mike-H] FIGGUEST> Yes, it is VERY competitive with C for speed...the sieve of eros-u-know-what on a 7.16mhz 6800 finishes in about 8 seconds. <[IRV] I.MONTANEZ> Have you added any enhancements to the JForth Target Compiler <[Mike-H] FIGGUEST> CLONE (the target compiler) now creates code about twice as fast as 2.0. It also support interrupt code, allowing to to force BSR's (cause A4 ain't set up) other enhancements. I meant that CLONE actually runs TWICE as fast itself the code it creates runs about the same speed. <[wheels] S.WHEELER> A few short (I hope) questions ... Have you found any Forth constructs which will effectively compile into a DBcc loop? Do you do peephole optimization once code is compiled? How many levels to the optimizer? <[Mike-H] FIGGUEST> no DBcc is limited to 16-bit loop iterations of course anyone could create their own special DO LOOPs for this purpose, and there's always assembly. What do you mean by peepholing (not done yet). The optimizer acts to cache MORE than one stack parameter... up to 5 if memory serves me right. <[Host SysOp] GARY-S> Steve - any follow on peepholing? <[wheels] S.WHEELER> By peepholing, I mean do you optimize a sequence where one word's code may end in a "drop" and the next's starts with a "dup", and if you don't optimize to "move.l (a6), tos" you have "move.l (a6)+,tos" followed by... "move.l tos,-(a6)". <[Mike-H] FIGGUEST> yes, all of the standard kernal words do this we call it HEAD-TAIL optimization. <[Host SysOp] GARY-S> Phil has a comment to add regarding optimization. <[Phil] PLBURK> Mike already mentioned the HEAD-TAIL, there is a second optimizer that is optional that does global optimization using as many registers as it can get. Once parameters are in registers then words like SWAP and ROT may only have an effect at compile and not actually generate any code. The compiler just shuffles its table to where it keeps the stack items. <[Mike-H] FIGGUEST> Anoother level of optimization that is in effact all the time is with conditionals. When in assembly, the compiler is not used unless you intentionally invoke it so no optimization is done. Normally, you are free to use A0 & A1, D0-D4. <[side] M.CHRISTOPH2> Does this imply that there are no free registers while doing code words? do all regs need to be pusshed? Not Clear, if the compiler can let the registers dance around you would never know which ones are free. Does the compiler expect the registers to stay correct across other words (code words)? <[Mike-H] FIGGUEST> No. When a call is made to any other word, the regs are flushed. This is mainly good for long series of short words. Note that the compiler doesn't look for specific sequences of words it will optimize ANY series as long as the words being referenced are known "optimizable". <[Len] NMORGENSTERN> Do you have a version for other 68xxx machines such as the Mac? <[Mike-H] FIGGUEST> Not currently. There are other packages available for the Mac, some even are JSR threaded. Phil himself has written one that he includes in th Mac version of HMSL. But no other forth I know of includes the level of optimization and features like CLONE. <[Host SysOp] GARY-S> Mike, you mentioned several packages to me, why don't you mention a few now, and give folks your phone/address/ e-mail. <[Mike-H] FIGGUEST> Well, MACH2 comes to mind...Phil has used that extensively. MacForth is not (i believe) jsr threaded, but it's another Mac 4th package. One PD forth that include object orientation is YERK (mac only, formerly known as NEON). My e-mail address is mikeh@starnine.com Delta Research's address (makers of JForth...that's us) is PO Box 1051, San Rafael CA, 94915 <[Host SysOp] GARY-S> No - I meant goodies from Delta besides JForth - don't you have some things that are turn key ? <[Mike-H] FIGGUEST> I have released several PD products...moist notably TEXTRA, a user-friendly editor that includes an AREXX interface, and LCD CALCULATOR all written in JForth and CLONEd. <[Phil] PLBURK> Mike mentioned HMSL which is a music programming language based on Forth. The Forth is similar to JForth except that I don't cache Top of Stack in D7 and I use absolute addresses instead of addresses relaitive to the base of Forth. Caching TOS is better and I regret not caching but I prefer absolute addresses, I was just experimenting to see what the differences were like. If anyone wants other HMSL info, I'm at phil@mills.edu <[Host SysOp] GARY-S> Mike , give us your closing remarks please <[Mike-H] FIGGUEST> In JForth, we tried to implement state-of-the-art programming techniques more familiar to users languages such as C. I firmly believe that Forth needs to progress more along these lines...creation of standalone programs, file & memory interfaces, etc. Hopefully, we will move in that direction. Keeping Forth the BEST environment there is. <[Host SysOp] GARY-S> Mike (and Phil) thanks for an informative meeting. Many of us are versed primarily in Indirect and some Direct Threaded code, so this was a treat. Thanks for coming. <[Host SysOp] GARY-S> This meeting is officially closed. All may stay and Chat. === END ===