Diff for /gforth/doc/vmgen.texi between versions 1.1 and 1.7

version 1.1, 2002/05/16 09:07:29 version 1.7, 2002/08/08 08:33:06
Line 3 Line 3
 @c @ifnottex  @c @ifnottex
 This file documents vmgen (Gforth @value{VERSION}).  This file documents vmgen (Gforth @value{VERSION}).
 @section Introduction  @chapter Introduction
 Vmgen is a tool for writing efficient interpreters.  It takes a simple  Vmgen is a tool for writing efficient interpreters.  It takes a simple
 virtual machine description and generates efficient C code for dealing  virtual machine description and generates efficient C code for dealing
Line 29  machine code. Line 29  machine code.
 @end itemize  @end itemize
 Such a division is usually used in interpreters, for modularity as well  Such a division is usually used in interpreters, for modularity as well
 as for efficiency reasons.  The virtual machine code is typically passed  as for efficiency.  The virtual machine code is typically passed between
 between front end and virtual machine interpreter in memory, like in a  front end and virtual machine interpreter in memory, like in a
 load-and-go compiler; this avoids the complexity and time cost of  load-and-go compiler; this avoids the complexity and time cost of
 writing the code to a file and reading it again.  writing the code to a file and reading it again.
Line 84  Replicating VM (super)instructions for b Line 84  Replicating VM (super)instructions for b
 As a result, vmgen-based interpreters are only about an order of  As a result, vmgen-based interpreters are only about an order of
 magintude slower than native code from an optimizing C compiler on small  magintude slower than native code from an optimizing C compiler on small
 benchmarks; on large benchmarks, which spend more time in the run-time  benchmarks; on large benchmarks, which spend more time in the run-time
 system, the slowdown is often less (e.g., the slowdown over the best JVM  system, the slowdown is often less (e.g., the slowdown of a
 JIT compiler we measured is only a factor of 2-3 for large benchmarks  Vmgen-generated JVM interpreter over the best JVM JIT compiler we
 (and some other JITs were slower than our interpreter).  measured is only a factor of 2-3 for large benchmarks; some other JITs
   and all other interpreters we looked at were slower than our
 VMs are usually designed as stack machines (passing data between VM  VMs are usually designed as stack machines (passing data between VM
 instructions on a stack), and vmgen supports such designs especially  instructions on a stack), and vmgen supports such designs especially
 well; however, you can also use vmgen for implementing a register VM and  well; however, you can also use vmgen for implementing a register VM and
 still benefit from most of the advantages offered by vmgen.  still benefit from most of the advantages offered by vmgen.
 @section Why interpreters?  There are many potential uses of the instruction descriptions that are
   not implemented at the moment, but we are open for feature requests, and
   we will implement new features if someone asks for them; so the feature
   list above is not exhaustive.
   @c *********************************************************************
   @chapter Why interpreters?
   Interpreters are a popular language implementation technique because
   they combine all three of the following advantages:
   @item Ease of implementation
   @item Portability
   @item Fast edit-compile-run cycle
   @end itemize
   The main disadvantage of interpreters is their run-time speed.  However,
   there are huge differences between different interpreters in this area:
   the slowdown over optimized C code on programs consisting of simple
   operations is typically a factor of 10 for the more efficient
   interpreters, and a factor of 1000 for the less efficient ones (the
   slowdown for programs executing complex operations is less, because the
   time spent in libraries for executing complex operations is the same in
   all implementation strategies).
   Vmgen makes it even easier to implement interpreters.  It also supports
   techniques for building efficient interpreters.
   @c ********************************************************************
   @chapter Concepts
   @c --------------------------------------------------------------------
   @section Front-end and virtual machine interpreter
   @cindex front-end
   Interpretive systems are typically divided into a @emph{front end} that
   parses the input language and produces an intermediate representation
   for the program, and an interpreter that executes the intermediate
   representation of the program.
   @cindex virtual machine
   @cindex VM
   @cindex instruction, VM
   For efficient interpreters the intermediate representation of choice is
   virtual machine code (rather than, e.g., an abstract syntax tree).
   @emph{Virtual machine} (VM) code consists of VM instructions arranged
   sequentially in memory; they are executed in sequence by the VM
   interpreter, except for VM branch instructions, which implement control
   structures.  The conceptual similarity to real machine code results in
   the name @emph{virtual machine}.
   In this framework, vmgen supports building the VM interpreter and any
   other component dealing with VM instructions.  It does not have any
   support for the front end, apart from VM code generation support.  The
   front end can be implemented with classical compiler front-end
   techniques, supported by tools like @command{flex} and @command{bison}.
   The intermediate representation is usually just internal to the
   interpreter, but some systems also support saving it to a file, either
   as an image file, or in a full-blown linkable file format (e.g., JVM).
   Vmgen currently has no special support for such features, but the
   information in the instruction descriptions can be helpful, and we are
   open for feature requests and suggestions.
   @section Data handling
   @cindex stack machine
   @cindex register machine
   Most VMs use one or more stacks for passing temporary data between VM
   instructions.  Another option is to use a register machine architecture
   for the virtual machine; however, this option is either slower or
   significantly more complex to implement than a stack machine architecture.
   Vmgen has special support and optimizations for stack VMs, making their
   implementation easy and efficient.
   You can also implement a register VM with vmgen (@pxref{Register
   Machines}), and you will still profit from most vmgen features.
   @cindex stack item size
   @cindex size, stack items
   Stack items all have the same size, so they typically will be as wide as
   an integer, pointer, or floating-point value.  Vmgen supports treating
   two consecutive stack items as a single value, but anything larger is
   best kept in some other memory area (e.g., the heap), with pointers to
   the data on the stack.
   @cindex instruction stream
   @cindex immediate arguments
   Another source of data is immediate arguments VM instructions (in the VM
   instruction stream).  The VM instruction stream is handled similar to a
   stack in vmgen.
   @cindex garbage collection
   @cindex reference counting
   Vmgen has no built-in support for nor restrictions against @emph{garbage
   collection}.  If you need garbage collection, you need to provide it in
   your run-time libraries.  Using @emph{reference counting} is probably
   harder, but might be possible (contact us if you are interested).
   @c reference counting might be possible by including counting code in 
   @c the conversion macros.
   @section Dispatch
   Understanding this section is probably not necessary for using vmgen,
   but it may help.  You may want to skip it now, and read it if you find statements about dispatch methods confusing.
   After executing one VM instruction, the VM interpreter has to dispatch
   the next VM instruction (vmgen calls the dispatch routine @samp{NEXT}).
   Vmgen supports two methods of dispatch:
   @item switch dispatch
   In this method the VM interpreter contains a giant @code{switch}
   statement, with one @code{case} for each VM instruction.  The VM
   instructions are represented by integers (e.g., produced by an
   @code{enum}) in the VM code, and dipatch occurs by loading the next
   integer from the VM code, @code{switch}ing on it, and continuing at the
   appropriate @code{case}; after executing the VM instruction, jump back
   to the dispatch code.
   @item threaded code
   This method represents a VM instruction in the VM code by the address of
   the start of the machine code fragment for executing the VM instruction.
   Dispatch consists of loading this address, jumping to it, and
   incrementing the VM instruction pointer.  Typically the threaded-code
   dispatch code is appended directly to the code for executing the VM
   instruction.  Threaded code cannot be implemented in ANSI C, but it can
   be implemented using GNU C's labels-as-values extension (@pxref{labels
   as values}).
   @end table
   @c *************************************************************
   @chapter Invoking vmgen
   The usual way to invoke vmgen is as follows:
   vmgen @var{infile}
   @end example
   Here @var{infile} is the VM instruction description file, which usually
   ends in @file{.vmg}.  The output filenames are made by taking the
   basename of @file{infile} (i.e., the output files will be created in the
   current working directory) and replacing @file{.vmg} with @file{-vm.i},
   @file{-disasm.i}, @file{-gen.i}, @file{-labels.i}, @file{-profile.i},
   and @file{-peephole.i}.  E.g., @command{bison hack/foo.vmg} will create
   @file{foo-vm.i} etc.
   The command-line options supported by vmgen are
   @table @option
   @cindex -h, command-line option
   @cindex --help, command-line option
   @item --help
   @itemx -h
   Print a message about the command-line options
   @cindex -v, command-line option
   @cindex --version, command-line option
   @item --version
   @itemx -v
   Print version and exit
   @end table
   @c ****************************************************************
   @chapter Example
   @section Example overview
   There are two versions of the same example for using vmgen:
   @file{vmgen-ex} and @file{vmgen-ex2} (you can also see Gforth as
   example, but it uses additional (undocumented) features, and also
   differs in some other respects).  The example implements @emph{mini}, a
   tiny Modula-2-like language with a small JavaVM-like virtual machine.
   The difference between the examples is that @file{vmgen-ex} uses many
   casts, and @file{vmgen-ex2} tries to avoids most casts and uses unions
   The files provided with each example are:
   disasm.c           wrapper file
   engine.c           wrapper file
   peephole.c         wrapper file
   profile.c          wrapper file
   mini-inst.vmg      simple VM instructions
   mini-super.vmg     superinstructions (empty at first)
   mini.h             common declarations
   mini.l             scanner
   mini.y             front end (parser, VM code generator)
   support.c          main() and other support functions
   fib.mini           example mini program
   simple.mini        example mini program
   test.mini          example mini program (tests everything)
   test.out           test.mini output
   stat.awk           script for aggregating profile information
   peephole-blacklist list of instructions not allowed in superinstructions
   seq2rule.awk       script for creating superinstructions
   @end example
   For your own interpreter, you would typically copy the following files
   and change little, if anything:
   disasm.c           wrapper file
   engine.c           wrapper file
   peephole.c         wrapper file
   profile.c          wrapper file
   stat.awk           script for aggregating profile information
   seq2rule.awk       script for creating superinstructions
   @end example
   You would typically change much in or replace the following files:
   mini-inst.vmg      simple VM instructions
   mini.h             common declarations
   mini.l             scanner
   mini.y             front end (parser, VM code generator)
   support.c          main() and other support functions
   peephole-blacklist list of instructions not allowed in superinstructions
   @end example
   You can build the example by @code{cd}ing into the example's directory,
   and then typing @samp{make}; you can check that it works with @samp{make
   check}.  You can run run mini programs like this:
   ./mini fib.mini
   @end example
   To learn about the options, type @samp{./mini -h}.
   @section Using profiling to create superinstructions
   I have not added rules for this in the @file{Makefile} (there are many
   options for selecting superinstructions, and I did not want to hardcode
   one into the @file{Makefile}), but there are some supporting scripts, and
   here's an example:
   Suppose you want to use @file{fib.mini} and @file{test.mini} as training
   programs, you get the profiles like this:
   make fib.prof test.prof #takes a few seconds
   @end example
   You can aggregate these profiles with @file{stat.awk}:
   awk -f stat.awk fib.prof test.prof
   @end example
   The result contains lines like:
         2      16        36910041 loadlocal lit
   @end example
   This means that the sequence @code{loadlocal lit} statically occurs a
   total of 16 times in 2 profiles, with a dynamic execution count of
   The numbers can be used in various ways to select superinstructions.
   E.g., if you just want to select all sequences with a dynamic
   execution count exceeding 10000, you would use the following pipeline:
   awk -f stat.awk fib.prof test.prof|
   awk '$3>=10000'|                #select sequences
   fgrep -v -f peephole-blacklist| #eliminate wrong instructions
   awk -f seq2rule.awk|      #transform sequences into superinstruction rules
   sort -k 3 >mini-super.vmg       #sort sequences
   @end example
   The file @file{peephole-blacklist} contains all instructions that
   directly access a stack or stack pointer (for mini: @code{call},
   @code{return}); the sort step is necessary to ensure that prefixes
   preceed larger superinstructions.
   Now you can create a version of mini with superinstructions by just
   saying @samp{make}
   @c ***************************************************************
   @chapter Input File Format
   Vmgen takes as input a file containing specifications of virtual machine
   instructions.  This file usually has a name ending in @file{.vmg}.
   Most examples are taken from the example in @file{vmgen-ex}.
   @section Input File Grammar
   The grammar is in EBNF format, with @code{@var{a}|@var{b}} meaning
   ``@var{a} or @var{b}'', @code{@{@var{c}@}} meaning 0 or more repetitions
   of @var{c} and @code{[@var{d}]} meaning 0 or 1 repetitions of @var{d}.
   Vmgen input is not free-format, so you have to take care where you put
   spaces and especially newlines; it's not as bad as makefiles, though:
   any sequence of spaces and tabs is equivalent to a single space.
   description: {instruction|comment|eval-escape}
   instruction: simple-inst|superinst
   simple-inst: ident " (" stack-effect " )" newline c-code newline newline
   stack-effect: {ident} " --" {ident}
   super-inst: ident " =" ident {ident}  
   comment:      "\ "  text newline
   eval-escape:  "\e " text newline
   @end example
   @c \+ \- \g \f \c
   Note that the @code{\}s in this grammar are meant literally, not as
   C-style encodings for non-printable characters.
   The C code in @code{simple-inst} must not contain empty lines (because
   vmgen would mistake that as the end of the simple-inst.  The text in
   @code{comment} and @code{eval-escape} must not contain a newline.
   @code{Ident} must conform to the usual conventions of C identifiers
   (otherwise the C compiler would choke on the vmgen output).
   Vmgen understands a few extensions beyond the grammar given here, but
   these extensions are only useful for building Gforth.  You can find a
   description of the format used for Gforth in @file{prim}.
   @c woanders?
   The text in @code{eval-escape} is Forth code that is evaluated when
   vmgen reads the line.  If you do not know (and do not want to learn)
   Forth, you can build the text according to the following grammar; these
   rules are normally all Forth you need for using vmgen:
   text: stack-decl|type-prefix-decl|stack-prefix-decl
   stack-decl: "stack " ident ident ident
       's" ' string '" ' ("single"|"double") ident "type-prefix" ident
   stack-prefix-decl:  ident "stack-prefix" string
   @end example
   Note that the syntax of this code is not checked thoroughly (there are
   many other Forth program fragments that could be written there).
   If you know Forth, the stack effects of the non-standard words involved
   stack        ( "name" "pointer" "type" -- )
                ( name execution: -- stack )
   type-prefix  ( addr u xt1 xt2 n stack "prefix" -- )
   single       ( -- xt1 xt2 n )
   double       ( -- xt1 xt2 n )
   stack-prefix ( stack "prefix" -- )
   @end example
   @section Simple instructions
   We will use the following simple VM instruction description as example:
   sub ( i1 i2 -- i )
   i = i1-i2;
   @end example
   The first line specifies the name of the VM instruction (@code{sub}) and
   its stack effect (@code{i1 i2 -- i}).  The rest of the description is
   just plain C code.
   @cindex stack effect
   The stack effect specifies that @code{sub} pulls two integers from the
   data stack and puts them in the C variables @code{i1} and @code{i2} (with
   the rightmost item (@code{i2}) taken from the top of stack) and later
   pushes one integer (@code{i)) on the data stack (the rightmost item is
   on the top afterwards).
   How do we know the type and stack of the stack items?  Vmgen uses
   prefixes, similar to Fortran; in contrast to Fortran, you have to
   define the prefix first:
   \E s" Cell"   single data-stack type-prefix i
   @end example
   This defines the prefix @code{i} to refer to the type @code{Cell}
   (defined as @code{long} in @file{mini.h}) and, by default, to the
   @code{data-stack}.  It also specifies that this type takes one stack
   item (@code{single}).  The type prefix is part of the variable name.
   Before we can use @code{data-stack} in this way, we have to define it:
   \E stack data-stack sp Cell
   @end example
   @c !! use something other than Cell
   This line defines the stack @code{data-stack}, which uses the stack
   pointer @code{sp}, and each item has the basic type @code{Cell}; other
   types have to fit into one or two @code{Cell}s (depending on whether the
   type is @code{single} or @code{double} wide), and are converted from and
   to Cells on accessing the @code{data-stack) with conversion macros
   (@pxref{Conversion macros}).  Stacks grow towards lower addresses in
   vmgen-erated interpreters.
   We can override the default stack of a stack item by using a stack
   prefix.  E.g., consider the following instruction:
   lit ( #i -- i )
   @end example
   The VM instruction @code{lit} takes the item @code{i} from the
   instruction stream (indicated by the prefix @code{#}), and pushes it on
   the (default) data stack.  The stack prefix is not part of the variable
   name.  Stack prefixes are defined like this:
   \E inst-stream stack-prefix #
   @end example
   This definition defines that the stack prefix @code{#} specifies the
   ``stack'' @code{inst-stream}.  Since the instruction stream behaves a
   little differently than an ordinary stack, it is predefined, and you do
   not need to define it.
   The instruction stream contains instructions and their immediate
   arguments, so specifying that an argument comes from the instruction
   stream indicates an immediate argument.  Of course, instruction stream
   arguments can only appear to the left of @code{--} in the stack effect.
   If there are multiple instruction stream arguments, the leftmost is the
   first one (just as the intuition suggests).
   @subsubsection C Code Macros
   Vmgen recognizes the following strings in the C code part of simple
   @table @samp
   @item SET_IP
   As far as vmgen is concerned, a VM instruction containing this ends a VM
   basic block (used in profiling to delimit profiled sequences).  On the C
   level, this also sets the instruction pointer.
   @item SUPER_END
   This ends a basic block (for profiling), without a SET_IP.
   @item TAIL;
   Vmgen replaces @samp{TAIL;} with code for ending a VM instruction and
   dispatching the next VM instruction.  This happens automatically when
   control reaches the end of the C code.  If you want to have this in the
   middle of the C code, you need to use @samp{TAIL;}.  A typical example
   is a conditional VM branch:
   if (branch_condition) {
     SET_IP(target); TAIL;
   /* implicit tail follows here */
   @end example
   In this example, @samp{TAIL;} is not strictly necessary, because there
   is another one implicitly after the if-statement, but using it improves
   branch prediction accuracy slightly and allows other optimizations.
   This indicates that the implicit tail at the end of the VM instruction
   dispatches the sequentially next VM instruction even if there is a
   @code{SET_IP} in the VM instruction.  This enables an optimization that
   is not yet implemented in the vmgen-ex code (but in Gforth).  The
   typical application is in conditional VM branches:
   if (branch_condition) {
     SET_IP(target); TAIL; /* now this TAIL is necessary */
   @end example
   @end table
   Note that vmgen is not smart about C-level tokenization, comments,
   strings, or conditional compilation, so it will interpret even a
   commented-out SUPER_END as ending a basic block (or, e.g.,
   @samp{RETAIL;} as @samp{TAIL;}).  Conversely, vmgen requires the literal
   presence of these strings; vmgen will not see them if they are hiding in
   a C preprocessor macro.
   @subsubsection C Code restrictions
   Vmgen generates code and performs some optimizations under the
   assumption that the user-supplied C code does not access the stack
   pointers or stack items, and that accesses to the instruction pointer
   only occur through special macros.  In general you should heed these
   restrictions.  However, if you need to break these restrictions, read
   the following.
   Accessing a stack or stack pointer directly can be a problem for several
   You may cache the top-of-stack item in a local variable (that is
   allocated to a register).  This is the most frequent source of trouble.
   You can deal with it either by not using top-of-stack caching (slowdown
   factor 1-1.4, depending on machine), or by inserting flushing code
   (e.g., @samp{IF_spTOS(sp[...] = spTOS);}) at the start and reloading
   code (e.g., @samp{IF_spTOS(spTOS = sp[0])}) at the end of problematic C
   code.  Vmgen inserts a stack pointer update before the start of the
   user-supplied C code, so the flushing code has to use an index that
   corrects for that.  In the future, this flushing may be done
   automatically by mentioning a special string in the C code.
   @c sometimes flushing and/or reloading unnecessary
   The vmgen-erated code loads the stack items from stack-pointer-indexed
   memory into variables before the user-supplied C code, and stores them
   from variables to stack-pointer-indexed memory afterwards.  If you do
   any writes to the stack through its stack pointer in your C code, it
   will not affact the variables, and your write may be overwritten by the
   stores after the C code.  Similarly, a read from a stack using a stack
   pointer will not reflect computations of stack items in the same VM
   Superinstructions keep stack items in variables across the whole
   superinstruction.  So you should not include VM instructions, that
   access a stack or stack pointer, as components of superinstructions.
   @end itemize
   You should access the instruction pointer only through its special
   macros (@samp{IP}, @samp{SET_IP}, @samp{IPTOS}); this ensure that these
   macros can be implemented in several ways for best performance.
   @samp{IP} points to the next instruction, and @samp{IPTOS} is its
   @section Superinstructions
   Here is an example of a superinstruction definition:
   lit_sub = lit sub
   @end example
   @code{lit_sub} is the name of the superinstruction, and @code{lit} and
   @code{sub} are its components.  This superinstruction performs the same
   action as the sequence @code{lit} and @code{sub}.  It is generated
   automatically by the VM code generation functions whenever that sequence
   occurs, so you only need to add this definition if you want to use this
   superinstruction (and even that can be partially automatized,
   Vmgen requires that the component instructions are simple instructions
   defined before superinstructions using the components.  Currently, vmgen
   also requires that all the subsequences at the start of a
   superinstruction (prefixes) must be defined as superinstruction before
   the superinstruction.  I.e., if you want to define a superinstruction
   sumof5 = add add add add
   @end example
   you first have to define
   add ( n1 n2 -- n )
   n = n1+n2;
   sumof3 = add add
   sumof4 = add add add
   @end example
   Here, @code{sumof4} is the longest prefix of @code{sumof5}, and @code{sumof3}
   is the longest prefix of @code{sumof4}.
   Note that vmgen assumes that only the code it generates accesses stack
   pointers, the instruction pointer, and various stack items, and it
   performs optimizations based on this assumption.  Therefore, VM
   instructions that change the instruction pointer should only be used as
   last component; a VM instruction that accesses a stack pointer should
   not be used as component at all.  Vmgen does not check these
   restrictions, they just result in bugs in your interpreter.
   @c ********************************************************************
   @chapter Using the generated code
   The easiest way to create a working VM interpreter with vmgen is
   probably to start with one of the examples, and modify it for your
   purposes.  This chapter is just the reference manual for the macros
   etc. used by the generated code, and the other context expected by the
   generated code, and what you can do with the various generated files.
   @section VM engine
   The VM engine is the VM interpreter that executes the VM code.  It is
   essential for an interpretive system.
   Vmgen supports two methods of VM instruction dispatch: @emph{threaded
   code} (fast, but gcc-specific), and @emph{switch dispatch} (slow, but
   portable across C compilers); you can use conditional compilation
   (@samp{defined(__GNUC__)}) to choose between these methods, and our
   example does so.
   For both methods, the VM engine is contained in a C-level function.
   Vmgen generates most of the contents of the function for you
   (@file{@var{name}-vm.i}), but you have to define this function, and
   macros and variables used in the engine, and initialize the variables.
   In our example the engine function also includes
   @file{@var{name}-labels.i} (@pxref{VM instruction table}).
   The following macros and variables are used in @file{@var{name}-vm.i}:
   @table @code
   @item LABEL(@var{inst_name})
   This is used just before each VM instruction to provide a jump or
   @code{switch} label (the @samp{:} is provided by vmgen).  For switch
   dispatch this should expand to @samp{case @var{label}}; for
   threaded-code dispatch this should just expand to @samp{case
   @var{label}}.  In either case @var{label} is usually the @var{inst_name}
   with some prefix or suffix to avoid naming conflicts.
   @item NAME(@var{inst_name_string})
   Called on entering a VM instruction with a string containing the name of
   the VM instruction as parameter.  In normal execution this should be a
   noop, but for tracing this usually prints the name, and possibly other
   information (several VM registers in our example).
   @item DEF_CA
   Usually empty.  Called just inside a new scope at the start of a VM
   instruction.  Can be used to define variables that should be visible
   during every VM instruction.  If you define this macro as non-empty, you
   have to provide the finishing @samp{;} in the macro.
   @item NEXT_P0 NEXT_P1 NEXT_P2
   The three parts of instruction dispatch.  They can be defined in
   different ways for best performance on various processors (see
   @file{engine.c} in the example or @file{engine/threaded.h} in Gforth).
   @samp{NEXT_P0} is invoked right at the start of the VM isntruction (but
   after @samp{DEF_CA}), @samp{NEXT_P1} right after the user-supplied C
   code, and @samp{NEXT_P2} at the end.  The actual jump has to be
   performed by @samp{NEXT_P2}.
   The simplest variant is if @samp{NEXT_P2} does everything and the other
   macros do nothing.  Then also related macros like @samp{IP},
   @samp{SET_IP}, @samp{IP}, @samp{INC_IP} and @samp{IPTOS} are very
   straightforward to define.  For switch dispatch this code consists just
   of a jump to the dispatch code (@samp{goto next_inst;} in our example;
   for direct threaded code it consists of something like
   @samp{({cfa=*ip++; goto *cfa;})}.
   Pulling code (usually the @samp{cfa=*ip;}) up into @samp{NEXT_P1}
   usually does not cause problems, but pulling things up into
   @samp{NEXT_P0} usually requires changing the other macros (and, at least
   for Gforth on Alpha, it does not buy much, because the compiler often
   manages to schedule the relevant stuff up by itself).  An even more
   extreme variant is to pull code up even further, into, e.g., NEXT_P1 of
   the previous VM instruction (prefetching, useful on PowerPCs).
   @item INC_IP(@var{n})
   This increments IP by @var{n}.
   @item vm_@var{A}2@var{B}(a,b)
   Type casting macro that assigns @samp{a} (of type @var{A}) to @samp{b}
   (of type @var{B}).  This is mainly used for getting stack items into
   variables and back.  So you need to define macros for every combination
   of stack basic type (@code{Cell} in our example) and type-prefix types
   used with that stack (in both directions).  For the type-prefix type,
   you use the type-prefix (not the C type string) as type name (e.g.,
   @samp{vm_Cell2i}, not @samp{vm_Cell2Cell}).  In addition, you have to
   define a vm_@var{X}2@var{X} macro for the stack basic type (used in
   The stack basic type for the predefined @samp{inst-stream} is
   @samp{Cell}.  If you want a stack with the same item size, making its
   basic type @samp{Cell} usually reduces the number of macros you have to
   Here our examples differ a lot: @file{vmgen-ex} uses casts in these
   macros, whereas @file{vmgen-ex2} uses union-field selection (or
   assignment to union fields).
   @item vm_two@var{A}2@var{B}(a1,a2,b)
   @item vm_@var{B}2two@var{A}(b,a1,a2)
   Conversions between two stack items (@code{a1}, @code{a2}) and a
   variable @code{b} of a type that takes two stack items.  This does not
   occur in our small examples, but you can look at Gforth for examples.
   @item @var{stackpointer}
   For each stack used, the stackpointer name given in the stack
   declaration is used.  For a regular stack this must be an l-expression;
   typically it is a variable declared as a pointer to the stack's basic
   type.  For @samp{inst-stream}, the name is @samp{IP}, and it can be a
   plain r-value; typically it is a macro that abstracts away the
   differences between the various implementations of NEXT_P*.
   @item @var{stackpointer}TOS
   The top-of-stack for the stack pointed to by @var{stackpointer}.  If you
   are using top-of-stack caching for that stack, this should be defined as
   variable; if you are not using top-of-stack caching for that stack, this
   should be a macro expanding to @samp{@var{stackpointer}[0]}.  The stack
   pointer for the predefined @samp{inst-stream} is called @samp{IP}, so
   the top-of-stack is called @samp{IPTOS}.
   @item IF_@var{stackpointer}TOS(@var{expr})
   Macro for executing @var{expr}, if top-of-stack caching is used for the
   @var{stackpointer} stack.  I.e., this should do @var{expr} if there is
   top-of-stack caching for @var{stackpointer}; otherwise it should do
   @item VM_DEBUG
   If this is defined, the tracing code will be compiled in (slower
   interpretation, but better debugging).  Our example compiles two
   versions of the engine, a fast-running one that cannot trace, and one
   with potential tracing and profiling.
   @item vm_debug
   Needed only if @samp{VM_DEBUG} is defined.  If this variable contains
   true, the VM instructions produce trace output.  It can be turned on or
   off at any time.
   @item vm_out
   Needed only if @samp{VM_DEBUG} is defined.  Specifies the file on which
   to print the trace output (type @samp{FILE *}).
   @item printarg_@var{type}(@var{value})
   Needed only if @samp{VM_DEBUG} is defined.  Macro or function for
   printing @var{value} in a way appropriate for the @var{type}.  This is
   used for printing the values of stack items during tracing.  @var{Type}
   is normally the type prefix specified in a @code{type-prefix} definition
   (e.g., @samp{printarg_i}); in superinstructions it is currently the
   basic type of the stack.
   @end table
   @section{VM instruction table}
   For threaded code we also need to produce a table containing the labels
   of all VM instructions.  This is needed for VM code generation
   (@pxref{VM code generation}), and it has to be done in the engine
   function, because the labels are not visible outside.  It then has to be
   passed outside the function (and assigned to @samp{vm_prim}), to be used
   by the VM code generation functions.
   This means that the engine function has to be called first to produce
   the VM instruction table, and later, after generating VM code, it has to
   be called again to execute the generated VM code (yes, this is ugly).
   In our example program, these two modes of calling the engine function
   are differentiated by the value of the parameter ip0 (if it equals 0,
   then the table is passed out, otherwise the VM code is executed); in our
   example, we pass the table out by assigning it to @samp{vm_prim} and
   returning from @samp{engine}.
   In our example, we also build such a table for switch dispatch; this is
   mainly done for uniformity.
   For switch dispatch, we also need to define the VM instruction opcodes
   used as case labels in an @code{enum}.
   For both purposes (VM instruction table, and enum), the file
   @file{@var{name}-labels.i} is generated by vmgen.  You have to define
   the following macro used in this file:
   @table @samp
   @item INST_ADDR(@var{inst_name})
   For switch dispatch, this is just the name of the switch label (the same
   name as used in @samp{LABEL(@var{inst_name})}), for both uses of
   @file{@var{name}-labels.i}.  For threaded-code dispatch, this is the
   address of the label defined in @samp{LABEL(@var{inst_name})}); the
   address is taken with @samp{&&} (@pxref{labels-as-values}).
   @end table
   @section VM code generation
   Vmgen generates VM code generation functions in @file{@var{name}-gen.i}
   that the front end can call to generate VM code.  This is essential for
   an interpretive system.
   For a VM instruction @samp{x ( #a b #c -- d )}, vmgen generates a
   function with the prototype
   void gen_x(Inst **ctp, a_type a, c_type c)
   @end example
   The @code{ctp} argument points to a pointer to the next instruction.
   @code{*ctp} is increased by the generation functions; i.e., you should
   allocate memory for the code to be generated beforehand, and start with
   *ctp set at the start of this memory area.  Before running out of
   memory, allocate a new area, and generate a VM-level jump to the new
   area (this is not implemented in our examples).
   The other arguments correspond to the immediate arguments of the VM
   instruction (with their appropriate types as defined in the
   @code{type_prefix} declaration.
   The following types, variables, and functions are used in
   @table @samp
   @item Inst
   The type of the VM instruction; if you use threaded code, this is
   @code{void *}; for switch dispatch this is an integer type.
   @item vm_prim
   The VM instruction table (type: @code{Inst *}, @pxref{VM instruction table}).
   @item gen_inst(Inst **ctp, Inst i)
   This function compiles the instruction @code{i}.  Take a look at it in
   @file{vmgen-ex/peephole.c}.  It is trivial when you don't want to use
   superinstructions (just the last two lines of the example function), and
   slightly more complicated in the example due to its ability to use
   superinstructions (@pxref{Peephole optimization}).
   @item genarg_@var{type_prefix}(Inst **ctp, @var{type} @var{type_prefix})
   This compiles an immediate argument of @var{type} (as defined in a
   @code{type-prefix} definition).  These functions are trivial to define
   (see @file{vmgen-ex/support.c}).  You need one of these functions for
   every type that you use as immediate argument.
   @end table
   In addition to using these functions to generate code, you should call
   @code{BB_BOUNDARY} at every basic block entry point if you ever want to
   use superinstructions (or if you want to use the profiling supported by
   vmgen; however, this is mainly useful for selecting superinstructions).
   If you use @code{BB_BOUNDARY}, you should also define it (take a look at
   its definition in @file{vmgen-ex/mini.y}).
   You do not need to call @code{BB_BOUNDARY} after branches, because you
   will not define superinstructions that contain branches in the middle
   (and if you did, and it would work, there would be no reason to end the
   superinstruction at the branch), and because the branches announce
   themselves to the profiler.
   @section Peephole optimization
   You need peephole optimization only if you want to use
   superinstructions.  But having the code for it does not hurt much if you
   do not use superinstructions.
   A simple greedy peephole optimization algorithm is used for
   superinstruction selection: every time @code{gen_inst} compiles a VM
   instruction, it looks if it can combine it with the last VM instruction
   (which may also be a superinstruction resulting from a previous peephole
   optimization); if so, it changes the last instruction to the combined
   instruction instead of laying down @code{i} at the current @samp{*ctp}.
   The code for peephole optimization is in @file{vmgen-ex/peephole.c}.
   You can use this file almost verbatim.  Vmgen generates
   @file{@var{file}-peephole.i} which contains data for the peephoile
   You have to call @samp{init_peeptable()} after initializing
   @samp{vm_prim}, and before compiling any VM code to initialize data
   structures for peephole optimization.  After that, compiling with the VM
   code generation functions will automatically combine VM instructions
   into superinstructions.  Since you do not want to combine instructions
   across VM branch targets (otherwise there will not be a proper VM
   instruction to branch to), you have to call @code{BB_BOUNDARY}
   (@pxref{VM code generation}) at branch targets.
   @section VM disassembler
   A VM code disassembler is optional for an interpretive system, but
   highly recommended during its development and maintenance, because it is
   very useful for detecting bugs in the front end (and for distinguishing
   them from VM interpreter bugs).
   Vmgen supports VM code disassembling by generating
   @file{@var{file}-disasm.i}.  This code has to be wrapped into a
   function, as is done in @file{vmgen-ex/disasm.i}.  You can use this file
   almost verbatim.  In addition to @samp{vm_@var{A}2@var{B}(a,b)},
   @samp{vm_out}, @samp{printarg_@var{type}(@var{value})}, which are
   explained above, the following macros and variables are used in
   @file{@var{file}-disasm.i} (and you have to define them):
   @table @samp
   @item ip
   This variable points to the opcode of the current VM instruction.
   @item IP IPTOS
   @samp{IPTOS} is the first argument of the current VM instruction, and
   @samp{IP} points to it; this is just as in the engine, but here
   @samp{ip} points to the opcode of the VM instruction (in contrast to the
   engine, where @samp{ip} points to the next cell, or even one further).
   @item VM_IS_INST(Inst i, int n)
   Tests if the opcode @samp{i} is the same as the @samp{n}th entry in the
   VM instruction table.
   @end table
   @section VM profiler
   The VM profiler is designed for getting execution and occurence counts
   for VM instruction sequences, and these counts can then be used for
   selecting sequences as superinstructions.  The VM profiler is probably
   not useful as profiling tool for the interpretive system (i.e., the VM
   profiler is useful for the developers, but not the users of the
   interpretive system).
   Input Syntax
   Concepts: Front end, VM, Stacks,  Types, input stream
   Required changes:
   vm_...2... -> two arguments
   "vm_two...2...(arg1,arg2,arg3);" -> "vm_two...2...(arg3,arg1,arg2)" (no ";").
   define INST_ADDR and LABEL
   define VM_IS_INST also for disassembler

Removed from v.1.1  
changed lines
  Added in v.1.7

FreeBSD-CVSweb <freebsd-cvsweb@FreeBSD.org>