--- gforth/doc/vmgen.texi	2002/06/02 10:31:29	1.3
+++ gforth/doc/vmgen.texi	2007/12/31 18:56:18	1.30
@@ -1,8 +1,129 @@
+\input texinfo    @c -*-texinfo-*-
+@comment %**start of header
+@setfilename vmgen.info
 @include version.texi
+@settitle Vmgen (Gforth @value{VERSION})
+@c @syncodeindex pg cp
+@comment %**end of header
+@copying
+This manual is for Vmgen
+(version @value{VERSION}, @value{UPDATED}),
+the virtual machine interpreter generator
+
+Copyright @copyright{} 2002,2003,2005 Free Software Foundation, Inc.
+
+@quotation
+Permission is granted to copy, distribute and/or modify this document
+under the terms of the GNU Free Documentation License, Version 1.2 or
+any later version published by the Free Software Foundation; with no
+Invariant Sections, with the Front-Cover texts being ``A GNU Manual,''
+and with the Back-Cover Texts as in (a) below.  A copy of the
+license is included in the section entitled ``GNU Free Documentation
+License.''
+
+(a) The FSF's Back-Cover Text is: ``You have freedom to copy and modify
+this GNU Manual, like GNU software.  Copies published by the Free
+Software Foundation raise funds for GNU development.''
+@end quotation
+@end copying
+
+@dircategory Software development
+@direntry
+* Vmgen: (vmgen).               Virtual machine interpreter generator
+@end direntry
+
+@titlepage
+@title Vmgen
+@subtitle for Gforth version @value{VERSION}, @value{UPDATED}
+@author M. Anton Ertl (@email{anton@@mips.complang.tuwien.ac.at})
+@page
+@vskip 0pt plus 1filll
+@insertcopying
+@end titlepage
+
+@contents
+
+@ifnottex
+@node Top, Introduction, (dir), (dir)
+@top Vmgen
+
+@insertcopying
+@end ifnottex
+
+@menu
+* Introduction::                What can Vmgen do for you?
+* Why interpreters?::           Advantages and disadvantages
+* Concepts::                    VM interpreter background
+* Invoking Vmgen::              
+* Example::                     
+* Input File Format::           
+* Error messages::              reported by Vmgen
+* Using the generated code::    
+* Hints::                       VM archictecture, efficiency
+* The future::                  
+* Changes::                     from earlier versions
+* Contact::                     Bug reporting etc.
+* Copying This Manual::         Manual License
+* Index::                       
+
+@detailmenu
+ --- The Detailed Node Listing ---
+
+Concepts
+
+* Front end and VM interpreter::  Modularizing an interpretive system
+* Data handling::               Stacks, registers, immediate arguments
+* Dispatch::                    From one VM instruction to the next
+
+Example
+
+* Example overview::            
+* Using profiling to create superinstructions::  
+
+Input File Format
+
+* Input File Grammar::          
+* Simple instructions::         
+* Superinstructions::           
+* Store Optimization::          
+* Register Machines::           How to define register VM instructions
+
+Input File Grammar
+
+* Eval escapes::                what follows \E
+
+Simple instructions
+
+* Explicit stack access::       If the C code accesses a stack pointer
+* C Code Macros::               Macros recognized by Vmgen
+* C Code restrictions::         Vmgen makes assumptions about C code
+* Stack growth direction::      is configurable per stack
+
+Using the generated code
+
+* VM engine::                   Executing VM code
+* VM instruction table::        
+* VM code generation::          Creating VM code (in the front-end)
+* Peephole optimization::       Creating VM superinstructions
+* VM disassembler::             for debugging the front end
+* VM profiler::                 for finding worthwhile superinstructions
+
+Hints
+
+* Floating point::              and stacks
+
+Copying This Manual
+
+* GNU Free Documentation License::  License for copying this manual.
+
+@end detailmenu
+@end menu
 
 @c @ifnottex
-This file documents vmgen (Gforth @value{VERSION}).
+@c This file documents Vmgen (Gforth @value{VERSION}).
 
+@c ************************************************************
+@node Introduction, Why interpreters?, Top, Top
 @chapter Introduction
 
 Vmgen is a tool for writing efficient interpreters.  It takes a simple
@@ -12,7 +133,7 @@ it).  The run-time efficiency of the res
 within a factor of 10 of machine code produced by an optimizing
 compiler.
 
-The interpreter design strategy supported by vmgen is to divide the
+The interpreter design strategy supported by Vmgen is to divide the
 interpreter into two parts:
 
 @itemize @bullet
@@ -29,8 +150,8 @@ machine code.
 @end itemize
 
 Such a division is usually used in interpreters, for modularity as well
-as for efficiency reasons.  The virtual machine code is typically passed
-between front end and virtual machine interpreter in memory, like in a
+as for efficiency.  The virtual machine code is typically passed between
+front end and virtual machine interpreter in memory, like in a
 load-and-go compiler; this avoids the complexity and time cost of
 writing the code to a file and reading it again.
 
@@ -39,11 +160,12 @@ A @emph{virtual machine} (VM) represents
 machine code.  Control flow occurs through VM branch instructions, like
 in a real machine.
 
-In this setup, vmgen can generate most of the code dealing with virtual
+@cindex functionality features overview
+In this setup, Vmgen can generate most of the code dealing with virtual
 machine instructions from a simple description of the virtual machine
-instructions (@pxref...), in particular:
+instructions (@pxref{Input File Format}), in particular:
 
-@table @emph
+@table @strong
 
 @item VM instruction execution
 
@@ -59,15 +181,21 @@ typically provide other means for debugg
 source level.
 
 @item VM code profiling
-Useful for optimizing the VM insterpreter with superinstructions
-(@pxref...).
+Useful for optimizing the VM interpreter with superinstructions
+(@pxref{VM profiler}).
 
 @end table
 
-VMgen supports efficient interpreters though various optimizations, in
+To create parts of the interpretive system that do not deal with VM
+instructions, you have to use other tools (e.g., @command{bison}) and/or
+hand-code them.
+
+@cindex efficiency features overview
+@noindent
+Vmgen supports efficient interpreters though various optimizations, in
 particular
 
-@itemize
+@itemize @bullet
 
 @item Threaded code
 
@@ -81,8 +209,9 @@ Replicating VM (super)instructions for b
 
 @end itemize
 
-As a result, vmgen-based interpreters are only about an order of
-magintude slower than native code from an optimizing C compiler on small
+@cindex speed for JVM
+As a result, Vmgen-based interpreters are only about an order of
+magnitude slower than native code from an optimizing C compiler on small
 benchmarks; on large benchmarks, which spend more time in the run-time
 system, the slowdown is often less (e.g., the slowdown of a
 Vmgen-generated JVM interpreter over the best JVM JIT compiler we
@@ -91,22 +220,27 @@ and all other interpreters we looked at
 interpreter).
 
 VMs are usually designed as stack machines (passing data between VM
-instructions on a stack), and vmgen supports such designs especially
-well; however, you can also use vmgen for implementing a register VM and
-still benefit from most of the advantages offered by vmgen.
+instructions on a stack), and Vmgen supports such designs especially
+well; however, you can also use Vmgen for implementing a register VM
+(@pxref{Register Machines}) and still benefit from most of the advantages
+offered by Vmgen.
 
 There are many potential uses of the instruction descriptions that are
 not implemented at the moment, but we are open for feature requests, and
-we will implement new features if someone asks for them; so the feature
+we will consider new features if someone asks for them; so the feature
 list above is not exhaustive.
 
 @c *********************************************************************
+@node Why interpreters?, Concepts, Introduction, Top
 @chapter Why interpreters?
+@cindex interpreters, advantages
+@cindex advantages of interpreters
+@cindex advantages of vmgen
 
 Interpreters are a popular language implementation technique because
 they combine all three of the following advantages:
 
-@itemize
+@itemize @bullet
 
 @item Ease of implementation
 
@@ -116,6 +250,9 @@ they combine all three of the following
 
 @end itemize
 
+Vmgen makes it even easier to implement interpreters.
+
+@cindex speed of interpreters
 The main disadvantage of interpreters is their run-time speed.  However,
 there are huge differences between different interpreters in this area:
 the slowdown over optimized C code on programs consisting of simple
@@ -125,15 +262,22 @@ slowdown for programs executing complex
 time spent in libraries for executing complex operations is the same in
 all implementation strategies).
 
-Vmgen makes it even easier to implement interpreters.  It also supports
-techniques for building efficient interpreters.
+Vmgen supports techniques for building efficient interpreters.
 
 @c ********************************************************************
-
+@node Concepts, Invoking Vmgen, Why interpreters?, Top
 @chapter Concepts
 
+@menu
+* Front end and VM interpreter::  Modularizing an interpretive system
+* Data handling::               Stacks, registers, immediate arguments
+* Dispatch::                    From one VM instruction to the next
+@end menu
+
 @c --------------------------------------------------------------------
-@section Front-end and virtual machine interpreter
+@node Front end and VM interpreter, Data handling, Concepts, Concepts
+@section Front end and VM interpreter
+@cindex modularization of interpreters
 
 @cindex front-end
 Interpretive systems are typically divided into a @emph{front end} that
@@ -143,16 +287,27 @@ representation of the program.
 
 @cindex virtual machine
 @cindex VM
+@cindex VM instruction
 @cindex instruction, VM
+@cindex VM branch instruction
+@cindex branch instruction, VM
+@cindex VM register
+@cindex register, VM
+@cindex opcode, VM instruction
+@cindex immediate argument, VM instruction
 For efficient interpreters the intermediate representation of choice is
 virtual machine code (rather than, e.g., an abstract syntax tree).
 @emph{Virtual machine} (VM) code consists of VM instructions arranged
 sequentially in memory; they are executed in sequence by the VM
-interpreter, except for VM branch instructions, which implement control
-structures.  The conceptual similarity to real machine code results in
-the name @emph{virtual machine}.
+interpreter, but VM branch instructions can change the control flow and
+are used for implementing control structures.  The conceptual similarity
+to real machine code results in the name @emph{virtual machine}.
+Various terms similar to terms for real machines are used; e.g., there
+are @emph{VM registers} (like the instruction pointer and stack
+pointer(s)), and the VM instruction consists of an @emph{opcode} and
+@emph{immediate arguments}.
 
-In this framework, vmgen supports building the VM interpreter and any
+In this framework, Vmgen supports building the VM interpreter and any
 other component dealing with VM instructions.  It does not have any
 support for the front end, apart from VM code generation support.  The
 front end can be implemented with classical compiler front-end
@@ -163,22 +318,27 @@ interpreter, but some systems also suppo
 as an image file, or in a full-blown linkable file format (e.g., JVM).
 Vmgen currently has no special support for such features, but the
 information in the instruction descriptions can be helpful, and we are
-open for feature requests and suggestions.
+open to feature requests and suggestions.
 
+@c --------------------------------------------------------------------
+@node Data handling, Dispatch, Front end and VM interpreter, Concepts
 @section Data handling
 
 @cindex stack machine
 @cindex register machine
 Most VMs use one or more stacks for passing temporary data between VM
 instructions.  Another option is to use a register machine architecture
-for the virtual machine; however, this option is either slower or
+for the virtual machine; we believe that using a stack architecture is
+usually both simpler and faster.
+
+however, this option is slower or
 significantly more complex to implement than a stack machine architecture.
 
 Vmgen has special support and optimizations for stack VMs, making their
 implementation easy and efficient.
 
-You can also implement a register VM with vmgen (@pxref{Register
-Machines}), and you will still profit from most vmgen features.
+You can also implement a register VM with Vmgen (@pxref{Register
+Machines}), and you will still profit from most Vmgen features.
 
 @cindex stack item size
 @cindex size, stack items
@@ -192,35 +352,82 @@ the data on the stack.
 @cindex immediate arguments
 Another source of data is immediate arguments VM instructions (in the VM
 instruction stream).  The VM instruction stream is handled similar to a
-stack in vmgen.
+stack in Vmgen.
 
 @cindex garbage collection
 @cindex reference counting
-Vmgen has no built-in support for nor restrictions against @emph{garbage
-collection}.  If you need garbage collection, you need to provide it in
-your run-time libraries.  Using @emph{reference counting} is probably
-harder, but might be possible (contact us if you are interested).
+Vmgen has no built-in support for, nor restrictions against
+@emph{garbage collection}.  If you need garbage collection, you need to
+provide it in your run-time libraries.  Using @emph{reference counting}
+is probably harder, but might be possible (contact us if you are
+interested).
 @c reference counting might be possible by including counting code in 
 @c the conversion macros.
 
+@c --------------------------------------------------------------------
+@node Dispatch,  , Data handling, Concepts
+@section Dispatch
+@cindex Dispatch of VM instructions
+@cindex main interpreter loop
+
+Understanding this section is probably not necessary for using Vmgen,
+but it may help.  You may want to skip it now, and read it if you find statements about dispatch methods confusing.
+
+After executing one VM instruction, the VM interpreter has to dispatch
+the next VM instruction (Vmgen calls the dispatch routine @samp{NEXT}).
+Vmgen supports two methods of dispatch:
+
+@table @strong
+
+@item switch dispatch
+@cindex switch dispatch
+In this method the VM interpreter contains a giant @code{switch}
+statement, with one @code{case} for each VM instruction.  The VM
+instruction opcodes are represented by integers (e.g., produced by an
+@code{enum}) in the VM code, and dispatch occurs by loading the next
+opcode, @code{switch}ing on it, and continuing at the appropriate
+@code{case}; after executing the VM instruction, the VM interpreter
+jumps back to the dispatch code.
+
+@item threaded code
+@cindex threaded code
+This method represents a VM instruction opcode by the address of the
+start of the machine code fragment for executing the VM instruction.
+Dispatch consists of loading this address, jumping to it, and
+incrementing the VM instruction pointer.  Typically the threaded-code
+dispatch code is appended directly to the code for executing the VM
+instruction.  Threaded code cannot be implemented in ANSI C, but it can
+be implemented using GNU C's labels-as-values extension (@pxref{Labels
+as Values, , Labels as Values, gcc.info, GNU C Manual}).
+
+@c call threading
+@end table
+
+Threaded code can be twice as fast as switch dispatch, depending on the
+interpreter, the benchmark, and the machine.
+
 @c *************************************************************
-@chapter Invoking vmgen
+@node Invoking Vmgen, Example, Concepts, Top
+@chapter Invoking Vmgen
+@cindex Invoking Vmgen
 
-The usual way to invoke vmgen is as follows:
+The usual way to invoke Vmgen is as follows:
 
 @example
-vmgen @var{infile}
+vmgen @var{inputfile}
 @end example
 
-Here @var{infile} is the VM instruction description file, which usually
-ends in @file{.vmg}.  The output filenames are made by taking the
-basename of @file{infile} (i.e., the output files will be created in the
-current working directory) and replacing @file{.vmg} with @file{-vm.i},
-@file{-disasm.i}, @file{-gen.i}, @file{-labels.i}, @file{-profile.i},
-and @file{-peephole.i}.  E.g., @command{bison hack/foo.vmg} will create
-@file{foo-vm.i} etc.
+Here @var{inputfile} is the VM instruction description file, which
+usually ends in @file{.vmg}.  The output filenames are made by taking
+the basename of @file{inputfile} (i.e., the output files will be created
+in the current working directory) and replacing @file{.vmg} with
+@file{-vm.i}, @file{-disasm.i}, @file{-gen.i}, @file{-labels.i},
+@file{-profile.i}, and @file{-peephole.i}.  E.g., @command{vmgen
+hack/foo.vmg} will create @file{foo-vm.i}, @file{foo-disasm.i},
+@file{foo-gen.i}, @file{foo-labels.i}, @file{foo-profile.i} and
+@file{foo-peephole.i}.
 
-The command-line options supported by vmgen are
+The command-line options supported by Vmgen are
 
 @table @option
 
@@ -239,86 +446,316 @@ Print version and exit
 
 @c env vars GFORTHDIR GFORTHDATADIR
 
+@c ****************************************************************
+@node Example, Input File Format, Invoking Vmgen, Top
+@chapter Example
+@cindex example of a Vmgen-based interpreter
+
+@menu
+* Example overview::            
+* Using profiling to create superinstructions::  
+@end menu
+
+@c --------------------------------------------------------------------
+@node Example overview, Using profiling to create superinstructions, Example, Example
+@section Example overview
+@cindex example overview
+@cindex @file{vmgen-ex}
+@cindex @file{vmgen-ex2}
+
+There are two versions of the same example for using Vmgen:
+@file{vmgen-ex} and @file{vmgen-ex2} (you can also see Gforth as
+example, but it uses additional (undocumented) features, and also
+differs in some other respects).  The example implements @emph{mini}, a
+tiny Modula-2-like language with a small JavaVM-like virtual machine.
+
+The difference between the examples is that @file{vmgen-ex} uses many
+casts, and @file{vmgen-ex2} tries to avoids most casts and uses unions
+instead.  In the rest of this manual we usually mention just files in
+@file{vmgen-ex}; if you want to use unions, use the equivalent file in
+@file{vmgen-ex2}.
+@cindex unions example
+@cindex casts example
+
+The files provided with each example are:
+@cindex example files
+
+@example
+Makefile
+README
+disasm.c           wrapper file
+engine.c           wrapper file
+peephole.c         wrapper file
+profile.c          wrapper file
+mini-inst.vmg      simple VM instructions
+mini-super.vmg     superinstructions (empty at first)
+mini.h             common declarations
+mini.l             scanner
+mini.y             front end (parser, VM code generator)
+support.c          main() and other support functions
+fib.mini           example mini program
+simple.mini        example mini program
+test.mini          example mini program (tests everything)
+test.out           test.mini output
+stat.awk           script for aggregating profile information
+peephole-blacklist list of instructions not allowed in superinstructions
+seq2rule.awk       script for creating superinstructions
+@end example
+
+For your own interpreter, you would typically copy the following files
+and change little, if anything:
+@cindex wrapper files
+
+@example
+disasm.c           wrapper file
+engine.c           wrapper file
+peephole.c         wrapper file
+profile.c          wrapper file
+stat.awk           script for aggregating profile information
+seq2rule.awk       script for creating superinstructions
+@end example
+
+@noindent
+You would typically change much in or replace the following files:
+
+@example
+Makefile
+mini-inst.vmg      simple VM instructions
+mini.h             common declarations
+mini.l             scanner
+mini.y             front end (parser, VM code generator)
+support.c          main() and other support functions
+peephole-blacklist list of instructions not allowed in superinstructions
+@end example
+
+You can build the example by @code{cd}ing into the example's directory,
+and then typing @code{make}; you can check that it works with @code{make
+check}.  You can run run mini programs like this:
+
+@example
+./mini fib.mini
+@end example
+
+To learn about the options, type @code{./mini -h}.
+
+@c --------------------------------------------------------------------
+@node Using profiling to create superinstructions,  , Example overview, Example
+@section Using profiling to create superinstructions
+@cindex profiling example
+@cindex superinstructions example
+
+I have not added rules for this in the @file{Makefile} (there are many
+options for selecting superinstructions, and I did not want to hardcode
+one into the @file{Makefile}), but there are some supporting scripts, and
+here's an example:
+
+Suppose you want to use @file{fib.mini} and @file{test.mini} as training
+programs, you get the profiles like this:
+
+@example
+make fib.prof test.prof #takes a few seconds
+@end example
+
+You can aggregate these profiles with @file{stat.awk}:
+
+@example
+awk -f stat.awk fib.prof test.prof
+@end example
+
+The result contains lines like:
+
+@example
+      2      16        36910041 loadlocal lit
+@end example
+
+This means that the sequence @code{loadlocal lit} statically occurs a
+total of 16 times in 2 profiles, with a dynamic execution count of
+36910041.
+
+The numbers can be used in various ways to select superinstructions.
+E.g., if you just want to select all sequences with a dynamic
+execution count exceeding 10000, you would use the following pipeline:
+
+@example
+awk -f stat.awk fib.prof test.prof|
+awk '$3>=10000'|                #select sequences
+fgrep -v -f peephole-blacklist| #eliminate wrong instructions
+awk -f seq2rule.awk|  #transform sequences into superinstruction rules
+sort -k 3 >mini-super.vmg       #sort sequences
+@end example
+
+The file @file{peephole-blacklist} contains all instructions that
+directly access a stack or stack pointer (for mini: @code{call},
+@code{return}); the sort step is necessary to ensure that prefixes
+precede larger superinstructions.
+
+Now you can create a version of mini with superinstructions by just
+saying @samp{make}
+
+
 @c ***************************************************************
+@node Input File Format, Error messages, Example, Top
 @chapter Input File Format
+@cindex input file format
+@cindex format, input file
 
 Vmgen takes as input a file containing specifications of virtual machine
 instructions.  This file usually has a name ending in @file{.vmg}.
 
-The examples are taken from the example in @file{vmgen-ex}.
+Most examples are taken from the example in @file{vmgen-ex}.
+
+@menu
+* Input File Grammar::          
+* Simple instructions::         
+* Superinstructions::           
+* Store Optimization::          
+* Register Machines::           How to define register VM instructions
+@end menu
 
+@c --------------------------------------------------------------------
+@node Input File Grammar, Simple instructions, Input File Format, Input File Format
 @section Input File Grammar
+@cindex grammar, input file
+@cindex input file grammar
 
 The grammar is in EBNF format, with @code{@var{a}|@var{b}} meaning
 ``@var{a} or @var{b}'', @code{@{@var{c}@}} meaning 0 or more repetitions
 of @var{c} and @code{[@var{d}]} meaning 0 or 1 repetitions of @var{d}.
 
+@cindex free-format, not
+@cindex newlines, significance in syntax
 Vmgen input is not free-format, so you have to take care where you put
-spaces and especially newlines; it's not as bad as makefiles, though:
-any sequence of spaces and tabs is equivalent to a single space.
+newlines (and, in a few cases, white space).
 
 @example
-description: {instruction|comment|eval-escape}
+description: @{instruction|comment|eval-escape|c-escape@}
 
 instruction: simple-inst|superinst
 
-simple-inst: ident " (" stack-effect " )" newline c-code newline newline
+simple-inst: ident '(' stack-effect ')' newline c-code newline newline
+
+stack-effect: @{ident@} '--' @{ident@}
 
-stack-effect: {ident} " --" {ident}
+super-inst: ident '=' ident @{ident@}  
 
-super-inst: ident " =" ident {ident}  
+comment:      '\ '  text newline
 
-comment:      "\ "  text newline
+eval-escape:  '\E ' text newline
 
-eval-escape:  "\e " text newline
+c-escape:     '\C ' text newline
 @end example
 @c \+ \- \g \f \c
 
 Note that the @code{\}s in this grammar are meant literally, not as
-C-style encodings for no-printable characters.
+C-style encodings for non-printable characters.
+
+There are two ways to delimit the C code in @code{simple-inst}:
+
+@itemize @bullet
+
+@item
+If you start it with a @samp{@{} at the start of a line (i.e., not even
+white space before it), you have to end it with a @samp{@}} at the start
+of a line (followed by a newline).  In this case you may have empty
+lines within the C code (typically used between variable definitions and
+statements).
 
-The C code in @code{simple-inst} must not contain empty lines (because
-vmgen would mistake that as the end of the simple-inst.  The text in
-@code{comment} and @code{eval-escape} must not contain a newline.
-@code{Ident} must conform to the usual conventions of C identifiers
-(otherwise the C compiler would choke on the vmgen output).
+@item
+You do not start it with @samp{@{}.  Then the C code ends at the first
+empty line, so you cannot have empty lines within this code.
+
+@end itemize
+
+The text in @code{comment}, @code{eval-escape} and @code{c-escape} must
+not contain a newline.  @code{Ident} must conform to the usual
+conventions of C identifiers (otherwise the C compiler would choke on
+the Vmgen output), except that idents in @code{stack-effect} may have a
+stack prefix (for stack prefix syntax, @pxref{Eval escapes}).
+
+@cindex C escape
+@cindex @code{\C}
+@cindex conditional compilation of Vmgen output
+The @code{c-escape} passes the text through to each output file (without
+the @samp{\C}).  This is useful mainly for conditional compilation
+(i.e., you write @samp{\C #if ...} etc.).
+
+@cindex sync lines
+@cindex @code{#line}
+In addition to the syntax given in the grammer, Vmgen also processes
+sync lines (lines starting with @samp{#line}), as produced by @samp{m4
+-s} (@pxref{Invoking m4, , Invoking m4, m4.info, GNU m4}) and similar
+tools.  This allows associating C compiler error messages with the
+original source of the C code.
 
 Vmgen understands a few extensions beyond the grammar given here, but
 these extensions are only useful for building Gforth.  You can find a
 description of the format used for Gforth in @file{prim}.
 
-@subsection
+@menu
+* Eval escapes::                what follows \E
+@end menu
+
+@node Eval escapes,  , Input File Grammar, Input File Grammar
+@subsection Eval escapes
+@cindex escape to Forth
+@cindex eval escape
+@cindex @code{\E}
+
 @c woanders?
 The text in @code{eval-escape} is Forth code that is evaluated when
-vmgen reads the line.  If you do not know (and do not want to learn)
-Forth, you can build the text according to the following grammar; these
-rules are normally all Forth you need for using vmgen:
+Vmgen reads the line.  You will normally use this feature to define
+stacks and types.
+
+If you do not know (and do not want to learn) Forth, you can build the
+text according to the following grammar; these rules are normally all
+Forth you need for using Vmgen:
 
 @example
-text: stack-decl|type-prefix-decl|stack-prefix-decl
+text: stack-decl|type-prefix-decl|stack-prefix-decl|set-flag
 
-stack-decl: "stack " ident ident ident
+stack-decl: 'stack ' ident ident ident
 type-prefix-decl: 
-    's" ' string '" ' ("single"|"double") ident "type-prefix" ident
-stack-prefix-decl:  ident "stack-prefix" string
+    's" ' string '" ' ('single'|'double') ident 'type-prefix' ident
+stack-prefix-decl:  ident 'stack-prefix' string
+set-flag: ('store-optimization'|'include-skipped-insts') ('on'|'off')
 @end example
 
 Note that the syntax of this code is not checked thoroughly (there are
-many other Forth program fragments that could be written there).
+many other Forth program fragments that could be written in an
+eval-escape).
+
+A stack prefix can contain letters, digits, or @samp{:}, and may start
+with an @samp{#}; e.g., in Gforth the return stack has the stack prefix
+@samp{R:}.  This restriction is not checked during the stack prefix
+definition, but it is enforced by the parsing rules for stack items
+later.
 
 If you know Forth, the stack effects of the non-standard words involved
 are:
-
+@findex stack
+@findex type-prefix
+@findex single
+@findex double
+@findex stack-prefix
+@findex store-optimization
 @example
-stack        ( "name" "pointer" "type" -- )
-             ( name execution: -- stack )
-type-prefix  ( addr u xt1 xt2 n stack "prefix" -- )
-single       ( -- xt1 xt2 n )
-double       ( -- xt1 xt2 n )
-stack-prefix ( stack "prefix" -- )
+stack                 ( "name" "pointer" "type" -- )
+                      ( name execution: -- stack )
+type-prefix           ( addr u item-size stack "prefix" -- )
+single                ( -- item-size )
+double                ( -- item-size )
+stack-prefix          ( stack "prefix" -- )
+store-optimization    ( -- addr )
+include-skipped-insts ( -- addr )
 @end example
 
+An @var{item-size} takes three cells on the stack.
+
+@c --------------------------------------------------------------------
+@node Simple instructions, Superinstructions, Input File Grammar, Input File Format
 @section Simple instructions
+@cindex simple VM instruction
+@cindex instruction, simple VM
 
 We will use the following simple VM instruction description as example:
 
@@ -332,12 +769,18 @@ its stack effect (@code{i1 i2 -- i}).  T
 just plain C code.
 
 @cindex stack effect
+@cindex effect, stack
 The stack effect specifies that @code{sub} pulls two integers from the
-data stack and puts them in the C variable @code{i1} and @code{i2} (with
-the rightmost item (@code{i2}) taken from the top of stack) and later
-pushes one integer (@code{i)) on the data stack (the rightmost item is
-on the top afterwards).
-
+data stack and puts them in the C variables @code{i1} and @code{i2}
+(with the rightmost item (@code{i2}) taken from the top of stack;
+intuition: if you push @code{i1}, then @code{i2} on the stack, the
+resulting stack picture is @code{i1 i2}) and later pushes one integer
+(@code{i}) on the data stack (the rightmost item is on the top
+afterwards).
+
+@cindex prefix, type
+@cindex type prefix
+@cindex default stack of a type prefix
 How do we know the type and stack of the stack items?  Vmgen uses
 prefixes, similar to Fortran; in contrast to Fortran, you have to
 define the prefix first:
@@ -351,6 +794,8 @@ This defines the prefix @code{i} to refe
 @code{data-stack}.  It also specifies that this type takes one stack
 item (@code{single}).  The type prefix is part of the variable name.
 
+@cindex stack definition
+@cindex defining a stack
 Before we can use @code{data-stack} in this way, we have to define it:
 
 @example
@@ -358,14 +803,19 @@ Before we can use @code{data-stack} in t
 @end example
 @c !! use something other than Cell
 
+@cindex stack basic type
+@cindex basic type of a stack
+@cindex type of a stack, basic
 This line defines the stack @code{data-stack}, which uses the stack
 pointer @code{sp}, and each item has the basic type @code{Cell}; other
 types have to fit into one or two @code{Cell}s (depending on whether the
-type is @code{single} or @code{double} wide), and are converted from and
-to Cells on accessing the @code{data-stack) with conversion macros
-(@pxref{Conversion macros}).  Stacks grow towards lower addresses in
-vmgen.
+type is @code{single} or @code{double} wide), and are cast from and to
+Cells on accessing the @code{data-stack} with type cast macros
+(@pxref{VM engine}).  By default, stacks grow towards lower addresses in
+Vmgen-erated interpreters (@pxref{Stack growth direction}).
 
+@cindex stack prefix
+@cindex prefix, stack
 We can override the default stack of a stack item by using a stack
 prefix.  E.g., consider the following instruction:
 
@@ -374,19 +824,21 @@ lit ( #i -- i )
 @end example
 
 The VM instruction @code{lit} takes the item @code{i} from the
-instruction stream (indicated by the prefix @code{#}, and pushes it on
+instruction stream (indicated by the prefix @code{#}), and pushes it on
 the (default) data stack.  The stack prefix is not part of the variable
 name.  Stack prefixes are defined like this:
 
 @example
 \E inst-stream stack-prefix #
+\E data-stack  stack-prefix S:
 @end example
 
-This definition defines that the stack prefix @code{#} to specifies the
+This definition defines that the stack prefix @code{#} specifies the
 ``stack'' @code{inst-stream}.  Since the instruction stream behaves a
 little differently than an ordinary stack, it is predefined, and you do
 not need to define it.
 
+@cindex instruction stream
 The instruction stream contains instructions and their immediate
 arguments, so specifying that an argument comes from the instruction
 stream indicates an immediate argument.  Of course, instruction stream
@@ -394,16 +846,1164 @@ arguments can only appear to the left of
 If there are multiple instruction stream arguments, the leftmost is the
 first one (just as the intuition suggests).
 
+@menu
+* Explicit stack access::       If the C code accesses a stack pointer
+* C Code Macros::               Macros recognized by Vmgen
+* C Code restrictions::         Vmgen makes assumptions about C code
+* Stack growth direction::      is configurable per stack
+@end menu
+
+@c --------------------------------------------------------------------
+@node  Explicit stack access, C Code Macros, Simple instructions, Simple instructions
+@subsection Explicit stack access
+@cindex stack access, explicit
+@cindex Stack pointer access
+@cindex explicit stack access
+
+Not all stack effects can be specified using the stack effect
+specifications above.  For VM instructions that have other stack
+effects, you can specify them explicitly by accessing the stack
+pointer in the C code; however, you have to notify Vmgen of such
+explicit stack accesses, otherwise Vmgens optimizations could conflict
+with your explicit stack accesses.
+
+You notify Vmgen by putting @code{...} with the appropriate stack
+prefix into the stack comment.  Then the VM instruction will first
+take the other stack items specified in the stack effect into C
+variables, then make sure that all other stack items for that stack
+are in memory, and that the stack pointer for the stack points to the
+top-of-stack (by default, unless you change the stack access
+transformation: @pxref{Stack growth direction}).
+
+The general rule is: If you mention a stack pointer in the C code of a
+VM instruction, you should put a @code{...} for that stack in the stack
+effect.
+
+Consider this example:
+
+@example
+return ( #iadjust S:... target afp i1 -- i2 )
+SET_IP(target);
+sp = (Cell *)(((char *)sp)+iadjust);
+fp = afp;
+i2=i1;
+@end example
+
+First the variables @code{target afp i1} are popped off the stack,
+then the stack pointer @code{sp} is set correctly for the new stack
+depth, then the C code changes the stack depth and does other things,
+and finally @code{i2} is pushed on the stack with the new depth.
+
+The position of the @code{...} within the stack effect does not
+matter.  You can use several @code{...}s, for different stacks, and
+also several for the same stack (that has no additional effect).  If
+you use @code{...} without a stack prefix, this specifies all the
+stacks except the instruction stream.
+
+You cannot use @code{...} for the instruction stream, but that is not
+necessary: At the start of the C code, @code{IP} points to the start
+of the next VM instruction (i.e., right beyond the end of the current
+VM instruction), and you can change the instruction pointer with
+@code{SET_IP} (@pxref{VM engine}).
+
+
+@c --------------------------------------------------------------------
+@node C Code Macros, C Code restrictions, Explicit stack access, Simple instructions
+@subsection C Code Macros
+@cindex macros recognized by Vmgen
+@cindex basic block, VM level
+
+Vmgen recognizes the following strings in the C code part of simple
+instructions:
+
+@table @code
+
+@item SET_IP
+@findex SET_IP
+As far as Vmgen is concerned, a VM instruction containing this ends a VM
+basic block (used in profiling to delimit profiled sequences).  On the C
+level, this also sets the instruction pointer.
+
+@item SUPER_END
+@findex SUPER_END
+This ends a basic block (for profiling), even if the instruction
+contains no @code{SET_IP}.
+
+@item INST_TAIL;
+@findex INST_TAIL;
+Vmgen replaces @samp{INST_TAIL;} with code for ending a VM instruction and
+dispatching the next VM instruction.  Even without a @samp{INST_TAIL;} this
+happens automatically when control reaches the end of the C code.  If
+you want to have this in the middle of the C code, you need to use
+@samp{INST_TAIL;}.  A typical example is a conditional VM branch:
+
+@example
+if (branch_condition) @{
+  SET_IP(target); INST_TAIL;
+@}
+/* implicit tail follows here */
+@end example
+
+In this example, @samp{INST_TAIL;} is not strictly necessary, because there
+is another one implicitly after the if-statement, but using it improves
+branch prediction accuracy slightly and allows other optimizations.
+
+@item SUPER_CONTINUE
+@findex SUPER_CONTINUE
+This indicates that the implicit tail at the end of the VM instruction
+dispatches the sequentially next VM instruction even if there is a
+@code{SET_IP} in the VM instruction.  This enables an optimization that
+is not yet implemented in the vmgen-ex code (but in Gforth).  The
+typical application is in conditional VM branches:
+
+@example
+if (branch_condition) @{
+  SET_IP(target); INST_TAIL; /* now this INST_TAIL is necessary */
+@}
+SUPER_CONTINUE;
+@end example
+
+@item VM_JUMP
+@findex VM_JUMP
+@code{VM_JUMP(target)} is equivalent to @code{goto *(target)}, but
+allows Vmgen to do dynamic superinstructions and replication.  You
+still need to say @code{SUPER_END}.  Also, the goto only happens at
+the end (wherever the VM_JUMP is).  Essentially, this just suppresses
+much of the ordinary dispatch mechanism.
+
+@end table
+
+Note that Vmgen is not smart about C-level tokenization, comments,
+strings, or conditional compilation, so it will interpret even a
+commented-out SUPER_END as ending a basic block (or, e.g.,
+@samp{RESET_IP;} as @samp{SET_IP;}).  Conversely, Vmgen requires the literal
+presence of these strings; Vmgen will not see them if they are hiding in
+a C preprocessor macro.
+
+
+@c --------------------------------------------------------------------
+@node C Code restrictions, Stack growth direction, C Code Macros, Simple instructions
+@subsection C Code restrictions
+@cindex C code restrictions
+@cindex restrictions on C code
+@cindex assumptions about C code
+
+@cindex accessing stack (pointer)
+@cindex stack pointer, access
+@cindex instruction pointer, access
+Vmgen generates code and performs some optimizations under the
+assumption that the user-supplied C code does not access the stack
+pointers or stack items, and that accesses to the instruction pointer
+only occur through special macros.  In general you should heed these
+restrictions.  However, if you need to break these restrictions, read
+the following.
+
+Accessing a stack or stack pointer directly can be a problem for several
+reasons: 
+@cindex stack caching, restriction on C code
+@cindex superinstructions, restrictions on components
+
+@itemize @bullet
+
+@item
+Vmgen optionally supports caching the top-of-stack item in a local
+variable (that is allocated to a register).  This is the most frequent
+source of trouble.  You can deal with it either by not using
+top-of-stack caching (slowdown factor 1-1.4, depending on machine), or
+by inserting flushing code (e.g., @samp{IF_spTOS(sp[...] = spTOS);}) at
+the start and reloading code (e.g., @samp{IF_spTOS(spTOS = sp[0])}) at
+the end of problematic C code.  Vmgen inserts a stack pointer update
+before the start of the user-supplied C code, so the flushing code has
+to use an index that corrects for that.  In the future, this flushing
+may be done automatically by mentioning a special string in the C code.
+@c sometimes flushing and/or reloading unnecessary
+
+@item
+The Vmgen-erated code loads the stack items from stack-pointer-indexed
+memory into variables before the user-supplied C code, and stores them
+from variables to stack-pointer-indexed memory afterwards.  If you do
+any writes to the stack through its stack pointer in your C code, it
+will not affect the variables, and your write may be overwritten by the
+stores after the C code.  Similarly, a read from a stack using a stack
+pointer will not reflect computations of stack items in the same VM
+instruction.
+
+@item
+Superinstructions keep stack items in variables across the whole
+superinstruction.  So you should not include VM instructions, that
+access a stack or stack pointer, as components of superinstructions
+(@pxref{VM profiler}).
+
+@end itemize
+
+You should access the instruction pointer only through its special
+macros (@samp{IP}, @samp{SET_IP}, @samp{IPTOS}); this ensure that these
+macros can be implemented in several ways for best performance.
+@samp{IP} points to the next instruction, and @samp{IPTOS} is its
+contents.
+
+@c --------------------------------------------------------------------
+@node Stack growth direction,  , C Code restrictions, Simple instructions
+@subsection Stack growth direction
+@cindex stack growth direction
+
+@cindex @code{stack-access-transform}
+By default, the stacks grow towards lower addresses.  You can change
+this for a stack by setting the @code{stack-access-transform} field of
+the stack to an xt @code{( itemnum -- index )} that performs the
+appropriate index transformation.
+
+E.g., if you want to let @code{data-stack} grow towards higher
+addresses, with the stack pointer always pointing just beyond the
+top-of-stack, use this right after defining @code{data-stack}:
+
+@example
+\E : sp-access-transform ( itemnum -- index ) negate 1- ;
+\E ' sp-access-transform ' data-stack >body stack-access-transform !
+@end example
+
+This means that @code{sp-access-transform} will be used to generate
+indexes for accessing @code{data-stack}.  The definition of
+@code{sp-access-transform} above transforms n into -n-1, e.g, 1 into -2.
+This will access the 0th data-stack element (top-of-stack) at sp[-1],
+the 1st at sp[-2], etc., which is the typical way upward-growing
+stacks are used.  If you need a different transform and do not know
+enough Forth to program it, let me know.
+
+@c --------------------------------------------------------------------
+@node Superinstructions, Store Optimization, Simple instructions, Input File Format
 @section Superinstructions
+@cindex superinstructions, defining
+@cindex defining superinstructions
+
+Note: don't invest too much work in (static) superinstructions; a future
+version of Vmgen will support dynamic superinstructions (see Ian
+Piumarta and Fabio Riccardi, @cite{Optimizing Direct Threaded Code by
+Selective Inlining}, PLDI'98), and static superinstructions have much
+less benefit in that context (preliminary results indicate only a factor
+1.1 speedup).
+
+Here is an example of a superinstruction definition:
+
+@example
+lit_sub = lit sub
+@end example
+
+@code{lit_sub} is the name of the superinstruction, and @code{lit} and
+@code{sub} are its components.  This superinstruction performs the same
+action as the sequence @code{lit} and @code{sub}.  It is generated
+automatically by the VM code generation functions whenever that sequence
+occurs, so if you want to use this superinstruction, you just need to
+add this definition (and even that can be partially automatized,
+@pxref{VM profiler}).
+
+@cindex prefixes of superinstructions
+Vmgen requires that the component instructions are simple instructions
+defined before superinstructions using the components.  Currently, Vmgen
+also requires that all the subsequences at the start of a
+superinstruction (prefixes) must be defined as superinstruction before
+the superinstruction.  I.e., if you want to define a superinstruction
+
+@example
+foo4 = load add sub mul
+@end example
+
+you first have to define @code{load}, @code{add}, @code{sub} and
+@code{mul}, plus
+
+@example
+foo2 = load add
+foo3 = load add sub
+@end example
+
+Here, @code{sumof4} is the longest prefix of @code{sumof5}, and @code{sumof3}
+is the longest prefix of @code{sumof4}.
+
+Note that Vmgen assumes that only the code it generates accesses stack
+pointers, the instruction pointer, and various stack items, and it
+performs optimizations based on this assumption.  Therefore, VM
+instructions where your C code changes the instruction pointer should
+only be used as last component; a VM instruction where your C code
+accesses a stack pointer should not be used as component at all.  Vmgen
+does not check these restrictions, they just result in bugs in your
+interpreter.
+
+@cindex include-skipped-insts
+The Vmgen flag @code{include-skipped-insts} influences superinstruction
+code generation.  Currently there is no support in the peephole
+optimizer for both variations, so leave this flag alone for now.
+
+@c -------------------------------------------------------------------
+@node  Store Optimization, Register Machines, Superinstructions, Input File Format
+@section Store Optimization
+@cindex store optimization
+@cindex optimization, stack stores
+@cindex stack stores, optimization
+@cindex eliminating stack stores
+
+This minor optimization (0.6\%--0.8\% reduction in executed instructions
+for Gforth) puts additional requirements on the instruction descriptions
+and is therefore disabled by default.
+
+What does it do?  Consider an instruction like
+
+@example
+dup ( n -- n n )
+@end example
+
+For simplicity, also assume that we are not caching the top-of-stack in
+a register.  Now, the C code for dup first loads @code{n} from the
+stack, and then stores it twice to the stack, one time to the address
+where it came from; that time is unnecessary, but gcc does not optimize
+it away, so vmgen can do it instead (if you turn on the store
+optimization).
+
+Vmgen uses the stack item's name to determine if the stack item contains
+the same value as it did at the start.  Therefore, if you use the store
+optimization, you have to ensure that stack items that have the same
+name on input and output also have the same value, and are not changed
+in the C code you supply.  I.e., the following code could fail if you
+turn on the store optimization:
+
+@example
+add1 ( n -- n )
+n++;
+@end example
+
+Instead, you have to use different names, i.e.:
+
+@example
+add1 ( n1 -- n2 )
+n2=n1+1;
+@end example
+
+Similarly, the store optimization assumes that the stack pointer is only
+changed by Vmgen-erated code.  If your C code changes the stack pointer,
+use different names in input and output stack items to avoid a (probably
+wrong) store optimization, or turn the store optimization off for this
+VM instruction.
+
+To turn on the store optimization, write
+
+@example
+\E store-optimization on
+@end example
+
+at the start of the file.  You can turn this optimization on or off
+between any two VM instruction descriptions.  For turning it off again,
+you can use
+
+@example
+\E store-optimization off
+@end example
+
+@c -------------------------------------------------------------------
+@node Register Machines,  , Store Optimization, Input File Format
+@section Register Machines
+@cindex Register VM
+@cindex Superinstructions for register VMs
+@cindex tracing of register VMs
+
+If you want to implement a register VM rather than a stack VM with
+Vmgen, there are two ways to do it: Directly and through
+superinstructions.
+
+If you use the direct way, you define instructions that take the
+register numbers as immediate arguments, like this:
+
+@example
+add3 ( #src1 #src2 #dest -- )
+reg[dest] = reg[src1]+reg[src2];
+@end example
+
+A disadvantage of this method is that during tracing you only see the
+register numbers, but not the register contents.  Actually, with an
+appropriate definition of @code{printarg_src} (@pxref{VM engine}), you
+can print the values of the source registers on entry, but you cannot
+print the value of the destination register on exit.
+
+If you use superinstructions to define a register VM, you define simple
+instructions that use a stack, and then define superinstructions that
+have no overall stack effect, like this:
+
+@example
+loadreg ( #src -- n )
+n = reg[src];
+
+storereg ( n #dest -- )
+reg[dest] = n;
+
+adds ( n1 n2 -- n )
+n = n1+n2;
+
+add3 = loadreg loadreg adds storereg
+@end example
+
+An advantage of this method is that you see the values and not just the
+register numbers in tracing.  A disadvantage of this method is that
+currently you cannot generate superinstructions directly, but only
+through generating a sequence of simple instructions (we might change
+this in the future if there is demand).
+
+Could the register VM support be improved, apart from the issues
+mentioned above?  It is hard to see how to do it in a general way,
+because there are a number of different designs that different people
+mean when they use the term @emph{register machine} in connection with
+VM interpreters.  However, if you have ideas or requests in that
+direction, please let me know (@pxref{Contact}).
+
+@c ********************************************************************
+@node Error messages, Using the generated code, Input File Format, Top
+@chapter Error messages
+@cindex error messages
+
+These error messages are created by Vmgen:
+
+@table @code
+
+@cindex @code{# can only be on the input side} error
+@item # can only be on the input side
+You have used an instruction-stream prefix (usually @samp{#}) after the
+@samp{--} (the output side); you can only use it before (the input
+side).
+
+@cindex @code{prefix for this combination must be defined earlier} error
+@item the prefix for this superinstruction must be defined earlier
+You have defined a superinstruction (e.g. @code{abc = a b c}) without
+defining its direct prefix (e.g., @code{ab = a b}),
+@xref{Superinstructions}.
+
+@cindex @code{sync line syntax} error
+@item sync line syntax
+If you are using a preprocessor (e.g., @command{m4}) to generate Vmgen
+input code, you may want to create @code{#line} directives (aka sync
+lines).  This error indicates that such a line is not in th syntax
+expected by Vmgen (this should not happen; please report the offending
+line in a bug report).
+
+@cindex @code{syntax error, wrong char} error
+@item syntax error, wrong char
+A syntax error.  If you do not see right away where the error is, it may
+be helpful to check the following: Did you put an empty line in a VM
+instruction where the C code is not delimited by braces (then the empty
+line ends the VM instruction)?  If you used brace-delimited C code, did
+you put the delimiting braces (and only those) at the start of the line,
+without preceding white space?  Did you forget a delimiting brace?
+
+@cindex @code{too many stacks} error
+@item too many stacks
+Vmgen currently supports 3 stacks (plus the instruction stream); if you
+need more, let us know.
+
+@cindex @code{unknown prefix} error
+@item unknown prefix
+The stack item does not match any defined type prefix (after stripping
+away any stack prefix).  You should either declare the type prefix you
+want for that stack item, or use a different type prefix
+
+@cindex @code{unknown primitive} error
+@item unknown primitive
+You have used the name of a simple VM instruction in a superinstruction
+definition without defining the simple VM instruction first.
+
+@end table
+
+In addition, the C compiler can produce errors due to code produced by
+Vmgen; e.g., you need to define type cast functions.
+
+@c ********************************************************************
+@node Using the generated code, Hints, Error messages, Top
+@chapter Using the generated code
+@cindex generated code, usage
+@cindex Using vmgen-erated code
+
+The easiest way to create a working VM interpreter with Vmgen is
+probably to start with @file{vmgen-ex}, and modify it for your purposes.
+This chapter explains what the various wrapper and generated files do.
+It also contains reference-manual style descriptions of the macros,
+variables etc. used by the generated code, and you can skip that on
+first reading.
+
+@menu
+* VM engine::                   Executing VM code
+* VM instruction table::        
+* VM code generation::          Creating VM code (in the front-end)
+* Peephole optimization::       Creating VM superinstructions
+* VM disassembler::             for debugging the front end
+* VM profiler::                 for finding worthwhile superinstructions
+@end menu
+
+@c --------------------------------------------------------------------
+@node VM engine, VM instruction table, Using the generated code, Using the generated code
+@section VM engine
+@cindex VM instruction execution
+@cindex engine
+@cindex executing VM code
+@cindex @file{engine.c}
+@cindex @file{-vm.i} output file
+
+The VM engine is the VM interpreter that executes the VM code.  It is
+essential for an interpretive system.
+
+Vmgen supports two methods of VM instruction dispatch: @emph{threaded
+code} (fast, but gcc-specific), and @emph{switch dispatch} (slow, but
+portable across C compilers); you can use conditional compilation
+(@samp{defined(__GNUC__)}) to choose between these methods, and our
+example does so.
+
+For both methods, the VM engine is contained in a C-level function.
+Vmgen generates most of the contents of the function for you
+(@file{@var{name}-vm.i}), but you have to define this function, and
+macros and variables used in the engine, and initialize the variables.
+In our example the engine function also includes
+@file{@var{name}-labels.i} (@pxref{VM instruction table}).
+
+@cindex tracing VM code
+@cindex superinstructions and tracing
+In addition to executing the code, the VM engine can optionally also
+print out a trace of the executed instructions, their arguments and
+results.  For superinstructions it prints the trace as if only component
+instructions were executed; this allows to introduce new
+superinstructions while keeping the traces comparable to old ones
+(important for regression tests).
+
+It costs significant performance to check in each instruction whether to
+print tracing code, so we recommend producing two copies of the engine:
+one for fast execution, and one for tracing.  See the rules for
+@file{engine.o} and @file{engine-debug.o} in @file{vmgen-ex/Makefile}
+for an example.
+
+The following macros and variables are used in @file{@var{name}-vm.i}:
+
+@table @code
+
+@findex LABEL
+@item LABEL(@var{inst_name})
+This is used just before each VM instruction to provide a jump or
+@code{switch} label (the @samp{:} is provided by Vmgen).  For switch
+dispatch this should expand to @samp{case @var{label}:}; for
+threaded-code dispatch this should just expand to @samp{@var{label}:}.
+In either case @var{label} is usually the @var{inst_name} with some
+prefix or suffix to avoid naming conflicts.
+
+@findex LABEL2
+@item LABEL2(@var{inst_name})
+This will be used for dynamic superinstructions; at the moment, this
+should expand to nothing.
+
+@findex NAME
+@item NAME(@var{inst_name_string})
+Called on entering a VM instruction with a string containing the name of
+the VM instruction as parameter.  In normal execution this should be
+expand to nothing, but for tracing this usually prints the name, and
+possibly other information (several VM registers in our example).
+
+@findex DEF_CA
+@item DEF_CA
+Usually empty.  Called just inside a new scope at the start of a VM
+instruction.  Can be used to define variables that should be visible
+during every VM instruction.  If you define this macro as non-empty, you
+have to provide the finishing @samp{;} in the macro.
+
+@findex NEXT_P0
+@findex NEXT_P1
+@findex NEXT_P2
+@item NEXT_P0 NEXT_P1 NEXT_P2
+The three parts of instruction dispatch.  They can be defined in
+different ways for best performance on various processors (see
+@file{engine.c} in the example or @file{engine/threaded.h} in Gforth).
+@samp{NEXT_P0} is invoked right at the start of the VM instruction (but
+after @samp{DEF_CA}), @samp{NEXT_P1} right after the user-supplied C
+code, and @samp{NEXT_P2} at the end.  The actual jump has to be
+performed by @samp{NEXT_P2} (if you would do it earlier, important parts
+of the VM instruction would not be executed).
+
+The simplest variant is if @samp{NEXT_P2} does everything and the other
+macros do nothing.  Then also related macros like @samp{IP},
+@samp{SET_IP}, @samp{IP}, @samp{INC_IP} and @samp{IPTOS} are very
+straightforward to define.  For switch dispatch this code consists just
+of a jump to the dispatch code (@samp{goto next_inst;} in our example);
+for direct threaded code it consists of something like
+@samp{(@{cfa=*ip++; goto *cfa;@})}.
+
+Pulling code (usually the @samp{cfa=*ip++;}) up into @samp{NEXT_P1}
+usually does not cause problems, but pulling things up into
+@samp{NEXT_P0} usually requires changing the other macros (and, at least
+for Gforth on Alpha, it does not buy much, because the compiler often
+manages to schedule the relevant stuff up by itself).  An even more
+extreme variant is to pull code up even further, into, e.g., NEXT_P1 of
+the previous VM instruction (prefetching, useful on PowerPCs).
+
+@findex INC_IP
+@item INC_IP(@var{n})
+This increments @code{IP} by @var{n}.
+
+@findex SET_IP
+@item SET_IP(@var{target})
+This sets @code{IP} to @var{target}.
+
+@cindex type cast macro
+@findex vm_@var{A}2@var{B}
+@item vm_@var{A}2@var{B}(a,b)
+Type casting macro that assigns @samp{a} (of type @var{A}) to @samp{b}
+(of type @var{B}).  This is mainly used for getting stack items into
+variables and back.  So you need to define macros for every combination
+of stack basic type (@code{Cell} in our example) and type-prefix types
+used with that stack (in both directions).  For the type-prefix type,
+you use the type-prefix (not the C type string) as type name (e.g.,
+@samp{vm_Cell2i}, not @samp{vm_Cell2Cell}).  In addition, you have to
+define a vm_@var{X}2@var{X} macro for the stack's basic type @var{X}
+(used in superinstructions).
+
+@cindex instruction stream, basic type
+The stack basic type for the predefined @samp{inst-stream} is
+@samp{Cell}.  If you want a stack with the same item size, making its
+basic type @samp{Cell} usually reduces the number of macros you have to
+define.
+
+@cindex unions in type cast macros
+@cindex casts in type cast macros
+@cindex type casting between floats and integers
+Here our examples differ a lot: @file{vmgen-ex} uses casts in these
+macros, whereas @file{vmgen-ex2} uses union-field selection (or
+assignment to union fields).  Note that casting floats into integers and
+vice versa changes the bit pattern (and you do not want that).  In this
+case your options are to use a (temporary) union, or to take the address
+of the value, cast the pointer, and dereference that (not always
+possible, and sometimes expensive).
+
+@findex vm_two@var{A}2@var{B}
+@findex vm_@var{B}2two@var{A}
+@item vm_two@var{A}2@var{B}(a1,a2,b)
+@item vm_@var{B}2two@var{A}(b,a1,a2)
+Type casting between two stack items (@code{a1}, @code{a2}) and a
+variable @code{b} of a type that takes two stack items.  This does not
+occur in our small examples, but you can look at Gforth for examples
+(see @code{vm_twoCell2d} in @file{engine/forth.h}).
+
+@cindex stack pointer definition
+@cindex instruction pointer definition
+@item @var{stackpointer}
+For each stack used, the stackpointer name given in the stack
+declaration is used.  For a regular stack this must be an l-expression;
+typically it is a variable declared as a pointer to the stack's basic
+type.  For @samp{inst-stream}, the name is @samp{IP}, and it can be a
+plain r-value; typically it is a macro that abstracts away the
+differences between the various implementations of @code{NEXT_P*}.
+
+@cindex IMM_ARG
+@findex IMM_ARG
+@item IMM_ARG(access,value)
+Define this to expland to ``(access)''.  This is just a placeholder for
+future extensions.
+
+@cindex top of stack caching
+@cindex stack caching
+@cindex TOS
+@findex IPTOS
+@item @var{stackpointer}TOS
+The top-of-stack for the stack pointed to by @var{stackpointer}.  If you
+are using top-of-stack caching for that stack, this should be defined as
+variable; if you are not using top-of-stack caching for that stack, this
+should be a macro expanding to @samp{@var{stackpointer}[0]}.  The stack
+pointer for the predefined @samp{inst-stream} is called @samp{IP}, so
+the top-of-stack is called @samp{IPTOS}.
+
+@findex IF_@var{stackpointer}TOS
+@item IF_@var{stackpointer}TOS(@var{expr})
+Macro for executing @var{expr}, if top-of-stack caching is used for the
+@var{stackpointer} stack.  I.e., this should do @var{expr} if there is
+top-of-stack caching for @var{stackpointer}; otherwise it should do
+nothing.
+
+@findex SUPER_END
+@item SUPER_END
+This is used by the VM profiler (@pxref{VM profiler}); it should not do
+anything in normal operation, and call @code{vm_count_block(IP)} for
+profiling.
+
+@findex SUPER_CONTINUE
+@item SUPER_CONTINUE
+This is just a hint to Vmgen and does nothing at the C level.
+
+@findex MAYBE_UNUSED
+@item MAYBE_UNUSED
+This should be defined as @code{__attribute__((unused))} for gcc-2.7 and
+higher.  It suppresses the warnings about unused variables in the code
+for superinstructions.  You need to define this only if you are using
+superinstructions.
+
+@findex VM_DEBUG
+@item VM_DEBUG
+If this is defined, the tracing code will be compiled in (slower
+interpretation, but better debugging).  Our example compiles two
+versions of the engine, a fast-running one that cannot trace, and one
+with potential tracing and profiling.
+
+@findex vm_debug
+@item vm_debug
+Needed only if @samp{VM_DEBUG} is defined.  If this variable contains
+true, the VM instructions produce trace output.  It can be turned on or
+off at any time.
+
+@findex vm_out
+@item vm_out
+Needed only if @samp{VM_DEBUG} is defined.  Specifies the file on which
+to print the trace output (type @samp{FILE *}).
+
+@findex printarg_@var{type}
+@item printarg_@var{type}(@var{value})
+Needed only if @samp{VM_DEBUG} is defined.  Macro or function for
+printing @var{value} in a way appropriate for the @var{type}.  This is
+used for printing the values of stack items during tracing.  @var{Type}
+is normally the type prefix specified in a @code{type-prefix} definition
+(e.g., @samp{printarg_i}); in superinstructions it is currently the
+basic type of the stack.
+
+@end table
+
+
+@c --------------------------------------------------------------------
+@node VM instruction table, VM code generation, VM engine, Using the generated code
+@section VM instruction table
+@cindex instruction table
+@cindex opcode definition
+@cindex labels for threaded code
+@cindex @code{vm_prim}, definition
+@cindex @file{-labels.i} output file
+
+For threaded code we also need to produce a table containing the labels
+of all VM instructions.  This is needed for VM code generation
+(@pxref{VM code generation}), and it has to be done in the engine
+function, because the labels are not visible outside.  It then has to be
+passed outside the function (and assigned to @samp{vm_prim}), to be used
+by the VM code generation functions.
+
+This means that the engine function has to be called first to produce
+the VM instruction table, and later, after generating VM code, it has to
+be called again to execute the generated VM code (yes, this is ugly).
+In our example program, these two modes of calling the engine function
+are differentiated by the value of the parameter ip0 (if it equals 0,
+then the table is passed out, otherwise the VM code is executed); in our
+example, we pass the table out by assigning it to @samp{vm_prim} and
+returning from @samp{engine}.
+
+In our example (@file{vmgen-ex/engine.c}), we also build such a table for
+switch dispatch; this is mainly done for uniformity.
+
+For switch dispatch, we also need to define the VM instruction opcodes
+used as case labels in an @code{enum}.
+
+For both purposes (VM instruction table, and enum), the file
+@file{@var{name}-labels.i} is generated by Vmgen.  You have to define
+the following macro used in this file:
+
+@table @code
+
+@findex INST_ADDR
+@item INST_ADDR(@var{inst_name})
+For switch dispatch, this is just the name of the switch label (the same
+name as used in @samp{LABEL(@var{inst_name})}), for both uses of
+@file{@var{name}-labels.i}.  For threaded-code dispatch, this is the
+address of the label defined in @samp{LABEL(@var{inst_name})}); the
+address is taken with @samp{&&} (@pxref{Labels as Values, , Labels as
+Values, gcc.info, GNU C Manual}).
+
+@end table
+
+
+@c --------------------------------------------------------------------
+@node VM code generation, Peephole optimization, VM instruction table, Using the generated code
+@section VM code generation
+@cindex VM code generation
+@cindex code generation, VM
+@cindex @file{-gen.i} output file
+
+Vmgen generates VM code generation functions in @file{@var{name}-gen.i}
+that the front end can call to generate VM code.  This is essential for
+an interpretive system.
+
+@findex gen_@var{inst}
+For a VM instruction @samp{x ( #a b #c -- d )}, Vmgen generates a
+function with the prototype
+
+@example
+void gen_x(Inst **ctp, a_type a, c_type c)
+@end example
+
+The @code{ctp} argument points to a pointer to the next instruction.
+@code{*ctp} is increased by the generation functions; i.e., you should
+allocate memory for the code to be generated beforehand, and start with
+*ctp set at the start of this memory area.  Before running out of
+memory, allocate a new area, and generate a VM-level jump to the new
+area (this overflow handling is not implemented in our examples).
+
+@cindex immediate arguments, VM code generation
+The other arguments correspond to the immediate arguments of the VM
+instruction (with their appropriate types as defined in the
+@code{type_prefix} declaration.
+
+The following types, variables, and functions are used in
+@file{@var{name}-gen.i}:
+
+@table @code
+
+@findex Inst
+@item Inst
+The type of the VM instruction; if you use threaded code, this is
+@code{void *}; for switch dispatch this is an integer type.
+
+@cindex @code{vm_prim}, use
+@item vm_prim
+The VM instruction table (type: @code{Inst *}, @pxref{VM instruction table}).
+
+@findex gen_inst
+@item gen_inst(Inst **ctp, Inst i)
+This function compiles the instruction @code{i}.  Take a look at it in
+@file{vmgen-ex/peephole.c}.  It is trivial when you don't want to use
+superinstructions (just the last two lines of the example function), and
+slightly more complicated in the example due to its ability to use
+superinstructions (@pxref{Peephole optimization}).
+
+@findex genarg_@var{type_prefix}
+@item genarg_@var{type_prefix}(Inst **ctp, @var{type} @var{type_prefix})
+This compiles an immediate argument of @var{type} (as defined in a
+@code{type-prefix} definition).  These functions are trivial to define
+(see @file{vmgen-ex/support.c}).  You need one of these functions for
+every type that you use as immediate argument.
+
+@end table
+
+@findex BB_BOUNDARY
+In addition to using these functions to generate code, you should call
+@code{BB_BOUNDARY} at every basic block entry point if you ever want to
+use superinstructions (or if you want to use the profiling supported by
+Vmgen; but this support is also useful mainly for selecting
+superinstructions).  If you use @code{BB_BOUNDARY}, you should also
+define it (take a look at its definition in @file{vmgen-ex/mini.y}).
+
+You do not need to call @code{BB_BOUNDARY} after branches, because you
+will not define superinstructions that contain branches in the middle
+(and if you did, and it would work, there would be no reason to end the
+superinstruction at the branch), and because the branches announce
+themselves to the profiler.
+
+
+@c --------------------------------------------------------------------
+@node Peephole optimization, VM disassembler, VM code generation, Using the generated code
+@section Peephole optimization
+@cindex peephole optimization
+@cindex superinstructions, generating
+@cindex @file{peephole.c}
+@cindex @file{-peephole.i} output file
+
+You need peephole optimization only if you want to use
+superinstructions.  But having the code for it does not hurt much if you
+do not use superinstructions.
+
+A simple greedy peephole optimization algorithm is used for
+superinstruction selection: every time @code{gen_inst} compiles a VM
+instruction, it checks if it can combine it with the last VM instruction
+(which may also be a superinstruction resulting from a previous peephole
+optimization); if so, it changes the last instruction to the combined
+instruction instead of laying down @code{i} at the current @samp{*ctp}.
+
+The code for peephole optimization is in @file{vmgen-ex/peephole.c}.
+You can use this file almost verbatim.  Vmgen generates
+@file{@var{file}-peephole.i} which contains data for the peephole
+optimizer.
+
+@findex init_peeptable
+You have to call @samp{init_peeptable()} after initializing
+@samp{vm_prim}, and before compiling any VM code to initialize data
+structures for peephole optimization.  After that, compiling with the VM
+code generation functions will automatically combine VM instructions
+into superinstructions.  Since you do not want to combine instructions
+across VM branch targets (otherwise there will not be a proper VM
+instruction to branch to), you have to call @code{BB_BOUNDARY}
+(@pxref{VM code generation}) at branch targets.
+
+
+@c --------------------------------------------------------------------
+@node VM disassembler, VM profiler, Peephole optimization, Using the generated code
+@section VM disassembler
+@cindex VM disassembler
+@cindex disassembler, VM code
+@cindex @file{disasm.c}
+@cindex @file{-disasm.i} output file
+
+A VM code disassembler is optional for an interpretive system, but
+highly recommended during its development and maintenance, because it is
+very useful for detecting bugs in the front end (and for distinguishing
+them from VM interpreter bugs).
+
+Vmgen supports VM code disassembling by generating
+@file{@var{file}-disasm.i}.  This code has to be wrapped into a
+function, as is done in @file{vmgen-ex/disasm.c}.  You can use this file
+almost verbatim.  In addition to @samp{vm_@var{A}2@var{B}(a,b)},
+@samp{vm_out}, @samp{printarg_@var{type}(@var{value})}, which are
+explained above, the following macros and variables are used in
+@file{@var{file}-disasm.i} (and you have to define them):
+
+@table @code
+
+@item ip
+This variable points to the opcode of the current VM instruction.
+
+@cindex @code{IP}, @code{IPTOS} in disassmbler
+@item IP IPTOS
+@samp{IPTOS} is the first argument of the current VM instruction, and
+@samp{IP} points to it; this is just as in the engine, but here
+@samp{ip} points to the opcode of the VM instruction (in contrast to the
+engine, where @samp{ip} points to the next cell, or even one further).
+
+@findex VM_IS_INST
+@item VM_IS_INST(Inst i, int n)
+Tests if the opcode @samp{i} is the same as the @samp{n}th entry in the
+VM instruction table.
+
+@end table
+
+
+@c --------------------------------------------------------------------
+@node VM profiler,  , VM disassembler, Using the generated code
+@section VM profiler
+@cindex VM profiler
+@cindex profiling for selecting superinstructions
+@cindex superinstructions and profiling
+@cindex @file{profile.c}
+@cindex @file{-profile.i} output file
+
+The VM profiler is designed for getting execution and occurence counts
+for VM instruction sequences, and these counts can then be used for
+selecting sequences as superinstructions.  The VM profiler is probably
+not useful as profiling tool for the interpretive system.  I.e., the VM
+profiler is useful for the developers, but not the users of the
+interpretive system.
+
+The output of the profiler is: for each basic block (executed at least
+once), it produces the dynamic execution count of that basic block and
+all its subsequences; e.g.,
+
+@example
+       9227465  lit storelocal 
+       9227465  storelocal branch 
+       9227465  lit storelocal branch 
+@end example
+
+I.e., a basic block consisting of @samp{lit storelocal branch} is
+executed 9227465 times.
+
+@cindex @file{stat.awk}
+@cindex @file{seq2rule.awk}
+This output can be combined in various ways.  E.g.,
+@file{vmgen-ex/stat.awk} adds up the occurences of a given sequence wrt
+dynamic execution, static occurence, and per-program occurence.  E.g.,
+
+@example
+      2      16        36910041 loadlocal lit 
+@end example
+
+@noindent
+indicates that the sequence @samp{loadlocal lit} occurs in 2 programs,
+in 16 places, and has been executed 36910041 times.  Now you can select
+superinstructions in any way you like (note that compile time and space
+typically limit the number of superinstructions to 100--1000).  After
+you have done that, @file{vmgen/seq2rule.awk} turns lines of the form
+above into rules for inclusion in a Vmgen input file.  Note that this
+script does not ensure that all prefixes are defined, so you have to do
+that in other ways.  So, an overall script for turning profiles into
+superinstructions can look like this:
+
+@example
+awk -f stat.awk fib.prof test.prof|
+awk '$3>=10000'|                #select sequences
+fgrep -v -f peephole-blacklist| #eliminate wrong instructions
+awk -f seq2rule.awk|            #turn into superinstructions
+sort -k 3 >mini-super.vmg       #sort sequences
+@end example
+
+Here the dynamic count is used for selecting sequences (preliminary
+results indicate that the static count gives better results, though);
+the third line eliminates sequences containing instructions that must not
+occur in a superinstruction, because they access a stack directly.  The
+dynamic count selection ensures that all subsequences (including
+prefixes) of longer sequences occur (because subsequences have at least
+the same count as the longer sequences); the sort in the last line
+ensures that longer superinstructions occur after their prefixes.
+
+But before using this, you have to have the profiler.  Vmgen supports its
+creation by generating @file{@var{file}-profile.i}; you also need the
+wrapper file @file{vmgen-ex/profile.c} that you can use almost verbatim.
+
+@cindex @code{SUPER_END} in profiling
+@cindex @code{BB_BOUNDARY} in profiling
+The profiler works by recording the targets of all VM control flow
+changes (through @code{SUPER_END} during execution, and through
+@code{BB_BOUNDARY} in the front end), and counting (through
+@code{SUPER_END}) how often they were targeted.  After the program run,
+the numbers are corrected such that each VM basic block has the correct
+count (entering a block without executing a branch does not increase the
+count, and the correction fixes that), then the subsequences of all
+basic blocks are printed.  To get all this, you just have to define
+@code{SUPER_END} (and @code{BB_BOUNDARY}) appropriately, and call
+@code{vm_print_profile(FILE *file)} when you want to output the profile
+on @code{file}.
+
+@cindex @code{VM_IS_INST} in profiling
+The @file{@var{file}-profile.i} is similar to the disassembler file, and
+it uses variables and functions defined in @file{vmgen-ex/profile.c},
+plus @code{VM_IS_INST} already defined for the VM disassembler
+(@pxref{VM disassembler}).
+
+@c **********************************************************
+@node Hints, The future, Using the generated code, Top
+@chapter Hints
+@cindex hints
+
+@menu
+* Floating point::              and stacks
+@end menu
+
+@c --------------------------------------------------------------------
+@node Floating point,  , Hints, Hints
+@section Floating point
+
+How should you deal with floating point values?  Should you use the same
+stack as for integers/pointers, or a different one?  This section
+discusses this issue with a view on execution speed.
+
+The simpler approach is to use a separate floating-point stack.  This
+allows you to choose FP value size without considering the size of the
+integers/pointers, and you avoid a number of performance problems.  The
+main downside is that this needs an FP stack pointer (and that may not
+fit in the register file on the 386 arhitecture, costing some
+performance, but comparatively little if you take the other option into
+account).  If you use a separate FP stack (with stack pointer @code{fp}),
+using an fpTOS is helpful on most machines, but some spill the fpTOS
+register into memory, and fpTOS should not be used there.
+
+The other approach is to share one stack (pointed to by, say, @code{sp})
+between integer/pointer and floating-point values.  This is ok if you do
+not use @code{spTOS}.  If you do use @code{spTOS}, the compiler has to
+decide whether to put that variable into an integer or a floating point
+register, and the other type of operation becomes quite expensive on
+most machines (because moving values between integer and FP registers is
+quite expensive).  If a value of one type has to be synthesized out of
+two values of the other type (@code{double} types), things are even more
+interesting.
+
+One way around this problem would be to not use the @code{spTOS}
+supported by Vmgen, but to use explicit top-of-stack variables (one for
+integers, one for FP values), and having a kind of accumulator+stack
+architecture (e.g., Ocaml bytecode uses this approach); however, this is
+a major change, and it's ramifications are not completely clear.
+
+@c **********************************************************
+@node The future, Changes, Hints, Top
+@chapter The future
+@cindex future ideas
+
+We have a number of ideas for future versions of Vmgen.  However, there
+are so many possible things to do that we would like some feedback from
+you.  What are you doing with Vmgen, what features are you missing, and
+why?
+
+One idea we are thinking about is to generate just one @file{.c} file
+instead of letting you copy and adapt all the wrapper files (you would
+still have to define stuff like the type-specific macros, and stack
+pointers etc. somewhere).  The advantage would be that, if we change the
+wrapper files between versions, you would not need to integrate your
+changes and our changes to them; Vmgen would also be easier to use for
+beginners.  The main disadvantage of that is that it would reduce the
+flexibility of Vmgen a little (well, those who like flexibility could
+still patch the resulting @file{.c} file, like they are now doing for
+the wrapper files).  In any case, if you are doing things to the wrapper
+files that would cause problems in a generated-@file{.c}-file approach,
+please let us know.
+
+@c **********************************************************
+@node Changes, Contact, The future, Top
+@chapter Changes
+@cindex Changes from old versions
+
+User-visible changes between 0.5.9-20020822 and 0.5.9-20020901:
+
+The store optimization is now disabled by default, but can be enabled by
+the user (@pxref{Store Optimization}).  Documentation for this
+optimization is also new.
+
+User-visible changes between 0.5.9-20010501 and 0.5.9-20020822:
+
+There is now a manual (in info, HTML, Postscript, or plain text format).
+
+There is the vmgen-ex2 variant of the vmgen-ex example; the new
+variant uses a union type instead of lots of casting.
+
+Both variants of the example can now be compiled with an ANSI C compiler
+(using switch dispatch and losing quite a bit of performance); tested
+with @command{lcc}.
+
+Users of the gforth-0.5.9-20010501 version of Vmgen need to change
+several things in their source code to use the current version.  I
+recommend keeping the gforth-0.5.9-20010501 version until you have
+completed the change (note that you can have several versions of Gforth
+installed at the same time).  I hope to avoid such incompatible changes
+in the future.
+
+The required changes are:
+
+@table @code
+
+@cindex @code{TAIL;}, changes
+@item TAIL;
+has been renamed into @code{INST_TAIL;} (less chance of an accidental
+match).
+
+@cindex @code{vm_@var{A}2@var{B}}, changes
+@item vm_@var{A}2@var{B}
+now takes two arguments.
+
+@cindex @code{vm_two@var{A}2@var{B}}, changes
+@item vm_two@var{A}2@var{B}(b,a1,a2);
+changed to vm_two@var{A}2@var{B}(a1,a2,b) (note the absence of the @samp{;}).
+
+@end table
+
+Also some new macros have to be defined, e.g., @code{INST_ADDR}, and
+@code{LABEL}; some macros have to be defined in new contexts, e.g.,
+@code{VM_IS_INST} is now also needed in the disassembler.
+
+@c *********************************************************
+@node Contact, Copying This Manual, Changes, Top
+@chapter Contact
+
+To report a bug, use
+@url{https://savannah.gnu.org/bugs/?func=addbug&group_id=2672}.
+
+For discussion on Vmgen (e.g., how to use it), use the mailing list
+@email{bug-vmgen@@mail.freesoftware.fsf.org} (use
+@url{http://mail.gnu.org/mailman/listinfo/help-vmgen} to subscribe).
+
+You can find vmgen information at
+@url{http://www.complang.tuwien.ac.at/anton/vmgen/}.
 
-@section Stacks, types, and prefixes
+@c ***********************************************************
+@node Copying This Manual, Index, Contact, Top
+@appendix Copying This Manual
 
+@menu
+* GNU Free Documentation License::  License for copying this manual.
+@end menu
 
+@node GNU Free Documentation License,  , Copying This Manual, Copying This Manual
+@appendixsec GNU Free Documentation License
+@include fdl.texi
 
-Invocation
 
-Input Syntax
+@node Index,  , Copying This Manual, Top
+@unnumbered Index
 
-Concepts: Front end, VM, Stacks,  Types, input stream
+@printindex cp
 
-Contact
+@bye