--- gforth/doc/vmgen.texi	2002/08/01 21:14:25	1.5
+++ gforth/doc/vmgen.texi	2002/08/09 09:42:35	1.9
@@ -29,8 +29,8 @@ machine code.
 @end itemize
 
 Such a division is usually used in interpreters, for modularity as well
-as for efficiency reasons.  The virtual machine code is typically passed
-between front end and virtual machine interpreter in memory, like in a
+as for efficiency.  The virtual machine code is typically passed between
+front end and virtual machine interpreter in memory, like in a
 load-and-go compiler; this avoids the complexity and time cost of
 writing the code to a file and reading it again.
 
@@ -129,7 +129,6 @@ Vmgen makes it even easier to implement
 techniques for building efficient interpreters.
 
 @c ********************************************************************
-
 @chapter Concepts
 
 @c --------------------------------------------------------------------
@@ -203,6 +202,38 @@ harder, but might be possible (contact u
 @c reference counting might be possible by including counting code in 
 @c the conversion macros.
 
+@section Dispatch
+
+Understanding this section is probably not necessary for using vmgen,
+but it may help.  You may want to skip it now, and read it if you find statements about dispatch methods confusing.
+
+After executing one VM instruction, the VM interpreter has to dispatch
+the next VM instruction (vmgen calls the dispatch routine @samp{NEXT}).
+Vmgen supports two methods of dispatch:
+
+@table
+
+@item switch dispatch
+In this method the VM interpreter contains a giant @code{switch}
+statement, with one @code{case} for each VM instruction.  The VM
+instructions are represented by integers (e.g., produced by an
+@code{enum}) in the VM code, and dipatch occurs by loading the next
+integer from the VM code, @code{switch}ing on it, and continuing at the
+appropriate @code{case}; after executing the VM instruction, jump back
+to the dispatch code.
+
+@item threaded code
+This method represents a VM instruction in the VM code by the address of
+the start of the machine code fragment for executing the VM instruction.
+Dispatch consists of loading this address, jumping to it, and
+incrementing the VM instruction pointer.  Typically the threaded-code
+dispatch code is appended directly to the code for executing the VM
+instruction.  Threaded code cannot be implemented in ANSI C, but it can
+be implemented using GNU C's labels-as-values extension (@pxref{labels
+as values}).
+
+@end table
+
 @c *************************************************************
 @chapter Invoking vmgen
 
@@ -627,6 +658,12 @@ contents.
 
 @section Superinstructions
 
+Note: don't invest too much work in (static) superinstructions; a future
+version of vmgen will support dynamic superinstructions (see Ian
+Piumarta and Fabio Riccardi, @cite{Optimizing Direct Threaded Code by
+Selective Inlining}, PLDI'98), and static superinstructions have much
+less benefit in that context.
+
 Here is an example of a superinstruction definition:
 
 @example
@@ -681,14 +718,26 @@ purposes.  This chapter is just the refe
 etc. used by the generated code, and the other context expected by the
 generated code, and what you can do with the various generated files.
 
+
 @section VM engine
 
 The VM engine is the VM interpreter that executes the VM code.  It is
 essential for an interpretive system.
 
-The main file generated for the VM interpreter is
-@file{@var{name}-vm.i}.  It uses the following macros and variables (and
-you have to define them):
+Vmgen supports two methods of VM instruction dispatch: @emph{threaded
+code} (fast, but gcc-specific), and @emph{switch dispatch} (slow, but
+portable across C compilers); you can use conditional compilation
+(@samp{defined(__GNUC__)}) to choose between these methods, and our
+example does so.
+
+For both methods, the VM engine is contained in a C-level function.
+Vmgen generates most of the contents of the function for you
+(@file{@var{name}-vm.i}), but you have to define this function, and
+macros and variables used in the engine, and initialize the variables.
+In our example the engine function also includes
+@file{@var{name}-labels.i} (@pxref{VM instruction table}).
+
+The following macros and variables are used in @file{@var{name}-vm.i}:
 
 @table @code
 
@@ -700,6 +749,10 @@ threaded-code dispatch this should just
 @var{label}}.  In either case @var{label} is usually the @var{inst_name}
 with some prefix or suffix to avoid naming conflicts.
 
+@item LABEL2(@var{inst_name})
+This will be used for dynamic superinstructions; at the moment, this
+should expand to nothing.
+
 @item NAME(@var{inst_name_string})
 Called on entering a VM instruction with a string containing the name of
 the VM instruction as parameter.  In normal execution this should be a
@@ -738,7 +791,10 @@ extreme variant is to pull code up even
 the previous VM instruction (prefetching, useful on PowerPCs).
 
 @item INC_IP(@var{n})
-This increments IP by @var{n}.
+This increments @code{IP} by @var{n}.
+
+@item SET_IP(@var{target})
+This sets @code{IP} to @var{target}.
 
 @item vm_@var{A}2@var{B}(a,b)
 Type casting macro that assigns @samp{a} (of type @var{A}) to @samp{b}
@@ -788,6 +844,14 @@ Macro for executing @var{expr}, if top-o
 top-of-stack caching for @var{stackpointer}; otherwise it should do
 nothing.
 
+@item SUPER_END
+This is used by the VM profiler (@pxref{VM profiler}); it should not do
+anything in normal operation, and call @code{vm_count_block(IP)} for
+profiling.
+
+@item SUPER_CONTINUE
+This is just a hint to vmgen and does nothing at the C level.
+
 @item VM_DEBUG
 If this is defined, the tracing code will be compiled in (slower
 interpretation, but better debugging).  Our example compiles two
@@ -813,37 +877,274 @@ basic type of the stack.
 
 @end table
 
-The file @file{@var{name}-labels.i} is used for enumerating or listing
-all virtual machine instructions and uses the following macro:
+
+@section{VM instruction table}
+
+For threaded code we also need to produce a table containing the labels
+of all VM instructions.  This is needed for VM code generation
+(@pxref{VM code generation}), and it has to be done in the engine
+function, because the labels are not visible outside.  It then has to be
+passed outside the function (and assigned to @samp{vm_prim}), to be used
+by the VM code generation functions.
+
+This means that the engine function has to be called first to produce
+the VM instruction table, and later, after generating VM code, it has to
+be called again to execute the generated VM code (yes, this is ugly).
+In our example program, these two modes of calling the engine function
+are differentiated by the value of the parameter ip0 (if it equals 0,
+then the table is passed out, otherwise the VM code is executed); in our
+example, we pass the table out by assigning it to @samp{vm_prim} and
+returning from @samp{engine}.
+
+In our example, we also build such a table for switch dispatch; this is
+mainly done for uniformity.
+
+For switch dispatch, we also need to define the VM instruction opcodes
+used as case labels in an @code{enum}.
+
+For both purposes (VM instruction table, and enum), the file
+@file{@var{name}-labels.i} is generated by vmgen.  You have to define
+the following macro used in this file:
 
 @table @samp
 
 @item INST_ADDR(@var{inst_name})
 For switch dispatch, this is just the name of the switch label (the same
-name as used in @samp{LABEL(@var{inst_name})}).  For threaded-code
-dispatch, this is the address of the label defined in
-@samp{LABEL(@var{inst_name})}); the address is taken with @samp{&&}
-(@pxref{labels-as-values}).
+name as used in @samp{LABEL(@var{inst_name})}), for both uses of
+@file{@var{name}-labels.i}.  For threaded-code dispatch, this is the
+address of the label defined in @samp{LABEL(@var{inst_name})}); the
+address is taken with @samp{&&} (@pxref{labels-as-values}).
 
 @end table
 
 
+@section VM code generation
+
+Vmgen generates VM code generation functions in @file{@var{name}-gen.i}
+that the front end can call to generate VM code.  This is essential for
+an interpretive system.
+
+For a VM instruction @samp{x ( #a b #c -- d )}, vmgen generates a
+function with the prototype
+
+@example
+void gen_x(Inst **ctp, a_type a, c_type c)
+@end example
+
+The @code{ctp} argument points to a pointer to the next instruction.
+@code{*ctp} is increased by the generation functions; i.e., you should
+allocate memory for the code to be generated beforehand, and start with
+*ctp set at the start of this memory area.  Before running out of
+memory, allocate a new area, and generate a VM-level jump to the new
+area (this is not implemented in our examples).
 
-@section Stacks, types, and prefixes
+The other arguments correspond to the immediate arguments of the VM
+instruction (with their appropriate types as defined in the
+@code{type_prefix} declaration.
 
+The following types, variables, and functions are used in
+@file{@var{name}-gen.i}:
 
+@table @samp
+
+@item Inst
+The type of the VM instruction; if you use threaded code, this is
+@code{void *}; for switch dispatch this is an integer type.
+
+@item vm_prim
+The VM instruction table (type: @code{Inst *}, @pxref{VM instruction table}).
+
+@item gen_inst(Inst **ctp, Inst i)
+This function compiles the instruction @code{i}.  Take a look at it in
+@file{vmgen-ex/peephole.c}.  It is trivial when you don't want to use
+superinstructions (just the last two lines of the example function), and
+slightly more complicated in the example due to its ability to use
+superinstructions (@pxref{Peephole optimization}).
+
+@item genarg_@var{type_prefix}(Inst **ctp, @var{type} @var{type_prefix})
+This compiles an immediate argument of @var{type} (as defined in a
+@code{type-prefix} definition).  These functions are trivial to define
+(see @file{vmgen-ex/support.c}).  You need one of these functions for
+every type that you use as immediate argument.
+
+@end table
+
+In addition to using these functions to generate code, you should call
+@code{BB_BOUNDARY} at every basic block entry point if you ever want to
+use superinstructions (or if you want to use the profiling supported by
+vmgen; however, this is mainly useful for selecting superinstructions).
+If you use @code{BB_BOUNDARY}, you should also define it (take a look at
+its definition in @file{vmgen-ex/mini.y}).
+
+You do not need to call @code{BB_BOUNDARY} after branches, because you
+will not define superinstructions that contain branches in the middle
+(and if you did, and it would work, there would be no reason to end the
+superinstruction at the branch), and because the branches announce
+themselves to the profiler.
+
+
+@section Peephole optimization
+
+You need peephole optimization only if you want to use
+superinstructions.  But having the code for it does not hurt much if you
+do not use superinstructions.
+
+A simple greedy peephole optimization algorithm is used for
+superinstruction selection: every time @code{gen_inst} compiles a VM
+instruction, it looks if it can combine it with the last VM instruction
+(which may also be a superinstruction resulting from a previous peephole
+optimization); if so, it changes the last instruction to the combined
+instruction instead of laying down @code{i} at the current @samp{*ctp}.
+
+The code for peephole optimization is in @file{vmgen-ex/peephole.c}.
+You can use this file almost verbatim.  Vmgen generates
+@file{@var{file}-peephole.i} which contains data for the peephoile
+optimizer.
+
+You have to call @samp{init_peeptable()} after initializing
+@samp{vm_prim}, and before compiling any VM code to initialize data
+structures for peephole optimization.  After that, compiling with the VM
+code generation functions will automatically combine VM instructions
+into superinstructions.  Since you do not want to combine instructions
+across VM branch targets (otherwise there will not be a proper VM
+instruction to branch to), you have to call @code{BB_BOUNDARY}
+(@pxref{VM code generation}) at branch targets.
+
+
+@section VM disassembler
+
+A VM code disassembler is optional for an interpretive system, but
+highly recommended during its development and maintenance, because it is
+very useful for detecting bugs in the front end (and for distinguishing
+them from VM interpreter bugs).
+
+Vmgen supports VM code disassembling by generating
+@file{@var{file}-disasm.i}.  This code has to be wrapped into a
+function, as is done in @file{vmgen-ex/disasm.i}.  You can use this file
+almost verbatim.  In addition to @samp{vm_@var{A}2@var{B}(a,b)},
+@samp{vm_out}, @samp{printarg_@var{type}(@var{value})}, which are
+explained above, the following macros and variables are used in
+@file{@var{file}-disasm.i} (and you have to define them):
+
+@table @samp
 
-Invocation
+@item ip
+This variable points to the opcode of the current VM instruction.
 
-Input Syntax
+@item IP IPTOS
+@samp{IPTOS} is the first argument of the current VM instruction, and
+@samp{IP} points to it; this is just as in the engine, but here
+@samp{ip} points to the opcode of the VM instruction (in contrast to the
+engine, where @samp{ip} points to the next cell, or even one further).
+
+@item VM_IS_INST(Inst i, int n)
+Tests if the opcode @samp{i} is the same as the @samp{n}th entry in the
+VM instruction table.
 
-Concepts: Front end, VM, Stacks,  Types, input stream
+@end table
+
+
+@section VM profiler
+
+The VM profiler is designed for getting execution and occurence counts
+for VM instruction sequences, and these counts can then be used for
+selecting sequences as superinstructions.  The VM profiler is probably
+not useful as profiling tool for the interpretive system.  I.e., the VM
+profiler is useful for the developers, but not the users of the
+interpretive system.
+
+The output of the profiler is: for each basic block (executed at least
+once), it produces the dynamic execution count of that basic block and
+all its subsequences; e.g.,
+
+@example
+       9227465  lit storelocal 
+       9227465  storelocal branch 
+       9227465  lit storelocal branch 
+@end example
+
+I.e., a basic block consisting of @samp{lit storelocal branch} is
+executed 9227465 times.
+
+This output can be combined in various ways.  E.g.,
+@file{vmgen/stat.awk} adds up the occurences of a given sequence wrt
+dynamic execution, static occurence, and per-program occurence.  E.g.,
+
+@example
+      2      16        36910041 loadlocal lit 
+@end example
+
+indicates that the sequence @samp{loadlocal lit} occurs in 2 programs,
+in 16 places, and has been executed 36910041 times.  Now you can select
+superinstructions in any way you like (note that compile time and space
+typically limit the number of superinstructions to 100--1000).  After
+you have done that, @file{vmgen/seq2rule.awk} turns lines of the form
+above into rules for inclusion in a vmgen input file.  Note that this
+script does not ensure that all prefixes are defined, so you have to do
+that in other ways.  So, an overall script for turning profiles into
+superinstructions can look like this:
+
+@example
+awk -f stat.awk fib.prof test.prof|
+awk '$3>=10000'|                #select sequences
+fgrep -v -f peephole-blacklist| #eliminate wrong instructions
+awk -f seq2rule.awk|            #turn into superinstructions
+sort -k 3 >mini-super.vmg       #sort sequences
+@end example
+
+Here the dynamic count is used for selecting sequences (preliminary
+results indicate that the static count gives better results, though);
+the third line eliminats sequences containing instructions that must not
+occur in a superinstruction, because they access a stack directly.  The
+dynamic count selection ensures that all subsequences (including
+prefixes) of longer sequences occur (because subsequences have at least
+the same count as the longer sequences); the sort in the last line
+ensures that longer superinstructions occur after their prefixes.
+
+But before using it, you have to have the profiler.  Vmgen supports its
+creation by generating @file{@var{file}-profile.i}; you also need the
+wrapper file @file{vmgen-ex/profile.c} that you can use almost verbatim.
+
+The profiler works by recording the targets of all VM control flow
+changes (through @code{SUPER_END} during execution, and through
+@code{BB_BOUNDARY} in the front end), and counting (through
+@code{SUPER_END}) how often they were targeted.  After the program run,
+the numbers are corrected such that each VM basic block has the correct
+count (originally entering a block without executing a branch does not
+increase the count), then the subsequences of all basic blocks are
+printed.  To get all this, you just have to define @code{SUPER_END} (and
+@code{BB_BOUNDARY}) appropriately, and call @code{vm_print_profile(FILE
+*file)} when you want to output the profile on @code{file}.
+
+The @file{@var{file}-profile.i} is simular to the disassembler file, and
+it uses variables and functions defined in @file{vmgen-ex/profile.c},
+plus @code{VM_IS_INST} already defined for the VM disassembler
+(@pxref{VM disassembler}).
+
+@chapter Changes
+
+Users of the gforth-0.5.9-20010501 version of vmgen need to change
+several things in their source code to use the current version.  I
+recommend keeping the gforth-0.5.9-20010501 version until you have
+completed the change (note that you can have several versions of Gforth
+installed at the same time).  I hope to avoid such incompatible changes
+in the future.
+
+The required changes are:
+
+@table @code
+
+@item vm_@var{A}2@var{B}
+now takes two arguments.
+
+@item vm_two@var{A}2@var{B}(b,a1,a2);
+changed to vm_two@var{A}2@var{B}(a1,a2,b) (note the absence of the @samp{;}).
+
+@end table
 
-Contact
+Also some new macros have to be defined, e.g., @code{INST_ADDR}, and
+@code{LABEL}; some macros have to be defined in new contexts, e.g.,
+@code{VM_IS_INST} is now also needed in the disassembler.
 
+@chapter Contact
 
-Required changes:
-vm_...2... -> two arguments
-"vm_two...2...(arg1,arg2,arg3);" -> "vm_two...2...(arg3,arg1,arg2)" (no ";").
-define INST_ADDR and LABEL
-define VM_IS_INST also for disassembler