File:  [gforth] / gforth / doc / vmgen.texi
Revision 1.2: download - view: text, annotated - select for diffs
Tue May 28 08:54:28 2002 UTC (17 years, 5 months ago) by anton
Branches: MAIN
CVS tags: HEAD
Documentation changes

    1: @include version.texi
    2: 
    3: @c @ifnottex
    4: This file documents vmgen (Gforth @value{VERSION}).
    5: 
    6: @chapter Introduction
    7: 
    8: Vmgen is a tool for writing efficient interpreters.  It takes a simple
    9: virtual machine description and generates efficient C code for dealing
   10: with the virtual machine code in various ways (in particular, executing
   11: it).  The run-time efficiency of the resulting interpreters is usually
   12: within a factor of 10 of machine code produced by an optimizing
   13: compiler.
   14: 
   15: The interpreter design strategy supported by vmgen is to divide the
   16: interpreter into two parts:
   17: 
   18: @itemize @bullet
   19: 
   20: @item The @emph{front end} takes the source code of the language to be
   21: implemented, and translates it into virtual machine code.  This is
   22: similar to an ordinary compiler front end; typically an interpreter
   23: front-end performs no optimization, so it is relatively simple to
   24: implement and runs fast.
   25: 
   26: @item The @emph{virtual machine interpreter} executes the virtual
   27: machine code.
   28: 
   29: @end itemize
   30: 
   31: Such a division is usually used in interpreters, for modularity as well
   32: as for efficiency reasons.  The virtual machine code is typically passed
   33: between front end and virtual machine interpreter in memory, like in a
   34: load-and-go compiler; this avoids the complexity and time cost of
   35: writing the code to a file and reading it again.
   36: 
   37: A @emph{virtual machine} (VM) represents the program as a sequence of
   38: @emph{VM instructions}, following each other in memory, similar to real
   39: machine code.  Control flow occurs through VM branch instructions, like
   40: in a real machine.
   41: 
   42: In this setup, vmgen can generate most of the code dealing with virtual
   43: machine instructions from a simple description of the virtual machine
   44: instructions (@pxref...), in particular:
   45: 
   46: @table @emph
   47: 
   48: @item VM instruction execution
   49: 
   50: @item VM code generation
   51: Useful in the front end.
   52: 
   53: @item VM code decompiler
   54: Useful for debugging the front end.
   55: 
   56: @item VM code tracing
   57: Useful for debugging the front end and the VM interpreter.  You will
   58: typically provide other means for debugging the user's programs at the
   59: source level.
   60: 
   61: @item VM code profiling
   62: Useful for optimizing the VM insterpreter with superinstructions
   63: (@pxref...).
   64: 
   65: @end table
   66: 
   67: VMgen supports efficient interpreters though various optimizations, in
   68: particular
   69: 
   70: @itemize
   71: 
   72: @item Threaded code
   73: 
   74: @item Caching the top-of-stack in a register
   75: 
   76: @item Combining VM instructions into superinstructions
   77: 
   78: @item
   79: Replicating VM (super)instructions for better BTB prediction accuracy
   80: (not yet in vmgen-ex, but already in Gforth).
   81: 
   82: @end itemize
   83: 
   84: As a result, vmgen-based interpreters are only about an order of
   85: magintude slower than native code from an optimizing C compiler on small
   86: benchmarks; on large benchmarks, which spend more time in the run-time
   87: system, the slowdown is often less (e.g., the slowdown of a
   88: Vmgen-generated JVM interpreter over the best JVM JIT compiler we
   89: measured is only a factor of 2-3 for large benchmarks; some other JITs
   90: and all other interpreters we looked at were slower than our
   91: interpreter).
   92: 
   93: VMs are usually designed as stack machines (passing data between VM
   94: instructions on a stack), and vmgen supports such designs especially
   95: well; however, you can also use vmgen for implementing a register VM and
   96: still benefit from most of the advantages offered by vmgen.
   97: 
   98: There are many potential uses of the instruction descriptions that are
   99: not implemented at the moment, but we are open for feature requests, and
  100: we will implement new features if someone asks for them; so the feature
  101: list above is not exhaustive.
  102: 
  103: @c *********************************************************************
  104: @chapter Why interpreters?
  105: 
  106: Interpreters are a popular language implementation technique because
  107: they combine all three of the following advantages:
  108: 
  109: @itemize
  110: 
  111: @item Ease of implementation
  112: 
  113: @item Portability
  114: 
  115: @item Fast edit-compile-run cycle
  116: 
  117: @end itemize
  118: 
  119: The main disadvantage of interpreters is their run-time speed.  However,
  120: there are huge differences between different interpreters in this area:
  121: the slowdown over optimized C code on programs consisting of simple
  122: operations is typically a factor of 10 for the more efficient
  123: interpreters, and a factor of 1000 for the less efficient ones (the
  124: slowdown for programs executing complex operations is less, because the
  125: time spent in libraries for executing complex operations is the same in
  126: all implementation strategies).
  127: 
  128: Vmgen makes it even easier to implement interpreters.  It also supports
  129: techniques for building efficient interpreters.
  130: 
  131: @c ********************************************************************
  132: 
  133: @chapter Concepts
  134: 
  135: @c --------------------------------------------------------------------
  136: @section Front-end and virtual machine interpreter
  137: 
  138: @cindex front-end
  139: Interpretive systems are typically divided into a @emph{front end} that
  140: parses the input language and produces an intermediate representation
  141: for the program, and an interpreter that executes the intermediate
  142: representation of the program.
  143: 
  144: @cindex virtual machine
  145: @cindex VM
  146: @cindex instruction, VM
  147: For efficient interpreters the intermediate representation of choice is
  148: virtual machine code (rather than, e.g., an abstract syntax tree).
  149: @emph{Virtual machine} (VM) code consists of VM instructions arranged
  150: sequentially in memory; they are executed in sequence by the VM
  151: interpreter, except for VM branch instructions, which implement control
  152: structures.  The conceptual similarity to real machine code results in
  153: the name @emph{virtual machine}.
  154: 
  155: In this framework, vmgen supports building the VM interpreter and any
  156: other component dealing with VM instructions.  It does not have any
  157: support for the front end, apart from VM code generation support.  The
  158: front end can be implemented with classical compiler front-end
  159: techniques, which are supported by tools like @command{flex} and
  160: @command{bison}.
  161: 
  162: The intermediate representation is usually just internal to the
  163: interpreter, but some systems also support saving it to a file, either
  164: as an image file, or in a full-blown linkable file format (e.g., JVM).
  165: Vmgen currently has no special support for such features, but the
  166: information in the instruction descriptions can be helpful, and we are
  167: open for feature requests and suggestions.
  168: 
  169: 
  170: 
  171: Invocation
  172: 
  173: Input Syntax
  174: 
  175: Concepts: Front end, VM, Stacks,  Types, input stream
  176: 
  177: Contact

FreeBSD-CVSweb <freebsd-cvsweb@FreeBSD.org>