Annotation of gforth/doc/vmgen.texi, revision 1.2

1.1       anton       1: @include version.texi
                      2: 
                      3: @c @ifnottex
                      4: This file documents vmgen (Gforth @value{VERSION}).
                      5: 
1.2     ! anton       6: @chapter Introduction
1.1       anton       7: 
                      8: Vmgen is a tool for writing efficient interpreters.  It takes a simple
                      9: virtual machine description and generates efficient C code for dealing
                     10: with the virtual machine code in various ways (in particular, executing
                     11: it).  The run-time efficiency of the resulting interpreters is usually
                     12: within a factor of 10 of machine code produced by an optimizing
                     13: compiler.
                     14: 
                     15: The interpreter design strategy supported by vmgen is to divide the
                     16: interpreter into two parts:
                     17: 
                     18: @itemize @bullet
                     19: 
                     20: @item The @emph{front end} takes the source code of the language to be
                     21: implemented, and translates it into virtual machine code.  This is
                     22: similar to an ordinary compiler front end; typically an interpreter
                     23: front-end performs no optimization, so it is relatively simple to
                     24: implement and runs fast.
                     25: 
                     26: @item The @emph{virtual machine interpreter} executes the virtual
                     27: machine code.
                     28: 
                     29: @end itemize
                     30: 
                     31: Such a division is usually used in interpreters, for modularity as well
                     32: as for efficiency reasons.  The virtual machine code is typically passed
                     33: between front end and virtual machine interpreter in memory, like in a
                     34: load-and-go compiler; this avoids the complexity and time cost of
                     35: writing the code to a file and reading it again.
                     36: 
                     37: A @emph{virtual machine} (VM) represents the program as a sequence of
                     38: @emph{VM instructions}, following each other in memory, similar to real
                     39: machine code.  Control flow occurs through VM branch instructions, like
                     40: in a real machine.
                     41: 
                     42: In this setup, vmgen can generate most of the code dealing with virtual
                     43: machine instructions from a simple description of the virtual machine
                     44: instructions (@pxref...), in particular:
                     45: 
                     46: @table @emph
                     47: 
                     48: @item VM instruction execution
                     49: 
                     50: @item VM code generation
                     51: Useful in the front end.
                     52: 
                     53: @item VM code decompiler
                     54: Useful for debugging the front end.
                     55: 
                     56: @item VM code tracing
                     57: Useful for debugging the front end and the VM interpreter.  You will
                     58: typically provide other means for debugging the user's programs at the
                     59: source level.
                     60: 
                     61: @item VM code profiling
                     62: Useful for optimizing the VM insterpreter with superinstructions
                     63: (@pxref...).
                     64: 
                     65: @end table
                     66: 
                     67: VMgen supports efficient interpreters though various optimizations, in
                     68: particular
                     69: 
                     70: @itemize
                     71: 
                     72: @item Threaded code
                     73: 
                     74: @item Caching the top-of-stack in a register
                     75: 
                     76: @item Combining VM instructions into superinstructions
                     77: 
                     78: @item
                     79: Replicating VM (super)instructions for better BTB prediction accuracy
                     80: (not yet in vmgen-ex, but already in Gforth).
                     81: 
                     82: @end itemize
                     83: 
                     84: As a result, vmgen-based interpreters are only about an order of
                     85: magintude slower than native code from an optimizing C compiler on small
                     86: benchmarks; on large benchmarks, which spend more time in the run-time
1.2     ! anton      87: system, the slowdown is often less (e.g., the slowdown of a
        !            88: Vmgen-generated JVM interpreter over the best JVM JIT compiler we
        !            89: measured is only a factor of 2-3 for large benchmarks; some other JITs
        !            90: and all other interpreters we looked at were slower than our
        !            91: interpreter).
1.1       anton      92: 
                     93: VMs are usually designed as stack machines (passing data between VM
                     94: instructions on a stack), and vmgen supports such designs especially
                     95: well; however, you can also use vmgen for implementing a register VM and
                     96: still benefit from most of the advantages offered by vmgen.
                     97: 
1.2     ! anton      98: There are many potential uses of the instruction descriptions that are
        !            99: not implemented at the moment, but we are open for feature requests, and
        !           100: we will implement new features if someone asks for them; so the feature
        !           101: list above is not exhaustive.
1.1       anton     102: 
1.2     ! anton     103: @c *********************************************************************
        !           104: @chapter Why interpreters?
        !           105: 
        !           106: Interpreters are a popular language implementation technique because
        !           107: they combine all three of the following advantages:
        !           108: 
        !           109: @itemize
        !           110: 
        !           111: @item Ease of implementation
        !           112: 
        !           113: @item Portability
        !           114: 
        !           115: @item Fast edit-compile-run cycle
        !           116: 
        !           117: @end itemize
        !           118: 
        !           119: The main disadvantage of interpreters is their run-time speed.  However,
        !           120: there are huge differences between different interpreters in this area:
        !           121: the slowdown over optimized C code on programs consisting of simple
        !           122: operations is typically a factor of 10 for the more efficient
        !           123: interpreters, and a factor of 1000 for the less efficient ones (the
        !           124: slowdown for programs executing complex operations is less, because the
        !           125: time spent in libraries for executing complex operations is the same in
        !           126: all implementation strategies).
        !           127: 
        !           128: Vmgen makes it even easier to implement interpreters.  It also supports
        !           129: techniques for building efficient interpreters.
        !           130: 
        !           131: @c ********************************************************************
        !           132: 
        !           133: @chapter Concepts
        !           134: 
        !           135: @c --------------------------------------------------------------------
        !           136: @section Front-end and virtual machine interpreter
        !           137: 
        !           138: @cindex front-end
        !           139: Interpretive systems are typically divided into a @emph{front end} that
        !           140: parses the input language and produces an intermediate representation
        !           141: for the program, and an interpreter that executes the intermediate
        !           142: representation of the program.
        !           143: 
        !           144: @cindex virtual machine
        !           145: @cindex VM
        !           146: @cindex instruction, VM
        !           147: For efficient interpreters the intermediate representation of choice is
        !           148: virtual machine code (rather than, e.g., an abstract syntax tree).
        !           149: @emph{Virtual machine} (VM) code consists of VM instructions arranged
        !           150: sequentially in memory; they are executed in sequence by the VM
        !           151: interpreter, except for VM branch instructions, which implement control
        !           152: structures.  The conceptual similarity to real machine code results in
        !           153: the name @emph{virtual machine}.
        !           154: 
        !           155: In this framework, vmgen supports building the VM interpreter and any
        !           156: other component dealing with VM instructions.  It does not have any
        !           157: support for the front end, apart from VM code generation support.  The
        !           158: front end can be implemented with classical compiler front-end
        !           159: techniques, which are supported by tools like @command{flex} and
        !           160: @command{bison}.
        !           161: 
        !           162: The intermediate representation is usually just internal to the
        !           163: interpreter, but some systems also support saving it to a file, either
        !           164: as an image file, or in a full-blown linkable file format (e.g., JVM).
        !           165: Vmgen currently has no special support for such features, but the
        !           166: information in the instruction descriptions can be helpful, and we are
        !           167: open for feature requests and suggestions.
        !           168: 
        !           169: 
        !           170: 
        !           171: Invocation
        !           172: 
        !           173: Input Syntax
        !           174: 
        !           175: Concepts: Front end, VM, Stacks,  Types, input stream
        !           176: 
        !           177: Contact

FreeBSD-CVSweb <freebsd-cvsweb@FreeBSD.org>