File:  [gforth] / gforth / doc / vmgen.texi
Revision 1.2: download - view: text, annotated - select for diffs
Tue May 28 08:54:28 2002 UTC (21 years, 10 months ago) by anton
Branches: MAIN
CVS tags: HEAD
Documentation changes

@include version.texi

@c @ifnottex
This file documents vmgen (Gforth @value{VERSION}).

@chapter Introduction

Vmgen is a tool for writing efficient interpreters.  It takes a simple
virtual machine description and generates efficient C code for dealing
with the virtual machine code in various ways (in particular, executing
it).  The run-time efficiency of the resulting interpreters is usually
within a factor of 10 of machine code produced by an optimizing
compiler.

The interpreter design strategy supported by vmgen is to divide the
interpreter into two parts:

@itemize @bullet

@item The @emph{front end} takes the source code of the language to be
implemented, and translates it into virtual machine code.  This is
similar to an ordinary compiler front end; typically an interpreter
front-end performs no optimization, so it is relatively simple to
implement and runs fast.

@item The @emph{virtual machine interpreter} executes the virtual
machine code.

@end itemize

Such a division is usually used in interpreters, for modularity as well
as for efficiency reasons.  The virtual machine code is typically passed
between front end and virtual machine interpreter in memory, like in a
load-and-go compiler; this avoids the complexity and time cost of
writing the code to a file and reading it again.

A @emph{virtual machine} (VM) represents the program as a sequence of
@emph{VM instructions}, following each other in memory, similar to real
machine code.  Control flow occurs through VM branch instructions, like
in a real machine.

In this setup, vmgen can generate most of the code dealing with virtual
machine instructions from a simple description of the virtual machine
instructions (@pxref...), in particular:

@table @emph

@item VM instruction execution

@item VM code generation
Useful in the front end.

@item VM code decompiler
Useful for debugging the front end.

@item VM code tracing
Useful for debugging the front end and the VM interpreter.  You will
typically provide other means for debugging the user's programs at the
source level.

@item VM code profiling
Useful for optimizing the VM insterpreter with superinstructions
(@pxref...).

@end table

VMgen supports efficient interpreters though various optimizations, in
particular

@itemize

@item Threaded code

@item Caching the top-of-stack in a register

@item Combining VM instructions into superinstructions

@item
Replicating VM (super)instructions for better BTB prediction accuracy
(not yet in vmgen-ex, but already in Gforth).

@end itemize

As a result, vmgen-based interpreters are only about an order of
magintude slower than native code from an optimizing C compiler on small
benchmarks; on large benchmarks, which spend more time in the run-time
system, the slowdown is often less (e.g., the slowdown of a
Vmgen-generated JVM interpreter over the best JVM JIT compiler we
measured is only a factor of 2-3 for large benchmarks; some other JITs
and all other interpreters we looked at were slower than our
interpreter).

VMs are usually designed as stack machines (passing data between VM
instructions on a stack), and vmgen supports such designs especially
well; however, you can also use vmgen for implementing a register VM and
still benefit from most of the advantages offered by vmgen.

There are many potential uses of the instruction descriptions that are
not implemented at the moment, but we are open for feature requests, and
we will implement new features if someone asks for them; so the feature
list above is not exhaustive.

@c *********************************************************************
@chapter Why interpreters?

Interpreters are a popular language implementation technique because
they combine all three of the following advantages:

@itemize

@item Ease of implementation

@item Portability

@item Fast edit-compile-run cycle

@end itemize

The main disadvantage of interpreters is their run-time speed.  However,
there are huge differences between different interpreters in this area:
the slowdown over optimized C code on programs consisting of simple
operations is typically a factor of 10 for the more efficient
interpreters, and a factor of 1000 for the less efficient ones (the
slowdown for programs executing complex operations is less, because the
time spent in libraries for executing complex operations is the same in
all implementation strategies).

Vmgen makes it even easier to implement interpreters.  It also supports
techniques for building efficient interpreters.

@c ********************************************************************

@chapter Concepts

@c --------------------------------------------------------------------
@section Front-end and virtual machine interpreter

@cindex front-end
Interpretive systems are typically divided into a @emph{front end} that
parses the input language and produces an intermediate representation
for the program, and an interpreter that executes the intermediate
representation of the program.

@cindex virtual machine
@cindex VM
@cindex instruction, VM
For efficient interpreters the intermediate representation of choice is
virtual machine code (rather than, e.g., an abstract syntax tree).
@emph{Virtual machine} (VM) code consists of VM instructions arranged
sequentially in memory; they are executed in sequence by the VM
interpreter, except for VM branch instructions, which implement control
structures.  The conceptual similarity to real machine code results in
the name @emph{virtual machine}.

In this framework, vmgen supports building the VM interpreter and any
other component dealing with VM instructions.  It does not have any
support for the front end, apart from VM code generation support.  The
front end can be implemented with classical compiler front-end
techniques, which are supported by tools like @command{flex} and
@command{bison}.

The intermediate representation is usually just internal to the
interpreter, but some systems also support saving it to a file, either
as an image file, or in a full-blown linkable file format (e.g., JVM).
Vmgen currently has no special support for such features, but the
information in the instruction descriptions can be helpful, and we are
open for feature requests and suggestions.



Invocation

Input Syntax

Concepts: Front end, VM, Stacks,  Types, input stream

Contact

FreeBSD-CVSweb <freebsd-cvsweb@FreeBSD.org>