Annotation of gforth/doc/vmgen.texi, revision 1.10

1.10    ! anton       1: \input texinfo    @c -*-texinfo-*-
        !             2: @comment %**start of header
        !             3: @setfilename vmgen.info
1.1       anton       4: @include version.texi
1.10    ! anton       5: @settitle Vmgen (Gforth @value{VERSION})
        !             6: @c @syncodeindex pg cp
        !             7: @comment %**end of header
        !             8: @copying
        !             9: This manual is for Vmgen
        !            10: (version @value{VERSION}, @value{UPDATED}),
        !            11: the virtual machine interpreter generator
        !            12: 
        !            13: Copyright @copyright{} 2002 Free Software Foundation, Inc.
        !            14: 
        !            15: @quotation
        !            16: Permission is granted to copy, distribute and/or modify this document
        !            17: under the terms of the GNU Free Documentation License, Version 1.1 or
        !            18: any later version published by the Free Software Foundation; with no
        !            19: Invariant Sections, with the Front-Cover texts being ``A GNU Manual,''
        !            20: and with the Back-Cover Texts as in (a) below.  A copy of the
        !            21: license is included in the section entitled ``GNU Free Documentation
        !            22: License.''
        !            23: 
        !            24: (a) The FSF's Back-Cover Text is: ``You have freedom to copy and modify
        !            25: this GNU Manual, like GNU software.  Copies published by the Free
        !            26: Software Foundation raise funds for GNU development.''
        !            27: @end quotation
        !            28: @end copying
        !            29: 
        !            30: @dircategory GNU programming tools
        !            31: @direntry
        !            32: * vmgen: (vmgen).               Interpreter generator
        !            33: @end direntry
        !            34: 
        !            35: @titlepage
        !            36: @title Vmgen
        !            37: @subtitle for Gforth version @value{VERSION}, @value{UPDATED}
        !            38: @author M. Anton Ertl (@email{anton@mips.complang.tuwien.ac.at})
        !            39: @page
        !            40: @vskip 0pt plus 1filll
        !            41: @insertcopying
        !            42: @end titlepage
        !            43: 
        !            44: @contents
        !            45: 
        !            46: @ifnottex
        !            47: @node Top, Introduction, (dir), (dir)
        !            48: @top Vmgen
        !            49: 
        !            50: @insertcopying
        !            51: @end ifnottex
        !            52: 
        !            53: @menu
        !            54: * Introduction::                What can Vmgen do for you?
        !            55: * Why interpreters?::           Advantages and disadvantages
        !            56: * Concepts::                    VM interpreter background
        !            57: * Invoking vmgen::              
        !            58: * Example::                     
        !            59: * Input File Format::           
        !            60: * Using the generated code::    
        !            61: * Changes::                     from earlier versions
        !            62: * Contact::                     Bug reporting etc.
        !            63: * Copying This Manual::         Manual License
        !            64: * Index::                       
        !            65: 
        !            66: @detailmenu
        !            67:  --- The Detailed Node Listing ---
        !            68: 
        !            69: Concepts
        !            70: 
        !            71: * Front end and VM interpreter::  Modularizing an interpretive system
        !            72: * Data handling::               Stacks, registers, immediate arguments
        !            73: * Dispatch::                    From one VM instruction to the next
        !            74: 
        !            75: Example
        !            76: 
        !            77: * Example overview::            
        !            78: * Using profiling to create superinstructions::  
        !            79: 
        !            80: Input File Format
        !            81: 
        !            82: * Input File Grammar::          
        !            83: * Simple instructions::         
        !            84: * Superinstructions::           
        !            85: 
        !            86: Simple instructions
        !            87: 
        !            88: * C Code Macros::               Macros recognized by Vmgen
        !            89: * C Code restrictions::         Vmgen makes assumptions about C code
        !            90: 
        !            91: Using the generated code
        !            92: 
        !            93: * VM engine::                   Executing VM code
        !            94: * VM instruction table::        
        !            95: * VM code generation::          Creating VM code (in the front-end)
        !            96: * Peephole optimization::       Creating VM superinstructions
        !            97: * VM disassembler::             for debugging the front end
        !            98: * VM profiler::                 for finding worthwhile superinstructions
        !            99: 
        !           100: Copying This Manual
        !           101: 
        !           102: * GNU Free Documentation License::  License for copying this manual.
        !           103: 
        !           104: @end detailmenu
        !           105: @end menu
1.1       anton     106: 
                    107: @c @ifnottex
1.10    ! anton     108: This file documents Vmgen (Gforth @value{VERSION}).
1.1       anton     109: 
1.10    ! anton     110: @c ************************************************************
        !           111: @node Introduction, Why interpreters?, Top, Top
1.2       anton     112: @chapter Introduction
1.1       anton     113: 
                    114: Vmgen is a tool for writing efficient interpreters.  It takes a simple
                    115: virtual machine description and generates efficient C code for dealing
                    116: with the virtual machine code in various ways (in particular, executing
                    117: it).  The run-time efficiency of the resulting interpreters is usually
                    118: within a factor of 10 of machine code produced by an optimizing
                    119: compiler.
                    120: 
                    121: The interpreter design strategy supported by vmgen is to divide the
                    122: interpreter into two parts:
                    123: 
                    124: @itemize @bullet
                    125: 
                    126: @item The @emph{front end} takes the source code of the language to be
                    127: implemented, and translates it into virtual machine code.  This is
                    128: similar to an ordinary compiler front end; typically an interpreter
                    129: front-end performs no optimization, so it is relatively simple to
                    130: implement and runs fast.
                    131: 
                    132: @item The @emph{virtual machine interpreter} executes the virtual
                    133: machine code.
                    134: 
                    135: @end itemize
                    136: 
                    137: Such a division is usually used in interpreters, for modularity as well
1.6       anton     138: as for efficiency.  The virtual machine code is typically passed between
                    139: front end and virtual machine interpreter in memory, like in a
1.1       anton     140: load-and-go compiler; this avoids the complexity and time cost of
                    141: writing the code to a file and reading it again.
                    142: 
                    143: A @emph{virtual machine} (VM) represents the program as a sequence of
                    144: @emph{VM instructions}, following each other in memory, similar to real
                    145: machine code.  Control flow occurs through VM branch instructions, like
                    146: in a real machine.
                    147: 
                    148: In this setup, vmgen can generate most of the code dealing with virtual
                    149: machine instructions from a simple description of the virtual machine
                    150: instructions (@pxref...), in particular:
                    151: 
                    152: @table @emph
                    153: 
                    154: @item VM instruction execution
                    155: 
                    156: @item VM code generation
                    157: Useful in the front end.
                    158: 
                    159: @item VM code decompiler
                    160: Useful for debugging the front end.
                    161: 
                    162: @item VM code tracing
                    163: Useful for debugging the front end and the VM interpreter.  You will
                    164: typically provide other means for debugging the user's programs at the
                    165: source level.
                    166: 
                    167: @item VM code profiling
                    168: Useful for optimizing the VM insterpreter with superinstructions
                    169: (@pxref...).
                    170: 
                    171: @end table
                    172: 
                    173: VMgen supports efficient interpreters though various optimizations, in
                    174: particular
                    175: 
                    176: @itemize
                    177: 
                    178: @item Threaded code
                    179: 
                    180: @item Caching the top-of-stack in a register
                    181: 
                    182: @item Combining VM instructions into superinstructions
                    183: 
                    184: @item
                    185: Replicating VM (super)instructions for better BTB prediction accuracy
                    186: (not yet in vmgen-ex, but already in Gforth).
                    187: 
                    188: @end itemize
                    189: 
                    190: As a result, vmgen-based interpreters are only about an order of
                    191: magintude slower than native code from an optimizing C compiler on small
                    192: benchmarks; on large benchmarks, which spend more time in the run-time
1.2       anton     193: system, the slowdown is often less (e.g., the slowdown of a
                    194: Vmgen-generated JVM interpreter over the best JVM JIT compiler we
                    195: measured is only a factor of 2-3 for large benchmarks; some other JITs
                    196: and all other interpreters we looked at were slower than our
                    197: interpreter).
1.1       anton     198: 
                    199: VMs are usually designed as stack machines (passing data between VM
                    200: instructions on a stack), and vmgen supports such designs especially
                    201: well; however, you can also use vmgen for implementing a register VM and
                    202: still benefit from most of the advantages offered by vmgen.
                    203: 
1.2       anton     204: There are many potential uses of the instruction descriptions that are
                    205: not implemented at the moment, but we are open for feature requests, and
                    206: we will implement new features if someone asks for them; so the feature
                    207: list above is not exhaustive.
1.1       anton     208: 
1.2       anton     209: @c *********************************************************************
1.10    ! anton     210: @node Why interpreters?, Concepts, Introduction, Top
1.2       anton     211: @chapter Why interpreters?
                    212: 
                    213: Interpreters are a popular language implementation technique because
                    214: they combine all three of the following advantages:
                    215: 
                    216: @itemize
                    217: 
                    218: @item Ease of implementation
                    219: 
                    220: @item Portability
                    221: 
                    222: @item Fast edit-compile-run cycle
                    223: 
                    224: @end itemize
                    225: 
                    226: The main disadvantage of interpreters is their run-time speed.  However,
                    227: there are huge differences between different interpreters in this area:
                    228: the slowdown over optimized C code on programs consisting of simple
                    229: operations is typically a factor of 10 for the more efficient
                    230: interpreters, and a factor of 1000 for the less efficient ones (the
                    231: slowdown for programs executing complex operations is less, because the
                    232: time spent in libraries for executing complex operations is the same in
                    233: all implementation strategies).
                    234: 
                    235: Vmgen makes it even easier to implement interpreters.  It also supports
                    236: techniques for building efficient interpreters.
                    237: 
                    238: @c ********************************************************************
1.10    ! anton     239: @node Concepts, Invoking vmgen, Why interpreters?, Top
1.2       anton     240: @chapter Concepts
                    241: 
1.10    ! anton     242: @menu
        !           243: * Front end and VM interpreter::  Modularizing an interpretive system
        !           244: * Data handling::               Stacks, registers, immediate arguments
        !           245: * Dispatch::                    From one VM instruction to the next
        !           246: @end menu
        !           247: 
1.2       anton     248: @c --------------------------------------------------------------------
1.10    ! anton     249: @node Front end and VM interpreter, Data handling, Concepts, Concepts
        !           250: @section Front end and VM interpreter
1.2       anton     251: 
                    252: @cindex front-end
                    253: Interpretive systems are typically divided into a @emph{front end} that
                    254: parses the input language and produces an intermediate representation
                    255: for the program, and an interpreter that executes the intermediate
                    256: representation of the program.
                    257: 
                    258: @cindex virtual machine
                    259: @cindex VM
                    260: @cindex instruction, VM
                    261: For efficient interpreters the intermediate representation of choice is
                    262: virtual machine code (rather than, e.g., an abstract syntax tree).
                    263: @emph{Virtual machine} (VM) code consists of VM instructions arranged
                    264: sequentially in memory; they are executed in sequence by the VM
                    265: interpreter, except for VM branch instructions, which implement control
                    266: structures.  The conceptual similarity to real machine code results in
                    267: the name @emph{virtual machine}.
                    268: 
                    269: In this framework, vmgen supports building the VM interpreter and any
                    270: other component dealing with VM instructions.  It does not have any
                    271: support for the front end, apart from VM code generation support.  The
                    272: front end can be implemented with classical compiler front-end
1.3       anton     273: techniques, supported by tools like @command{flex} and @command{bison}.
1.2       anton     274: 
                    275: The intermediate representation is usually just internal to the
                    276: interpreter, but some systems also support saving it to a file, either
                    277: as an image file, or in a full-blown linkable file format (e.g., JVM).
                    278: Vmgen currently has no special support for such features, but the
                    279: information in the instruction descriptions can be helpful, and we are
                    280: open for feature requests and suggestions.
1.3       anton     281: 
1.10    ! anton     282: @c --------------------------------------------------------------------
        !           283: @node Data handling, Dispatch, Front end and VM interpreter, Concepts
1.3       anton     284: @section Data handling
                    285: 
                    286: @cindex stack machine
                    287: @cindex register machine
                    288: Most VMs use one or more stacks for passing temporary data between VM
                    289: instructions.  Another option is to use a register machine architecture
                    290: for the virtual machine; however, this option is either slower or
                    291: significantly more complex to implement than a stack machine architecture.
                    292: 
                    293: Vmgen has special support and optimizations for stack VMs, making their
                    294: implementation easy and efficient.
                    295: 
                    296: You can also implement a register VM with vmgen (@pxref{Register
                    297: Machines}), and you will still profit from most vmgen features.
                    298: 
                    299: @cindex stack item size
                    300: @cindex size, stack items
                    301: Stack items all have the same size, so they typically will be as wide as
                    302: an integer, pointer, or floating-point value.  Vmgen supports treating
                    303: two consecutive stack items as a single value, but anything larger is
                    304: best kept in some other memory area (e.g., the heap), with pointers to
                    305: the data on the stack.
                    306: 
                    307: @cindex instruction stream
                    308: @cindex immediate arguments
                    309: Another source of data is immediate arguments VM instructions (in the VM
                    310: instruction stream).  The VM instruction stream is handled similar to a
                    311: stack in vmgen.
                    312: 
                    313: @cindex garbage collection
                    314: @cindex reference counting
                    315: Vmgen has no built-in support for nor restrictions against @emph{garbage
                    316: collection}.  If you need garbage collection, you need to provide it in
                    317: your run-time libraries.  Using @emph{reference counting} is probably
                    318: harder, but might be possible (contact us if you are interested).
                    319: @c reference counting might be possible by including counting code in 
                    320: @c the conversion macros.
                    321: 
1.10    ! anton     322: @c --------------------------------------------------------------------
        !           323: @node Dispatch,  , Data handling, Concepts
1.6       anton     324: @section Dispatch
                    325: 
                    326: Understanding this section is probably not necessary for using vmgen,
                    327: but it may help.  You may want to skip it now, and read it if you find statements about dispatch methods confusing.
                    328: 
                    329: After executing one VM instruction, the VM interpreter has to dispatch
                    330: the next VM instruction (vmgen calls the dispatch routine @samp{NEXT}).
                    331: Vmgen supports two methods of dispatch:
                    332: 
                    333: @table
                    334: 
                    335: @item switch dispatch
                    336: In this method the VM interpreter contains a giant @code{switch}
                    337: statement, with one @code{case} for each VM instruction.  The VM
                    338: instructions are represented by integers (e.g., produced by an
                    339: @code{enum}) in the VM code, and dipatch occurs by loading the next
                    340: integer from the VM code, @code{switch}ing on it, and continuing at the
                    341: appropriate @code{case}; after executing the VM instruction, jump back
                    342: to the dispatch code.
                    343: 
                    344: @item threaded code
                    345: This method represents a VM instruction in the VM code by the address of
                    346: the start of the machine code fragment for executing the VM instruction.
                    347: Dispatch consists of loading this address, jumping to it, and
                    348: incrementing the VM instruction pointer.  Typically the threaded-code
                    349: dispatch code is appended directly to the code for executing the VM
                    350: instruction.  Threaded code cannot be implemented in ANSI C, but it can
                    351: be implemented using GNU C's labels-as-values extension (@pxref{labels
                    352: as values}).
                    353: 
                    354: @end table
                    355: 
1.3       anton     356: @c *************************************************************
1.10    ! anton     357: @node Invoking vmgen, Example, Concepts, Top
1.3       anton     358: @chapter Invoking vmgen
                    359: 
                    360: The usual way to invoke vmgen is as follows:
                    361: 
                    362: @example
                    363: vmgen @var{infile}
                    364: @end example
                    365: 
                    366: Here @var{infile} is the VM instruction description file, which usually
                    367: ends in @file{.vmg}.  The output filenames are made by taking the
                    368: basename of @file{infile} (i.e., the output files will be created in the
                    369: current working directory) and replacing @file{.vmg} with @file{-vm.i},
                    370: @file{-disasm.i}, @file{-gen.i}, @file{-labels.i}, @file{-profile.i},
                    371: and @file{-peephole.i}.  E.g., @command{bison hack/foo.vmg} will create
                    372: @file{foo-vm.i} etc.
                    373: 
                    374: The command-line options supported by vmgen are
                    375: 
                    376: @table @option
                    377: 
                    378: @cindex -h, command-line option
                    379: @cindex --help, command-line option
                    380: @item --help
                    381: @itemx -h
                    382: Print a message about the command-line options
                    383: 
                    384: @cindex -v, command-line option
                    385: @cindex --version, command-line option
                    386: @item --version
                    387: @itemx -v
                    388: Print version and exit
                    389: @end table
                    390: 
                    391: @c env vars GFORTHDIR GFORTHDATADIR
                    392: 
1.5       anton     393: @c ****************************************************************
1.10    ! anton     394: @node Example, Input File Format, Invoking vmgen, Top
1.5       anton     395: @chapter Example
                    396: 
1.10    ! anton     397: @menu
        !           398: * Example overview::            
        !           399: * Using profiling to create superinstructions::  
        !           400: @end menu
        !           401: 
        !           402: @c --------------------------------------------------------------------
        !           403: @node Example overview, Using profiling to create superinstructions, Example, Example
1.5       anton     404: @section Example overview
                    405: 
                    406: There are two versions of the same example for using vmgen:
                    407: @file{vmgen-ex} and @file{vmgen-ex2} (you can also see Gforth as
                    408: example, but it uses additional (undocumented) features, and also
                    409: differs in some other respects).  The example implements @emph{mini}, a
                    410: tiny Modula-2-like language with a small JavaVM-like virtual machine.
                    411: The difference between the examples is that @file{vmgen-ex} uses many
                    412: casts, and @file{vmgen-ex2} tries to avoids most casts and uses unions
                    413: instead.
                    414: 
                    415: The files provided with each example are:
                    416: 
                    417: @example
                    418: Makefile
                    419: README
                    420: disasm.c           wrapper file
                    421: engine.c           wrapper file
                    422: peephole.c         wrapper file
                    423: profile.c          wrapper file
                    424: mini-inst.vmg      simple VM instructions
                    425: mini-super.vmg     superinstructions (empty at first)
                    426: mini.h             common declarations
                    427: mini.l             scanner
                    428: mini.y             front end (parser, VM code generator)
                    429: support.c          main() and other support functions
                    430: fib.mini           example mini program
                    431: simple.mini        example mini program
                    432: test.mini          example mini program (tests everything)
                    433: test.out           test.mini output
                    434: stat.awk           script for aggregating profile information
                    435: peephole-blacklist list of instructions not allowed in superinstructions
                    436: seq2rule.awk       script for creating superinstructions
                    437: @end example
                    438: 
                    439: For your own interpreter, you would typically copy the following files
                    440: and change little, if anything:
                    441: 
                    442: @example
                    443: disasm.c           wrapper file
                    444: engine.c           wrapper file
                    445: peephole.c         wrapper file
                    446: profile.c          wrapper file
                    447: stat.awk           script for aggregating profile information
                    448: seq2rule.awk       script for creating superinstructions
                    449: @end example
                    450: 
                    451: You would typically change much in or replace the following files:
                    452: 
                    453: @example
                    454: Makefile
                    455: mini-inst.vmg      simple VM instructions
                    456: mini.h             common declarations
                    457: mini.l             scanner
                    458: mini.y             front end (parser, VM code generator)
                    459: support.c          main() and other support functions
                    460: peephole-blacklist list of instructions not allowed in superinstructions
                    461: @end example
                    462: 
                    463: You can build the example by @code{cd}ing into the example's directory,
                    464: and then typing @samp{make}; you can check that it works with @samp{make
                    465: check}.  You can run run mini programs like this:
                    466: 
                    467: @example
                    468: ./mini fib.mini
                    469: @end example
                    470: 
                    471: To learn about the options, type @samp{./mini -h}.
                    472: 
1.10    ! anton     473: @c --------------------------------------------------------------------
        !           474: @node Using profiling to create superinstructions,  , Example overview, Example
1.5       anton     475: @section Using profiling to create superinstructions
                    476: 
                    477: I have not added rules for this in the @file{Makefile} (there are many
                    478: options for selecting superinstructions, and I did not want to hardcode
                    479: one into the @file{Makefile}), but there are some supporting scripts, and
                    480: here's an example:
                    481: 
                    482: Suppose you want to use @file{fib.mini} and @file{test.mini} as training
                    483: programs, you get the profiles like this:
                    484: 
                    485: @example
                    486: make fib.prof test.prof #takes a few seconds
                    487: @end example
                    488: 
                    489: You can aggregate these profiles with @file{stat.awk}:
                    490: 
                    491: @example
                    492: awk -f stat.awk fib.prof test.prof
                    493: @end example
                    494: 
                    495: The result contains lines like:
                    496: 
                    497: @example
                    498:       2      16        36910041 loadlocal lit
                    499: @end example
                    500: 
                    501: This means that the sequence @code{loadlocal lit} statically occurs a
                    502: total of 16 times in 2 profiles, with a dynamic execution count of
                    503: 36910041.
                    504: 
                    505: The numbers can be used in various ways to select superinstructions.
                    506: E.g., if you just want to select all sequences with a dynamic
                    507: execution count exceeding 10000, you would use the following pipeline:
                    508: 
                    509: @example
                    510: awk -f stat.awk fib.prof test.prof|
                    511: awk '$3>=10000'|                #select sequences
                    512: fgrep -v -f peephole-blacklist| #eliminate wrong instructions
                    513: awk -f seq2rule.awk|      #transform sequences into superinstruction rules
                    514: sort -k 3 >mini-super.vmg       #sort sequences
                    515: @end example
                    516: 
                    517: The file @file{peephole-blacklist} contains all instructions that
                    518: directly access a stack or stack pointer (for mini: @code{call},
                    519: @code{return}); the sort step is necessary to ensure that prefixes
                    520: preceed larger superinstructions.
                    521: 
                    522: Now you can create a version of mini with superinstructions by just
                    523: saying @samp{make}
                    524: 
1.10    ! anton     525: 
1.3       anton     526: @c ***************************************************************
1.10    ! anton     527: @node Input File Format, Using the generated code, Example, Top
1.3       anton     528: @chapter Input File Format
                    529: 
                    530: Vmgen takes as input a file containing specifications of virtual machine
                    531: instructions.  This file usually has a name ending in @file{.vmg}.
                    532: 
1.5       anton     533: Most examples are taken from the example in @file{vmgen-ex}.
1.3       anton     534: 
1.10    ! anton     535: @menu
        !           536: * Input File Grammar::          
        !           537: * Simple instructions::         
        !           538: * Superinstructions::           
        !           539: @end menu
        !           540: 
        !           541: @c --------------------------------------------------------------------
        !           542: @node Input File Grammar, Simple instructions, Input File Format, Input File Format
1.3       anton     543: @section Input File Grammar
                    544: 
                    545: The grammar is in EBNF format, with @code{@var{a}|@var{b}} meaning
                    546: ``@var{a} or @var{b}'', @code{@{@var{c}@}} meaning 0 or more repetitions
                    547: of @var{c} and @code{[@var{d}]} meaning 0 or 1 repetitions of @var{d}.
                    548: 
                    549: Vmgen input is not free-format, so you have to take care where you put
                    550: spaces and especially newlines; it's not as bad as makefiles, though:
                    551: any sequence of spaces and tabs is equivalent to a single space.
                    552: 
                    553: @example
                    554: description: {instruction|comment|eval-escape}
                    555: 
                    556: instruction: simple-inst|superinst
                    557: 
                    558: simple-inst: ident " (" stack-effect " )" newline c-code newline newline
                    559: 
                    560: stack-effect: {ident} " --" {ident}
                    561: 
                    562: super-inst: ident " =" ident {ident}  
                    563: 
                    564: comment:      "\ "  text newline
                    565: 
                    566: eval-escape:  "\e " text newline
                    567: @end example
                    568: @c \+ \- \g \f \c
                    569: 
                    570: Note that the @code{\}s in this grammar are meant literally, not as
1.5       anton     571: C-style encodings for non-printable characters.
1.3       anton     572: 
                    573: The C code in @code{simple-inst} must not contain empty lines (because
                    574: vmgen would mistake that as the end of the simple-inst.  The text in
                    575: @code{comment} and @code{eval-escape} must not contain a newline.
                    576: @code{Ident} must conform to the usual conventions of C identifiers
                    577: (otherwise the C compiler would choke on the vmgen output).
                    578: 
                    579: Vmgen understands a few extensions beyond the grammar given here, but
                    580: these extensions are only useful for building Gforth.  You can find a
                    581: description of the format used for Gforth in @file{prim}.
                    582: 
1.10    ! anton     583: @subsection Eval escapes
1.3       anton     584: @c woanders?
                    585: The text in @code{eval-escape} is Forth code that is evaluated when
                    586: vmgen reads the line.  If you do not know (and do not want to learn)
                    587: Forth, you can build the text according to the following grammar; these
                    588: rules are normally all Forth you need for using vmgen:
                    589: 
                    590: @example
                    591: text: stack-decl|type-prefix-decl|stack-prefix-decl
                    592: 
                    593: stack-decl: "stack " ident ident ident
                    594: type-prefix-decl: 
                    595:     's" ' string '" ' ("single"|"double") ident "type-prefix" ident
                    596: stack-prefix-decl:  ident "stack-prefix" string
                    597: @end example
                    598: 
                    599: Note that the syntax of this code is not checked thoroughly (there are
                    600: many other Forth program fragments that could be written there).
                    601: 
                    602: If you know Forth, the stack effects of the non-standard words involved
                    603: are:
                    604: 
                    605: @example
                    606: stack        ( "name" "pointer" "type" -- )
                    607:              ( name execution: -- stack )
                    608: type-prefix  ( addr u xt1 xt2 n stack "prefix" -- )
                    609: single       ( -- xt1 xt2 n )
                    610: double       ( -- xt1 xt2 n )
                    611: stack-prefix ( stack "prefix" -- )
                    612: @end example
                    613: 
1.5       anton     614: 
1.10    ! anton     615: @c --------------------------------------------------------------------
        !           616: @node Simple instructions, Superinstructions, Input File Grammar, Input File Format
1.3       anton     617: @section Simple instructions
                    618: 
                    619: We will use the following simple VM instruction description as example:
                    620: 
                    621: @example
                    622: sub ( i1 i2 -- i )
                    623: i = i1-i2;
                    624: @end example
                    625: 
                    626: The first line specifies the name of the VM instruction (@code{sub}) and
                    627: its stack effect (@code{i1 i2 -- i}).  The rest of the description is
                    628: just plain C code.
                    629: 
                    630: @cindex stack effect
                    631: The stack effect specifies that @code{sub} pulls two integers from the
1.5       anton     632: data stack and puts them in the C variables @code{i1} and @code{i2} (with
1.3       anton     633: the rightmost item (@code{i2}) taken from the top of stack) and later
                    634: pushes one integer (@code{i)) on the data stack (the rightmost item is
                    635: on the top afterwards).
                    636: 
                    637: How do we know the type and stack of the stack items?  Vmgen uses
                    638: prefixes, similar to Fortran; in contrast to Fortran, you have to
                    639: define the prefix first:
                    640: 
                    641: @example
                    642: \E s" Cell"   single data-stack type-prefix i
                    643: @end example
                    644: 
                    645: This defines the prefix @code{i} to refer to the type @code{Cell}
                    646: (defined as @code{long} in @file{mini.h}) and, by default, to the
                    647: @code{data-stack}.  It also specifies that this type takes one stack
                    648: item (@code{single}).  The type prefix is part of the variable name.
                    649: 
                    650: Before we can use @code{data-stack} in this way, we have to define it:
                    651: 
                    652: @example
                    653: \E stack data-stack sp Cell
                    654: @end example
                    655: @c !! use something other than Cell
                    656: 
                    657: This line defines the stack @code{data-stack}, which uses the stack
                    658: pointer @code{sp}, and each item has the basic type @code{Cell}; other
                    659: types have to fit into one or two @code{Cell}s (depending on whether the
                    660: type is @code{single} or @code{double} wide), and are converted from and
                    661: to Cells on accessing the @code{data-stack) with conversion macros
                    662: (@pxref{Conversion macros}).  Stacks grow towards lower addresses in
1.5       anton     663: vmgen-erated interpreters.
1.3       anton     664: 
                    665: We can override the default stack of a stack item by using a stack
                    666: prefix.  E.g., consider the following instruction:
                    667: 
                    668: @example
                    669: lit ( #i -- i )
                    670: @end example
                    671: 
                    672: The VM instruction @code{lit} takes the item @code{i} from the
1.5       anton     673: instruction stream (indicated by the prefix @code{#}), and pushes it on
1.3       anton     674: the (default) data stack.  The stack prefix is not part of the variable
                    675: name.  Stack prefixes are defined like this:
                    676: 
                    677: @example
                    678: \E inst-stream stack-prefix #
                    679: @end example
                    680: 
1.5       anton     681: This definition defines that the stack prefix @code{#} specifies the
1.3       anton     682: ``stack'' @code{inst-stream}.  Since the instruction stream behaves a
                    683: little differently than an ordinary stack, it is predefined, and you do
                    684: not need to define it.
                    685: 
                    686: The instruction stream contains instructions and their immediate
                    687: arguments, so specifying that an argument comes from the instruction
                    688: stream indicates an immediate argument.  Of course, instruction stream
                    689: arguments can only appear to the left of @code{--} in the stack effect.
                    690: If there are multiple instruction stream arguments, the leftmost is the
                    691: first one (just as the intuition suggests).
                    692: 
1.10    ! anton     693: @menu
        !           694: * C Code Macros::               Macros recognized by Vmgen
        !           695: * C Code restrictions::         Vmgen makes assumptions about C code
        !           696: @end menu
        !           697: 
        !           698: @c --------------------------------------------------------------------
        !           699: @node C Code Macros, C Code restrictions, Simple instructions, Simple instructions
        !           700: @subsection C Code Macros
1.5       anton     701: 
                    702: Vmgen recognizes the following strings in the C code part of simple
                    703: instructions:
                    704: 
                    705: @table @samp
                    706: 
                    707: @item SET_IP
                    708: As far as vmgen is concerned, a VM instruction containing this ends a VM
                    709: basic block (used in profiling to delimit profiled sequences).  On the C
                    710: level, this also sets the instruction pointer.
                    711: 
                    712: @item SUPER_END
                    713: This ends a basic block (for profiling), without a SET_IP.
                    714: 
                    715: @item TAIL;
                    716: Vmgen replaces @samp{TAIL;} with code for ending a VM instruction and
                    717: dispatching the next VM instruction.  This happens automatically when
                    718: control reaches the end of the C code.  If you want to have this in the
                    719: middle of the C code, you need to use @samp{TAIL;}.  A typical example
                    720: is a conditional VM branch:
                    721: 
                    722: @example
                    723: if (branch_condition) {
                    724:   SET_IP(target); TAIL;
                    725: }
                    726: /* implicit tail follows here */
                    727: @end example
                    728: 
                    729: In this example, @samp{TAIL;} is not strictly necessary, because there
                    730: is another one implicitly after the if-statement, but using it improves
                    731: branch prediction accuracy slightly and allows other optimizations.
                    732: 
                    733: @item SUPER_CONTINUE
                    734: This indicates that the implicit tail at the end of the VM instruction
                    735: dispatches the sequentially next VM instruction even if there is a
                    736: @code{SET_IP} in the VM instruction.  This enables an optimization that
                    737: is not yet implemented in the vmgen-ex code (but in Gforth).  The
                    738: typical application is in conditional VM branches:
                    739: 
                    740: @example
                    741: if (branch_condition) {
                    742:   SET_IP(target); TAIL; /* now this TAIL is necessary */
                    743: }
                    744: SUPER_CONTINUE;
                    745: @end example
                    746: 
                    747: @end table
                    748: 
                    749: Note that vmgen is not smart about C-level tokenization, comments,
                    750: strings, or conditional compilation, so it will interpret even a
                    751: commented-out SUPER_END as ending a basic block (or, e.g.,
                    752: @samp{RETAIL;} as @samp{TAIL;}).  Conversely, vmgen requires the literal
                    753: presence of these strings; vmgen will not see them if they are hiding in
                    754: a C preprocessor macro.
                    755: 
                    756: 
1.10    ! anton     757: @c --------------------------------------------------------------------
        !           758: @node C Code restrictions,  , C Code Macros, Simple instructions
        !           759: @subsection C Code restrictions
1.5       anton     760: 
                    761: Vmgen generates code and performs some optimizations under the
                    762: assumption that the user-supplied C code does not access the stack
                    763: pointers or stack items, and that accesses to the instruction pointer
                    764: only occur through special macros.  In general you should heed these
                    765: restrictions.  However, if you need to break these restrictions, read
                    766: the following.
                    767: 
                    768: Accessing a stack or stack pointer directly can be a problem for several
                    769: reasons: 
                    770: 
                    771: @itemize
                    772: 
                    773: @item
                    774: You may cache the top-of-stack item in a local variable (that is
                    775: allocated to a register).  This is the most frequent source of trouble.
                    776: You can deal with it either by not using top-of-stack caching (slowdown
                    777: factor 1-1.4, depending on machine), or by inserting flushing code
                    778: (e.g., @samp{IF_spTOS(sp[...] = spTOS);}) at the start and reloading
                    779: code (e.g., @samp{IF_spTOS(spTOS = sp[0])}) at the end of problematic C
                    780: code.  Vmgen inserts a stack pointer update before the start of the
                    781: user-supplied C code, so the flushing code has to use an index that
                    782: corrects for that.  In the future, this flushing may be done
                    783: automatically by mentioning a special string in the C code.
                    784: @c sometimes flushing and/or reloading unnecessary
                    785: 
                    786: @item
                    787: The vmgen-erated code loads the stack items from stack-pointer-indexed
                    788: memory into variables before the user-supplied C code, and stores them
                    789: from variables to stack-pointer-indexed memory afterwards.  If you do
                    790: any writes to the stack through its stack pointer in your C code, it
                    791: will not affact the variables, and your write may be overwritten by the
                    792: stores after the C code.  Similarly, a read from a stack using a stack
                    793: pointer will not reflect computations of stack items in the same VM
                    794: instruction.
                    795: 
                    796: @item
                    797: Superinstructions keep stack items in variables across the whole
                    798: superinstruction.  So you should not include VM instructions, that
                    799: access a stack or stack pointer, as components of superinstructions.
                    800: 
                    801: @end itemize
                    802: 
                    803: You should access the instruction pointer only through its special
                    804: macros (@samp{IP}, @samp{SET_IP}, @samp{IPTOS}); this ensure that these
                    805: macros can be implemented in several ways for best performance.
                    806: @samp{IP} points to the next instruction, and @samp{IPTOS} is its
                    807: contents.
                    808: 
                    809: 
1.10    ! anton     810: @c --------------------------------------------------------------------
        !           811: @node Superinstructions,  , Simple instructions, Input File Format
1.3       anton     812: @section Superinstructions
1.5       anton     813: 
1.8       anton     814: Note: don't invest too much work in (static) superinstructions; a future
                    815: version of vmgen will support dynamic superinstructions (see Ian
                    816: Piumarta and Fabio Riccardi, @cite{Optimizing Direct Threaded Code by
                    817: Selective Inlining}, PLDI'98), and static superinstructions have much
                    818: less benefit in that context.
                    819: 
1.5       anton     820: Here is an example of a superinstruction definition:
                    821: 
                    822: @example
                    823: lit_sub = lit sub
                    824: @end example
                    825: 
                    826: @code{lit_sub} is the name of the superinstruction, and @code{lit} and
                    827: @code{sub} are its components.  This superinstruction performs the same
                    828: action as the sequence @code{lit} and @code{sub}.  It is generated
                    829: automatically by the VM code generation functions whenever that sequence
                    830: occurs, so you only need to add this definition if you want to use this
                    831: superinstruction (and even that can be partially automatized,
                    832: @pxref{...}).
                    833: 
                    834: Vmgen requires that the component instructions are simple instructions
                    835: defined before superinstructions using the components.  Currently, vmgen
                    836: also requires that all the subsequences at the start of a
                    837: superinstruction (prefixes) must be defined as superinstruction before
                    838: the superinstruction.  I.e., if you want to define a superinstruction
                    839: 
                    840: @example
                    841: sumof5 = add add add add
                    842: @end example
                    843: 
                    844: you first have to define
                    845: 
                    846: @example
                    847: add ( n1 n2 -- n )
                    848: n = n1+n2;
                    849: 
                    850: sumof3 = add add
                    851: sumof4 = add add add
                    852: @end example
                    853: 
                    854: Here, @code{sumof4} is the longest prefix of @code{sumof5}, and @code{sumof3}
                    855: is the longest prefix of @code{sumof4}.
                    856: 
                    857: Note that vmgen assumes that only the code it generates accesses stack
                    858: pointers, the instruction pointer, and various stack items, and it
                    859: performs optimizations based on this assumption.  Therefore, VM
                    860: instructions that change the instruction pointer should only be used as
                    861: last component; a VM instruction that accesses a stack pointer should
                    862: not be used as component at all.  Vmgen does not check these
                    863: restrictions, they just result in bugs in your interpreter.
                    864: 
                    865: @c ********************************************************************
1.10    ! anton     866: @node Using the generated code, Changes, Input File Format, Top
1.5       anton     867: @chapter Using the generated code
                    868: 
                    869: The easiest way to create a working VM interpreter with vmgen is
                    870: probably to start with one of the examples, and modify it for your
                    871: purposes.  This chapter is just the reference manual for the macros
1.10    ! anton     872: etc. used by the generated code, the other context expected by the
1.5       anton     873: generated code, and what you can do with the various generated files.
                    874: 
1.10    ! anton     875: @menu
        !           876: * VM engine::                   Executing VM code
        !           877: * VM instruction table::        
        !           878: * VM code generation::          Creating VM code (in the front-end)
        !           879: * Peephole optimization::       Creating VM superinstructions
        !           880: * VM disassembler::             for debugging the front end
        !           881: * VM profiler::                 for finding worthwhile superinstructions
        !           882: @end menu
1.6       anton     883: 
1.10    ! anton     884: @c --------------------------------------------------------------------
        !           885: @node VM engine, VM instruction table, Using the generated code, Using the generated code
1.5       anton     886: @section VM engine
                    887: 
                    888: The VM engine is the VM interpreter that executes the VM code.  It is
                    889: essential for an interpretive system.
                    890: 
1.6       anton     891: Vmgen supports two methods of VM instruction dispatch: @emph{threaded
                    892: code} (fast, but gcc-specific), and @emph{switch dispatch} (slow, but
                    893: portable across C compilers); you can use conditional compilation
                    894: (@samp{defined(__GNUC__)}) to choose between these methods, and our
                    895: example does so.
                    896: 
                    897: For both methods, the VM engine is contained in a C-level function.
                    898: Vmgen generates most of the contents of the function for you
                    899: (@file{@var{name}-vm.i}), but you have to define this function, and
                    900: macros and variables used in the engine, and initialize the variables.
                    901: In our example the engine function also includes
                    902: @file{@var{name}-labels.i} (@pxref{VM instruction table}).
                    903: 
                    904: The following macros and variables are used in @file{@var{name}-vm.i}:
1.5       anton     905: 
                    906: @table @code
                    907: 
                    908: @item LABEL(@var{inst_name})
                    909: This is used just before each VM instruction to provide a jump or
                    910: @code{switch} label (the @samp{:} is provided by vmgen).  For switch
                    911: dispatch this should expand to @samp{case @var{label}}; for
                    912: threaded-code dispatch this should just expand to @samp{case
                    913: @var{label}}.  In either case @var{label} is usually the @var{inst_name}
                    914: with some prefix or suffix to avoid naming conflicts.
                    915: 
1.9       anton     916: @item LABEL2(@var{inst_name})
                    917: This will be used for dynamic superinstructions; at the moment, this
                    918: should expand to nothing.
                    919: 
1.5       anton     920: @item NAME(@var{inst_name_string})
                    921: Called on entering a VM instruction with a string containing the name of
                    922: the VM instruction as parameter.  In normal execution this should be a
                    923: noop, but for tracing this usually prints the name, and possibly other
                    924: information (several VM registers in our example).
                    925: 
                    926: @item DEF_CA
                    927: Usually empty.  Called just inside a new scope at the start of a VM
                    928: instruction.  Can be used to define variables that should be visible
                    929: during every VM instruction.  If you define this macro as non-empty, you
                    930: have to provide the finishing @samp{;} in the macro.
                    931: 
                    932: @item NEXT_P0 NEXT_P1 NEXT_P2
                    933: The three parts of instruction dispatch.  They can be defined in
                    934: different ways for best performance on various processors (see
                    935: @file{engine.c} in the example or @file{engine/threaded.h} in Gforth).
                    936: @samp{NEXT_P0} is invoked right at the start of the VM isntruction (but
                    937: after @samp{DEF_CA}), @samp{NEXT_P1} right after the user-supplied C
                    938: code, and @samp{NEXT_P2} at the end.  The actual jump has to be
                    939: performed by @samp{NEXT_P2}.
                    940: 
                    941: The simplest variant is if @samp{NEXT_P2} does everything and the other
                    942: macros do nothing.  Then also related macros like @samp{IP},
                    943: @samp{SET_IP}, @samp{IP}, @samp{INC_IP} and @samp{IPTOS} are very
                    944: straightforward to define.  For switch dispatch this code consists just
                    945: of a jump to the dispatch code (@samp{goto next_inst;} in our example;
                    946: for direct threaded code it consists of something like
                    947: @samp{({cfa=*ip++; goto *cfa;})}.
                    948: 
                    949: Pulling code (usually the @samp{cfa=*ip;}) up into @samp{NEXT_P1}
                    950: usually does not cause problems, but pulling things up into
                    951: @samp{NEXT_P0} usually requires changing the other macros (and, at least
                    952: for Gforth on Alpha, it does not buy much, because the compiler often
                    953: manages to schedule the relevant stuff up by itself).  An even more
                    954: extreme variant is to pull code up even further, into, e.g., NEXT_P1 of
                    955: the previous VM instruction (prefetching, useful on PowerPCs).
                    956: 
                    957: @item INC_IP(@var{n})
1.8       anton     958: This increments @code{IP} by @var{n}.
                    959: 
                    960: @item SET_IP(@var{target})
                    961: This sets @code{IP} to @var{target}.
1.5       anton     962: 
                    963: @item vm_@var{A}2@var{B}(a,b)
                    964: Type casting macro that assigns @samp{a} (of type @var{A}) to @samp{b}
                    965: (of type @var{B}).  This is mainly used for getting stack items into
                    966: variables and back.  So you need to define macros for every combination
                    967: of stack basic type (@code{Cell} in our example) and type-prefix types
                    968: used with that stack (in both directions).  For the type-prefix type,
                    969: you use the type-prefix (not the C type string) as type name (e.g.,
                    970: @samp{vm_Cell2i}, not @samp{vm_Cell2Cell}).  In addition, you have to
                    971: define a vm_@var{X}2@var{X} macro for the stack basic type (used in
                    972: superinstructions).
                    973: 
                    974: The stack basic type for the predefined @samp{inst-stream} is
                    975: @samp{Cell}.  If you want a stack with the same item size, making its
                    976: basic type @samp{Cell} usually reduces the number of macros you have to
                    977: define.
                    978: 
                    979: Here our examples differ a lot: @file{vmgen-ex} uses casts in these
                    980: macros, whereas @file{vmgen-ex2} uses union-field selection (or
                    981: assignment to union fields).
                    982: 
                    983: @item vm_two@var{A}2@var{B}(a1,a2,b)
                    984: @item vm_@var{B}2two@var{A}(b,a1,a2)
                    985: Conversions between two stack items (@code{a1}, @code{a2}) and a
                    986: variable @code{b} of a type that takes two stack items.  This does not
                    987: occur in our small examples, but you can look at Gforth for examples.
                    988: 
                    989: @item @var{stackpointer}
                    990: For each stack used, the stackpointer name given in the stack
                    991: declaration is used.  For a regular stack this must be an l-expression;
                    992: typically it is a variable declared as a pointer to the stack's basic
                    993: type.  For @samp{inst-stream}, the name is @samp{IP}, and it can be a
                    994: plain r-value; typically it is a macro that abstracts away the
                    995: differences between the various implementations of NEXT_P*.
                    996: 
                    997: @item @var{stackpointer}TOS
                    998: The top-of-stack for the stack pointed to by @var{stackpointer}.  If you
                    999: are using top-of-stack caching for that stack, this should be defined as
                   1000: variable; if you are not using top-of-stack caching for that stack, this
                   1001: should be a macro expanding to @samp{@var{stackpointer}[0]}.  The stack
                   1002: pointer for the predefined @samp{inst-stream} is called @samp{IP}, so
                   1003: the top-of-stack is called @samp{IPTOS}.
                   1004: 
                   1005: @item IF_@var{stackpointer}TOS(@var{expr})
                   1006: Macro for executing @var{expr}, if top-of-stack caching is used for the
                   1007: @var{stackpointer} stack.  I.e., this should do @var{expr} if there is
                   1008: top-of-stack caching for @var{stackpointer}; otherwise it should do
                   1009: nothing.
                   1010: 
1.8       anton    1011: @item SUPER_END
                   1012: This is used by the VM profiler (@pxref{VM profiler}); it should not do
                   1013: anything in normal operation, and call @code{vm_count_block(IP)} for
                   1014: profiling.
                   1015: 
                   1016: @item SUPER_CONTINUE
                   1017: This is just a hint to vmgen and does nothing at the C level.
                   1018: 
1.5       anton    1019: @item VM_DEBUG
                   1020: If this is defined, the tracing code will be compiled in (slower
                   1021: interpretation, but better debugging).  Our example compiles two
                   1022: versions of the engine, a fast-running one that cannot trace, and one
                   1023: with potential tracing and profiling.
                   1024: 
                   1025: @item vm_debug
                   1026: Needed only if @samp{VM_DEBUG} is defined.  If this variable contains
                   1027: true, the VM instructions produce trace output.  It can be turned on or
                   1028: off at any time.
                   1029: 
                   1030: @item vm_out
                   1031: Needed only if @samp{VM_DEBUG} is defined.  Specifies the file on which
                   1032: to print the trace output (type @samp{FILE *}).
                   1033: 
                   1034: @item printarg_@var{type}(@var{value})
                   1035: Needed only if @samp{VM_DEBUG} is defined.  Macro or function for
                   1036: printing @var{value} in a way appropriate for the @var{type}.  This is
                   1037: used for printing the values of stack items during tracing.  @var{Type}
                   1038: is normally the type prefix specified in a @code{type-prefix} definition
                   1039: (e.g., @samp{printarg_i}); in superinstructions it is currently the
                   1040: basic type of the stack.
                   1041: 
                   1042: @end table
                   1043: 
1.6       anton    1044: 
1.10    ! anton    1045: @c --------------------------------------------------------------------
        !          1046: @node VM instruction table, VM code generation, VM engine, Using the generated code
        !          1047: @section VM instruction table
1.6       anton    1048: 
                   1049: For threaded code we also need to produce a table containing the labels
                   1050: of all VM instructions.  This is needed for VM code generation
                   1051: (@pxref{VM code generation}), and it has to be done in the engine
                   1052: function, because the labels are not visible outside.  It then has to be
                   1053: passed outside the function (and assigned to @samp{vm_prim}), to be used
                   1054: by the VM code generation functions.
                   1055: 
                   1056: This means that the engine function has to be called first to produce
                   1057: the VM instruction table, and later, after generating VM code, it has to
                   1058: be called again to execute the generated VM code (yes, this is ugly).
                   1059: In our example program, these two modes of calling the engine function
                   1060: are differentiated by the value of the parameter ip0 (if it equals 0,
                   1061: then the table is passed out, otherwise the VM code is executed); in our
                   1062: example, we pass the table out by assigning it to @samp{vm_prim} and
                   1063: returning from @samp{engine}.
                   1064: 
                   1065: In our example, we also build such a table for switch dispatch; this is
                   1066: mainly done for uniformity.
                   1067: 
                   1068: For switch dispatch, we also need to define the VM instruction opcodes
                   1069: used as case labels in an @code{enum}.
                   1070: 
                   1071: For both purposes (VM instruction table, and enum), the file
                   1072: @file{@var{name}-labels.i} is generated by vmgen.  You have to define
                   1073: the following macro used in this file:
1.5       anton    1074: 
                   1075: @table @samp
                   1076: 
                   1077: @item INST_ADDR(@var{inst_name})
                   1078: For switch dispatch, this is just the name of the switch label (the same
1.6       anton    1079: name as used in @samp{LABEL(@var{inst_name})}), for both uses of
                   1080: @file{@var{name}-labels.i}.  For threaded-code dispatch, this is the
                   1081: address of the label defined in @samp{LABEL(@var{inst_name})}); the
                   1082: address is taken with @samp{&&} (@pxref{labels-as-values}).
1.5       anton    1083: 
                   1084: @end table
                   1085: 
                   1086: 
1.10    ! anton    1087: @c --------------------------------------------------------------------
        !          1088: @node VM code generation, Peephole optimization, VM instruction table, Using the generated code
1.6       anton    1089: @section VM code generation
                   1090: 
                   1091: Vmgen generates VM code generation functions in @file{@var{name}-gen.i}
                   1092: that the front end can call to generate VM code.  This is essential for
                   1093: an interpretive system.
                   1094: 
                   1095: For a VM instruction @samp{x ( #a b #c -- d )}, vmgen generates a
                   1096: function with the prototype
                   1097: 
                   1098: @example
                   1099: void gen_x(Inst **ctp, a_type a, c_type c)
                   1100: @end example
                   1101: 
                   1102: The @code{ctp} argument points to a pointer to the next instruction.
                   1103: @code{*ctp} is increased by the generation functions; i.e., you should
                   1104: allocate memory for the code to be generated beforehand, and start with
                   1105: *ctp set at the start of this memory area.  Before running out of
                   1106: memory, allocate a new area, and generate a VM-level jump to the new
                   1107: area (this is not implemented in our examples).
                   1108: 
                   1109: The other arguments correspond to the immediate arguments of the VM
                   1110: instruction (with their appropriate types as defined in the
                   1111: @code{type_prefix} declaration.
                   1112: 
                   1113: The following types, variables, and functions are used in
                   1114: @file{@var{name}-gen.i}:
                   1115: 
                   1116: @table @samp
                   1117: 
                   1118: @item Inst
                   1119: The type of the VM instruction; if you use threaded code, this is
                   1120: @code{void *}; for switch dispatch this is an integer type.
                   1121: 
                   1122: @item vm_prim
                   1123: The VM instruction table (type: @code{Inst *}, @pxref{VM instruction table}).
                   1124: 
                   1125: @item gen_inst(Inst **ctp, Inst i)
                   1126: This function compiles the instruction @code{i}.  Take a look at it in
                   1127: @file{vmgen-ex/peephole.c}.  It is trivial when you don't want to use
                   1128: superinstructions (just the last two lines of the example function), and
                   1129: slightly more complicated in the example due to its ability to use
                   1130: superinstructions (@pxref{Peephole optimization}).
                   1131: 
                   1132: @item genarg_@var{type_prefix}(Inst **ctp, @var{type} @var{type_prefix})
                   1133: This compiles an immediate argument of @var{type} (as defined in a
                   1134: @code{type-prefix} definition).  These functions are trivial to define
                   1135: (see @file{vmgen-ex/support.c}).  You need one of these functions for
                   1136: every type that you use as immediate argument.
                   1137: 
                   1138: @end table
                   1139: 
                   1140: In addition to using these functions to generate code, you should call
                   1141: @code{BB_BOUNDARY} at every basic block entry point if you ever want to
                   1142: use superinstructions (or if you want to use the profiling supported by
                   1143: vmgen; however, this is mainly useful for selecting superinstructions).
                   1144: If you use @code{BB_BOUNDARY}, you should also define it (take a look at
                   1145: its definition in @file{vmgen-ex/mini.y}).
                   1146: 
                   1147: You do not need to call @code{BB_BOUNDARY} after branches, because you
                   1148: will not define superinstructions that contain branches in the middle
                   1149: (and if you did, and it would work, there would be no reason to end the
                   1150: superinstruction at the branch), and because the branches announce
                   1151: themselves to the profiler.
                   1152: 
                   1153: 
1.10    ! anton    1154: @c --------------------------------------------------------------------
        !          1155: @node Peephole optimization, VM disassembler, VM code generation, Using the generated code
1.6       anton    1156: @section Peephole optimization
                   1157: 
                   1158: You need peephole optimization only if you want to use
                   1159: superinstructions.  But having the code for it does not hurt much if you
                   1160: do not use superinstructions.
                   1161: 
                   1162: A simple greedy peephole optimization algorithm is used for
                   1163: superinstruction selection: every time @code{gen_inst} compiles a VM
                   1164: instruction, it looks if it can combine it with the last VM instruction
                   1165: (which may also be a superinstruction resulting from a previous peephole
                   1166: optimization); if so, it changes the last instruction to the combined
                   1167: instruction instead of laying down @code{i} at the current @samp{*ctp}.
                   1168: 
                   1169: The code for peephole optimization is in @file{vmgen-ex/peephole.c}.
                   1170: You can use this file almost verbatim.  Vmgen generates
                   1171: @file{@var{file}-peephole.i} which contains data for the peephoile
                   1172: optimizer.
                   1173: 
                   1174: You have to call @samp{init_peeptable()} after initializing
                   1175: @samp{vm_prim}, and before compiling any VM code to initialize data
                   1176: structures for peephole optimization.  After that, compiling with the VM
                   1177: code generation functions will automatically combine VM instructions
                   1178: into superinstructions.  Since you do not want to combine instructions
                   1179: across VM branch targets (otherwise there will not be a proper VM
                   1180: instruction to branch to), you have to call @code{BB_BOUNDARY}
                   1181: (@pxref{VM code generation}) at branch targets.
                   1182: 
                   1183: 
1.10    ! anton    1184: @c --------------------------------------------------------------------
        !          1185: @node VM disassembler, VM profiler, Peephole optimization, Using the generated code
1.6       anton    1186: @section VM disassembler
                   1187: 
                   1188: A VM code disassembler is optional for an interpretive system, but
                   1189: highly recommended during its development and maintenance, because it is
                   1190: very useful for detecting bugs in the front end (and for distinguishing
                   1191: them from VM interpreter bugs).
                   1192: 
                   1193: Vmgen supports VM code disassembling by generating
                   1194: @file{@var{file}-disasm.i}.  This code has to be wrapped into a
                   1195: function, as is done in @file{vmgen-ex/disasm.i}.  You can use this file
                   1196: almost verbatim.  In addition to @samp{vm_@var{A}2@var{B}(a,b)},
                   1197: @samp{vm_out}, @samp{printarg_@var{type}(@var{value})}, which are
                   1198: explained above, the following macros and variables are used in
                   1199: @file{@var{file}-disasm.i} (and you have to define them):
                   1200: 
                   1201: @table @samp
                   1202: 
                   1203: @item ip
                   1204: This variable points to the opcode of the current VM instruction.
                   1205: 
                   1206: @item IP IPTOS
                   1207: @samp{IPTOS} is the first argument of the current VM instruction, and
                   1208: @samp{IP} points to it; this is just as in the engine, but here
                   1209: @samp{ip} points to the opcode of the VM instruction (in contrast to the
                   1210: engine, where @samp{ip} points to the next cell, or even one further).
                   1211: 
                   1212: @item VM_IS_INST(Inst i, int n)
                   1213: Tests if the opcode @samp{i} is the same as the @samp{n}th entry in the
                   1214: VM instruction table.
                   1215: 
                   1216: @end table
                   1217: 
                   1218: 
1.10    ! anton    1219: @c --------------------------------------------------------------------
        !          1220: @node VM profiler,  , VM disassembler, Using the generated code
1.7       anton    1221: @section VM profiler
                   1222: 
                   1223: The VM profiler is designed for getting execution and occurence counts
                   1224: for VM instruction sequences, and these counts can then be used for
                   1225: selecting sequences as superinstructions.  The VM profiler is probably
1.8       anton    1226: not useful as profiling tool for the interpretive system.  I.e., the VM
1.7       anton    1227: profiler is useful for the developers, but not the users of the
1.8       anton    1228: interpretive system.
1.7       anton    1229: 
1.8       anton    1230: The output of the profiler is: for each basic block (executed at least
                   1231: once), it produces the dynamic execution count of that basic block and
                   1232: all its subsequences; e.g.,
1.7       anton    1233: 
1.8       anton    1234: @example
                   1235:        9227465  lit storelocal 
                   1236:        9227465  storelocal branch 
                   1237:        9227465  lit storelocal branch 
                   1238: @end example
1.7       anton    1239: 
1.8       anton    1240: I.e., a basic block consisting of @samp{lit storelocal branch} is
                   1241: executed 9227465 times.
1.6       anton    1242: 
1.8       anton    1243: This output can be combined in various ways.  E.g.,
                   1244: @file{vmgen/stat.awk} adds up the occurences of a given sequence wrt
                   1245: dynamic execution, static occurence, and per-program occurence.  E.g.,
1.3       anton    1246: 
1.8       anton    1247: @example
                   1248:       2      16        36910041 loadlocal lit 
                   1249: @end example
1.2       anton    1250: 
1.8       anton    1251: indicates that the sequence @samp{loadlocal lit} occurs in 2 programs,
                   1252: in 16 places, and has been executed 36910041 times.  Now you can select
                   1253: superinstructions in any way you like (note that compile time and space
                   1254: typically limit the number of superinstructions to 100--1000).  After
                   1255: you have done that, @file{vmgen/seq2rule.awk} turns lines of the form
                   1256: above into rules for inclusion in a vmgen input file.  Note that this
                   1257: script does not ensure that all prefixes are defined, so you have to do
                   1258: that in other ways.  So, an overall script for turning profiles into
                   1259: superinstructions can look like this:
1.2       anton    1260: 
1.8       anton    1261: @example
                   1262: awk -f stat.awk fib.prof test.prof|
                   1263: awk '$3>=10000'|                #select sequences
                   1264: fgrep -v -f peephole-blacklist| #eliminate wrong instructions
                   1265: awk -f seq2rule.awk|            #turn into superinstructions
                   1266: sort -k 3 >mini-super.vmg       #sort sequences
                   1267: @end example
1.2       anton    1268: 
1.8       anton    1269: Here the dynamic count is used for selecting sequences (preliminary
                   1270: results indicate that the static count gives better results, though);
                   1271: the third line eliminats sequences containing instructions that must not
                   1272: occur in a superinstruction, because they access a stack directly.  The
                   1273: dynamic count selection ensures that all subsequences (including
                   1274: prefixes) of longer sequences occur (because subsequences have at least
                   1275: the same count as the longer sequences); the sort in the last line
                   1276: ensures that longer superinstructions occur after their prefixes.
                   1277: 
                   1278: But before using it, you have to have the profiler.  Vmgen supports its
                   1279: creation by generating @file{@var{file}-profile.i}; you also need the
                   1280: wrapper file @file{vmgen-ex/profile.c} that you can use almost verbatim.
                   1281: 
                   1282: The profiler works by recording the targets of all VM control flow
                   1283: changes (through @code{SUPER_END} during execution, and through
                   1284: @code{BB_BOUNDARY} in the front end), and counting (through
                   1285: @code{SUPER_END}) how often they were targeted.  After the program run,
                   1286: the numbers are corrected such that each VM basic block has the correct
                   1287: count (originally entering a block without executing a branch does not
                   1288: increase the count), then the subsequences of all basic blocks are
                   1289: printed.  To get all this, you just have to define @code{SUPER_END} (and
                   1290: @code{BB_BOUNDARY}) appropriately, and call @code{vm_print_profile(FILE
                   1291: *file)} when you want to output the profile on @code{file}.
                   1292: 
                   1293: The @file{@var{file}-profile.i} is simular to the disassembler file, and
                   1294: it uses variables and functions defined in @file{vmgen-ex/profile.c},
                   1295: plus @code{VM_IS_INST} already defined for the VM disassembler
                   1296: (@pxref{VM disassembler}).
                   1297: 
1.10    ! anton    1298: 
        !          1299: @c **********************************************************
        !          1300: @node Changes, Contact, Using the generated code, Top
1.8       anton    1301: @chapter Changes
                   1302: 
                   1303: Users of the gforth-0.5.9-20010501 version of vmgen need to change
                   1304: several things in their source code to use the current version.  I
                   1305: recommend keeping the gforth-0.5.9-20010501 version until you have
                   1306: completed the change (note that you can have several versions of Gforth
                   1307: installed at the same time).  I hope to avoid such incompatible changes
                   1308: in the future.
1.2       anton    1309: 
1.8       anton    1310: The required changes are:
                   1311: 
                   1312: @table @code
1.2       anton    1313: 
1.8       anton    1314: @item vm_@var{A}2@var{B}
                   1315: now takes two arguments.
                   1316: 
                   1317: @item vm_two@var{A}2@var{B}(b,a1,a2);
                   1318: changed to vm_two@var{A}2@var{B}(a1,a2,b) (note the absence of the @samp{;}).
                   1319: 
                   1320: @end table
1.2       anton    1321: 
1.8       anton    1322: Also some new macros have to be defined, e.g., @code{INST_ADDR}, and
                   1323: @code{LABEL}; some macros have to be defined in new contexts, e.g.,
                   1324: @code{VM_IS_INST} is now also needed in the disassembler.
1.4       anton    1325: 
1.10    ! anton    1326: @node Contact, Copying This Manual, Changes, Top
1.8       anton    1327: @chapter Contact
1.4       anton    1328: 
1.10    ! anton    1329: @node Copying This Manual, Index, Contact, Top
        !          1330: @appendix Copying This Manual
        !          1331: 
        !          1332: @menu
        !          1333: * GNU Free Documentation License::  License for copying this manual.
        !          1334: @end menu
        !          1335: 
        !          1336: @include fdl.texi
        !          1337: 
        !          1338: 
        !          1339: @node Index,  , Copying This Manual, Top
        !          1340: @unnumbered Index
        !          1341: 
        !          1342: @printindex cp
        !          1343: 
        !          1344: @bye

FreeBSD-CVSweb <freebsd-cvsweb@FreeBSD.org>