--- gforth/doc/vmgen.texi 2002/08/16 09:43:49 1.12 +++ gforth/doc/vmgen.texi 2002/09/01 15:15:07 1.19 @@ -57,7 +57,10 @@ Software Foundation raise funds for GNU * Invoking Vmgen:: * Example:: * Input File Format:: +* Error messages:: reported by Vmgen * Using the generated code:: +* Hints:: VM archictecture, efficiency +* The future:: * Changes:: from earlier versions * Contact:: Bug reporting etc. * Copying This Manual:: Manual License @@ -82,8 +85,13 @@ Input File Format * Input File Grammar:: * Simple instructions:: * Superinstructions:: +* Store Optimization:: * Register Machines:: How to define register VM instructions +Input File Grammar + +* Eval escapes:: what follows \E + Simple instructions * C Code Macros:: Macros recognized by Vmgen @@ -98,6 +106,10 @@ Using the generated code * VM disassembler:: for debugging the front end * VM profiler:: for finding worthwhile superinstructions +Hints + +* Floating point:: and stacks + Copying This Manual * GNU Free Documentation License:: License for copying this manual. @@ -151,7 +163,7 @@ In this setup, Vmgen can generate most o machine instructions from a simple description of the virtual machine instructions (@pxref{Input File Format}), in particular: -@table @asis +@table @strong @item VM instruction execution @@ -172,6 +184,10 @@ Useful for optimizing the VM interpreter @end table +To create parts of the interpretive system that do not deal with VM +instructions, you have to use other tools (e.g., @command{bison}) and/or +hand-code them. + @cindex efficiency features overview @noindent Vmgen supports efficient interpreters though various optimizations, in @@ -209,7 +225,7 @@ offered by Vmgen. There are many potential uses of the instruction descriptions that are not implemented at the moment, but we are open for feature requests, and -we will implement new features if someone asks for them; so the feature +we will consider new features if someone asks for them; so the feature list above is not exhaustive. @c ********************************************************************* @@ -300,7 +316,7 @@ interpreter, but some systems also suppo as an image file, or in a full-blown linkable file format (e.g., JVM). Vmgen currently has no special support for such features, but the information in the instruction descriptions can be helpful, and we are -open for feature requests and suggestions. +open to feature requests and suggestions. @c -------------------------------------------------------------------- @node Data handling, Dispatch, Front end and VM interpreter, Concepts @@ -310,7 +326,10 @@ open for feature requests and suggestion @cindex register machine Most VMs use one or more stacks for passing temporary data between VM instructions. Another option is to use a register machine architecture -for the virtual machine; however, this option is either slower or +for the virtual machine; we believe that using a stack architecture is +usually both simpler and faster. + +however, this option is slower or significantly more complex to implement than a stack machine architecture. Vmgen has special support and optimizations for stack VMs, making their @@ -356,7 +375,7 @@ After executing one VM instruction, the the next VM instruction (Vmgen calls the dispatch routine @samp{NEXT}). Vmgen supports two methods of dispatch: -@table @asis +@table @strong @item switch dispatch @cindex switch dispatch @@ -379,6 +398,7 @@ instruction. Threaded code cannot be im be implemented using GNU C's labels-as-values extension (@pxref{Labels as Values, , Labels as Values, gcc.info, GNU C Manual}). +@c call threading @end table Threaded code can be twice as fast as switch dispatch, depending on the @@ -392,16 +412,18 @@ interpreter, the benchmark, and the mach The usual way to invoke Vmgen is as follows: @example -vmgen @var{infile} +vmgen @var{inputfile} @end example -Here @var{infile} is the VM instruction description file, which usually -ends in @file{.vmg}. The output filenames are made by taking the -basename of @file{infile} (i.e., the output files will be created in the -current working directory) and replacing @file{.vmg} with @file{-vm.i}, -@file{-disasm.i}, @file{-gen.i}, @file{-labels.i}, @file{-profile.i}, -and @file{-peephole.i}. E.g., @command{vmgen hack/foo.vmg} will create -@file{foo-vm.i} etc. +Here @var{inputfile} is the VM instruction description file, which +usually ends in @file{.vmg}. The output filenames are made by taking +the basename of @file{inputfile} (i.e., the output files will be created +in the current working directory) and replacing @file{.vmg} with +@file{-vm.i}, @file{-disasm.i}, @file{-gen.i}, @file{-labels.i}, +@file{-profile.i}, and @file{-peephole.i}. E.g., @command{vmgen +hack/foo.vmg} will create @file{foo-vm.i}, @file{foo-disasm.i}, +@file{foo-gen.i}, @file{foo-labels.i}, @file{foo-profile.i} and +@file{foo-peephole.i}. The command-line options supported by Vmgen are @@ -563,14 +585,14 @@ sort -k 3 >mini-super.vmg #sort se The file @file{peephole-blacklist} contains all instructions that directly access a stack or stack pointer (for mini: @code{call}, @code{return}); the sort step is necessary to ensure that prefixes -preceed larger superinstructions. +precede larger superinstructions. Now you can create a version of mini with superinstructions by just saying @samp{make} @c *************************************************************** -@node Input File Format, Using the generated code, Example, Top +@node Input File Format, Error messages, Example, Top @chapter Input File Format @cindex input file format @cindex format, input file @@ -584,6 +606,7 @@ Most examples are taken from the example * Input File Grammar:: * Simple instructions:: * Superinstructions:: +* Store Optimization:: * Register Machines:: How to define register VM instructions @end menu @@ -598,61 +621,112 @@ The grammar is in EBNF format, with @cod of @var{c} and @code{[@var{d}]} meaning 0 or 1 repetitions of @var{d}. @cindex free-format, not +@cindex newlines, significance in syntax Vmgen input is not free-format, so you have to take care where you put -spaces and especially newlines; it's not as bad as makefiles, though: -any sequence of spaces and tabs is equivalent to a single space. +newlines (and, in a few cases, white space). @example -description: @{instruction|comment|eval-escape@} +description: @{instruction|comment|eval-escape|c-escape@} instruction: simple-inst|superinst -simple-inst: ident ' (' stack-effect ' )' newline c-code newline newline +simple-inst: ident '(' stack-effect ')' newline c-code newline newline -stack-effect: @{ident@} ' --' @{ident@} +stack-effect: @{ident@} '--' @{ident@} -super-inst: ident ' =' ident @{ident@} +super-inst: ident '=' ident @{ident@} comment: '\ ' text newline -eval-escape: '\e ' text newline +eval-escape: '\E ' text newline + +c-escape: '\C ' text newline @end example @c \+ \- \g \f \c Note that the @code{\}s in this grammar are meant literally, not as C-style encodings for non-printable characters. -The C code in @code{simple-inst} must not contain empty lines (because -Vmgen would mistake that as the end of the simple-inst. The text in -@code{comment} and @code{eval-escape} must not contain a newline. -@code{Ident} must conform to the usual conventions of C identifiers -(otherwise the C compiler would choke on the Vmgen output). +There are two ways to delimit the C code in @code{simple-inst}: + +@itemize @bullet + +@item +If you start it with a @samp{@{} at the start of a line (i.e., not even +white space before it), you have to end it with a @samp{@}} at the start +of a line (followed by a newline). In this case you may have empty +lines within the C code (typically used between variable definitions and +statements). + +@item +You do not start it with @samp{@{}. Then the C code ends at the first +empty line, so you cannot have empty lines within this code. + +@end itemize + +The text in @code{comment}, @code{eval-escape} and @code{c-escape} must +not contain a newline. @code{Ident} must conform to the usual +conventions of C identifiers (otherwise the C compiler would choke on +the Vmgen output), except that idents in @code{stack-effect} may have a +stack prefix (for stack prefix syntax, @pxref{Eval escapes}). + +@cindex C escape +@cindex @code{\C} +@cindex conditional compilation of Vmgen output +The @code{c-escape} passes the text through to each output file (without +the @samp{\C}). This is useful mainly for conditional compilation +(i.e., you write @samp{\C #if ...} etc.). + +@cindex sync lines +@cindex @code{#line} +In addition to the syntax given in the grammer, Vmgen also processes +sync lines (lines starting with @samp{#line}), as produced by @samp{m4 +-s} (@pxref{Invoking m4, , Invoking m4, m4.info, GNU m4}) and similar +tools. This allows associating C compiler error messages with the +original source of the C code. Vmgen understands a few extensions beyond the grammar given here, but these extensions are only useful for building Gforth. You can find a description of the format used for Gforth in @file{prim}. +@menu +* Eval escapes:: what follows \E +@end menu + +@node Eval escapes, , Input File Grammar, Input File Grammar @subsection Eval escapes @cindex escape to Forth @cindex eval escape +@cindex @code{\E} @c woanders? The text in @code{eval-escape} is Forth code that is evaluated when -Vmgen reads the line. If you do not know (and do not want to learn) -Forth, you can build the text according to the following grammar; these -rules are normally all Forth you need for using Vmgen: +Vmgen reads the line. You will normally use this feature to define +stacks and types. + +If you do not know (and do not want to learn) Forth, you can build the +text according to the following grammar; these rules are normally all +Forth you need for using Vmgen: @example -text: stack-decl|type-prefix-decl|stack-prefix-decl +text: stack-decl|type-prefix-decl|stack-prefix-decl|set-flag stack-decl: 'stack ' ident ident ident type-prefix-decl: 's" ' string '" ' ('single'|'double') ident 'type-prefix' ident stack-prefix-decl: ident 'stack-prefix' string +set-flag: 'store-optimization' ('on'|'off') @end example Note that the syntax of this code is not checked thoroughly (there are -many other Forth program fragments that could be written there). +many other Forth program fragments that could be written in an +eval-escape). + +A stack prefix can contain letters, digits, or @samp{:}, and may start +with an @samp{#}; e.g., in Gforth the return stack has the stack prefix +@samp{R:}. This restriction is not checked during the stack prefix +definition, but it is enforced by the parsing rules for stack items +later. If you know Forth, the stack effects of the non-standard words involved are: @@ -661,15 +735,18 @@ are: @findex single @findex double @findex stack-prefix +@findex store-optimization @example -stack ( "name" "pointer" "type" -- ) - ( name execution: -- stack ) -type-prefix ( addr u xt1 xt2 n stack "prefix" -- ) -single ( -- xt1 xt2 n ) -double ( -- xt1 xt2 n ) -stack-prefix ( stack "prefix" -- ) +stack ( "name" "pointer" "type" -- ) + ( name execution: -- stack ) +type-prefix ( addr u item-size stack "prefix" -- ) +single ( -- item-size ) +double ( -- item-size ) +stack-prefix ( stack "prefix" -- ) +store-optimization ( -- addr ) @end example +An @var{item-size} takes three cells on the stack. @c -------------------------------------------------------------------- @node Simple instructions, Superinstructions, Input File Grammar, Input File Format @@ -793,22 +870,22 @@ level, this also sets the instruction po This ends a basic block (for profiling), even if the instruction contains no @code{SET_IP}. -@item TAIL; -@findex TAIL; -Vmgen replaces @samp{TAIL;} with code for ending a VM instruction and -dispatching the next VM instruction. Even without a @samp{TAIL;} this +@item INST_TAIL; +@findex INST_TAIL; +Vmgen replaces @samp{INST_TAIL;} with code for ending a VM instruction and +dispatching the next VM instruction. Even without a @samp{INST_TAIL;} this happens automatically when control reaches the end of the C code. If you want to have this in the middle of the C code, you need to use -@samp{TAIL;}. A typical example is a conditional VM branch: +@samp{INST_TAIL;}. A typical example is a conditional VM branch: @example if (branch_condition) @{ - SET_IP(target); TAIL; + SET_IP(target); INST_TAIL; @} /* implicit tail follows here */ @end example -In this example, @samp{TAIL;} is not strictly necessary, because there +In this example, @samp{INST_TAIL;} is not strictly necessary, because there is another one implicitly after the if-statement, but using it improves branch prediction accuracy slightly and allows other optimizations. @@ -822,7 +899,7 @@ typical application is in conditional VM @example if (branch_condition) @{ - SET_IP(target); TAIL; /* now this TAIL is necessary */ + SET_IP(target); INST_TAIL; /* now this INST_TAIL is necessary */ @} SUPER_CONTINUE; @end example @@ -832,7 +909,7 @@ SUPER_CONTINUE; Note that Vmgen is not smart about C-level tokenization, comments, strings, or conditional compilation, so it will interpret even a commented-out SUPER_END as ending a basic block (or, e.g., -@samp{RETAIL;} as @samp{TAIL;}). Conversely, Vmgen requires the literal +@samp{RESET_IP;} as @samp{SET_IP;}). Conversely, Vmgen requires the literal presence of these strings; Vmgen will not see them if they are hiding in a C preprocessor macro. @@ -879,7 +956,7 @@ The Vmgen-erated code loads the stack it memory into variables before the user-supplied C code, and stores them from variables to stack-pointer-indexed memory afterwards. If you do any writes to the stack through its stack pointer in your C code, it -will not affact the variables, and your write may be overwritten by the +will not affect the variables, and your write may be overwritten by the stores after the C code. Similarly, a read from a stack using a stack pointer will not reflect computations of stack items in the same VM instruction. @@ -900,7 +977,7 @@ contents. @c -------------------------------------------------------------------- -@node Superinstructions, Register Machines, Simple instructions, Input File Format +@node Superinstructions, Store Optimization, Simple instructions, Input File Format @section Superinstructions @cindex superinstructions, defining @cindex defining superinstructions @@ -958,7 +1035,65 @@ does not check these restrictions, they interpreter. @c ------------------------------------------------------------------- -@node Register Machines, , Superinstructions, Input File Format +@node Store Optimization, Register Machines, Superinstructions, Input File Format +@section Store Optimization +@cindex store optimization +@cindex optimization, stack stores +@cindex stack stores, optimization +@cindex eliminating stack stores + +This minor optimization (0.6\%--0.8\% reduction in executed instructions +for Gforth) puts additional requirements on the instruction descriptions +and is therefore disabled by default. + +What does it do? Consider an instruction like + +@example +dup ( n -- n n ) +@end example + +For simplicity, also assume that we are not caching the top-of-stack in +a register. Now, the C code for dup first loads @code{n} from the +stack, and then stores it twice to the stack, one time to the address +where it came from; that time is unnecessary, but gcc does not optimize +it away, so vmgen can do it instead (if you turn on the store +optimization). + +Vmgen uses the stack item's name to determine if the stack item contains +the same value as it did at the start. Therefore, if you use the store +optimization, you have to ensure that stack items that have the same +name on input and output also have the same value, and are not changed +in the C code you supply. I.e., the following code could fail if you +turn on the store optimization: + +@example +add1 ( n -- n ) +n++; +@end example + +Instead, you have to use different names, i.e.: + +@example +add1 ( n1 -- n1 ) +n2=n1+1; +@end example + +To turn on the store optimization, write + +@example +\E store-optimization on +@end example + +at the start of the file. You can turn this optimization on or off +between any two VM instruction descriptions. For turning it off again, +you can use + +@example +\E store-optimization off +@end example + +@c ------------------------------------------------------------------- +@node Register Machines, , Store Optimization, Input File Format @section Register Machines @cindex Register VM @cindex Superinstructions for register VMs @@ -1013,16 +1148,76 @@ VM interpreters. However, if you have i direction, please let me know (@pxref{Contact}). @c ******************************************************************** -@node Using the generated code, Changes, Input File Format, Top +@node Error messages, Using the generated code, Input File Format, Top +@chapter Error messages +@cindex error messages + +These error messages are created by Vmgen: + +@table @code + +@cindex @code{# can only be on the input side} error +@item # can only be on the input side +You have used an instruction-stream prefix (usually @samp{#}) after the +@samp{--} (the output side); you can only use it before (the input +side). + +@cindex @code{prefix for this combination must be defined earlier} error +@item the prefix for this combination must be defined earlier +You have defined a superinstruction (e.g. @code{abc = a b c}) without +defining its direct prefix (e.g., @code{ab = a b}), +@xref{Superinstructions}. + +@cindex @code{sync line syntax} error +@item sync line syntax +If you are using a preprocessor (e.g., @command{m4}) to generate Vmgen +input code, you may want to create @code{#line} directives (aka sync +lines). This error indicates that such a line is not in th syntax +expected by Vmgen (this should not happen; please report the offending +line in a bug report). + +@cindex @code{syntax error, wrong char} error +@cindex syntax error, wrong char +A syntax error. If you do not see right away where the error is, it may +be helpful to check the following: Did you put an empty line in a VM +instruction where the C code is not delimited by braces (then the empty +line ends the VM instruction)? If you used brace-delimited C code, did +you put the delimiting braces (and only those) at the start of the line, +without preceding white space? Did you forget a delimiting brace? + +@cindex @code{too many stacks} error +@item too many stacks +Vmgen currently supports 3 stacks (plus the instruction stream); if you +need more, let us know. + +@cindex @code{unknown prefix} error +@item unknown prefix +The stack item does not match any defined type prefix (after stripping +away any stack prefix). You should either declare the type prefix you +want for that stack item, or use a different type prefix + +@item @code{unknown primitive} error +@item unknown primitive +You have used the name of a simple VM instruction in a superinstruction +definition without defining the simple VM instruction first. + +@end table + +In addition, the C compiler can produce errors due to code produced by +Vmgen; e.g., you need to define type cast functions. + +@c ******************************************************************** +@node Using the generated code, Hints, Error messages, Top @chapter Using the generated code @cindex generated code, usage @cindex Using vmgen-erated code The easiest way to create a working VM interpreter with Vmgen is probably to start with @file{vmgen-ex}, and modify it for your purposes. -This chapter is just the reference manual for the macros etc. used by -the generated code, the other context expected by the generated code, -and what you can do with the various generated files. +This chapter explains what the various wrapper and generated files do. +It also contains reference-manual style descriptions of the macros, +variables etc. used by the generated code, and you can skip that on +first reading. @menu * VM engine:: Executing VM code @@ -1059,6 +1254,7 @@ In our example the engine function also @file{@var{name}-labels.i} (@pxref{VM instruction table}). @cindex tracing VM code +@cindex superinstructions and tracing In addition to executing the code, the VM engine can optionally also print out a trace of the executed instructions, their arguments and results. For superinstructions it prints the trace as if only component @@ -1080,8 +1276,8 @@ The following macros and variables are u @item LABEL(@var{inst_name}) This is used just before each VM instruction to provide a jump or @code{switch} label (the @samp{:} is provided by Vmgen). For switch -dispatch this should expand to @samp{case @var{label}}; for -threaded-code dispatch this should just expand to @samp{@var{label}}. +dispatch this should expand to @samp{case @var{label}:}; for +threaded-code dispatch this should just expand to @samp{@var{label}:}. In either case @var{label} is usually the @var{inst_name} with some prefix or suffix to avoid naming conflicts. @@ -1093,9 +1289,9 @@ should expand to nothing. @findex NAME @item NAME(@var{inst_name_string}) Called on entering a VM instruction with a string containing the name of -the VM instruction as parameter. In normal execution this should be a -noop, but for tracing this usually prints the name, and possibly other -information (several VM registers in our example). +the VM instruction as parameter. In normal execution this should be +expand to nothing, but for tracing this usually prints the name, and +possibly other information (several VM registers in our example). @findex DEF_CA @item DEF_CA @@ -1114,7 +1310,8 @@ different ways for best performance on v @samp{NEXT_P0} is invoked right at the start of the VM instruction (but after @samp{DEF_CA}), @samp{NEXT_P1} right after the user-supplied C code, and @samp{NEXT_P2} at the end. The actual jump has to be -performed by @samp{NEXT_P2}. +performed by @samp{NEXT_P2} (if you would do it earlier, important parts +of the VM instruction would not be executed). The simplest variant is if @samp{NEXT_P2} does everything and the other macros do nothing. Then also related macros like @samp{IP}, @@ -1541,12 +1738,94 @@ it uses variables and functions defined plus @code{VM_IS_INST} already defined for the VM disassembler (@pxref{VM disassembler}). +@c ********************************************************** +@node Hints, The future, Using the generated code, Top +@chapter Hints +@cindex hints + +@menu +* Floating point:: and stacks +@end menu + +@c -------------------------------------------------------------------- +@node Floating point, , Hints, Hints +@section Floating point + +How should you deal with floating point values? Should you use the same +stack as for integers/pointers, or a different one? This section +discusses this issue with a view on execution speed. + +The simpler approach is to use a separate floating-point stack. This +allows you to choose FP value size without considering the size of the +integers/pointers, and you avoid a number of performance problems. The +main downside is that this needs an FP stack pointer (and that may not +fit in the register file on the 386 arhitecture, costing some +performance, but comparatively little if you take the other option into +account). If you use a separate FP stack (with stack pointer @code{fp}), +using an fpTOS is helpful on most machines, but some spill the fpTOS +register into memory, and fpTOS should not be used there. + +The other approach is to share one stack (pointed to by, say, @code{sp}) +between integer/pointer and floating-point values. This is ok if you do +not use @code{spTOS}. If you do use @code{spTOS}, the compiler has to +decide whether to put that variable into an integer or a floating point +register, and the other type of operation becomes quite expensive on +most machines (because moving values between integer and FP registers is +quite expensive). If a value of one type has to be synthesized out of +two values of the other type (@code{double} types), things are even more +interesting. + +One way around this problem would be to not use the @code{spTOS} +supported by Vmgen, but to use explicit top-of-stack variables (one for +integers, one for FP values), and having a kind of accumulator+stack +architecture (e.g., Ocaml bytecode uses this approach); however, this is +a major change, and it's ramifications are not completely clear. @c ********************************************************** -@node Changes, Contact, Using the generated code, Top +@node The future, Changes, Hints, Top +@chapter The future +@cindex future ideas + +We have a number of ideas for future versions of Gforth. However, there +are so many possible things to do that we would like some feedback from +you. What are you doing with Vmgen, what features are you missing, and +why? + +One idea we are thinking about is to generate just one @file{.c} file +instead of letting you copy and adapt all the wrapper files (you would +still have to define stuff like the type-specific macros, and stack +pointers etc. somewhere). The advantage would be that, if we change the +wrapper files between versions, you would not need to integrate your +changes and our changes to them; Vmgen would also be easier to use for +beginners. The main disadvantage of that is that it would reduce the +flexibility of Vmgen a little (well, those who like flexibility could +still patch the resulting @file{.c} file, like they are now doing for +the wrapper files). In any case, if you are doing things to the wrapper +files that would cause problems in a generated-@file{.c}-file approach, +please let us know. + +@c ********************************************************** +@node Changes, Contact, The future, Top @chapter Changes @cindex Changes from old versions +User-visible changes between 0.5.9-20020822 and 0.5.9-20020901: + +The store optimization is now disabled by default, but can be enabled by +the user (@pxref{Store Optimization}). Documentation for this +optimization is also new. + +User-visible changes between 0.5.9-20010501 and 0.5.9-20020822: + +There is now a manual (in info, HTML, Postscript, or plain text format). + +There is the vmgen-ex2 variant of the vmgen-ex example; the new +variant uses a union type instead of lots of casting. + +Both variants of the example can now be compiled with an ANSI C compiler +(using switch dispatch and losing quite a bit of performance); tested +with @command{lcc}. + Users of the gforth-0.5.9-20010501 version of Vmgen need to change several things in their source code to use the current version. I recommend keeping the gforth-0.5.9-20010501 version until you have @@ -1558,6 +1837,11 @@ The required changes are: @table @code +@cindex @code{TAIL;}, changes +@item TAIL; +has been renamed into @code{INST_TAIL;} (less chance of an accidental +match). + @cindex @code{vm_@var{A}2@var{B}}, changes @item vm_@var{A}2@var{B} now takes two arguments. @@ -1576,6 +1860,16 @@ Also some new macros have to be defined, @node Contact, Copying This Manual, Changes, Top @chapter Contact +To report a bug, use +@url{https://savannah.gnu.org/bugs/?func=addbug&group_id=2672}. + +For discussion on Vmgen (e.g., how to use it), use the mailing list +@email{bug-vmgen@@mail.freesoftware.fsf.org} (use +@url{http://mail.gnu.org/mailman/listinfo/help-vmgen} to subscribe). + +You can find vmgen information at +@url{http://www.complang.tuwien.ac.at/anton/vmgen/}. + @c *********************************************************** @node Copying This Manual, Index, Contact, Top @appendix Copying This Manual