| 1 : |
anton
|
1.1
|
@include version.texi |
| 2 : |
|
|
|
| 3 : |
|
|
@c @ifnottex |
| 4 : |
|
|
This file documents vmgen (Gforth @value{VERSION}). |
| 5 : |
|
|
|
| 6 : |
anton
|
1.2
|
@chapter Introduction |
| 7 : |
anton
|
1.1
|
|
| 8 : |
|
|
Vmgen is a tool for writing efficient interpreters. It takes a simple |
| 9 : |
|
|
virtual machine description and generates efficient C code for dealing |
| 10 : |
|
|
with the virtual machine code in various ways (in particular, executing |
| 11 : |
|
|
it). The run-time efficiency of the resulting interpreters is usually |
| 12 : |
|
|
within a factor of 10 of machine code produced by an optimizing |
| 13 : |
|
|
compiler. |
| 14 : |
|
|
|
| 15 : |
|
|
The interpreter design strategy supported by vmgen is to divide the |
| 16 : |
|
|
interpreter into two parts: |
| 17 : |
|
|
|
| 18 : |
|
|
@itemize @bullet |
| 19 : |
|
|
|
| 20 : |
|
|
@item The @emph{front end} takes the source code of the language to be |
| 21 : |
|
|
implemented, and translates it into virtual machine code. This is |
| 22 : |
|
|
similar to an ordinary compiler front end; typically an interpreter |
| 23 : |
|
|
front-end performs no optimization, so it is relatively simple to |
| 24 : |
|
|
implement and runs fast. |
| 25 : |
|
|
|
| 26 : |
|
|
@item The @emph{virtual machine interpreter} executes the virtual |
| 27 : |
|
|
machine code. |
| 28 : |
|
|
|
| 29 : |
|
|
@end itemize |
| 30 : |
|
|
|
| 31 : |
|
|
Such a division is usually used in interpreters, for modularity as well |
| 32 : |
anton
|
1.6
|
as for efficiency. The virtual machine code is typically passed between |
| 33 : |
|
|
front end and virtual machine interpreter in memory, like in a |
| 34 : |
anton
|
1.1
|
load-and-go compiler; this avoids the complexity and time cost of |
| 35 : |
|
|
writing the code to a file and reading it again. |
| 36 : |
|
|
|
| 37 : |
|
|
A @emph{virtual machine} (VM) represents the program as a sequence of |
| 38 : |
|
|
@emph{VM instructions}, following each other in memory, similar to real |
| 39 : |
|
|
machine code. Control flow occurs through VM branch instructions, like |
| 40 : |
|
|
in a real machine. |
| 41 : |
|
|
|
| 42 : |
|
|
In this setup, vmgen can generate most of the code dealing with virtual |
| 43 : |
|
|
machine instructions from a simple description of the virtual machine |
| 44 : |
|
|
instructions (@pxref...), in particular: |
| 45 : |
|
|
|
| 46 : |
|
|
@table @emph |
| 47 : |
|
|
|
| 48 : |
|
|
@item VM instruction execution |
| 49 : |
|
|
|
| 50 : |
|
|
@item VM code generation |
| 51 : |
|
|
Useful in the front end. |
| 52 : |
|
|
|
| 53 : |
|
|
@item VM code decompiler |
| 54 : |
|
|
Useful for debugging the front end. |
| 55 : |
|
|
|
| 56 : |
|
|
@item VM code tracing |
| 57 : |
|
|
Useful for debugging the front end and the VM interpreter. You will |
| 58 : |
|
|
typically provide other means for debugging the user's programs at the |
| 59 : |
|
|
source level. |
| 60 : |
|
|
|
| 61 : |
|
|
@item VM code profiling |
| 62 : |
|
|
Useful for optimizing the VM insterpreter with superinstructions |
| 63 : |
|
|
(@pxref...). |
| 64 : |
|
|
|
| 65 : |
|
|
@end table |
| 66 : |
|
|
|
| 67 : |
|
|
VMgen supports efficient interpreters though various optimizations, in |
| 68 : |
|
|
particular |
| 69 : |
|
|
|
| 70 : |
|
|
@itemize |
| 71 : |
|
|
|
| 72 : |
|
|
@item Threaded code |
| 73 : |
|
|
|
| 74 : |
|
|
@item Caching the top-of-stack in a register |
| 75 : |
|
|
|
| 76 : |
|
|
@item Combining VM instructions into superinstructions |
| 77 : |
|
|
|
| 78 : |
|
|
@item |
| 79 : |
|
|
Replicating VM (super)instructions for better BTB prediction accuracy |
| 80 : |
|
|
(not yet in vmgen-ex, but already in Gforth). |
| 81 : |
|
|
|
| 82 : |
|
|
@end itemize |
| 83 : |
|
|
|
| 84 : |
|
|
As a result, vmgen-based interpreters are only about an order of |
| 85 : |
|
|
magintude slower than native code from an optimizing C compiler on small |
| 86 : |
|
|
benchmarks; on large benchmarks, which spend more time in the run-time |
| 87 : |
anton
|
1.2
|
system, the slowdown is often less (e.g., the slowdown of a |
| 88 : |
|
|
Vmgen-generated JVM interpreter over the best JVM JIT compiler we |
| 89 : |
|
|
measured is only a factor of 2-3 for large benchmarks; some other JITs |
| 90 : |
|
|
and all other interpreters we looked at were slower than our |
| 91 : |
|
|
interpreter). |
| 92 : |
anton
|
1.1
|
|
| 93 : |
|
|
VMs are usually designed as stack machines (passing data between VM |
| 94 : |
|
|
instructions on a stack), and vmgen supports such designs especially |
| 95 : |
|
|
well; however, you can also use vmgen for implementing a register VM and |
| 96 : |
|
|
still benefit from most of the advantages offered by vmgen. |
| 97 : |
|
|
|
| 98 : |
anton
|
1.2
|
There are many potential uses of the instruction descriptions that are |
| 99 : |
|
|
not implemented at the moment, but we are open for feature requests, and |
| 100 : |
|
|
we will implement new features if someone asks for them; so the feature |
| 101 : |
|
|
list above is not exhaustive. |
| 102 : |
anton
|
1.1
|
|
| 103 : |
anton
|
1.2
|
@c ********************************************************************* |
| 104 : |
|
|
@chapter Why interpreters? |
| 105 : |
|
|
|
| 106 : |
|
|
Interpreters are a popular language implementation technique because |
| 107 : |
|
|
they combine all three of the following advantages: |
| 108 : |
|
|
|
| 109 : |
|
|
@itemize |
| 110 : |
|
|
|
| 111 : |
|
|
@item Ease of implementation |
| 112 : |
|
|
|
| 113 : |
|
|
@item Portability |
| 114 : |
|
|
|
| 115 : |
|
|
@item Fast edit-compile-run cycle |
| 116 : |
|
|
|
| 117 : |
|
|
@end itemize |
| 118 : |
|
|
|
| 119 : |
|
|
The main disadvantage of interpreters is their run-time speed. However, |
| 120 : |
|
|
there are huge differences between different interpreters in this area: |
| 121 : |
|
|
the slowdown over optimized C code on programs consisting of simple |
| 122 : |
|
|
operations is typically a factor of 10 for the more efficient |
| 123 : |
|
|
interpreters, and a factor of 1000 for the less efficient ones (the |
| 124 : |
|
|
slowdown for programs executing complex operations is less, because the |
| 125 : |
|
|
time spent in libraries for executing complex operations is the same in |
| 126 : |
|
|
all implementation strategies). |
| 127 : |
|
|
|
| 128 : |
|
|
Vmgen makes it even easier to implement interpreters. It also supports |
| 129 : |
|
|
techniques for building efficient interpreters. |
| 130 : |
|
|
|
| 131 : |
|
|
@c ******************************************************************** |
| 132 : |
|
|
@chapter Concepts |
| 133 : |
|
|
|
| 134 : |
|
|
@c -------------------------------------------------------------------- |
| 135 : |
|
|
@section Front-end and virtual machine interpreter |
| 136 : |
|
|
|
| 137 : |
|
|
@cindex front-end |
| 138 : |
|
|
Interpretive systems are typically divided into a @emph{front end} that |
| 139 : |
|
|
parses the input language and produces an intermediate representation |
| 140 : |
|
|
for the program, and an interpreter that executes the intermediate |
| 141 : |
|
|
representation of the program. |
| 142 : |
|
|
|
| 143 : |
|
|
@cindex virtual machine |
| 144 : |
|
|
@cindex VM |
| 145 : |
|
|
@cindex instruction, VM |
| 146 : |
|
|
For efficient interpreters the intermediate representation of choice is |
| 147 : |
|
|
virtual machine code (rather than, e.g., an abstract syntax tree). |
| 148 : |
|
|
@emph{Virtual machine} (VM) code consists of VM instructions arranged |
| 149 : |
|
|
sequentially in memory; they are executed in sequence by the VM |
| 150 : |
|
|
interpreter, except for VM branch instructions, which implement control |
| 151 : |
|
|
structures. The conceptual similarity to real machine code results in |
| 152 : |
|
|
the name @emph{virtual machine}. |
| 153 : |
|
|
|
| 154 : |
|
|
In this framework, vmgen supports building the VM interpreter and any |
| 155 : |
|
|
other component dealing with VM instructions. It does not have any |
| 156 : |
|
|
support for the front end, apart from VM code generation support. The |
| 157 : |
|
|
front end can be implemented with classical compiler front-end |
| 158 : |
anton
|
1.3
|
techniques, supported by tools like @command{flex} and @command{bison}. |
| 159 : |
anton
|
1.2
|
|
| 160 : |
|
|
The intermediate representation is usually just internal to the |
| 161 : |
|
|
interpreter, but some systems also support saving it to a file, either |
| 162 : |
|
|
as an image file, or in a full-blown linkable file format (e.g., JVM). |
| 163 : |
|
|
Vmgen currently has no special support for such features, but the |
| 164 : |
|
|
information in the instruction descriptions can be helpful, and we are |
| 165 : |
|
|
open for feature requests and suggestions. |
| 166 : |
anton
|
1.3
|
|
| 167 : |
|
|
@section Data handling |
| 168 : |
|
|
|
| 169 : |
|
|
@cindex stack machine |
| 170 : |
|
|
@cindex register machine |
| 171 : |
|
|
Most VMs use one or more stacks for passing temporary data between VM |
| 172 : |
|
|
instructions. Another option is to use a register machine architecture |
| 173 : |
|
|
for the virtual machine; however, this option is either slower or |
| 174 : |
|
|
significantly more complex to implement than a stack machine architecture. |
| 175 : |
|
|
|
| 176 : |
|
|
Vmgen has special support and optimizations for stack VMs, making their |
| 177 : |
|
|
implementation easy and efficient. |
| 178 : |
|
|
|
| 179 : |
|
|
You can also implement a register VM with vmgen (@pxref{Register |
| 180 : |
|
|
Machines}), and you will still profit from most vmgen features. |
| 181 : |
|
|
|
| 182 : |
|
|
@cindex stack item size |
| 183 : |
|
|
@cindex size, stack items |
| 184 : |
|
|
Stack items all have the same size, so they typically will be as wide as |
| 185 : |
|
|
an integer, pointer, or floating-point value. Vmgen supports treating |
| 186 : |
|
|
two consecutive stack items as a single value, but anything larger is |
| 187 : |
|
|
best kept in some other memory area (e.g., the heap), with pointers to |
| 188 : |
|
|
the data on the stack. |
| 189 : |
|
|
|
| 190 : |
|
|
@cindex instruction stream |
| 191 : |
|
|
@cindex immediate arguments |
| 192 : |
|
|
Another source of data is immediate arguments VM instructions (in the VM |
| 193 : |
|
|
instruction stream). The VM instruction stream is handled similar to a |
| 194 : |
|
|
stack in vmgen. |
| 195 : |
|
|
|
| 196 : |
|
|
@cindex garbage collection |
| 197 : |
|
|
@cindex reference counting |
| 198 : |
|
|
Vmgen has no built-in support for nor restrictions against @emph{garbage |
| 199 : |
|
|
collection}. If you need garbage collection, you need to provide it in |
| 200 : |
|
|
your run-time libraries. Using @emph{reference counting} is probably |
| 201 : |
|
|
harder, but might be possible (contact us if you are interested). |
| 202 : |
|
|
@c reference counting might be possible by including counting code in |
| 203 : |
|
|
@c the conversion macros. |
| 204 : |
|
|
|
| 205 : |
anton
|
1.6
|
@section Dispatch |
| 206 : |
|
|
|
| 207 : |
|
|
Understanding this section is probably not necessary for using vmgen, |
| 208 : |
|
|
but it may help. You may want to skip it now, and read it if you find statements about dispatch methods confusing. |
| 209 : |
|
|
|
| 210 : |
|
|
After executing one VM instruction, the VM interpreter has to dispatch |
| 211 : |
|
|
the next VM instruction (vmgen calls the dispatch routine @samp{NEXT}). |
| 212 : |
|
|
Vmgen supports two methods of dispatch: |
| 213 : |
|
|
|
| 214 : |
|
|
@table |
| 215 : |
|
|
|
| 216 : |
|
|
@item switch dispatch |
| 217 : |
|
|
In this method the VM interpreter contains a giant @code{switch} |
| 218 : |
|
|
statement, with one @code{case} for each VM instruction. The VM |
| 219 : |
|
|
instructions are represented by integers (e.g., produced by an |
| 220 : |
|
|
@code{enum}) in the VM code, and dipatch occurs by loading the next |
| 221 : |
|
|
integer from the VM code, @code{switch}ing on it, and continuing at the |
| 222 : |
|
|
appropriate @code{case}; after executing the VM instruction, jump back |
| 223 : |
|
|
to the dispatch code. |
| 224 : |
|
|
|
| 225 : |
|
|
@item threaded code |
| 226 : |
|
|
This method represents a VM instruction in the VM code by the address of |
| 227 : |
|
|
the start of the machine code fragment for executing the VM instruction. |
| 228 : |
|
|
Dispatch consists of loading this address, jumping to it, and |
| 229 : |
|
|
incrementing the VM instruction pointer. Typically the threaded-code |
| 230 : |
|
|
dispatch code is appended directly to the code for executing the VM |
| 231 : |
|
|
instruction. Threaded code cannot be implemented in ANSI C, but it can |
| 232 : |
|
|
be implemented using GNU C's labels-as-values extension (@pxref{labels |
| 233 : |
|
|
as values}). |
| 234 : |
|
|
|
| 235 : |
|
|
@end table |
| 236 : |
|
|
|
| 237 : |
anton
|
1.3
|
@c ************************************************************* |
| 238 : |
|
|
@chapter Invoking vmgen |
| 239 : |
|
|
|
| 240 : |
|
|
The usual way to invoke vmgen is as follows: |
| 241 : |
|
|
|
| 242 : |
|
|
@example |
| 243 : |
|
|
vmgen @var{infile} |
| 244 : |
|
|
@end example |
| 245 : |
|
|
|
| 246 : |
|
|
Here @var{infile} is the VM instruction description file, which usually |
| 247 : |
|
|
ends in @file{.vmg}. The output filenames are made by taking the |
| 248 : |
|
|
basename of @file{infile} (i.e., the output files will be created in the |
| 249 : |
|
|
current working directory) and replacing @file{.vmg} with @file{-vm.i}, |
| 250 : |
|
|
@file{-disasm.i}, @file{-gen.i}, @file{-labels.i}, @file{-profile.i}, |
| 251 : |
|
|
and @file{-peephole.i}. E.g., @command{bison hack/foo.vmg} will create |
| 252 : |
|
|
@file{foo-vm.i} etc. |
| 253 : |
|
|
|
| 254 : |
|
|
The command-line options supported by vmgen are |
| 255 : |
|
|
|
| 256 : |
|
|
@table @option |
| 257 : |
|
|
|
| 258 : |
|
|
@cindex -h, command-line option |
| 259 : |
|
|
@cindex --help, command-line option |
| 260 : |
|
|
@item --help |
| 261 : |
|
|
@itemx -h |
| 262 : |
|
|
Print a message about the command-line options |
| 263 : |
|
|
|
| 264 : |
|
|
@cindex -v, command-line option |
| 265 : |
|
|
@cindex --version, command-line option |
| 266 : |
|
|
@item --version |
| 267 : |
|
|
@itemx -v |
| 268 : |
|
|
Print version and exit |
| 269 : |
|
|
@end table |
| 270 : |
|
|
|
| 271 : |
|
|
@c env vars GFORTHDIR GFORTHDATADIR |
| 272 : |
|
|
|
| 273 : |
anton
|
1.5
|
@c **************************************************************** |
| 274 : |
|
|
@chapter Example |
| 275 : |
|
|
|
| 276 : |
|
|
@section Example overview |
| 277 : |
|
|
|
| 278 : |
|
|
There are two versions of the same example for using vmgen: |
| 279 : |
|
|
@file{vmgen-ex} and @file{vmgen-ex2} (you can also see Gforth as |
| 280 : |
|
|
example, but it uses additional (undocumented) features, and also |
| 281 : |
|
|
differs in some other respects). The example implements @emph{mini}, a |
| 282 : |
|
|
tiny Modula-2-like language with a small JavaVM-like virtual machine. |
| 283 : |
|
|
The difference between the examples is that @file{vmgen-ex} uses many |
| 284 : |
|
|
casts, and @file{vmgen-ex2} tries to avoids most casts and uses unions |
| 285 : |
|
|
instead. |
| 286 : |
|
|
|
| 287 : |
|
|
The files provided with each example are: |
| 288 : |
|
|
|
| 289 : |
|
|
@example |
| 290 : |
|
|
Makefile |
| 291 : |
|
|
README |
| 292 : |
|
|
disasm.c wrapper file |
| 293 : |
|
|
engine.c wrapper file |
| 294 : |
|
|
peephole.c wrapper file |
| 295 : |
|
|
profile.c wrapper file |
| 296 : |
|
|
mini-inst.vmg simple VM instructions |
| 297 : |
|
|
mini-super.vmg superinstructions (empty at first) |
| 298 : |
|
|
mini.h common declarations |
| 299 : |
|
|
mini.l scanner |
| 300 : |
|
|
mini.y front end (parser, VM code generator) |
| 301 : |
|
|
support.c main() and other support functions |
| 302 : |
|
|
fib.mini example mini program |
| 303 : |
|
|
simple.mini example mini program |
| 304 : |
|
|
test.mini example mini program (tests everything) |
| 305 : |
|
|
test.out test.mini output |
| 306 : |
|
|
stat.awk script for aggregating profile information |
| 307 : |
|
|
peephole-blacklist list of instructions not allowed in superinstructions |
| 308 : |
|
|
seq2rule.awk script for creating superinstructions |
| 309 : |
|
|
@end example |
| 310 : |
|
|
|
| 311 : |
|
|
For your own interpreter, you would typically copy the following files |
| 312 : |
|
|
and change little, if anything: |
| 313 : |
|
|
|
| 314 : |
|
|
@example |
| 315 : |
|
|
disasm.c wrapper file |
| 316 : |
|
|
engine.c wrapper file |
| 317 : |
|
|
peephole.c wrapper file |
| 318 : |
|
|
profile.c wrapper file |
| 319 : |
|
|
stat.awk script for aggregating profile information |
| 320 : |
|
|
seq2rule.awk script for creating superinstructions |
| 321 : |
|
|
@end example |
| 322 : |
|
|
|
| 323 : |
|
|
You would typically change much in or replace the following files: |
| 324 : |
|
|
|
| 325 : |
|
|
@example |
| 326 : |
|
|
Makefile |
| 327 : |
|
|
mini-inst.vmg simple VM instructions |
| 328 : |
|
|
mini.h common declarations |
| 329 : |
|
|
mini.l scanner |
| 330 : |
|
|
mini.y front end (parser, VM code generator) |
| 331 : |
|
|
support.c main() and other support functions |
| 332 : |
|
|
peephole-blacklist list of instructions not allowed in superinstructions |
| 333 : |
|
|
@end example |
| 334 : |
|
|
|
| 335 : |
|
|
You can build the example by @code{cd}ing into the example's directory, |
| 336 : |
|
|
and then typing @samp{make}; you can check that it works with @samp{make |
| 337 : |
|
|
check}. You can run run mini programs like this: |
| 338 : |
|
|
|
| 339 : |
|
|
@example |
| 340 : |
|
|
./mini fib.mini |
| 341 : |
|
|
@end example |
| 342 : |
|
|
|
| 343 : |
|
|
To learn about the options, type @samp{./mini -h}. |
| 344 : |
|
|
|
| 345 : |
|
|
@section Using profiling to create superinstructions |
| 346 : |
|
|
|
| 347 : |
|
|
I have not added rules for this in the @file{Makefile} (there are many |
| 348 : |
|
|
options for selecting superinstructions, and I did not want to hardcode |
| 349 : |
|
|
one into the @file{Makefile}), but there are some supporting scripts, and |
| 350 : |
|
|
here's an example: |
| 351 : |
|
|
|
| 352 : |
|
|
Suppose you want to use @file{fib.mini} and @file{test.mini} as training |
| 353 : |
|
|
programs, you get the profiles like this: |
| 354 : |
|
|
|
| 355 : |
|
|
@example |
| 356 : |
|
|
make fib.prof test.prof #takes a few seconds |
| 357 : |
|
|
@end example |
| 358 : |
|
|
|
| 359 : |
|
|
You can aggregate these profiles with @file{stat.awk}: |
| 360 : |
|
|
|
| 361 : |
|
|
@example |
| 362 : |
|
|
awk -f stat.awk fib.prof test.prof |
| 363 : |
|
|
@end example |
| 364 : |
|
|
|
| 365 : |
|
|
The result contains lines like: |
| 366 : |
|
|
|
| 367 : |
|
|
@example |
| 368 : |
|
|
2 16 36910041 loadlocal lit |
| 369 : |
|
|
@end example |
| 370 : |
|
|
|
| 371 : |
|
|
This means that the sequence @code{loadlocal lit} statically occurs a |
| 372 : |
|
|
total of 16 times in 2 profiles, with a dynamic execution count of |
| 373 : |
|
|
36910041. |
| 374 : |
|
|
|
| 375 : |
|
|
The numbers can be used in various ways to select superinstructions. |
| 376 : |
|
|
E.g., if you just want to select all sequences with a dynamic |
| 377 : |
|
|
execution count exceeding 10000, you would use the following pipeline: |
| 378 : |
|
|
|
| 379 : |
|
|
@example |
| 380 : |
|
|
awk -f stat.awk fib.prof test.prof| |
| 381 : |
|
|
awk '$3>=10000'| #select sequences |
| 382 : |
|
|
fgrep -v -f peephole-blacklist| #eliminate wrong instructions |
| 383 : |
|
|
awk -f seq2rule.awk| #transform sequences into superinstruction rules |
| 384 : |
|
|
sort -k 3 >mini-super.vmg #sort sequences |
| 385 : |
|
|
@end example |
| 386 : |
|
|
|
| 387 : |
|
|
The file @file{peephole-blacklist} contains all instructions that |
| 388 : |
|
|
directly access a stack or stack pointer (for mini: @code{call}, |
| 389 : |
|
|
@code{return}); the sort step is necessary to ensure that prefixes |
| 390 : |
|
|
preceed larger superinstructions. |
| 391 : |
|
|
|
| 392 : |
|
|
Now you can create a version of mini with superinstructions by just |
| 393 : |
|
|
saying @samp{make} |
| 394 : |
|
|
|
| 395 : |
anton
|
1.3
|
@c *************************************************************** |
| 396 : |
|
|
@chapter Input File Format |
| 397 : |
|
|
|
| 398 : |
|
|
Vmgen takes as input a file containing specifications of virtual machine |
| 399 : |
|
|
instructions. This file usually has a name ending in @file{.vmg}. |
| 400 : |
|
|
|
| 401 : |
anton
|
1.5
|
Most examples are taken from the example in @file{vmgen-ex}. |
| 402 : |
anton
|
1.3
|
|
| 403 : |
|
|
@section Input File Grammar |
| 404 : |
|
|
|
| 405 : |
|
|
The grammar is in EBNF format, with @code{@var{a}|@var{b}} meaning |
| 406 : |
|
|
``@var{a} or @var{b}'', @code{@{@var{c}@}} meaning 0 or more repetitions |
| 407 : |
|
|
of @var{c} and @code{[@var{d}]} meaning 0 or 1 repetitions of @var{d}. |
| 408 : |
|
|
|
| 409 : |
|
|
Vmgen input is not free-format, so you have to take care where you put |
| 410 : |
|
|
spaces and especially newlines; it's not as bad as makefiles, though: |
| 411 : |
|
|
any sequence of spaces and tabs is equivalent to a single space. |
| 412 : |
|
|
|
| 413 : |
|
|
@example |
| 414 : |
|
|
description: {instruction|comment|eval-escape} |
| 415 : |
|
|
|
| 416 : |
|
|
instruction: simple-inst|superinst |
| 417 : |
|
|
|
| 418 : |
|
|
simple-inst: ident " (" stack-effect " )" newline c-code newline newline |
| 419 : |
|
|
|
| 420 : |
|
|
stack-effect: {ident} " --" {ident} |
| 421 : |
|
|
|
| 422 : |
|
|
super-inst: ident " =" ident {ident} |
| 423 : |
|
|
|
| 424 : |
|
|
comment: "\ " text newline |
| 425 : |
|
|
|
| 426 : |
|
|
eval-escape: "\e " text newline |
| 427 : |
|
|
@end example |
| 428 : |
|
|
@c \+ \- \g \f \c |
| 429 : |
|
|
|
| 430 : |
|
|
Note that the @code{\}s in this grammar are meant literally, not as |
| 431 : |
anton
|
1.5
|
C-style encodings for non-printable characters. |
| 432 : |
anton
|
1.3
|
|
| 433 : |
|
|
The C code in @code{simple-inst} must not contain empty lines (because |
| 434 : |
|
|
vmgen would mistake that as the end of the simple-inst. The text in |
| 435 : |
|
|
@code{comment} and @code{eval-escape} must not contain a newline. |
| 436 : |
|
|
@code{Ident} must conform to the usual conventions of C identifiers |
| 437 : |
|
|
(otherwise the C compiler would choke on the vmgen output). |
| 438 : |
|
|
|
| 439 : |
|
|
Vmgen understands a few extensions beyond the grammar given here, but |
| 440 : |
|
|
these extensions are only useful for building Gforth. You can find a |
| 441 : |
|
|
description of the format used for Gforth in @file{prim}. |
| 442 : |
|
|
|
| 443 : |
|
|
@subsection |
| 444 : |
|
|
@c woanders? |
| 445 : |
|
|
The text in @code{eval-escape} is Forth code that is evaluated when |
| 446 : |
|
|
vmgen reads the line. If you do not know (and do not want to learn) |
| 447 : |
|
|
Forth, you can build the text according to the following grammar; these |
| 448 : |
|
|
rules are normally all Forth you need for using vmgen: |
| 449 : |
|
|
|
| 450 : |
|
|
@example |
| 451 : |
|
|
text: stack-decl|type-prefix-decl|stack-prefix-decl |
| 452 : |
|
|
|
| 453 : |
|
|
stack-decl: "stack " ident ident ident |
| 454 : |
|
|
type-prefix-decl: |
| 455 : |
|
|
's" ' string '" ' ("single"|"double") ident "type-prefix" ident |
| 456 : |
|
|
stack-prefix-decl: ident "stack-prefix" string |
| 457 : |
|
|
@end example |
| 458 : |
|
|
|
| 459 : |
|
|
Note that the syntax of this code is not checked thoroughly (there are |
| 460 : |
|
|
many other Forth program fragments that could be written there). |
| 461 : |
|
|
|
| 462 : |
|
|
If you know Forth, the stack effects of the non-standard words involved |
| 463 : |
|
|
are: |
| 464 : |
|
|
|
| 465 : |
|
|
@example |
| 466 : |
|
|
stack ( "name" "pointer" "type" -- ) |
| 467 : |
|
|
( name execution: -- stack ) |
| 468 : |
|
|
type-prefix ( addr u xt1 xt2 n stack "prefix" -- ) |
| 469 : |
|
|
single ( -- xt1 xt2 n ) |
| 470 : |
|
|
double ( -- xt1 xt2 n ) |
| 471 : |
|
|
stack-prefix ( stack "prefix" -- ) |
| 472 : |
|
|
@end example |
| 473 : |
|
|
|
| 474 : |
anton
|
1.5
|
|
| 475 : |
anton
|
1.3
|
@section Simple instructions |
| 476 : |
|
|
|
| 477 : |
|
|
We will use the following simple VM instruction description as example: |
| 478 : |
|
|
|
| 479 : |
|
|
@example |
| 480 : |
|
|
sub ( i1 i2 -- i ) |
| 481 : |
|
|
i = i1-i2; |
| 482 : |
|
|
@end example |
| 483 : |
|
|
|
| 484 : |
|
|
The first line specifies the name of the VM instruction (@code{sub}) and |
| 485 : |
|
|
its stack effect (@code{i1 i2 -- i}). The rest of the description is |
| 486 : |
|
|
just plain C code. |
| 487 : |
|
|
|
| 488 : |
|
|
@cindex stack effect |
| 489 : |
|
|
The stack effect specifies that @code{sub} pulls two integers from the |
| 490 : |
anton
|
1.5
|
data stack and puts them in the C variables @code{i1} and @code{i2} (with |
| 491 : |
anton
|
1.3
|
the rightmost item (@code{i2}) taken from the top of stack) and later |
| 492 : |
|
|
pushes one integer (@code{i)) on the data stack (the rightmost item is |
| 493 : |
|
|
on the top afterwards). |
| 494 : |
|
|
|
| 495 : |
|
|
How do we know the type and stack of the stack items? Vmgen uses |
| 496 : |
|
|
prefixes, similar to Fortran; in contrast to Fortran, you have to |
| 497 : |
|
|
define the prefix first: |
| 498 : |
|
|
|
| 499 : |
|
|
@example |
| 500 : |
|
|
\E s" Cell" single data-stack type-prefix i |
| 501 : |
|
|
@end example |
| 502 : |
|
|
|
| 503 : |
|
|
This defines the prefix @code{i} to refer to the type @code{Cell} |
| 504 : |
|
|
(defined as @code{long} in @file{mini.h}) and, by default, to the |
| 505 : |
|
|
@code{data-stack}. It also specifies that this type takes one stack |
| 506 : |
|
|
item (@code{single}). The type prefix is part of the variable name. |
| 507 : |
|
|
|
| 508 : |
|
|
Before we can use @code{data-stack} in this way, we have to define it: |
| 509 : |
|
|
|
| 510 : |
|
|
@example |
| 511 : |
|
|
\E stack data-stack sp Cell |
| 512 : |
|
|
@end example |
| 513 : |
|
|
@c !! use something other than Cell |
| 514 : |
|
|
|
| 515 : |
|
|
This line defines the stack @code{data-stack}, which uses the stack |
| 516 : |
|
|
pointer @code{sp}, and each item has the basic type @code{Cell}; other |
| 517 : |
|
|
types have to fit into one or two @code{Cell}s (depending on whether the |
| 518 : |
|
|
type is @code{single} or @code{double} wide), and are converted from and |
| 519 : |
|
|
to Cells on accessing the @code{data-stack) with conversion macros |
| 520 : |
|
|
(@pxref{Conversion macros}). Stacks grow towards lower addresses in |
| 521 : |
anton
|
1.5
|
vmgen-erated interpreters. |
| 522 : |
anton
|
1.3
|
|
| 523 : |
|
|
We can override the default stack of a stack item by using a stack |
| 524 : |
|
|
prefix. E.g., consider the following instruction: |
| 525 : |
|
|
|
| 526 : |
|
|
@example |
| 527 : |
|
|
lit ( #i -- i ) |
| 528 : |
|
|
@end example |
| 529 : |
|
|
|
| 530 : |
|
|
The VM instruction @code{lit} takes the item @code{i} from the |
| 531 : |
anton
|
1.5
|
instruction stream (indicated by the prefix @code{#}), and pushes it on |
| 532 : |
anton
|
1.3
|
the (default) data stack. The stack prefix is not part of the variable |
| 533 : |
|
|
name. Stack prefixes are defined like this: |
| 534 : |
|
|
|
| 535 : |
|
|
@example |
| 536 : |
|
|
\E inst-stream stack-prefix # |
| 537 : |
|
|
@end example |
| 538 : |
|
|
|
| 539 : |
anton
|
1.5
|
This definition defines that the stack prefix @code{#} specifies the |
| 540 : |
anton
|
1.3
|
``stack'' @code{inst-stream}. Since the instruction stream behaves a |
| 541 : |
|
|
little differently than an ordinary stack, it is predefined, and you do |
| 542 : |
|
|
not need to define it. |
| 543 : |
|
|
|
| 544 : |
|
|
The instruction stream contains instructions and their immediate |
| 545 : |
|
|
arguments, so specifying that an argument comes from the instruction |
| 546 : |
|
|
stream indicates an immediate argument. Of course, instruction stream |
| 547 : |
|
|
arguments can only appear to the left of @code{--} in the stack effect. |
| 548 : |
|
|
If there are multiple instruction stream arguments, the leftmost is the |
| 549 : |
|
|
first one (just as the intuition suggests). |
| 550 : |
|
|
|
| 551 : |
anton
|
1.5
|
@subsubsection C Code Macros |
| 552 : |
|
|
|
| 553 : |
|
|
Vmgen recognizes the following strings in the C code part of simple |
| 554 : |
|
|
instructions: |
| 555 : |
|
|
|
| 556 : |
|
|
@table @samp |
| 557 : |
|
|
|
| 558 : |
|
|
@item SET_IP |
| 559 : |
|
|
As far as vmgen is concerned, a VM instruction containing this ends a VM |
| 560 : |
|
|
basic block (used in profiling to delimit profiled sequences). On the C |
| 561 : |
|
|
level, this also sets the instruction pointer. |
| 562 : |
|
|
|
| 563 : |
|
|
@item SUPER_END |
| 564 : |
|
|
This ends a basic block (for profiling), without a SET_IP. |
| 565 : |
|
|
|
| 566 : |
|
|
@item TAIL; |
| 567 : |
|
|
Vmgen replaces @samp{TAIL;} with code for ending a VM instruction and |
| 568 : |
|
|
dispatching the next VM instruction. This happens automatically when |
| 569 : |
|
|
control reaches the end of the C code. If you want to have this in the |
| 570 : |
|
|
middle of the C code, you need to use @samp{TAIL;}. A typical example |
| 571 : |
|
|
is a conditional VM branch: |
| 572 : |
|
|
|
| 573 : |
|
|
@example |
| 574 : |
|
|
if (branch_condition) { |
| 575 : |
|
|
SET_IP(target); TAIL; |
| 576 : |
|
|
} |
| 577 : |
|
|
/* implicit tail follows here */ |
| 578 : |
|
|
@end example |
| 579 : |
|
|
|
| 580 : |
|
|
In this example, @samp{TAIL;} is not strictly necessary, because there |
| 581 : |
|
|
is another one implicitly after the if-statement, but using it improves |
| 582 : |
|
|
branch prediction accuracy slightly and allows other optimizations. |
| 583 : |
|
|
|
| 584 : |
|
|
@item SUPER_CONTINUE |
| 585 : |
|
|
This indicates that the implicit tail at the end of the VM instruction |
| 586 : |
|
|
dispatches the sequentially next VM instruction even if there is a |
| 587 : |
|
|
@code{SET_IP} in the VM instruction. This enables an optimization that |
| 588 : |
|
|
is not yet implemented in the vmgen-ex code (but in Gforth). The |
| 589 : |
|
|
typical application is in conditional VM branches: |
| 590 : |
|
|
|
| 591 : |
|
|
@example |
| 592 : |
|
|
if (branch_condition) { |
| 593 : |
|
|
SET_IP(target); TAIL; /* now this TAIL is necessary */ |
| 594 : |
|
|
} |
| 595 : |
|
|
SUPER_CONTINUE; |
| 596 : |
|
|
@end example |
| 597 : |
|
|
|
| 598 : |
|
|
@end table |
| 599 : |
|
|
|
| 600 : |
|
|
Note that vmgen is not smart about C-level tokenization, comments, |
| 601 : |
|
|
strings, or conditional compilation, so it will interpret even a |
| 602 : |
|
|
commented-out SUPER_END as ending a basic block (or, e.g., |
| 603 : |
|
|
@samp{RETAIL;} as @samp{TAIL;}). Conversely, vmgen requires the literal |
| 604 : |
|
|
presence of these strings; vmgen will not see them if they are hiding in |
| 605 : |
|
|
a C preprocessor macro. |
| 606 : |
|
|
|
| 607 : |
|
|
|
| 608 : |
|
|
@subsubsection C Code restrictions |
| 609 : |
|
|
|
| 610 : |
|
|
Vmgen generates code and performs some optimizations under the |
| 611 : |
|
|
assumption that the user-supplied C code does not access the stack |
| 612 : |
|
|
pointers or stack items, and that accesses to the instruction pointer |
| 613 : |
|
|
only occur through special macros. In general you should heed these |
| 614 : |
|
|
restrictions. However, if you need to break these restrictions, read |
| 615 : |
|
|
the following. |
| 616 : |
|
|
|
| 617 : |
|
|
Accessing a stack or stack pointer directly can be a problem for several |
| 618 : |
|
|
reasons: |
| 619 : |
|
|
|
| 620 : |
|
|
@itemize |
| 621 : |
|
|
|
| 622 : |
|
|
@item |
| 623 : |
|
|
You may cache the top-of-stack item in a local variable (that is |
| 624 : |
|
|
allocated to a register). This is the most frequent source of trouble. |
| 625 : |
|
|
You can deal with it either by not using top-of-stack caching (slowdown |
| 626 : |
|
|
factor 1-1.4, depending on machine), or by inserting flushing code |
| 627 : |
|
|
(e.g., @samp{IF_spTOS(sp[...] = spTOS);}) at the start and reloading |
| 628 : |
|
|
code (e.g., @samp{IF_spTOS(spTOS = sp[0])}) at the end of problematic C |
| 629 : |
|
|
code. Vmgen inserts a stack pointer update before the start of the |
| 630 : |
|
|
user-supplied C code, so the flushing code has to use an index that |
| 631 : |
|
|
corrects for that. In the future, this flushing may be done |
| 632 : |
|
|
automatically by mentioning a special string in the C code. |
| 633 : |
|
|
@c sometimes flushing and/or reloading unnecessary |
| 634 : |
|
|
|
| 635 : |
|
|
@item |
| 636 : |
|
|
The vmgen-erated code loads the stack items from stack-pointer-indexed |
| 637 : |
|
|
memory into variables before the user-supplied C code, and stores them |
| 638 : |
|
|
from variables to stack-pointer-indexed memory afterwards. If you do |
| 639 : |
|
|
any writes to the stack through its stack pointer in your C code, it |
| 640 : |
|
|
will not affact the variables, and your write may be overwritten by the |
| 641 : |
|
|
stores after the C code. Similarly, a read from a stack using a stack |
| 642 : |
|
|
pointer will not reflect computations of stack items in the same VM |
| 643 : |
|
|
instruction. |
| 644 : |
|
|
|
| 645 : |
|
|
@item |
| 646 : |
|
|
Superinstructions keep stack items in variables across the whole |
| 647 : |
|
|
superinstruction. So you should not include VM instructions, that |
| 648 : |
|
|
access a stack or stack pointer, as components of superinstructions. |
| 649 : |
|
|
|
| 650 : |
|
|
@end itemize |
| 651 : |
|
|
|
| 652 : |
|
|
You should access the instruction pointer only through its special |
| 653 : |
|
|
macros (@samp{IP}, @samp{SET_IP}, @samp{IPTOS}); this ensure that these |
| 654 : |
|
|
macros can be implemented in several ways for best performance. |
| 655 : |
|
|
@samp{IP} points to the next instruction, and @samp{IPTOS} is its |
| 656 : |
|
|
contents. |
| 657 : |
|
|
|
| 658 : |
|
|
|
| 659 : |
anton
|
1.3
|
@section Superinstructions |
| 660 : |
anton
|
1.5
|
|
| 661 : |
anton
|
1.8
|
Note: don't invest too much work in (static) superinstructions; a future |
| 662 : |
|
|
version of vmgen will support dynamic superinstructions (see Ian |
| 663 : |
|
|
Piumarta and Fabio Riccardi, @cite{Optimizing Direct Threaded Code by |
| 664 : |
|
|
Selective Inlining}, PLDI'98), and static superinstructions have much |
| 665 : |
|
|
less benefit in that context. |
| 666 : |
|
|
|
| 667 : |
anton
|
1.5
|
Here is an example of a superinstruction definition: |
| 668 : |
|
|
|
| 669 : |
|
|
@example |
| 670 : |
|
|
lit_sub = lit sub |
| 671 : |
|
|
@end example |
| 672 : |
|
|
|
| 673 : |
|
|
@code{lit_sub} is the name of the superinstruction, and @code{lit} and |
| 674 : |
|
|
@code{sub} are its components. This superinstruction performs the same |
| 675 : |
|
|
action as the sequence @code{lit} and @code{sub}. It is generated |
| 676 : |
|
|
automatically by the VM code generation functions whenever that sequence |
| 677 : |
|
|
occurs, so you only need to add this definition if you want to use this |
| 678 : |
|
|
superinstruction (and even that can be partially automatized, |
| 679 : |
|
|
@pxref{...}). |
| 680 : |
|
|
|
| 681 : |
|
|
Vmgen requires that the component instructions are simple instructions |
| 682 : |
|
|
defined before superinstructions using the components. Currently, vmgen |
| 683 : |
|
|
also requires that all the subsequences at the start of a |
| 684 : |
|
|
superinstruction (prefixes) must be defined as superinstruction before |
| 685 : |
|
|
the superinstruction. I.e., if you want to define a superinstruction |
| 686 : |
|
|
|
| 687 : |
|
|
@example |
| 688 : |
|
|
sumof5 = add add add add |
| 689 : |
|
|
@end example |
| 690 : |
|
|
|
| 691 : |
|
|
you first have to define |
| 692 : |
|
|
|
| 693 : |
|
|
@example |
| 694 : |
|
|
add ( n1 n2 -- n ) |
| 695 : |
|
|
n = n1+n2; |
| 696 : |
|
|
|
| 697 : |
|
|
sumof3 = add add |
| 698 : |
|
|
sumof4 = add add add |
| 699 : |
|
|
@end example |
| 700 : |
|
|
|
| 701 : |
|
|
Here, @code{sumof4} is the longest prefix of @code{sumof5}, and @code{sumof3} |
| 702 : |
|
|
is the longest prefix of @code{sumof4}. |
| 703 : |
|
|
|
| 704 : |
|
|
Note that vmgen assumes that only the code it generates accesses stack |
| 705 : |
|
|
pointers, the instruction pointer, and various stack items, and it |
| 706 : |
|
|
performs optimizations based on this assumption. Therefore, VM |
| 707 : |
|
|
instructions that change the instruction pointer should only be used as |
| 708 : |
|
|
last component; a VM instruction that accesses a stack pointer should |
| 709 : |
|
|
not be used as component at all. Vmgen does not check these |
| 710 : |
|
|
restrictions, they just result in bugs in your interpreter. |
| 711 : |
|
|
|
| 712 : |
|
|
@c ******************************************************************** |
| 713 : |
|
|
@chapter Using the generated code |
| 714 : |
|
|
|
| 715 : |
|
|
The easiest way to create a working VM interpreter with vmgen is |
| 716 : |
|
|
probably to start with one of the examples, and modify it for your |
| 717 : |
|
|
purposes. This chapter is just the reference manual for the macros |
| 718 : |
|
|
etc. used by the generated code, and the other context expected by the |
| 719 : |
|
|
generated code, and what you can do with the various generated files. |
| 720 : |
|
|
|
| 721 : |
anton
|
1.6
|
|
| 722 : |
anton
|
1.5
|
@section VM engine |
| 723 : |
|
|
|
| 724 : |
|
|
The VM engine is the VM interpreter that executes the VM code. It is |
| 725 : |
|
|
essential for an interpretive system. |
| 726 : |
|
|
|
| 727 : |
anton
|
1.6
|
Vmgen supports two methods of VM instruction dispatch: @emph{threaded |
| 728 : |
|
|
code} (fast, but gcc-specific), and @emph{switch dispatch} (slow, but |
| 729 : |
|
|
portable across C compilers); you can use conditional compilation |
| 730 : |
|
|
(@samp{defined(__GNUC__)}) to choose between these methods, and our |
| 731 : |
|
|
example does so. |
| 732 : |
|
|
|
| 733 : |
|
|
For both methods, the VM engine is contained in a C-level function. |
| 734 : |
|
|
Vmgen generates most of the contents of the function for you |
| 735 : |
|
|
(@file{@var{name}-vm.i}), but you have to define this function, and |
| 736 : |
|
|
macros and variables used in the engine, and initialize the variables. |
| 737 : |
|
|
In our example the engine function also includes |
| 738 : |
|
|
@file{@var{name}-labels.i} (@pxref{VM instruction table}). |
| 739 : |
|
|
|
| 740 : |
|
|
The following macros and variables are used in @file{@var{name}-vm.i}: |
| 741 : |
anton
|
1.5
|
|
| 742 : |
|
|
@table @code |
| 743 : |
|
|
|
| 744 : |
|
|
@item LABEL(@var{inst_name}) |
| 745 : |
|
|
This is used just before each VM instruction to provide a jump or |
| 746 : |
|
|
@code{switch} label (the @samp{:} is provided by vmgen). For switch |
| 747 : |
|
|
dispatch this should expand to @samp{case @var{label}}; for |
| 748 : |
|
|
threaded-code dispatch this should just expand to @samp{case |
| 749 : |
|
|
@var{label}}. In either case @var{label} is usually the @var{inst_name} |
| 750 : |
|
|
with some prefix or suffix to avoid naming conflicts. |
| 751 : |
|
|
|
| 752 : |
|
|
@item NAME(@var{inst_name_string}) |
| 753 : |
|
|
Called on entering a VM instruction with a string containing the name of |
| 754 : |
|
|
the VM instruction as parameter. In normal execution this should be a |
| 755 : |
|
|
noop, but for tracing this usually prints the name, and possibly other |
| 756 : |
|
|
information (several VM registers in our example). |
| 757 : |
|
|
|
| 758 : |
|
|
@item DEF_CA |
| 759 : |
|
|
Usually empty. Called just inside a new scope at the start of a VM |
| 760 : |
|
|
instruction. Can be used to define variables that should be visible |
| 761 : |
|
|
during every VM instruction. If you define this macro as non-empty, you |
| 762 : |
|
|
have to provide the finishing @samp{;} in the macro. |
| 763 : |
|
|
|
| 764 : |
|
|
@item NEXT_P0 NEXT_P1 NEXT_P2 |
| 765 : |
|
|
The three parts of instruction dispatch. They can be defined in |
| 766 : |
|
|
different ways for best performance on various processors (see |
| 767 : |
|
|
@file{engine.c} in the example or @file{engine/threaded.h} in Gforth). |
| 768 : |
|
|
@samp{NEXT_P0} is invoked right at the start of the VM isntruction (but |
| 769 : |
|
|
after @samp{DEF_CA}), @samp{NEXT_P1} right after the user-supplied C |
| 770 : |
|
|
code, and @samp{NEXT_P2} at the end. The actual jump has to be |
| 771 : |
|
|
performed by @samp{NEXT_P2}. |
| 772 : |
|
|
|
| 773 : |
|
|
The simplest variant is if @samp{NEXT_P2} does everything and the other |
| 774 : |
|
|
macros do nothing. Then also related macros like @samp{IP}, |
| 775 : |
|
|
@samp{SET_IP}, @samp{IP}, @samp{INC_IP} and @samp{IPTOS} are very |
| 776 : |
|
|
straightforward to define. For switch dispatch this code consists just |
| 777 : |
|
|
of a jump to the dispatch code (@samp{goto next_inst;} in our example; |
| 778 : |
|
|
for direct threaded code it consists of something like |
| 779 : |
|
|
@samp{({cfa=*ip++; goto *cfa;})}. |
| 780 : |
|
|
|
| 781 : |
|
|
Pulling code (usually the @samp{cfa=*ip;}) up into @samp{NEXT_P1} |
| 782 : |
|
|
usually does not cause problems, but pulling things up into |
| 783 : |
|
|
@samp{NEXT_P0} usually requires changing the other macros (and, at least |
| 784 : |
|
|
for Gforth on Alpha, it does not buy much, because the compiler often |
| 785 : |
|
|
manages to schedule the relevant stuff up by itself). An even more |
| 786 : |
|
|
extreme variant is to pull code up even further, into, e.g., NEXT_P1 of |
| 787 : |
|
|
the previous VM instruction (prefetching, useful on PowerPCs). |
| 788 : |
|
|
|
| 789 : |
|
|
@item INC_IP(@var{n}) |
| 790 : |
anton
|
1.8
|
This increments @code{IP} by @var{n}. |
| 791 : |
|
|
|
| 792 : |
|
|
@item SET_IP(@var{target}) |
| 793 : |
|
|
This sets @code{IP} to @var{target}. |
| 794 : |
anton
|
1.5
|
|
| 795 : |
|
|
@item vm_@var{A}2@var{B}(a,b) |
| 796 : |
|
|
Type casting macro that assigns @samp{a} (of type @var{A}) to @samp{b} |
| 797 : |
|
|
(of type @var{B}). This is mainly used for getting stack items into |
| 798 : |
|
|
variables and back. So you need to define macros for every combination |
| 799 : |
|
|
of stack basic type (@code{Cell} in our example) and type-prefix types |
| 800 : |
|
|
used with that stack (in both directions). For the type-prefix type, |
| 801 : |
|
|
you use the type-prefix (not the C type string) as type name (e.g., |
| 802 : |
|
|
@samp{vm_Cell2i}, not @samp{vm_Cell2Cell}). In addition, you have to |
| 803 : |
|
|
define a vm_@var{X}2@var{X} macro for the stack basic type (used in |
| 804 : |
|
|
superinstructions). |
| 805 : |
|
|
|
| 806 : |
|
|
The stack basic type for the predefined @samp{inst-stream} is |
| 807 : |
|
|
@samp{Cell}. If you want a stack with the same item size, making its |
| 808 : |
|
|
basic type @samp{Cell} usually reduces the number of macros you have to |
| 809 : |
|
|
define. |
| 810 : |
|
|
|
| 811 : |
|
|
Here our examples differ a lot: @file{vmgen-ex} uses casts in these |
| 812 : |
|
|
macros, whereas @file{vmgen-ex2} uses union-field selection (or |
| 813 : |
|
|
assignment to union fields). |
| 814 : |
|
|
|
| 815 : |
|
|
@item vm_two@var{A}2@var{B}(a1,a2,b) |
| 816 : |
|
|
@item vm_@var{B}2two@var{A}(b,a1,a2) |
| 817 : |
|
|
Conversions between two stack items (@code{a1}, @code{a2}) and a |
| 818 : |
|
|
variable @code{b} of a type that takes two stack items. This does not |
| 819 : |
|
|
occur in our small examples, but you can look at Gforth for examples. |
| 820 : |
|
|
|
| 821 : |
|
|
@item @var{stackpointer} |
| 822 : |
|
|
For each stack used, the stackpointer name given in the stack |
| 823 : |
|
|
declaration is used. For a regular stack this must be an l-expression; |
| 824 : |
|
|
typically it is a variable declared as a pointer to the stack's basic |
| 825 : |
|
|
type. For @samp{inst-stream}, the name is @samp{IP}, and it can be a |
| 826 : |
|
|
plain r-value; typically it is a macro that abstracts away the |
| 827 : |
|
|
differences between the various implementations of NEXT_P*. |
| 828 : |
|
|
|
| 829 : |
|
|
@item @var{stackpointer}TOS |
| 830 : |
|
|
The top-of-stack for the stack pointed to by @var{stackpointer}. If you |
| 831 : |
|
|
are using top-of-stack caching for that stack, this should be defined as |
| 832 : |
|
|
variable; if you are not using top-of-stack caching for that stack, this |
| 833 : |
|
|
should be a macro expanding to @samp{@var{stackpointer}[0]}. The stack |
| 834 : |
|
|
pointer for the predefined @samp{inst-stream} is called @samp{IP}, so |
| 835 : |
|
|
the top-of-stack is called @samp{IPTOS}. |
| 836 : |
|
|
|
| 837 : |
|
|
@item IF_@var{stackpointer}TOS(@var{expr}) |
| 838 : |
|
|
Macro for executing @var{expr}, if top-of-stack caching is used for the |
| 839 : |
|
|
@var{stackpointer} stack. I.e., this should do @var{expr} if there is |
| 840 : |
|
|
top-of-stack caching for @var{stackpointer}; otherwise it should do |
| 841 : |
|
|
nothing. |
| 842 : |
|
|
|
| 843 : |
anton
|
1.8
|
@item SUPER_END |
| 844 : |
|
|
This is used by the VM profiler (@pxref{VM profiler}); it should not do |
| 845 : |
|
|
anything in normal operation, and call @code{vm_count_block(IP)} for |
| 846 : |
|
|
profiling. |
| 847 : |
|
|
|
| 848 : |
|
|
@item SUPER_CONTINUE |
| 849 : |
|
|
This is just a hint to vmgen and does nothing at the C level. |
| 850 : |
|
|
|
| 851 : |
anton
|
1.5
|
@item VM_DEBUG |
| 852 : |
|
|
If this is defined, the tracing code will be compiled in (slower |
| 853 : |
|
|
interpretation, but better debugging). Our example compiles two |
| 854 : |
|
|
versions of the engine, a fast-running one that cannot trace, and one |
| 855 : |
|
|
with potential tracing and profiling. |
| 856 : |
|
|
|
| 857 : |
|
|
@item vm_debug |
| 858 : |
|
|
Needed only if @samp{VM_DEBUG} is defined. If this variable contains |
| 859 : |
|
|
true, the VM instructions produce trace output. It can be turned on or |
| 860 : |
|
|
off at any time. |
| 861 : |
|
|
|
| 862 : |
|
|
@item vm_out |
| 863 : |
|
|
Needed only if @samp{VM_DEBUG} is defined. Specifies the file on which |
| 864 : |
|
|
to print the trace output (type @samp{FILE *}). |
| 865 : |
|
|
|
| 866 : |
|
|
@item printarg_@var{type}(@var{value}) |
| 867 : |
|
|
Needed only if @samp{VM_DEBUG} is defined. Macro or function for |
| 868 : |
|
|
printing @var{value} in a way appropriate for the @var{type}. This is |
| 869 : |
|
|
used for printing the values of stack items during tracing. @var{Type} |
| 870 : |
|
|
is normally the type prefix specified in a @code{type-prefix} definition |
| 871 : |
|
|
(e.g., @samp{printarg_i}); in superinstructions it is currently the |
| 872 : |
|
|
basic type of the stack. |
| 873 : |
|
|
|
| 874 : |
|
|
@end table |
| 875 : |
|
|
|
| 876 : |
anton
|
1.6
|
|
| 877 : |
|
|
@section{VM instruction table} |
| 878 : |
|
|
|
| 879 : |
|
|
For threaded code we also need to produce a table containing the labels |
| 880 : |
|
|
of all VM instructions. This is needed for VM code generation |
| 881 : |
|
|
(@pxref{VM code generation}), and it has to be done in the engine |
| 882 : |
|
|
function, because the labels are not visible outside. It then has to be |
| 883 : |
|
|
passed outside the function (and assigned to @samp{vm_prim}), to be used |
| 884 : |
|
|
by the VM code generation functions. |
| 885 : |
|
|
|
| 886 : |
|
|
This means that the engine function has to be called first to produce |
| 887 : |
|
|
the VM instruction table, and later, after generating VM code, it has to |
| 888 : |
|
|
be called again to execute the generated VM code (yes, this is ugly). |
| 889 : |
|
|
In our example program, these two modes of calling the engine function |
| 890 : |
|
|
are differentiated by the value of the parameter ip0 (if it equals 0, |
| 891 : |
|
|
then the table is passed out, otherwise the VM code is executed); in our |
| 892 : |
|
|
example, we pass the table out by assigning it to @samp{vm_prim} and |
| 893 : |
|
|
returning from @samp{engine}. |
| 894 : |
|
|
|
| 895 : |
|
|
In our example, we also build such a table for switch dispatch; this is |
| 896 : |
|
|
mainly done for uniformity. |
| 897 : |
|
|
|
| 898 : |
|
|
For switch dispatch, we also need to define the VM instruction opcodes |
| 899 : |
|
|
used as case labels in an @code{enum}. |
| 900 : |
|
|
|
| 901 : |
|
|
For both purposes (VM instruction table, and enum), the file |
| 902 : |
|
|
@file{@var{name}-labels.i} is generated by vmgen. You have to define |
| 903 : |
|
|
the following macro used in this file: |
| 904 : |
anton
|
1.5
|
|
| 905 : |
|
|
@table @samp |
| 906 : |
|
|
|
| 907 : |
|
|
@item INST_ADDR(@var{inst_name}) |
| 908 : |
|
|
For switch dispatch, this is just the name of the switch label (the same |
| 909 : |
anton
|
1.6
|
name as used in @samp{LABEL(@var{inst_name})}), for both uses of |
| 910 : |
|
|
@file{@var{name}-labels.i}. For threaded-code dispatch, this is the |
| 911 : |
|
|
address of the label defined in @samp{LABEL(@var{inst_name})}); the |
| 912 : |
|
|
address is taken with @samp{&&} (@pxref{labels-as-values}). |
| 913 : |
anton
|
1.5
|
|
| 914 : |
|
|
@end table |
| 915 : |
|
|
|
| 916 : |
|
|
|
| 917 : |
anton
|
1.6
|
@section VM code generation |
| 918 : |
|
|
|
| 919 : |
|
|
Vmgen generates VM code generation functions in @file{@var{name}-gen.i} |
| 920 : |
|
|
that the front end can call to generate VM code. This is essential for |
| 921 : |
|
|
an interpretive system. |
| 922 : |
|
|
|
| 923 : |
|
|
For a VM instruction @samp{x ( #a b #c -- d )}, vmgen generates a |
| 924 : |
|
|
function with the prototype |
| 925 : |
|
|
|
| 926 : |
|
|
@example |
| 927 : |
|
|
void gen_x(Inst **ctp, a_type a, c_type c) |
| 928 : |
|
|
@end example |
| 929 : |
|
|
|
| 930 : |
|
|
The @code{ctp} argument points to a pointer to the next instruction. |
| 931 : |
|
|
@code{*ctp} is increased by the generation functions; i.e., you should |
| 932 : |
|
|
allocate memory for the code to be generated beforehand, and start with |
| 933 : |
|
|
*ctp set at the start of this memory area. Before running out of |
| 934 : |
|
|
memory, allocate a new area, and generate a VM-level jump to the new |
| 935 : |
|
|
area (this is not implemented in our examples). |
| 936 : |
|
|
|
| 937 : |
|
|
The other arguments correspond to the immediate arguments of the VM |
| 938 : |
|
|
instruction (with their appropriate types as defined in the |
| 939 : |
|
|
@code{type_prefix} declaration. |
| 940 : |
|
|
|
| 941 : |
|
|
The following types, variables, and functions are used in |
| 942 : |
|
|
@file{@var{name}-gen.i}: |
| 943 : |
|
|
|
| 944 : |
|
|
@table @samp |
| 945 : |
|
|
|
| 946 : |
|
|
@item Inst |
| 947 : |
|
|
The type of the VM instruction; if you use threaded code, this is |
| 948 : |
|
|
@code{void *}; for switch dispatch this is an integer type. |
| 949 : |
|
|
|
| 950 : |
|
|
@item vm_prim |
| 951 : |
|
|
The VM instruction table (type: @code{Inst *}, @pxref{VM instruction table}). |
| 952 : |
|
|
|
| 953 : |
|
|
@item gen_inst(Inst **ctp, Inst i) |
| 954 : |
|
|
This function compiles the instruction @code{i}. Take a look at it in |
| 955 : |
|
|
@file{vmgen-ex/peephole.c}. It is trivial when you don't want to use |
| 956 : |
|
|
superinstructions (just the last two lines of the example function), and |
| 957 : |
|
|
slightly more complicated in the example due to its ability to use |
| 958 : |
|
|
superinstructions (@pxref{Peephole optimization}). |
| 959 : |
|
|
|
| 960 : |
|
|
@item genarg_@var{type_prefix}(Inst **ctp, @var{type} @var{type_prefix}) |
| 961 : |
|
|
This compiles an immediate argument of @var{type} (as defined in a |
| 962 : |
|
|
@code{type-prefix} definition). These functions are trivial to define |
| 963 : |
|
|
(see @file{vmgen-ex/support.c}). You need one of these functions for |
| 964 : |
|
|
every type that you use as immediate argument. |
| 965 : |
|
|
|
| 966 : |
|
|
@end table |
| 967 : |
|
|
|
| 968 : |
|
|
In addition to using these functions to generate code, you should call |
| 969 : |
|
|
@code{BB_BOUNDARY} at every basic block entry point if you ever want to |
| 970 : |
|
|
use superinstructions (or if you want to use the profiling supported by |
| 971 : |
|
|
vmgen; however, this is mainly useful for selecting superinstructions). |
| 972 : |
|
|
If you use @code{BB_BOUNDARY}, you should also define it (take a look at |
| 973 : |
|
|
its definition in @file{vmgen-ex/mini.y}). |
| 974 : |
|
|
|
| 975 : |
|
|
You do not need to call @code{BB_BOUNDARY} after branches, because you |
| 976 : |
|
|
will not define superinstructions that contain branches in the middle |
| 977 : |
|
|
(and if you did, and it would work, there would be no reason to end the |
| 978 : |
|
|
superinstruction at the branch), and because the branches announce |
| 979 : |
|
|
themselves to the profiler. |
| 980 : |
|
|
|
| 981 : |
|
|
|
| 982 : |
|
|
@section Peephole optimization |
| 983 : |
|
|
|
| 984 : |
|
|
You need peephole optimization only if you want to use |
| 985 : |
|
|
superinstructions. But having the code for it does not hurt much if you |
| 986 : |
|
|
do not use superinstructions. |
| 987 : |
|
|
|
| 988 : |
|
|
A simple greedy peephole optimization algorithm is used for |
| 989 : |
|
|
superinstruction selection: every time @code{gen_inst} compiles a VM |
| 990 : |
|
|
instruction, it looks if it can combine it with the last VM instruction |
| 991 : |
|
|
(which may also be a superinstruction resulting from a previous peephole |
| 992 : |
|
|
optimization); if so, it changes the last instruction to the combined |
| 993 : |
|
|
instruction instead of laying down @code{i} at the current @samp{*ctp}. |
| 994 : |
|
|
|
| 995 : |
|
|
The code for peephole optimization is in @file{vmgen-ex/peephole.c}. |
| 996 : |
|
|
You can use this file almost verbatim. Vmgen generates |
| 997 : |
|
|
@file{@var{file}-peephole.i} which contains data for the peephoile |
| 998 : |
|
|
optimizer. |
| 999 : |
|
|
|
| 1000 : |
|
|
You have to call @samp{init_peeptable()} after initializing |
| 1001 : |
|
|
@samp{vm_prim}, and before compiling any VM code to initialize data |
| 1002 : |
|
|
structures for peephole optimization. After that, compiling with the VM |
| 1003 : |
|
|
code generation functions will automatically combine VM instructions |
| 1004 : |
|
|
into superinstructions. Since you do not want to combine instructions |
| 1005 : |
|
|
across VM branch targets (otherwise there will not be a proper VM |
| 1006 : |
|
|
instruction to branch to), you have to call @code{BB_BOUNDARY} |
| 1007 : |
|
|
(@pxref{VM code generation}) at branch targets. |
| 1008 : |
|
|
|
| 1009 : |
|
|
|
| 1010 : |
|
|
@section VM disassembler |
| 1011 : |
|
|
|
| 1012 : |
|
|
A VM code disassembler is optional for an interpretive system, but |
| 1013 : |
|
|
highly recommended during its development and maintenance, because it is |
| 1014 : |
|
|
very useful for detecting bugs in the front end (and for distinguishing |
| 1015 : |
|
|
them from VM interpreter bugs). |
| 1016 : |
|
|
|
| 1017 : |
|
|
Vmgen supports VM code disassembling by generating |
| 1018 : |
|
|
@file{@var{file}-disasm.i}. This code has to be wrapped into a |
| 1019 : |
|
|
function, as is done in @file{vmgen-ex/disasm.i}. You can use this file |
| 1020 : |
|
|
almost verbatim. In addition to @samp{vm_@var{A}2@var{B}(a,b)}, |
| 1021 : |
|
|
@samp{vm_out}, @samp{printarg_@var{type}(@var{value})}, which are |
| 1022 : |
|
|
explained above, the following macros and variables are used in |
| 1023 : |
|
|
@file{@var{file}-disasm.i} (and you have to define them): |
| 1024 : |
|
|
|
| 1025 : |
|
|
@table @samp |
| 1026 : |
|
|
|
| 1027 : |
|
|
@item ip |
| 1028 : |
|
|
This variable points to the opcode of the current VM instruction. |
| 1029 : |
|
|
|
| 1030 : |
|
|
@item IP IPTOS |
| 1031 : |
|
|
@samp{IPTOS} is the first argument of the current VM instruction, and |
| 1032 : |
|
|
@samp{IP} points to it; this is just as in the engine, but here |
| 1033 : |
|
|
@samp{ip} points to the opcode of the VM instruction (in contrast to the |
| 1034 : |
|
|
engine, where @samp{ip} points to the next cell, or even one further). |
| 1035 : |
|
|
|
| 1036 : |
|
|
@item VM_IS_INST(Inst i, int n) |
| 1037 : |
|
|
Tests if the opcode @samp{i} is the same as the @samp{n}th entry in the |
| 1038 : |
|
|
VM instruction table. |
| 1039 : |
|
|
|
| 1040 : |
|
|
@end table |
| 1041 : |
|
|
|
| 1042 : |
|
|
|
| 1043 : |
anton
|
1.7
|
@section VM profiler |
| 1044 : |
|
|
|
| 1045 : |
|
|
The VM profiler is designed for getting execution and occurence counts |
| 1046 : |
|
|
for VM instruction sequences, and these counts can then be used for |
| 1047 : |
|
|
selecting sequences as superinstructions. The VM profiler is probably |
| 1048 : |
anton
|
1.8
|
not useful as profiling tool for the interpretive system. I.e., the VM |
| 1049 : |
anton
|
1.7
|
profiler is useful for the developers, but not the users of the |
| 1050 : |
anton
|
1.8
|
interpretive system. |
| 1051 : |
anton
|
1.7
|
|
| 1052 : |
anton
|
1.8
|
The output of the profiler is: for each basic block (executed at least |
| 1053 : |
|
|
once), it produces the dynamic execution count of that basic block and |
| 1054 : |
|
|
all its subsequences; e.g., |
| 1055 : |
anton
|
1.7
|
|
| 1056 : |
anton
|
1.8
|
@example |
| 1057 : |
|
|
9227465 lit storelocal |
| 1058 : |
|
|
9227465 storelocal branch |
| 1059 : |
|
|
9227465 lit storelocal branch |
| 1060 : |
|
|
@end example |
| 1061 : |
anton
|
1.7
|
|
| 1062 : |
anton
|
1.8
|
I.e., a basic block consisting of @samp{lit storelocal branch} is |
| 1063 : |
|
|
executed 9227465 times. |
| 1064 : |
anton
|
1.6
|
|
| 1065 : |
anton
|
1.8
|
This output can be combined in various ways. E.g., |
| 1066 : |
|
|
@file{vmgen/stat.awk} adds up the occurences of a given sequence wrt |
| 1067 : |
|
|
dynamic execution, static occurence, and per-program occurence. E.g., |
| 1068 : |
anton
|
1.3
|
|
| 1069 : |
anton
|
1.8
|
@example |
| 1070 : |
|
|
2 16 36910041 loadlocal lit |
| 1071 : |
|
|
@end example |
| 1072 : |
anton
|
1.2
|
|
| 1073 : |
anton
|
1.8
|
indicates that the sequence @samp{loadlocal lit} occurs in 2 programs, |
| 1074 : |
|
|
in 16 places, and has been executed 36910041 times. Now you can select |
| 1075 : |
|
|
superinstructions in any way you like (note that compile time and space |
| 1076 : |
|
|
typically limit the number of superinstructions to 100--1000). After |
| 1077 : |
|
|
you have done that, @file{vmgen/seq2rule.awk} turns lines of the form |
| 1078 : |
|
|
above into rules for inclusion in a vmgen input file. Note that this |
| 1079 : |
|
|
script does not ensure that all prefixes are defined, so you have to do |
| 1080 : |
|
|
that in other ways. So, an overall script for turning profiles into |
| 1081 : |
|
|
superinstructions can look like this: |
| 1082 : |
anton
|
1.2
|
|
| 1083 : |
anton
|
1.8
|
@example |
| 1084 : |
|
|
awk -f stat.awk fib.prof test.prof| |
| 1085 : |
|
|
awk '$3>=10000'| #select sequences |
| 1086 : |
|
|
fgrep -v -f peephole-blacklist| #eliminate wrong instructions |
| 1087 : |
|
|
awk -f seq2rule.awk| #turn into superinstructions |
| 1088 : |
|
|
sort -k 3 >mini-super.vmg #sort sequences |
| 1089 : |
|
|
@end example |
| 1090 : |
anton
|
1.2
|
|
| 1091 : |
anton
|
1.8
|
Here the dynamic count is used for selecting sequences (preliminary |
| 1092 : |
|
|
results indicate that the static count gives better results, though); |
| 1093 : |
|
|
the third line eliminats sequences containing instructions that must not |
| 1094 : |
|
|
occur in a superinstruction, because they access a stack directly. The |
| 1095 : |
|
|
dynamic count selection ensures that all subsequences (including |
| 1096 : |
|
|
prefixes) of longer sequences occur (because subsequences have at least |
| 1097 : |
|
|
the same count as the longer sequences); the sort in the last line |
| 1098 : |
|
|
ensures that longer superinstructions occur after their prefixes. |
| 1099 : |
|
|
|
| 1100 : |
|
|
But before using it, you have to have the profiler. Vmgen supports its |
| 1101 : |
|
|
creation by generating @file{@var{file}-profile.i}; you also need the |
| 1102 : |
|
|
wrapper file @file{vmgen-ex/profile.c} that you can use almost verbatim. |
| 1103 : |
|
|
|
| 1104 : |
|
|
The profiler works by recording the targets of all VM control flow |
| 1105 : |
|
|
changes (through @code{SUPER_END} during execution, and through |
| 1106 : |
|
|
@code{BB_BOUNDARY} in the front end), and counting (through |
| 1107 : |
|
|
@code{SUPER_END}) how often they were targeted. After the program run, |
| 1108 : |
|
|
the numbers are corrected such that each VM basic block has the correct |
| 1109 : |
|
|
count (originally entering a block without executing a branch does not |
| 1110 : |
|
|
increase the count), then the subsequences of all basic blocks are |
| 1111 : |
|
|
printed. To get all this, you just have to define @code{SUPER_END} (and |
| 1112 : |
|
|
@code{BB_BOUNDARY}) appropriately, and call @code{vm_print_profile(FILE |
| 1113 : |
|
|
*file)} when you want to output the profile on @code{file}. |
| 1114 : |
|
|
|
| 1115 : |
|
|
The @file{@var{file}-profile.i} is simular to the disassembler file, and |
| 1116 : |
|
|
it uses variables and functions defined in @file{vmgen-ex/profile.c}, |
| 1117 : |
|
|
plus @code{VM_IS_INST} already defined for the VM disassembler |
| 1118 : |
|
|
(@pxref{VM disassembler}). |
| 1119 : |
|
|
|
| 1120 : |
|
|
@chapter Changes |
| 1121 : |
|
|
|
| 1122 : |
|
|
Users of the gforth-0.5.9-20010501 version of vmgen need to change |
| 1123 : |
|
|
several things in their source code to use the current version. I |
| 1124 : |
|
|
recommend keeping the gforth-0.5.9-20010501 version until you have |
| 1125 : |
|
|
completed the change (note that you can have several versions of Gforth |
| 1126 : |
|
|
installed at the same time). I hope to avoid such incompatible changes |
| 1127 : |
|
|
in the future. |
| 1128 : |
anton
|
1.2
|
|
| 1129 : |
anton
|
1.8
|
The required changes are: |
| 1130 : |
|
|
|
| 1131 : |
|
|
@table @code |
| 1132 : |
anton
|
1.2
|
|
| 1133 : |
anton
|
1.8
|
@item vm_@var{A}2@var{B} |
| 1134 : |
|
|
now takes two arguments. |
| 1135 : |
|
|
|
| 1136 : |
|
|
@item vm_two@var{A}2@var{B}(b,a1,a2); |
| 1137 : |
|
|
changed to vm_two@var{A}2@var{B}(a1,a2,b) (note the absence of the @samp{;}). |
| 1138 : |
|
|
|
| 1139 : |
|
|
@end table |
| 1140 : |
anton
|
1.2
|
|
| 1141 : |
anton
|
1.8
|
Also some new macros have to be defined, e.g., @code{INST_ADDR}, and |
| 1142 : |
|
|
@code{LABEL}; some macros have to be defined in new contexts, e.g., |
| 1143 : |
|
|
@code{VM_IS_INST} is now also needed in the disassembler. |
| 1144 : |
anton
|
1.4
|
|
| 1145 : |
anton
|
1.8
|
@chapter Contact |
| 1146 : |
anton
|
1.4
|
|