Index of /forth/peep

[ICO]NameLast modifiedSizeDescription

[DIR]Parent Directory  -  
[   ]percent06-Mar-2019 09:39 1.4M 
[   ]percent.awk06-Mar-2019 09:39 178  
[   ]predict-itc16-Dec-1995 18:53 2.9K 
[   ]predict-dtc16-Dec-1995 18:52 2.7K 
[   ]predict-dtc.awk10-Dec-1995 11:29 339  
[   ]predict-itc.awk10-Dec-1995 11:26 328  
[   ]sorted26-Nov-1995 11:13 883K 
[   ]prims2x.peep26-Nov-1995 11:13 346K 
[   ]peep.awk26-Nov-1995 11:13 222  
[   ]gs.peep26-Nov-1995 11:13 405K 
[   ]cross.peep26-Nov-1995 11:13 354K 
[   ]combine.awk26-Nov-1995 11:13 244  

I have collected some data to evaluate the potential of combining primitives by the Forth compiler (called "pinhole optimization" by Wil Baden) for real-world code. I have then used the same data for checking how well the Alpha architecture's static branch target prediction may work for Forth interpreters.

These data are derived from dynamic traces of real applications running on Gforth, not from small benchmarks that might provide unrealistic results:

cross
is a Forth cross compiler written mainly by Bernd Paysan. length 5,779,740 primitives.
gs
is a data and control flow analysis program written by me. It uses locals a lot. length 3,540,682 primitives.
prims2x
is a filter that converts primitives specifications into C code. I have written most of it. length 5,822,924 primitives.

(Here, I use the term "primitive" for anything that performs a NEXT (or NEXT1); this includes routines like docol and dovar).

The files included here are:

README
this file
*.peep
contains the actual data about the frequency of sequences (up to length 6) in the trace for the application *.
peep.awk
generates the *.peep file from a trace.
combine.awk
combines several peep files into one file containing all the data
sorted
combined data, augmented with the total for a sequence (in the first field; the next thee are the counts for cross, gs and prims2x. respectively) sorted by the total.
predict-*
tables containing predictions for the next primitive and the prediction accuracy (derived from the data in sorted). For direct threaded code the next primitive must be a real primitive, not just routines like docol).
predict-*.awk
computes the predict-* table.

You can get all of them in tarred and gzipped form from http://mips.complang.tuwien.ac.at/forth/peep.tar.gz.

Not all of the dynamic sequences shown in the data can be combined into one primitive by a simple compiler. E.g., if they contain a primitive that changes the control flow (e.g., docol, ;s, branch, dodefer) on any but the last position, the sequence cannot be combined (easily). Moreover, when two sequences overlap, only one of them can be combined.

The following table shows how many distinct sequences of a given length appear in the applications and what portion of the dynamic sequences is covered by the top 20 (50,100,200) most frequently occuring sequences. Note that the number of dynamic sequences of length n is #dynprimitives-n+1.

length	distinct %top20	%top50	%top100	%top200	
1	 160	75.53	94.219	99.832	100
2	1278	31.1074	48.8859	65.3516	83.112
3	3321	16.7137	31.2978	44.3291	62.3914
4	5489	13.1306	25.8037	36.8787	53.2986
5	7423	11.8311	23.5362	34.1386	49.9976
6	9138	11.0131	22.015	32.3519	47.7475

Eliminating sequences, that cannot be combined because they contain a control flow change in the wrong place reduces the portion of the length-2 top-100 from 65% to 45%. I do not have an accurate result for the effect of overlapping sequences; however, for length-2 sequences the overlap can at most halve the number of combinable sequences.

So, the results are encouraging: Even if we just combine length-2 sequences, we can reduce the dynamic primitive count by at least 22% (probably significantly more) by adding just 100 new primitives that represent the most frequent combinations. If we also consider longer sequences, the effect should be even more pronounced. It also appears that most of the frequent sequences are in general use; they usually appear frequently in more than one trace.

Related work

Wil Baden implemented 'Pinhole optimization' in ThisForth. Rob Chapman does similar stuff in Timbre.

Peephole optimization itself goes back into the dark ages of computer science, but schuetz92 is the first reference I have that applies it to Forth intermediate code.

@InProceedings{schuetz92,
  author = 	"Udo Sch{\"u}tz",
  title = 	"{Optimierung von Fadencode}",
  booktitle = 	"{FORTH-Tagung 1992}",
  year = 	"1992",
  organization = 	"Forth Gesellscheft e.V."
}

Proebsting95 peephole optimizes already on the tree intermediate representation, enabling more combinations. The combinations are selected automatically and a custom interpreter is generated automatically for the application.

@InProceedings{proebsting95,
  author = 	 "Todd A. Proebsting",
  title = 	 "Optimizing an ANSI C Interpreter with Superoperators",
  crossref =	 "popl95",
  pages =	 "322--332"
}

@Proceedings{popl95,
  booktitle = 	 "Principles of Programming Languages (POPL '95)",
  title = 	 "Principles of Programming Languages (POPL '95)",
  year = 	 "1995",
  key =		 "POPL '95"
}
Anton Ertl