Annotation of gforth/prof-inline.fs, revision 1.9
1.1 anton 1: \ get some data on potential (partial) inlining
2:
1.9 ! anton 3: \ Copyright (C) 2004,2007 Free Software Foundation, Inc.
1.1 anton 4:
5: \ This file is part of Gforth.
6:
7: \ Gforth is free software; you can redistribute it and/or
8: \ modify it under the terms of the GNU General Public License
1.8 anton 9: \ as published by the Free Software Foundation, either version 3
1.1 anton 10: \ of the License, or (at your option) any later version.
11:
12: \ This program is distributed in the hope that it will be useful,
13: \ but WITHOUT ANY WARRANTY; without even the implied warranty of
14: \ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
15: \ GNU General Public License for more details.
16:
17: \ You should have received a copy of the GNU General Public License
1.8 anton 18: \ along with this program. If not, see http://www.gnu.org/licenses/.
1.1 anton 19:
20:
21: \ relies on some Gforth internals
22:
23: \ !! assumption: each file is included only once; otherwise you get
24: \ the counts for just one of the instances of the file. This can be
25: \ fixed by making sure that every source position occurs only once as
26: \ a profile point.
27:
28: true constant count-calls? \ do some profiling of colon definitions etc.
29:
30: \ for true COUNT-CALLS?:
31:
32: \ What data do I need for evaluating the effectiveness of (partial) inlining?
33:
34: \ static and dynamic counts of everything:
35:
36: \ original BB length (histogram and average)
37: \ BB length with partial inlining (histogram and average)
38: \ since we cannot partially inline library calls, we use a parameter
39: \ that represents the amount of partial inlining we can expect there.
40: \ number of tail calls (original and after partial inlining)
41: \ number of calls (original and after partial inlining)
42: \ reason for BB end: call, return, execute, branch
43:
44: \ how many static calls are there to a word? How many of the dynamic
45: \ calls call just a single word?
46:
1.2 anton 47: \ how much does inlining called-once words help?
48: \ how much does inlining words without control flow help?
49: \ how much does partial inlining help?
50: \ what's the overlap?
51: \ optimizing return-to-returns (tail calls), return-to-calls, call-to-calls
52:
1.1 anton 53: struct
1.3 anton 54: cell% field list-next
1.2 anton 55: end-struct list%
56:
57: list%
1.7 anton 58: cell% 2* field profile-count \ how often this profile point is performed
1.1 anton 59: cell% 2* field profile-sourcepos
1.6 anton 60: cell% field profile-char \ character position in line
61: cell% field profile-bblen \ number of primitives in BB
1.7 anton 62: cell% field profile-bblenpi \ bblen after partial inlining
63: cell% field profile-callee-postlude \ 0 or (for calls) callee postlude len
64: cell% field profile-tailof \ 0 or (for tail bbs) pointer to coldef bb
1.6 anton 65: cell% field profile-colondef? \ is this a colon definition start
66: cell% field profile-calls \ static calls to the colon def (calls%)
67: cell% field profile-straight-line \ may contain calls, but no other CF
68: cell% field profile-calls-from \ static calls in the colon def
1.7 anton 69: cell% field profile-exits \ number of exits in this colon def
70: cell% 2* field profile-execs \ number of EXECUTEs etc. of this colon def
71: cell% field profile-prelude \ first BB-len of colon def (incl. callee)
72: cell% field profile-postlude \ last BB-len of colon def (incl. callee)
73: end-struct profile% \ profile point
1.1 anton 74:
1.2 anton 75: list%
1.3 anton 76: cell% field calls-call \ ptr to profile point of bb containing the call
1.2 anton 77: end-struct calls%
78:
1.1 anton 79: variable profile-points \ linked list of profile%
80: 0 profile-points !
81: variable next-profile-point-p \ the address where the next pp will be stored
82: profile-points next-profile-point-p !
1.3 anton 83: variable last-colondef-profile \ pointer to the pp of last colon definition
84: variable current-profile-point
1.5 anton 85: variable library-calls 0 library-calls ! \ list of calls to library colon defs
1.4 anton 86: variable in-compile,? in-compile,? off
1.6 anton 87: variable all-bbs 0 all-bbs ! \ list of all basic blocks
1.2 anton 88:
89: \ list stuff
90:
1.3 anton 91: : map-list ( ... list xt -- ... )
92: { xt } begin { list }
93: list while
94: list xt execute
95: list list-next @
96: repeat ;
97:
98: : drop-1+ drop 1+ ;
99:
100: : list-length ( list -- u )
101: 0 swap ['] drop-1+ map-list ;
102:
103: : insert-list ( listp listpp -- )
104: \ insert list node listp into list pointed to by listpp in front
105: tuck @ over list-next !
106: swap ! ;
107:
108: : insert-list-end ( listp listppp -- )
109: \ insert list node listp into list, with listppp indicating the
110: \ position to insert at, and indicating the position behind the
111: \ new element afterwards.
112: 2dup @ insert-list
113: swap list-next swap ! ;
1.2 anton 114:
1.3 anton 115: \ calls
116:
117: : new-call ( profile-point -- call )
118: calls% %alloc tuck calls-call ! ;
1.2 anton 119:
120: \ profile-point stuff
121:
1.1 anton 122: : new-profile-point ( -- addr )
123: profile% %alloc >r
124: 0. r@ profile-count 2!
125: current-sourcepos r@ profile-sourcepos 2!
126: >in @ r@ profile-char !
1.7 anton 127: 0 r@ profile-callee-postlude !
128: 0 r@ profile-tailof !
1.6 anton 129: r@ profile-colondef? off
130: 0 r@ profile-bblen !
1.7 anton 131: -100000000 r@ profile-bblenpi !
132: current-profile-point @ profile-bblenpi @ -100000000 = if
133: current-profile-point @ dup profile-bblen @ swap profile-bblenpi !
134: endif
1.6 anton 135: 0 r@ profile-calls !
136: r@ profile-straight-line on
137: 0 r@ profile-calls-from !
1.7 anton 138: 0 r@ profile-exits !
139: 0. r@ profile-execs 2!
140: 0 r@ profile-prelude !
141: 0 r@ profile-postlude !
1.3 anton 142: r@ next-profile-point-p insert-list-end
143: r@ current-profile-point !
1.6 anton 144: r@ new-call all-bbs insert-list
1.1 anton 145: r> ;
146:
147: : print-profile ( -- )
148: profile-points @ begin
149: dup while
150: dup >r
151: r@ profile-sourcepos 2@ .sourcepos ." :"
152: r@ profile-char @ 0 .r ." : "
153: r@ profile-count 2@ 0 d.r cr
1.2 anton 154: r> list-next @
1.1 anton 155: repeat
156: drop ;
157:
158: : print-profile-coldef ( -- )
159: profile-points @ begin
160: dup while
161: dup >r
162: r@ profile-colondef? @ if
163: r@ profile-sourcepos 2@ .sourcepos ." :"
164: r@ profile-char @ 3 .r ." : "
165: r@ profile-count 2@ 10 d.r
166: r@ profile-straight-line @ space 2 .r
1.3 anton 167: r@ profile-calls @ list-length 4 .r
1.1 anton 168: cr
169: endif
1.2 anton 170: r> list-next @
1.1 anton 171: repeat
172: drop ;
173:
1.3 anton 174: : 1= ( u -- f )
175: 1 = ;
176:
177: : 2= ( u -- f )
178: 2 = ;
179:
180: : 3= ( u -- f )
181: 3 = ;
182:
183: : 1u> ( u -- f )
184: 1 u> ;
185:
186: : call-count+ ( ud1 callp -- ud2 )
187: calls-call @ profile-count 2@ d+ ;
188:
1.5 anton 189: : count-dyncalls ( calls -- ud )
190: 0. rot ['] call-count+ map-list ;
191:
192: : add-calls ( statistics1 xt-test profpp -- statistics2 xt-test )
193: \ add statistics for callee profpp up, if the number of static
194: \ calls to profpp satisfies xt-test ( u -- f ); see below for what
195: \ statistics are computed.
1.3 anton 196: { xt-test p }
1.5 anton 197: p profile-colondef? @ if
1.3 anton 198: p profile-calls @ { calls }
199: calls list-length { stat }
1.5 anton 200: stat xt-test execute if
201: { d: ud-dyn-callee d: ud-dyn-caller u-stat u-exec-callees u-callees }
202: ud-dyn-callee p profile-count 2@ 2dup { d: de } d+
203: ud-dyn-caller calls count-dyncalls 2dup { d: dr } d+
204: u-stat stat +
205: u-exec-callees de dr d<> -
206: u-callees 1+
1.3 anton 207: endif
208: endif
209: xt-test ;
210:
211: : print-stat-line ( xt -- )
1.5 anton 212: >r 0. 0. 0 0 0 r> profile-points @ ['] add-calls map-list drop
1.3 anton 213: ( ud-dyn-callee ud-dyn-caller u-stat )
1.5 anton 214: 6 u.r 7 u.r 7 u.r 12 ud.r 12 ud.r space ;
215:
216: : print-library-stats ( -- )
217: library-calls @ list-length 20 u.r \ static callers
218: library-calls @ count-dyncalls 12 ud.r \ dynamic callers
219: 13 spaces ;
1.3 anton 220:
1.6 anton 221: : bblen+ ( u1 callp -- u2 )
222: calls-call @ profile-bblen @ + ;
223:
224: : dyn-bblen+ ( ud1 callp -- ud2 )
225: calls-call @ dup profile-count 2@ rot profile-bblen @ 1 m*/ d+ ;
226:
227: : print-bb-statistics ( -- )
228: ." static dynamic" cr
229: all-bbs @ list-length 6 u.r all-bbs @ count-dyncalls 12 ud.r ." basic blocks" cr
230: 0 all-bbs @ ['] bblen+ map-list 6 u.r
231: 0. all-bbs @ ['] dyn-bblen+ map-list 12 ud.r ." primitives" cr
232: ;
233:
1.3 anton 234: : print-statistics ( -- )
1.5 anton 235: ." callee exec'd static dyn-caller dyn-callee condition" cr
1.3 anton 236: ['] 0= print-stat-line ." calls to coldefs with 0 callers" cr
237: ['] 1= print-stat-line ." calls to coldefs with 1 callers" cr
238: ['] 2= print-stat-line ." calls to coldefs with 2 callers" cr
239: ['] 3= print-stat-line ." calls to coldefs with 3 callers" cr
240: ['] 1u> print-stat-line ." calls to coldefs with >1 callers" cr
1.5 anton 241: print-library-stats ." library calls" cr
1.6 anton 242: print-bb-statistics
1.3 anton 243: ;
244:
1.1 anton 245: : dinc ( profilep -- )
246: \ increment double pointed to by d-addr
247: profile-count dup 2@ 1. d+ rot 2! ;
248:
249: : profile-this ( -- )
1.4 anton 250: in-compile,? @ in-compile,? on
251: new-profile-point POSTPONE literal POSTPONE dinc
252: in-compile,? ! ;
1.1 anton 253:
254: \ Various words trigger PROFILE-THIS. In order to avoid getting
255: \ several calls to PROFILE-THIS from a compiling word (like ?EXIT), we
256: \ just wait until the next word is parsed by the text interpreter (in
257: \ compile state) and call PROFILE-THIS only once then. The whole
258: \ BEFORE-WORD hooking etc. is there for this.
259:
260: \ The reason that we do this is because we use the source position for
261: \ the profiling information, and there's only one source position for
262: \ ?EXIT. If we used the threaded code position instead, we would see
263: \ that ?EXIT compiles to several threaded-code words, and could use
264: \ different profile points for them. However, usually dealing with
265: \ the source is more practical.
266:
267: \ Another benefit is that we can ask for profiling anywhere in a
268: \ control-flow word (even before it compiles its own stuff).
269:
270: \ Potential problem: Consider "COMPILING ] [" where COMPILING compiles
271: \ a whole colon definition (and triggers our profiler), but during the
272: \ compilation of the colon definition there is no parsing. Afterwards
273: \ you get interpret state at first (no profiling, either), but after
274: \ the "]" you get parsing in compile state, and PROFILE-THIS gets
275: \ called (and compiles code that is never executed). It would be
276: \ better if we had a way of knowing whether we are in a colon def or
277: \ not (and used that knowledge instead of STATE).
278:
1.6 anton 279: Defer before-word-profile ( -- )
280: ' noop IS before-word-profile
1.1 anton 281:
1.6 anton 282: : before-word1 ( -- )
283: before-word-profile defers before-word ;
1.1 anton 284:
1.6 anton 285: ' before-word1 IS before-word
1.1 anton 286:
1.6 anton 287: : profile-this-compiling ( -- )
288: state @ if
289: profile-this
290: ['] noop IS before-word-profile
291: endif ;
292:
293: : cock-profiler ( -- )
294: \ as in cock the gun - pull the trigger
295: ['] profile-this-compiling IS before-word-profile
296: [ count-calls? ] [if] \ we are at a non-colondef profile point
297: last-colondef-profile @ profile-straight-line off
298: [endif]
299: ;
1.1 anton 300:
301: : hook-profiling-into ( "name" -- )
302: \ make (deferred word) "name" call cock-profiler, too
303: ' >body >r :noname
1.6 anton 304: POSTPONE cock-profiler
1.1 anton 305: r@ @ compile, \ old hook behaviour
306: POSTPONE ;
307: r> ! ; \ change hook behaviour
308:
309: : note-execute ( -- )
1.7 anton 310: \ end of BB due to execute, dodefer, perform
311: profile-this \ should actually happen after the word, but the
312: \ error is probably small
1.1 anton 313: ;
314:
315: : note-call ( addr -- )
316: \ addr is the body address of a called colon def or does handler
1.5 anton 317: dup ['] (does>2) >body = if \ adjust does handler address
318: 4 cells here 1 cells - +!
1.1 anton 319: endif
1.7 anton 320: { addr }
321: current-profile-point @ { lastbb }
322: profile-this
323: current-profile-point @ { thisbb }
324: thisbb new-call { call-node }
325: over 3 cells + @ ['] dinc >body = if
1.5 anton 326: \ non-library call
1.7 anton 327: !! update profile-bblenpi of last and current pp
328: addr cell+ @ { callee-pp }
329: callee-pp profile-postlude @ thisbb profile-callee-postlude !
330: call-node callee-pp profile-calls insert-list
1.5 anton 331: else ( addr call-prof-point )
1.7 anton 332: call-node library-calls insert-list
1.5 anton 333: endif ;
1.4 anton 334:
1.1 anton 335: : prof-compile, ( xt -- )
1.4 anton 336: in-compile,? @ if
337: DEFERS compile, EXIT
338: endif
1.6 anton 339: 1 current-profile-point @ profile-bblen +!
1.7 anton 340: dup CASE
341: ['] execute of note-execute endof
342: ['] perform of note-execute endof
343: dup >does-code if
344: dup >does-code note-call
345: then
346: dup >code-address CASE
347: docol: OF dup >body note-call ENDOF
348: dodefer: OF note-execute ENDOF
349: \ dofield: OF >body @ POSTPONE literal ['] + peephole-compile, EXIT ENDOF
350: \ code words and ;code-defined words (code words could be optimized):
351: ENDCASE
1.1 anton 352: ENDCASE
353: DEFERS compile, ;
354:
1.4 anton 355: : :-hook-profile ( -- )
356: defers :-hook
357: next-profile-point-p @
358: profile-this
1.7 anton 359: @ dup last-colondef-profile ! ( current-profile-point )
360: 1 over profile-bblenpi !
1.4 anton 361: profile-colondef? on ;
362:
1.7 anton 363: : exit-hook-profile ( -- )
364: defers exit-hook
365: 1 last-colondef-profile @ profile-exits +! ;
366:
367: : ;-hook-profile ( -- )
368: \ ;-hook is called before the POSTPONE EXIT
369: defers ;-hook
370: last-colondef-profile @ { col }
371: current-profile-point @ { bb }
372: col profile-bblen @ col profile-prelude +!
373: col profile-exits @ 0= if
374: col bb profile-tailof !
375: bb profile-bblen @ bb profile-callee-postlude @ +
376: col profile-postlude !
377: 1 bb profile-bblenpi !
378: \ not counting the EXIT
379: endif ;
380:
1.6 anton 381: hook-profiling-into then-like
382: \ hook-profiling-into if-like \ subsumed by other-control-flow
383: \ hook-profiling-into ahead-like \ subsumed by other-control-flow
384: hook-profiling-into other-control-flow
385: hook-profiling-into begin-like
386: hook-profiling-into again-like
387: hook-profiling-into until-like
1.1 anton 388: ' :-hook-profile IS :-hook
1.4 anton 389: ' prof-compile, IS compile,
1.7 anton 390: ' exit-hook-profile IS exit-hook
391: ' ;-hook-profile IS ;-hook
FreeBSD-CVSweb <freebsd-cvsweb@FreeBSD.org>