Annotation of gforth/prof-inline.fs, revision 1.8

1.1       anton       1: \ get some data on potential (partial) inlining
                      2: 
                      3: \ Copyright (C) 2004 Free Software Foundation, Inc.
                      4: 
                      5: \ This file is part of Gforth.
                      6: 
                      7: \ Gforth is free software; you can redistribute it and/or
                      8: \ modify it under the terms of the GNU General Public License
1.8     ! anton       9: \ as published by the Free Software Foundation, either version 3
1.1       anton      10: \ of the License, or (at your option) any later version.
                     11: 
                     12: \ This program is distributed in the hope that it will be useful,
                     13: \ but WITHOUT ANY WARRANTY; without even the implied warranty of
                     14: \ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
                     15: \ GNU General Public License for more details.
                     16: 
                     17: \ You should have received a copy of the GNU General Public License
1.8     ! anton      18: \ along with this program. If not, see http://www.gnu.org/licenses/.
1.1       anton      19: 
                     20: 
                     21: \ relies on some Gforth internals
                     22: 
                     23: \ !! assumption: each file is included only once; otherwise you get
                     24: \ the counts for just one of the instances of the file.  This can be
                     25: \ fixed by making sure that every source position occurs only once as
                     26: \ a profile point.
                     27: 
                     28: true constant count-calls? \ do some profiling of colon definitions etc.
                     29: 
                     30: \ for true COUNT-CALLS?:
                     31: 
                     32: \ What data do I need for evaluating the effectiveness of (partial) inlining?
                     33: 
                     34: \ static and dynamic counts of everything:
                     35: 
                     36: \ original BB length (histogram and average)
                     37: \ BB length with partial inlining (histogram and average)
                     38: \   since we cannot partially inline library calls, we use a parameter
                     39: \   that represents the amount of partial inlining we can expect there.
                     40: \ number of tail calls (original and after partial inlining)
                     41: \ number of calls (original and after partial inlining)
                     42: \ reason for BB end: call, return, execute, branch
                     43: 
                     44: \ how many static calls are there to a word?  How many of the dynamic
                     45: \ calls call just a single word?
                     46: 
1.2       anton      47: \ how much does inlining called-once words help?
                     48: \ how much does inlining words without control flow help?
                     49: \ how much does partial inlining help?
                     50: \ what's the overlap?
                     51: \ optimizing return-to-returns (tail calls), return-to-calls, call-to-calls
                     52: 
1.1       anton      53: struct
1.3       anton      54:     cell% field list-next
1.2       anton      55: end-struct list%
                     56: 
                     57: list%
1.7       anton      58:     cell% 2* field profile-count \ how often this profile point is performed
1.1       anton      59:     cell% 2* field profile-sourcepos
1.6       anton      60:     cell% field profile-char \ character position in line
                     61:     cell% field profile-bblen \ number of primitives in BB
1.7       anton      62:     cell% field profile-bblenpi \ bblen after partial inlining
                     63:     cell% field profile-callee-postlude \ 0 or (for calls) callee postlude len
                     64:     cell% field profile-tailof \ 0 or (for tail bbs) pointer to coldef bb
1.6       anton      65:     cell% field profile-colondef? \ is this a colon definition start
                     66:     cell% field profile-calls \ static calls to the colon def (calls%)
                     67:     cell% field profile-straight-line \ may contain calls, but no other CF
                     68:     cell% field profile-calls-from \ static calls in the colon def
1.7       anton      69:     cell% field profile-exits \ number of exits in this colon def
                     70:     cell% 2* field profile-execs \ number of EXECUTEs etc. of this colon def
                     71:     cell% field profile-prelude \ first BB-len of colon def (incl. callee)
                     72:     cell% field profile-postlude \ last BB-len of colon def (incl. callee)
                     73: end-struct profile% \ profile point 
1.1       anton      74: 
1.2       anton      75: list%
1.3       anton      76:     cell% field calls-call \ ptr to profile point of bb containing the call
1.2       anton      77: end-struct calls%
                     78: 
1.1       anton      79: variable profile-points \ linked list of profile%
                     80: 0 profile-points !
                     81: variable next-profile-point-p \ the address where the next pp will be stored
                     82: profile-points next-profile-point-p !
1.3       anton      83: variable last-colondef-profile \ pointer to the pp of last colon definition
                     84: variable current-profile-point
1.5       anton      85: variable library-calls 0 library-calls ! \ list of calls to library colon defs
1.4       anton      86: variable in-compile,? in-compile,? off
1.6       anton      87: variable all-bbs 0 all-bbs ! \ list of all basic blocks
1.2       anton      88: 
                     89: \ list stuff
                     90: 
1.3       anton      91: : map-list ( ... list xt -- ... )
                     92:     { xt } begin { list }
                     93:        list while
                     94:            list xt execute
                     95:            list list-next @
                     96:     repeat ;
                     97: 
                     98: : drop-1+ drop 1+ ;
                     99: 
                    100: : list-length ( list -- u )
                    101:     0 swap ['] drop-1+ map-list ;
                    102: 
                    103: : insert-list ( listp listpp -- )
                    104:     \ insert list node listp into list pointed to by listpp in front
                    105:     tuck @ over list-next !
                    106:     swap ! ;
                    107: 
                    108: : insert-list-end ( listp listppp -- )
                    109:     \ insert list node listp into list, with listppp indicating the
                    110:     \ position to insert at, and indicating the position behind the
                    111:     \ new element afterwards.
                    112:     2dup @ insert-list
                    113:     swap list-next swap ! ;
1.2       anton     114: 
1.3       anton     115: \ calls
                    116: 
                    117: : new-call ( profile-point -- call )
                    118:     calls% %alloc tuck calls-call ! ;
1.2       anton     119: 
                    120: \ profile-point stuff   
                    121: 
1.1       anton     122: : new-profile-point ( -- addr )
                    123:     profile% %alloc >r
                    124:     0. r@ profile-count 2!
                    125:     current-sourcepos r@ profile-sourcepos 2!
                    126:     >in @ r@ profile-char !
1.7       anton     127:     0 r@ profile-callee-postlude !
                    128:     0 r@ profile-tailof !
1.6       anton     129:     r@ profile-colondef? off
                    130:     0 r@ profile-bblen !
1.7       anton     131:     -100000000 r@ profile-bblenpi !
                    132:     current-profile-point @ profile-bblenpi @ -100000000 = if
                    133:        current-profile-point @ dup profile-bblen @ swap profile-bblenpi !
                    134:     endif
1.6       anton     135:     0 r@ profile-calls !
                    136:     r@ profile-straight-line on
                    137:     0 r@ profile-calls-from !
1.7       anton     138:     0 r@ profile-exits !
                    139:     0. r@ profile-execs 2!
                    140:     0 r@ profile-prelude !
                    141:     0 r@ profile-postlude !
1.3       anton     142:     r@ next-profile-point-p insert-list-end
                    143:     r@ current-profile-point !
1.6       anton     144:     r@ new-call all-bbs insert-list
1.1       anton     145:     r> ;
                    146: 
                    147: : print-profile ( -- )
                    148:     profile-points @ begin
                    149:        dup while
                    150:            dup >r
                    151:            r@ profile-sourcepos 2@ .sourcepos ." :"
                    152:            r@ profile-char @ 0 .r ." : "
                    153:            r@ profile-count 2@ 0 d.r cr
1.2       anton     154:            r> list-next @
1.1       anton     155:     repeat
                    156:     drop ;
                    157: 
                    158: : print-profile-coldef ( -- )
                    159:     profile-points @ begin
                    160:        dup while
                    161:            dup >r
                    162:            r@ profile-colondef? @ if
                    163:                r@ profile-sourcepos 2@ .sourcepos ." :"
                    164:                r@ profile-char @ 3 .r ." : "
                    165:                r@ profile-count 2@ 10 d.r
                    166:                r@ profile-straight-line @ space 2 .r
1.3       anton     167:                r@ profile-calls @ list-length 4 .r
1.1       anton     168:                cr
                    169:            endif
1.2       anton     170:            r> list-next @
1.1       anton     171:     repeat
                    172:     drop ;
                    173: 
1.3       anton     174: : 1= ( u -- f )
                    175:     1 = ;
                    176: 
                    177: : 2= ( u -- f )
                    178:     2 = ;
                    179: 
                    180: : 3= ( u -- f )
                    181:     3 = ;
                    182: 
                    183: : 1u> ( u -- f )
                    184:     1 u> ;
                    185: 
                    186: : call-count+ ( ud1 callp -- ud2 )
                    187:     calls-call @ profile-count 2@ d+ ;
                    188: 
1.5       anton     189: : count-dyncalls ( calls -- ud )
                    190:     0. rot ['] call-count+ map-list ;
                    191: 
                    192: : add-calls ( statistics1 xt-test profpp -- statistics2 xt-test )
                    193:     \ add statistics for callee profpp up, if the number of static
                    194:     \ calls to profpp satisfies xt-test ( u -- f ); see below for what
                    195:     \ statistics are computed.
1.3       anton     196:     { xt-test p }
1.5       anton     197:     p profile-colondef? @ if
1.3       anton     198:        p profile-calls @ { calls }
                    199:        calls list-length { stat }
1.5       anton     200:        stat xt-test execute if
                    201:            { d: ud-dyn-callee d: ud-dyn-caller u-stat u-exec-callees u-callees }
                    202:            ud-dyn-callee p profile-count 2@ 2dup { d: de } d+
                    203:            ud-dyn-caller calls count-dyncalls 2dup { d: dr } d+
                    204:            u-stat stat +
                    205:            u-exec-callees de dr d<> -
                    206:            u-callees 1+
1.3       anton     207:        endif
                    208:     endif
                    209:     xt-test ;
                    210: 
                    211: : print-stat-line ( xt -- )
1.5       anton     212:     >r 0. 0. 0 0 0 r> profile-points @ ['] add-calls map-list drop
1.3       anton     213:     ( ud-dyn-callee ud-dyn-caller u-stat )
1.5       anton     214:     6 u.r 7 u.r 7 u.r 12 ud.r 12 ud.r space ;
                    215: 
                    216: : print-library-stats ( -- )
                    217:     library-calls @ list-length 20 u.r \ static callers
                    218:     library-calls @ count-dyncalls 12 ud.r \ dynamic callers
                    219:     13 spaces ;
1.3       anton     220: 
1.6       anton     221: : bblen+ ( u1 callp -- u2 )
                    222:     calls-call @ profile-bblen @ + ;
                    223: 
                    224: : dyn-bblen+ ( ud1 callp -- ud2 )
                    225:     calls-call @ dup profile-count 2@ rot profile-bblen @ 1 m*/ d+ ;
                    226:     
                    227: : print-bb-statistics ( -- )
                    228:     ." static     dynamic" cr
                    229:     all-bbs @ list-length 6 u.r all-bbs @ count-dyncalls 12 ud.r ."  basic blocks" cr
                    230:     0 all-bbs @ ['] bblen+ map-list 6 u.r
                    231:     0. all-bbs @ ['] dyn-bblen+ map-list 12 ud.r ."  primitives" cr
                    232:     ;
                    233: 
1.3       anton     234: : print-statistics ( -- )
1.5       anton     235:     ." callee exec'd static  dyn-caller  dyn-callee   condition" cr
1.3       anton     236:     ['] 0=  print-stat-line ." calls to coldefs with 0 callers" cr
                    237:     ['] 1=  print-stat-line ." calls to coldefs with 1 callers" cr
                    238:     ['] 2=  print-stat-line ." calls to coldefs with 2 callers" cr
                    239:     ['] 3=  print-stat-line ." calls to coldefs with 3 callers" cr
                    240:     ['] 1u> print-stat-line ." calls to coldefs with >1 callers" cr
1.5       anton     241:     print-library-stats     ." library calls" cr
1.6       anton     242:     print-bb-statistics
1.3       anton     243:     ;
                    244: 
1.1       anton     245: : dinc ( profilep -- )
                    246:     \ increment double pointed to by d-addr
                    247:     profile-count dup 2@ 1. d+ rot 2! ;
                    248: 
                    249: : profile-this ( -- )
1.4       anton     250:     in-compile,? @ in-compile,? on
                    251:     new-profile-point POSTPONE literal POSTPONE dinc
                    252:     in-compile,? ! ;
1.1       anton     253: 
                    254: \ Various words trigger PROFILE-THIS.  In order to avoid getting
                    255: \ several calls to PROFILE-THIS from a compiling word (like ?EXIT), we
                    256: \ just wait until the next word is parsed by the text interpreter (in
                    257: \ compile state) and call PROFILE-THIS only once then.  The whole
                    258: \ BEFORE-WORD hooking etc. is there for this.
                    259: 
                    260: \ The reason that we do this is because we use the source position for
                    261: \ the profiling information, and there's only one source position for
                    262: \ ?EXIT.  If we used the threaded code position instead, we would see
                    263: \ that ?EXIT compiles to several threaded-code words, and could use
                    264: \ different profile points for them.  However, usually dealing with
                    265: \ the source is more practical.
                    266: 
                    267: \ Another benefit is that we can ask for profiling anywhere in a
                    268: \ control-flow word (even before it compiles its own stuff).
                    269: 
                    270: \ Potential problem: Consider "COMPILING ] [" where COMPILING compiles
                    271: \ a whole colon definition (and triggers our profiler), but during the
                    272: \ compilation of the colon definition there is no parsing.  Afterwards
                    273: \ you get interpret state at first (no profiling, either), but after
                    274: \ the "]" you get parsing in compile state, and PROFILE-THIS gets
                    275: \ called (and compiles code that is never executed).  It would be
                    276: \ better if we had a way of knowing whether we are in a colon def or
                    277: \ not (and used that knowledge instead of STATE).
                    278: 
1.6       anton     279: Defer before-word-profile ( -- )
                    280: ' noop IS before-word-profile
1.1       anton     281: 
1.6       anton     282: : before-word1 ( -- )
                    283:     before-word-profile defers before-word ;
1.1       anton     284: 
1.6       anton     285: ' before-word1 IS before-word
1.1       anton     286: 
1.6       anton     287: : profile-this-compiling ( -- )
                    288:     state @ if
                    289:        profile-this
                    290:        ['] noop IS before-word-profile
                    291:     endif ;
                    292: 
                    293: : cock-profiler ( -- )
                    294:     \ as in cock the gun - pull the trigger
                    295:     ['] profile-this-compiling IS before-word-profile
                    296:     [ count-calls? ] [if] \ we are at a non-colondef profile point
                    297:        last-colondef-profile @ profile-straight-line off
                    298:     [endif]
                    299: ;
1.1       anton     300: 
                    301: : hook-profiling-into ( "name" -- )
                    302:     \ make (deferred word) "name" call cock-profiler, too
                    303:     ' >body >r :noname
1.6       anton     304:     POSTPONE cock-profiler
1.1       anton     305:     r@ @ compile, \ old hook behaviour
                    306:     POSTPONE ;
                    307:     r> ! ; \ change hook behaviour
                    308: 
                    309: : note-execute ( -- )
1.7       anton     310:     \ end of BB due to execute, dodefer, perform
                    311:     profile-this \ should actually happen after the word, but the
                    312:                  \ error is probably small
1.1       anton     313: ;
                    314: 
                    315: : note-call ( addr -- )
                    316:     \ addr is the body address of a called colon def or does handler
1.5       anton     317:     dup ['] (does>2) >body = if \ adjust does handler address
                    318:        4 cells here 1 cells - +!
1.1       anton     319:     endif
1.7       anton     320:     { addr }
                    321:     current-profile-point @ { lastbb }
                    322:     profile-this
                    323:     current-profile-point @ { thisbb }
                    324:     thisbb new-call { call-node }
                    325:     over 3 cells + @ ['] dinc >body = if
1.5       anton     326:        \ non-library call
1.7       anton     327:     !! update profile-bblenpi of last and current pp
                    328:        addr cell+ @ { callee-pp }
                    329:        callee-pp profile-postlude @ thisbb profile-callee-postlude !
                    330:        call-node callee-pp profile-calls insert-list
1.5       anton     331:     else ( addr call-prof-point )
1.7       anton     332:        call-node library-calls insert-list
1.5       anton     333:     endif ;
1.4       anton     334: 
1.1       anton     335: : prof-compile, ( xt -- )
1.4       anton     336:     in-compile,? @ if
                    337:        DEFERS compile, EXIT
                    338:     endif
1.6       anton     339:     1 current-profile-point @ profile-bblen +!
1.7       anton     340:     dup CASE
                    341:        ['] execute of note-execute endof
                    342:        ['] perform of note-execute endof
                    343:        dup >does-code if
                    344:            dup >does-code note-call
                    345:        then
                    346:        dup >code-address CASE
                    347:            docol:   OF dup >body note-call ENDOF
                    348:            dodefer: OF note-execute ENDOF
                    349:            \ dofield: OF >body @ POSTPONE literal ['] + peephole-compile, EXIT ENDOF
                    350:            \ code words and ;code-defined words (code words could be optimized):
                    351:        ENDCASE
1.1       anton     352:     ENDCASE
                    353:     DEFERS compile, ;
                    354: 
1.4       anton     355: : :-hook-profile ( -- )
                    356:     defers :-hook
                    357:     next-profile-point-p @
                    358:     profile-this
1.7       anton     359:     @ dup last-colondef-profile ! ( current-profile-point )
                    360:     1 over profile-bblenpi !
1.4       anton     361:     profile-colondef? on ;
                    362: 
1.7       anton     363: : exit-hook-profile ( -- )
                    364:     defers exit-hook
                    365:     1 last-colondef-profile @ profile-exits +! ;
                    366: 
                    367: : ;-hook-profile ( -- )
                    368:     \ ;-hook is called before the POSTPONE EXIT
                    369:     defers ;-hook
                    370:     last-colondef-profile @ { col }
                    371:     current-profile-point @ { bb }
                    372:     col profile-bblen @ col profile-prelude +!
                    373:     col profile-exits @ 0= if
                    374:        col bb profile-tailof !
                    375:        bb profile-bblen @ bb profile-callee-postlude @ +
                    376:        col profile-postlude !
                    377:        1 bb profile-bblenpi !
                    378:        \ not counting the EXIT
                    379:     endif ;
                    380: 
1.6       anton     381: hook-profiling-into then-like
                    382: \ hook-profiling-into if-like    \ subsumed by other-control-flow
                    383: \ hook-profiling-into ahead-like \ subsumed by other-control-flow
                    384: hook-profiling-into other-control-flow
                    385: hook-profiling-into begin-like
                    386: hook-profiling-into again-like
                    387: hook-profiling-into until-like
1.1       anton     388: ' :-hook-profile IS :-hook
1.4       anton     389: ' prof-compile, IS compile,
1.7       anton     390: ' exit-hook-profile IS exit-hook
                    391: ' ;-hook-profile IS ;-hook

FreeBSD-CVSweb <freebsd-cvsweb@FreeBSD.org>