2023W-EFFPROG

Magic Hexagon.
Log | Files | Refs | README

HEADER.html (13414B)


      1 <title>Project for the course Efficient Programs WS23/24</title>
      2 
      3 <h1>Project for the course Efficient Programs WS23/24</h1>
      4 
      5 You can choose an arbitrary project.
      6 
      7 <p>If you don't have any project of your own, you can choose
      8 the <a href="#given">given project</a> for this semester.
      9 
     10 <p>Implementing the given project has some advantages: You don't need
     11 to spend precious presentation time on explaining the problem, and may
     12 also be able to spend less on explaining the algorithm, and the
     13 results are directly comparable to those of other groups.  A
     14 disadvantage of using the given project is that you might be repeating
     15 what others have done and you may produce worse results (especially if
     16 you choose a later presentation date).  If your results are worse,
     17 this does not influence the grading, though; the grading is based on
     18 whether you demonstrate that you have applied the content of the
     19 course.
     20 
     21 <p>Prepare an 18-minute presentation (I recommend doing a trial run).
     22 You do not have as much time as I had in the lecture, so I recommend
     23 that you present most optimization steps only superficially, and focus
     24 on those steps in more detail that are particularly interesting, e.g.,
     25 because they produced surprisingly good or suprisingly bad results.
     26 </p>
     27 
     28 <h2><a name="given">Given Project:</a> Magic Hexagon</h2>
     29 
     30 <h3>Background</h3>
     31 
     32 In a <a href="https://en.wikipedia.org/wiki/Magic_hexagon">Magic
     33 Hexagon</a> all the numbers in each row, in each of the three
     34 directions sum to the same number M; in addition, in our Magic
     35 Hexagons, all numbers are different, and they are all from a set of
     36 consecutive numbers.  E.g.,
     37 
     38 <pre>
     39      3  17  18
     40   19   7   1  11
     41 16   2   5   6   9
     42   12   4   8  14
     43     10  13  15
     44 </pre>
     45 
     46 All lines sum up to M=38 in this magic hexagon.
     47 
     48 <h3>Given Program</h3>
     49 
     50 You can find the given program on the g0
     51 in <code>/nfs/unsafe/httpd/ftp/pub/anton/lvas/effizienz-aufgabe23/</code>
     52 (use <code>cp -r to copy the whole directory to your account on the
     53 g0</code>), or <a href="effizienz-aufgabe23.tar.gz">on the web</a>
     54 
     55 You can compile the given
     56 program <a href="magichex.c"><code>magichex.c</code></a>
     57 with <code>make</code> (you need to copy magichex.c and
     58 the <a href="Makefile"><code>Makefile</code></a> for this.  The
     59 resulting binary is called <code>magichex</code> and you can invoke it
     60 as follows:
     61 
     62 <pre>
     63   ./magichex &lt;n&gt; &lt;d&gt; &lt;value1&gt; ... &lt;valuem&gt;
     64 </pre>
     65 
     66 The values are optional.  n is the size of one side of the magic
     67 hexagon (also called the order of the magic hexagon); it is 3 for the
     68 example above.  d is a parameter that influences the range of the set
     69 of consecutive numbers; the numbers are centered around d(2n-1); for
     70 the example above d=2, and the numbers are centered around 10 (from
     71 1-19).  I.e., you get the result above with
     72 
     73 <pre>
     74   ./magichex 3 2
     75 </pre>
     76 
     77 or with <code>make test1</code>.  There is also a <code>make
     78 test2</code>, which tests <code>./magichex 3 0</code>, which produces
     79 26 solutions.
     80 
     81 <p>If you provide values, they fill in the hexagon starting with the
     82 leftmost element of the first line, then to the right, then the next
     83 line, and so on.  This eliminates a lot of search space, reducing the
     84 possible number of solutions, but also reducing the needed amount of
     85 search time.  E.g, if you perform</p>
     86 
     87 <pre>
     88   ./magichex 3 0 -9
     89 </pre>
     90 
     91 you only get the 5 solutions of the 26 mentioned above where the left
     92 element of the top row is -9.
     93 
     94 <p>To evaluate your optimizations, start with</p>
     95 
     96 <pre>
     97   perf stat -e cycles:u -e instructions:u -e branches:u -e branch-misses:u \
     98   -e L1-dcache-load-misses:u ./magichex 4 3 14 33 30 34 39 6 24 20
     99 </pre>
    100 
    101 You can run this with an output check with <code>make measure</code>,
    102 or without (but showing the solutions on the terminal) with <code>make
    103 nocheck</code>.
    104 
    105 <p>With the given program compiled with -O3 this outputs 40 solutions
    106 (where all solutions are checked with <code>make measure</code>) and
    107 the following performance results:
    108 
    109 <pre>
    110  310194428814      cycles:u             
    111  897820790268      instructions:u        # 2.89  insn per cycle
    112  257049961217      branches:u
    113    2366511820      branch-misses:u       # 0.92% of all branches
    114        141455      L1-dcache-load-misses
    115 
    116  66.160150710 seconds time elapsed
    117 
    118  66.123002000 seconds user
    119   0.003999000 seconds sys
    120 </pre>  
    121 
    122 Your optimized program has to produce the same solutions in any order;
    123 you do not need to preserve the output of the number of visited leafs
    124 (this can be considered to be another kind of performance counter).
    125 To compare the output for correctness, process it with
    126 
    127 <pre>
    128   ./magichex ... |
    129   grep -v solution|
    130   awk '/^leafs visited:/ {printf("\0")} /^leafs visited:/,/^$/ {next} 1'|
    131   sort -z|
    132   tr '\0' '\n\n' |
    133   diff -u reference-output -
    134 </pre>
    135 
    136 <code>make measure</code> does this.
    137 
    138 <p>I expect that your optimizations will produce vast improvements in
    139 run-time.  This may allow your program to solve larger problems while
    140 still taking &lt;90s of CPU time (user+system).  So if you improve
    141 your program, try larger problems by deleting some of the given values
    142 (starting from the right).  Of course, this may produce additional
    143 solutions, so you will have to generate a new reference output (after
    144 first checking that your program still produces the given reference
    145 output for the original values) for your further work, using the same
    146 kind of canonicalization as shown above.
    147 
    148 <p>In your presentation, please report how many values you gave for
    149 evaluating your best version, report the cycles:u value, and the
    150 number of solutions (as a weak form of comaring the correctness with
    151 other solutions that use the same number of given values).  Given the
    152 low number of cache misses, the cycles should be very representative
    153 and vary less than the run-time even in the face of clock frequency
    154 changes due to other load on the machine.
    155 
    156 <p>If you are using parallel programming to improve the elapsed time
    157 of your program, report the cycles:u, the CPU time and the elapsed
    158 time, but still report all solutions for a problem that does not take
    159 more than 90s of CPU time.
    160 
    161 <h3>Program explanation and suggestions</h3>
    162 
    163 This section explains how the program works and suggests some possible improvements.
    164 
    165 <p>The numbers that have to be found (as well as the numbers that are
    166 given) are represented in the <code>Var</code> type, which works like
    167 a variable in constraint logic programming: The value in the variable
    168 is usually not completely known, but is within bounds (represented by
    169 the fields <code>lo</code> and <code>hi</code>; note that the value
    170 of <code>hi</code> is a possible value of the variable, unlike the
    171 convention in many other cases in programming).  There is also the
    172 array <code>occupation</code> in <code>solve()</code> which tells
    173 which variables already contain which values, and that these values
    174 are therefore unavailable to other variables (because they all have to
    175 have different values).  While following a branch of the search tree
    176 towards a leaf, the bounds of the variables are tightened more and
    177 more, until the bounds of a variable are the same value (then the
    178 variable has that value), or until there is no value left for the
    179 variable (then there is no solution in that branch of the search
    180 tree).
    181 
    182 <p>These variables are involved in the following constraints:
    183 
    184 <ul>
    185 <li>An alldifferent constraint involving all the variables.  It
    186 ensures that the variables all get different values.  This constraint
    187 does not occur explicitly in the program, but is expressed in the
    188 first part of <code>solve()</code>
    189 
    190 <li>Sum constraints v1+...+vn=M for all the lines in all three
    191 directions through the hexagon.  You can find them as
    192 function <code>sum()</code>, and in the calls to this function
    193 inside <code>solve()</code>.
    194 
    195 <li>Lessthan constraints (vx&lt;vy) between the corners eliminate
    196 symmetric solutions.  You can find them in the
    197 function <code>lessthan()</code> and in the calls of this function
    198 inside <code>solve()</code>.
    199 </ul>
    200 
    201 <p>The two most relevant functions for understanding (and probably for
    202 optimizing) the program are <code>labeling()</code>
    203 and <code>solve()</code>.
    204 
    205 <p><code>labeling()</code> explores the search tree by trying out one
    206 of the possible values of a variable, and then exploring the rest of
    207 the search tree with <code>labeling()</code>.  If that part of the
    208 search tree is exhausted (whether a solution was found or not), the
    209 next value for the variable is tried, until all values for this
    210 variable have been exhausted.  Given that the search tree grows
    211 exponentially, there is a potential for big speedups by better
    212 labeling.
    213 
    214 <p>In particular, you can choose to process the variables in a
    215 different order (the given program processes them line by line,
    216 left-to-righ).  A common heuristic is to process the variable with the
    217 lowest number of remaining values first.  For another magic hexagon
    218 program, I have used a fixed ordering starting with the corners,
    219 because the corners are involved in two of the sum constraints with
    220 the least variables (i.e., few additional variables need to get a
    221 value to fully determine all values in the constraint), and because
    222 they are involved in the lessthan constraints.
    223 
    224 <p>Instead of setting a variable directly to one of its values in a
    225 labeling step, you can also bisect its range by first setting its
    226 upper bound to some middle value x, and, after exploring that branch,
    227 its lower bound to x+1.  The potential advantage is that you may find
    228 that one of these subranges does not contain a solution without
    229 exploring it in more depth.
    230 
    231 <p>You can get additional inspiration for possible labeling heuristics
    232 from <a href="https://swish.swi-prolog.org/pldoc/man?predicate=labeling/2">the
    233 SWI-Prolog documentation</a>.
    234 
    235 <p>One other interesting aspect of the present labeling approach is
    236 that it copies all variables before every choice it makes.  This is
    237 probably a good choice given the relatively low number of variables
    238 and the probably large number of changes, but a more usual
    239 implementation of (constraint) logic programming instead uses a
    240 (value) trail stack that records changes to the variables as they are
    241 made, and, on backtracking (returning from an
    242 inner <code>labeling()</code> call), all the values since the choice
    243 point at hand are restored from the trail stack.
    244 
    245 <p><code>solve()</code> applies all constraints to reduce the possible
    246 values of all variables as much as possible (within the limits of
    247 looking at each constraint, one at a time).  This is a polynomial
    248 process, but it still consumes a lot of time, and reducing that by a
    249 good factor will certainly help.
    250 
    251 <p>The current solver always starts from the beginning after every
    252 change to a variable or the <code>occupation</code> array.  It may be
    253 faster to process further changes before restarting the solver.
    254 
    255 <p> It also reevaluates constraints that cannot be affected by the
    256 changes since the last evaluation (because the changed variable is not
    257 involved in the constraint).  It may be faster to only reevaluate
    258 constraints that have been affected.  However, if most constraints
    259 have been affected, the overhead of recording whether a constraint is
    260 affected may be larger than the performance improvement.
    261 
    262 <p>If you are really brave, you implement a stronger solver, such as a
    263 stronger implementation of alldifferent, e.g., using the methods
    264 of <a href="https://www.ijcai.org/Proceedings/03/Papers/036.pdf">Lopez-Ortiz
    265 et al.</a>
    266 or <a href="https://cdn.aaai.org/AAAI/1994/AAAI94-055.pdf">Regin</a>.
    267 
    268 <p>You should not pursue all of these ideas (some of which conflict
    269 with each other), but select some promising ideas, implement them,
    270 then optimize that implementation, and report the results.
    271 
    272 <h2>Putting the results online</h2>
    273 
    274 <p>You can publish your program and its documentation (e.g., the
    275 presentation) by putting a project on, e.g., Github
    276 or <a href="https://sr.ht/">Sourcehut</a>, and putting a web page with
    277 a link to that project
    278 on <code>/nfs/unsafe/httpd/ftp/pub/anton/lvas/effizienz-abgaben/2023w</code>
    279 (on the g0).  Alternatively, you can also create a directory there and
    280 put your program and documentation there.  This web page or directory
    281 is visible in the web
    282 through <a href="../effizienz-abgaben/2023w/">this page</a>.
    283 
    284 <p>The publication of your solution is not graded, so publishing it is
    285 completely optional.
    286 
    287 
    288 <p>Project tasks from
    289 [<a href="../effizienz-aufgabe02.html">WS02/03</a> |
    290 <a href="../effizienz-aufgabe03.html">WS03/04</a> | <a
    291 href="../effizienz-aufgabe04.html">WS04/05</a> | <a
    292 href="../effizienz-aufgabe05/">WS05/06</a> | <a
    293 href="../effizienz-aufgabe06/">WS06/07</a> | <a
    294 href="../effizienz-aufgabe07/">WS07/08</a> | <a
    295 href="../effizienz-aufgabe08/">WS08/09</a> | <a
    296 href="../effizienz-aufgabe09/">WS09/10</a> | <a
    297 href="../effizienz-aufgabe10/">WS10/11</a> | <a
    298 href="../effizienz-aufgabe11/">WS11/12</a> | <a
    299 href="../effizienz-aufgabe12/">WS12/13</a> | <a
    300 href="../effizienz-aufgabe13/">WS13/14</a> | <a
    301 href="../effizienz-aufgabe14/">WS14/15</a> | <a
    302 href="../effizienz-aufgabe15/">WS15/16</a> | <a
    303 href="../effizienz-aufgabe16/">WS16/17</a> | <a
    304 href="../effizienz-aufgabe17/">WS17/18</a> | <a
    305 href="../effizienz-aufgabe18/">WS18/19</a> | <a
    306 href="../effizienz-aufgabe19/">WS19/20</a> | <a
    307 href="../effizienz-aufgabe20/">WS20/21</a> | <a
    308 href="../effizienz-aufgabe21/">WS21/22</a> | <a
    309 href="../effizienz-aufgabe22/">WS22/23</a> ]
    310 
    311 <hr>
    312 <a href="../">Anton Ertl</a>
    313