HEADER.html (13414B)
1 <title>Project for the course Efficient Programs WS23/24</title> 2 3 <h1>Project for the course Efficient Programs WS23/24</h1> 4 5 You can choose an arbitrary project. 6 7 <p>If you don't have any project of your own, you can choose 8 the <a href="#given">given project</a> for this semester. 9 10 <p>Implementing the given project has some advantages: You don't need 11 to spend precious presentation time on explaining the problem, and may 12 also be able to spend less on explaining the algorithm, and the 13 results are directly comparable to those of other groups. A 14 disadvantage of using the given project is that you might be repeating 15 what others have done and you may produce worse results (especially if 16 you choose a later presentation date). If your results are worse, 17 this does not influence the grading, though; the grading is based on 18 whether you demonstrate that you have applied the content of the 19 course. 20 21 <p>Prepare an 18-minute presentation (I recommend doing a trial run). 22 You do not have as much time as I had in the lecture, so I recommend 23 that you present most optimization steps only superficially, and focus 24 on those steps in more detail that are particularly interesting, e.g., 25 because they produced surprisingly good or suprisingly bad results. 26 </p> 27 28 <h2><a name="given">Given Project:</a> Magic Hexagon</h2> 29 30 <h3>Background</h3> 31 32 In a <a href="https://en.wikipedia.org/wiki/Magic_hexagon">Magic 33 Hexagon</a> all the numbers in each row, in each of the three 34 directions sum to the same number M; in addition, in our Magic 35 Hexagons, all numbers are different, and they are all from a set of 36 consecutive numbers. E.g., 37 38 <pre> 39 3 17 18 40 19 7 1 11 41 16 2 5 6 9 42 12 4 8 14 43 10 13 15 44 </pre> 45 46 All lines sum up to M=38 in this magic hexagon. 47 48 <h3>Given Program</h3> 49 50 You can find the given program on the g0 51 in <code>/nfs/unsafe/httpd/ftp/pub/anton/lvas/effizienz-aufgabe23/</code> 52 (use <code>cp -r to copy the whole directory to your account on the 53 g0</code>), or <a href="effizienz-aufgabe23.tar.gz">on the web</a> 54 55 You can compile the given 56 program <a href="magichex.c"><code>magichex.c</code></a> 57 with <code>make</code> (you need to copy magichex.c and 58 the <a href="Makefile"><code>Makefile</code></a> for this. The 59 resulting binary is called <code>magichex</code> and you can invoke it 60 as follows: 61 62 <pre> 63 ./magichex <n> <d> <value1> ... <valuem> 64 </pre> 65 66 The values are optional. n is the size of one side of the magic 67 hexagon (also called the order of the magic hexagon); it is 3 for the 68 example above. d is a parameter that influences the range of the set 69 of consecutive numbers; the numbers are centered around d(2n-1); for 70 the example above d=2, and the numbers are centered around 10 (from 71 1-19). I.e., you get the result above with 72 73 <pre> 74 ./magichex 3 2 75 </pre> 76 77 or with <code>make test1</code>. There is also a <code>make 78 test2</code>, which tests <code>./magichex 3 0</code>, which produces 79 26 solutions. 80 81 <p>If you provide values, they fill in the hexagon starting with the 82 leftmost element of the first line, then to the right, then the next 83 line, and so on. This eliminates a lot of search space, reducing the 84 possible number of solutions, but also reducing the needed amount of 85 search time. E.g, if you perform</p> 86 87 <pre> 88 ./magichex 3 0 -9 89 </pre> 90 91 you only get the 5 solutions of the 26 mentioned above where the left 92 element of the top row is -9. 93 94 <p>To evaluate your optimizations, start with</p> 95 96 <pre> 97 perf stat -e cycles:u -e instructions:u -e branches:u -e branch-misses:u \ 98 -e L1-dcache-load-misses:u ./magichex 4 3 14 33 30 34 39 6 24 20 99 </pre> 100 101 You can run this with an output check with <code>make measure</code>, 102 or without (but showing the solutions on the terminal) with <code>make 103 nocheck</code>. 104 105 <p>With the given program compiled with -O3 this outputs 40 solutions 106 (where all solutions are checked with <code>make measure</code>) and 107 the following performance results: 108 109 <pre> 110 310194428814 cycles:u 111 897820790268 instructions:u # 2.89 insn per cycle 112 257049961217 branches:u 113 2366511820 branch-misses:u # 0.92% of all branches 114 141455 L1-dcache-load-misses 115 116 66.160150710 seconds time elapsed 117 118 66.123002000 seconds user 119 0.003999000 seconds sys 120 </pre> 121 122 Your optimized program has to produce the same solutions in any order; 123 you do not need to preserve the output of the number of visited leafs 124 (this can be considered to be another kind of performance counter). 125 To compare the output for correctness, process it with 126 127 <pre> 128 ./magichex ... | 129 grep -v solution| 130 awk '/^leafs visited:/ {printf("\0")} /^leafs visited:/,/^$/ {next} 1'| 131 sort -z| 132 tr '\0' '\n\n' | 133 diff -u reference-output - 134 </pre> 135 136 <code>make measure</code> does this. 137 138 <p>I expect that your optimizations will produce vast improvements in 139 run-time. This may allow your program to solve larger problems while 140 still taking <90s of CPU time (user+system). So if you improve 141 your program, try larger problems by deleting some of the given values 142 (starting from the right). Of course, this may produce additional 143 solutions, so you will have to generate a new reference output (after 144 first checking that your program still produces the given reference 145 output for the original values) for your further work, using the same 146 kind of canonicalization as shown above. 147 148 <p>In your presentation, please report how many values you gave for 149 evaluating your best version, report the cycles:u value, and the 150 number of solutions (as a weak form of comaring the correctness with 151 other solutions that use the same number of given values). Given the 152 low number of cache misses, the cycles should be very representative 153 and vary less than the run-time even in the face of clock frequency 154 changes due to other load on the machine. 155 156 <p>If you are using parallel programming to improve the elapsed time 157 of your program, report the cycles:u, the CPU time and the elapsed 158 time, but still report all solutions for a problem that does not take 159 more than 90s of CPU time. 160 161 <h3>Program explanation and suggestions</h3> 162 163 This section explains how the program works and suggests some possible improvements. 164 165 <p>The numbers that have to be found (as well as the numbers that are 166 given) are represented in the <code>Var</code> type, which works like 167 a variable in constraint logic programming: The value in the variable 168 is usually not completely known, but is within bounds (represented by 169 the fields <code>lo</code> and <code>hi</code>; note that the value 170 of <code>hi</code> is a possible value of the variable, unlike the 171 convention in many other cases in programming). There is also the 172 array <code>occupation</code> in <code>solve()</code> which tells 173 which variables already contain which values, and that these values 174 are therefore unavailable to other variables (because they all have to 175 have different values). While following a branch of the search tree 176 towards a leaf, the bounds of the variables are tightened more and 177 more, until the bounds of a variable are the same value (then the 178 variable has that value), or until there is no value left for the 179 variable (then there is no solution in that branch of the search 180 tree). 181 182 <p>These variables are involved in the following constraints: 183 184 <ul> 185 <li>An alldifferent constraint involving all the variables. It 186 ensures that the variables all get different values. This constraint 187 does not occur explicitly in the program, but is expressed in the 188 first part of <code>solve()</code> 189 190 <li>Sum constraints v1+...+vn=M for all the lines in all three 191 directions through the hexagon. You can find them as 192 function <code>sum()</code>, and in the calls to this function 193 inside <code>solve()</code>. 194 195 <li>Lessthan constraints (vx<vy) between the corners eliminate 196 symmetric solutions. You can find them in the 197 function <code>lessthan()</code> and in the calls of this function 198 inside <code>solve()</code>. 199 </ul> 200 201 <p>The two most relevant functions for understanding (and probably for 202 optimizing) the program are <code>labeling()</code> 203 and <code>solve()</code>. 204 205 <p><code>labeling()</code> explores the search tree by trying out one 206 of the possible values of a variable, and then exploring the rest of 207 the search tree with <code>labeling()</code>. If that part of the 208 search tree is exhausted (whether a solution was found or not), the 209 next value for the variable is tried, until all values for this 210 variable have been exhausted. Given that the search tree grows 211 exponentially, there is a potential for big speedups by better 212 labeling. 213 214 <p>In particular, you can choose to process the variables in a 215 different order (the given program processes them line by line, 216 left-to-righ). A common heuristic is to process the variable with the 217 lowest number of remaining values first. For another magic hexagon 218 program, I have used a fixed ordering starting with the corners, 219 because the corners are involved in two of the sum constraints with 220 the least variables (i.e., few additional variables need to get a 221 value to fully determine all values in the constraint), and because 222 they are involved in the lessthan constraints. 223 224 <p>Instead of setting a variable directly to one of its values in a 225 labeling step, you can also bisect its range by first setting its 226 upper bound to some middle value x, and, after exploring that branch, 227 its lower bound to x+1. The potential advantage is that you may find 228 that one of these subranges does not contain a solution without 229 exploring it in more depth. 230 231 <p>You can get additional inspiration for possible labeling heuristics 232 from <a href="https://swish.swi-prolog.org/pldoc/man?predicate=labeling/2">the 233 SWI-Prolog documentation</a>. 234 235 <p>One other interesting aspect of the present labeling approach is 236 that it copies all variables before every choice it makes. This is 237 probably a good choice given the relatively low number of variables 238 and the probably large number of changes, but a more usual 239 implementation of (constraint) logic programming instead uses a 240 (value) trail stack that records changes to the variables as they are 241 made, and, on backtracking (returning from an 242 inner <code>labeling()</code> call), all the values since the choice 243 point at hand are restored from the trail stack. 244 245 <p><code>solve()</code> applies all constraints to reduce the possible 246 values of all variables as much as possible (within the limits of 247 looking at each constraint, one at a time). This is a polynomial 248 process, but it still consumes a lot of time, and reducing that by a 249 good factor will certainly help. 250 251 <p>The current solver always starts from the beginning after every 252 change to a variable or the <code>occupation</code> array. It may be 253 faster to process further changes before restarting the solver. 254 255 <p> It also reevaluates constraints that cannot be affected by the 256 changes since the last evaluation (because the changed variable is not 257 involved in the constraint). It may be faster to only reevaluate 258 constraints that have been affected. However, if most constraints 259 have been affected, the overhead of recording whether a constraint is 260 affected may be larger than the performance improvement. 261 262 <p>If you are really brave, you implement a stronger solver, such as a 263 stronger implementation of alldifferent, e.g., using the methods 264 of <a href="https://www.ijcai.org/Proceedings/03/Papers/036.pdf">Lopez-Ortiz 265 et al.</a> 266 or <a href="https://cdn.aaai.org/AAAI/1994/AAAI94-055.pdf">Regin</a>. 267 268 <p>You should not pursue all of these ideas (some of which conflict 269 with each other), but select some promising ideas, implement them, 270 then optimize that implementation, and report the results. 271 272 <h2>Putting the results online</h2> 273 274 <p>You can publish your program and its documentation (e.g., the 275 presentation) by putting a project on, e.g., Github 276 or <a href="https://sr.ht/">Sourcehut</a>, and putting a web page with 277 a link to that project 278 on <code>/nfs/unsafe/httpd/ftp/pub/anton/lvas/effizienz-abgaben/2023w</code> 279 (on the g0). Alternatively, you can also create a directory there and 280 put your program and documentation there. This web page or directory 281 is visible in the web 282 through <a href="../effizienz-abgaben/2023w/">this page</a>. 283 284 <p>The publication of your solution is not graded, so publishing it is 285 completely optional. 286 287 288 <p>Project tasks from 289 [<a href="../effizienz-aufgabe02.html">WS02/03</a> | 290 <a href="../effizienz-aufgabe03.html">WS03/04</a> | <a 291 href="../effizienz-aufgabe04.html">WS04/05</a> | <a 292 href="../effizienz-aufgabe05/">WS05/06</a> | <a 293 href="../effizienz-aufgabe06/">WS06/07</a> | <a 294 href="../effizienz-aufgabe07/">WS07/08</a> | <a 295 href="../effizienz-aufgabe08/">WS08/09</a> | <a 296 href="../effizienz-aufgabe09/">WS09/10</a> | <a 297 href="../effizienz-aufgabe10/">WS10/11</a> | <a 298 href="../effizienz-aufgabe11/">WS11/12</a> | <a 299 href="../effizienz-aufgabe12/">WS12/13</a> | <a 300 href="../effizienz-aufgabe13/">WS13/14</a> | <a 301 href="../effizienz-aufgabe14/">WS14/15</a> | <a 302 href="../effizienz-aufgabe15/">WS15/16</a> | <a 303 href="../effizienz-aufgabe16/">WS16/17</a> | <a 304 href="../effizienz-aufgabe17/">WS17/18</a> | <a 305 href="../effizienz-aufgabe18/">WS18/19</a> | <a 306 href="../effizienz-aufgabe19/">WS19/20</a> | <a 307 href="../effizienz-aufgabe20/">WS20/21</a> | <a 308 href="../effizienz-aufgabe21/">WS21/22</a> | <a 309 href="../effizienz-aufgabe22/">WS22/23</a> ] 310 311 <hr> 312 <a href="../">Anton Ertl</a> 313