Index of /anton/lvas/effizienz-aufgabe23

Project for the course Efficient Programs WS23/24

You can choose an arbitrary project.

If you don't have any project of your own, you can choose the given project for this semester.

Implementing the given project has some advantages: You don't need to spend precious presentation time on explaining the problem, and may also be able to spend less on explaining the algorithm, and the results are directly comparable to those of other groups. A disadvantage of using the given project is that you might be repeating what others have done and you may produce worse results (especially if you choose a later presentation date). If your results are worse, this does not influence the grading, though; the grading is based on whether you demonstrate that you have applied the content of the course.

Prepare an 18-minute presentation (I recommend doing a trial run). You do not have as much time as I had in the lecture, so I recommend that you present most optimization steps only superficially, and focus on those steps in more detail that are particularly interesting, e.g., because they produced surprisingly good or suprisingly bad results.

Given Project: Magic Hexagon

Background

In a Magic Hexagon all the numbers in each row, in each of the three directions sum to the same number M; in addition, in our Magic Hexagons, all numbers are different, and they are all from a set of consecutive numbers. E.g.,

     3  17  18
  19   7   1  11
16   2   5   6   9
  12   4   8  14
    10  13  15

All lines sum up to M=38 in this magic hexagon.

Given Program

You can find the given program on the g0 in /nfs/unsafe/httpd/ftp/pub/anton/lvas/effizienz-aufgabe23/ (use

cp -r to copy the whole directory to your account on the
g0

), or on the web You can compile the given program magichex.c with make (you need to copy magichex.c and the Makefile for this. The resulting binary is called magichex and you can invoke it as follows:

  ./magichex <n> <d> <value1> ... <valuem>

The values are optional. n is the size of one side of the magic hexagon (also called the order of the magic hexagon); it is 3 for the example above. d is a parameter that influences the range of the set of consecutive numbers; the numbers are centered around d(2n-1); for the example above d=2, and the numbers are centered around 10 (from 1-19). I.e., you get the result above with

  ./magichex 3 2

or with make test1. There is also a

make
test2

, which tests ./magichex 3 0, which produces 26 solutions.

If you provide values, they fill in the hexagon starting with the leftmost element of the first line, then to the right, then the next line, and so on. This eliminates a lot of search space, reducing the possible number of solutions, but also reducing the needed amount of search time. E.g, if you perform

  ./magichex 3 0 -9

you only get the 5 solutions of the 26 mentioned above where the left element of the top row is -9.

To evaluate your optimizations, start with

  perf stat -e cycles:u -e instructions:u -e branches:u -e branch-misses:u \
  -e L1-dcache-load-misses:u ./magichex 4 3 14 33 30 34 39 6 24 20

You can run this with an output check with make measure, or without (but showing the solutions on the terminal) with

make
nocheck

With the given program compiled with -O3 this outputs 40 solutions (where all solutions are checked with make measure) and the following performance results:

 310194428814      cycles:u             
 897820790268      instructions:u        # 2.89  insn per cycle
 257049961217      branches:u
   2366511820      branch-misses:u       # 0.92% of all branches
       141455      L1-dcache-load-misses

 66.160150710 seconds time elapsed

 66.123002000 seconds user
  0.003999000 seconds sys

Your optimized program has to produce the same solutions in any order; you do not need to preserve the output of the number of visited leafs (this can be considered to be another kind of performance counter). To compare the output for correctness, process it with

  ./magichex ... |
  grep -v solution|
  awk '/^leafs visited:/ {printf("\0")} /^leafs visited:/,/^$/ {next} 1'|
  sort -z|
  tr '\0' '\n\n' |
  diff -u reference-output -

make measure does this.

I expect that your optimizations will produce vast improvements in run-time. This may allow your program to solve larger problems while still taking <90s of CPU time (user+system). So if you improve your program, try larger problems by deleting some of the given values (starting from the right). Of course, this may produce additional solutions, so you will have to generate a new reference output (after first checking that your program still produces the given reference output for the original values) for your further work, using the same kind of canonicalization as shown above.

In your presentation, please report how many values you gave for evaluating your best version, report the cycles:u value, and the number of solutions (as a weak form of comaring the correctness with other solutions that use the same number of given values). Given the low number of cache misses, the cycles should be very representative and vary less than the run-time even in the face of clock frequency changes due to other load on the machine.

If you are using parallel programming to improve the elapsed time of your program, report the cycles:u, the CPU time and the elapsed time, but still report all solutions for a problem that does not take more than 90s of CPU time.

Program explanation and suggestions

This section explains how the program works and suggests some possible improvements.

The numbers that have to be found (as well as the numbers that are given) are represented in the Var type, which works like a variable in constraint logic programming: The value in the variable is usually not completely known, but is within bounds (represented by the fields lo and hi; note that the value of hi is a possible value of the variable, unlike the convention in many other cases in programming). There is also the array occupation in solve() which tells which variables already contain which values, and that these values are therefore unavailable to other variables (because they all have to have different values). While following a branch of the search tree towards a leaf, the bounds of the variables are tightened more and more, until the bounds of a variable are the same value (then the variable has that value), or until there is no value left for the variable (then there is no solution in that branch of the search tree).

These variables are involved in the following constraints:

An alldifferent constraint involving all the variables. It ensures that the variables all get different values. This constraint does not occur explicitly in the program, but is expressed in the first part of solve()
Sum constraints v1+...+vn=M for all the lines in all three directions through the hexagon. You can find them as function sum(), and in the calls to this function inside solve().
Lessthan constraints (vx<vy) between the corners eliminate symmetric solutions. You can find them in the function lessthan() and in the calls of this function inside solve().

The two most relevant functions for understanding (and probably for optimizing) the program are labeling() and solve().

labeling() explores the search tree by trying out one of the possible values of a variable, and then exploring the rest of the search tree with labeling(). If that part of the search tree is exhausted (whether a solution was found or not), the next value for the variable is tried, until all values for this variable have been exhausted. Given that the search tree grows exponentially, there is a potential for big speedups by better labeling.

In particular, you can choose to process the variables in a different order (the given program processes them line by line, left-to-right). A common heuristic is to process the variable with the lowest number of remaining values first. For another magic hexagon program, I have used a fixed ordering starting with the corners, because the corners are involved in two of the sum constraints with the least variables (i.e., few additional variables need to get a value to fully determine all values in the constraint), and because they are involved in the lessthan constraints.

Instead of setting a variable directly to one of its values in a labeling step, you can also bisect its range by first setting its upper bound to some middle value x, and, after exploring that branch, its lower bound to x+1. The potential advantage is that you may find that one of these subranges does not contain a solution without exploring it in more depth.

You can get additional inspiration for possible labeling heuristics from the SWI-Prolog documentation.

One other interesting aspect of the present labeling approach is that it copies all variables before every choice it makes. This is probably a good choice given the relatively low number of variables and the probably large number of changes, but a more usual implementation of (constraint) logic programming instead uses a (value) trail stack that records changes to the variables as they are made, and, on backtracking (returning from an inner labeling() call), all the values since the choice point at hand are restored from the trail stack.

solve() applies all constraints to reduce the possible values of all variables as much as possible (within the limits of looking at each constraint, one at a time). This is a polynomial process, but it still consumes a lot of time, and reducing that by a good factor will certainly help.

The current solver always starts from the beginning after every change to a variable or the occupation array. It may be faster to process further changes before restarting the solver.

It also reevaluates constraints that cannot be affected by the changes since the last evaluation (because the changed variable is not involved in the constraint). It may be faster to only reevaluate constraints that have been affected. However, if most constraints have been affected, the overhead of recording whether a constraint is affected may be larger than the performance improvement.

If you are really brave, you implement a stronger solver, such as a stronger implementation of alldifferent, e.g., using the methods of Lopez-Ortiz et al. or Regin.

You should not pursue all of these ideas (some of which conflict with each other), but select some promising ideas, implement them, then optimize that implementation, and report the results.

Putting the results online

You can publish your program and its documentation (e.g., the presentation) by putting a project on, e.g., Github or Sourcehut, and putting a web page with a link to that project on /nfs/unsafe/httpd/ftp/pub/anton/lvas/effizienz-abgaben/2023w (on the g0). Alternatively, you can also create a directory there and put your program and documentation there. This web page or directory is visible in the web through this page.

The publication of your solution is not graded, so publishing it is completely optional.

Project tasks from [WS02/03 | WS03/04 | WS04/05 | WS05/06 | WS06/07 | WS07/08 | WS08/09 | WS09/10 | WS10/11 | WS11/12 | WS12/13 | WS13/14 | WS14/15 | WS15/16 | WS16/17 | WS17/18 | WS18/19 | WS19/20 | WS20/21 | WS21/22 | WS22/23 ]

Anton Ertl

Name	Last modified	Size

Parent Directory		-
Makefile	21-Nov-2023 10:36	850
effizienz-aufgabe23.tar.gz	21-Nov-2023 18:53	9.8K
magichex	20-Nov-2023 18:57	21K
magichex.c	20-Nov-2023 18:57	8.2K
magichex.o	20-Nov-2023 18:57	12K
reference-output	21-Nov-2023 08:28	14K

Apache/2.2.22 (Debian) DAV/2 mod_fcgid/2.3.6 PHP/5.4.36-0+deb7u3 mod_python/3.3.1 Python/2.7.3 mod_ssl/2.2.22 OpenSSL/1.0.1e mod_perl/2.0.7 Perl/v5.14.2 Server at www.complang.tuwien.ac.at Port 443