If you don't have any project of your own, you can choose the given project for this semester.
Implementing the given project has some advantages: You don't need to spend precious presentation time on explaining the problem, and may also be able to spend less on explaining the algorithm, and the results are directly comparable to those of other groups. A disadvantage of using the given project is that you might be repeating what others have done and you may produce worse results (especially if you choose a later presentation date). If your results are worse, this does not influence the grading, though; the grading is based on whether you demonstrate that you have applied the content of the course.
Prepare an 18-minute presentation (I recommend doing a trial run). You do not have as much time as I had in the lecture, so I recommend that you present most optimization steps only superficially, and focus on those steps in more detail that are particularly interesting, e.g., because they produced surprisingly good or suprisingly bad results.
3 17 18 19 7 1 11 16 2 5 6 9 12 4 8 14 10 13 15All lines sum up to M=38 in this magic hexagon.
/nfs/unsafe/httpd/ftp/pub/anton/lvas/effizienz-aufgabe23/
(use cp -r to copy the whole directory to your account on the
g0
), or on the web
You can compile the given
program magichex.c
with make
(you need to copy magichex.c and
the Makefile
for this. The
resulting binary is called magichex
and you can invoke it
as follows:
./magichex <n> <d> <value1> ... <valuem>The values are optional. n is the size of one side of the magic hexagon (also called the order of the magic hexagon); it is 3 for the example above. d is a parameter that influences the range of the set of consecutive numbers; the numbers are centered around d(2n-1); for the example above d=2, and the numbers are centered around 10 (from 1-19). I.e., you get the result above with
./magichex 3 2or with
make test1
. There is also a make
test2
, which tests ./magichex 3 0
, which produces
26 solutions.
If you provide values, they fill in the hexagon starting with the leftmost element of the first line, then to the right, then the next line, and so on. This eliminates a lot of search space, reducing the possible number of solutions, but also reducing the needed amount of search time. E.g, if you perform
./magichex 3 0 -9you only get the 5 solutions of the 26 mentioned above where the left element of the top row is -9.
To evaluate your optimizations, start with
perf stat -e cycles:u -e instructions:u -e branches:u -e branch-misses:u \ -e L1-dcache-load-misses:u ./magichex 4 3 14 33 30 34 39 6 24 20You can run this with an output check with
make measure
,
or without (but showing the solutions on the terminal) with make
nocheck
.
With the given program compiled with -O3 this outputs 40 solutions
(where all solutions are checked with make measure
) and
the following performance results:
310194428814 cycles:u 897820790268 instructions:u # 2.89 insn per cycle 257049961217 branches:u 2366511820 branch-misses:u # 0.92% of all branches 141455 L1-dcache-load-misses 66.160150710 seconds time elapsed 66.123002000 seconds user 0.003999000 seconds sysYour optimized program has to produce the same solutions in any order; you do not need to preserve the output of the number of visited leafs (this can be considered to be another kind of performance counter). To compare the output for correctness, process it with
./magichex ... | grep -v solution| awk '/^leafs visited:/ {printf("\0")} /^leafs visited:/,/^$/ {next} 1'| sort -z| tr '\0' '\n\n' | diff -u reference-output -
make measure
does this.
I expect that your optimizations will produce vast improvements in run-time. This may allow your program to solve larger problems while still taking <90s of CPU time (user+system). So if you improve your program, try larger problems by deleting some of the given values (starting from the right). Of course, this may produce additional solutions, so you will have to generate a new reference output (after first checking that your program still produces the given reference output for the original values) for your further work, using the same kind of canonicalization as shown above.
In your presentation, please report how many values you gave for evaluating your best version, report the cycles:u value, and the number of solutions (as a weak form of comaring the correctness with other solutions that use the same number of given values). Given the low number of cache misses, the cycles should be very representative and vary less than the run-time even in the face of clock frequency changes due to other load on the machine.
If you are using parallel programming to improve the elapsed time of your program, report the cycles:u, the CPU time and the elapsed time, but still report all solutions for a problem that does not take more than 90s of CPU time.
The numbers that have to be found (as well as the numbers that are
given) are represented in the Var
type, which works like
a variable in constraint logic programming: The value in the variable
is usually not completely known, but is within bounds (represented by
the fields lo
and hi
; note that the value
of hi
is a possible value of the variable, unlike the
convention in many other cases in programming). There is also the
array occupation
in solve()
which tells
which variables already contain which values, and that these values
are therefore unavailable to other variables (because they all have to
have different values). While following a branch of the search tree
towards a leaf, the bounds of the variables are tightened more and
more, until the bounds of a variable are the same value (then the
variable has that value), or until there is no value left for the
variable (then there is no solution in that branch of the search
tree).
These variables are involved in the following constraints:
solve()
sum()
, and in the calls to this function
inside solve()
.
lessthan()
and in the calls of this function
inside solve()
.
The two most relevant functions for understanding (and probably for
optimizing) the program are labeling()
and solve()
.
labeling()
explores the search tree by trying out one
of the possible values of a variable, and then exploring the rest of
the search tree with labeling()
. If that part of the
search tree is exhausted (whether a solution was found or not), the
next value for the variable is tried, until all values for this
variable have been exhausted. Given that the search tree grows
exponentially, there is a potential for big speedups by better
labeling.
In particular, you can choose to process the variables in a different order (the given program processes them line by line, left-to-right). A common heuristic is to process the variable with the lowest number of remaining values first. For another magic hexagon program, I have used a fixed ordering starting with the corners, because the corners are involved in two of the sum constraints with the least variables (i.e., few additional variables need to get a value to fully determine all values in the constraint), and because they are involved in the lessthan constraints.
Instead of setting a variable directly to one of its values in a labeling step, you can also bisect its range by first setting its upper bound to some middle value x, and, after exploring that branch, its lower bound to x+1. The potential advantage is that you may find that one of these subranges does not contain a solution without exploring it in more depth.
You can get additional inspiration for possible labeling heuristics from the SWI-Prolog documentation.
One other interesting aspect of the present labeling approach is
that it copies all variables before every choice it makes. This is
probably a good choice given the relatively low number of variables
and the probably large number of changes, but a more usual
implementation of (constraint) logic programming instead uses a
(value) trail stack that records changes to the variables as they are
made, and, on backtracking (returning from an
inner labeling()
call), all the values since the choice
point at hand are restored from the trail stack.
solve()
applies all constraints to reduce the possible
values of all variables as much as possible (within the limits of
looking at each constraint, one at a time). This is a polynomial
process, but it still consumes a lot of time, and reducing that by a
good factor will certainly help.
The current solver always starts from the beginning after every
change to a variable or the occupation
array. It may be
faster to process further changes before restarting the solver.
It also reevaluates constraints that cannot be affected by the changes since the last evaluation (because the changed variable is not involved in the constraint). It may be faster to only reevaluate constraints that have been affected. However, if most constraints have been affected, the overhead of recording whether a constraint is affected may be larger than the performance improvement.
If you are really brave, you implement a stronger solver, such as a stronger implementation of alldifferent, e.g., using the methods of Lopez-Ortiz et al. or Regin.
You should not pursue all of these ideas (some of which conflict with each other), but select some promising ideas, implement them, then optimize that implementation, and report the results.
You can publish your program and its documentation (e.g., the
presentation) by putting a project on, e.g., Github
or Sourcehut, and putting a web page with
a link to that project
on /nfs/unsafe/httpd/ftp/pub/anton/lvas/effizienz-abgaben/2023w
(on the g0). Alternatively, you can also create a directory there and
put your program and documentation there. This web page or directory
is visible in the web
through this page.
The publication of your solution is not graded, so publishing it is completely optional.
Project tasks from [WS02/03 | WS03/04 | WS04/05 | WS05/06 | WS06/07 | WS07/08 | WS08/09 | WS09/10 | WS10/11 | WS11/12 | WS12/13 | WS13/14 | WS14/15 | WS15/16 | WS16/17 | WS17/18 | WS18/19 | WS19/20 | WS20/21 | WS21/22 | WS22/23 ]
Name | Last modified | Size | Description | |
---|---|---|---|---|
Parent Directory | - | |||
Makefile | 21-Nov-2023 10:36 | 850 | ||
effizienz-aufgabe23.tar.gz | 21-Nov-2023 18:53 | 9.8K | ||
magichex | 20-Nov-2023 18:57 | 21K | ||
magichex.c | 20-Nov-2023 18:57 | 8.2K | ||
magichex.o | 20-Nov-2023 18:57 | 12K | ||
reference-output | 21-Nov-2023 08:28 | 14K | ||