# Optimal and Heuristic Global Code Motion for Minimal Spilling

Gergö Barany, Andreas Krall

{gergo,andi}@complang\_tuwien\_ac\_at



Institute of Computer Languages Vienna University of Technology



CC 2013 March 21, 2013 Solve global code motion and register allocation as an integrated problem.

Given: Scheduling for minimal spilling is good.

Hypothesis: Global code motion for minimal spilling might be good.

start: j0 := 0 a := read() loop:  $j1 := \phi(j0, j2)$ b := a + 1 j2 := j1 + b c := f(a)compare j2 < c  $d := j2 \times 2$ blt loop end: return d



start: j0 := 0 a := read() loop: <u>j1 :=  $\phi(j0, j2)$ </u> b := a + 1 loop invariant j2 := j1 + b c := f(a)compare j2 < c  $d := j2 \times 2$ blt loop end:

return d

start: j0 := 0 a := read() loop: j1 :=  $\phi(j0, j2)$ b := a + 1] loop invariant j2 := j1 + b c := f(a)compare j2 < c  $d := j2 \times 2$ partially dead blt loop end:

return d



start: j0 := 0 a := read() loop:  $j1 := \phi(j0, j2)$ b := a + 1 live range of b j2 := j1 + b c := f(a)compare j2 < c  $d := j2 \times 2$ blt loop end: return d

```
start:
  j0 := 0
  a := read()
   b := a + 1
loop:
  j1 := \phi(j0, j2)
  j2 := j1 + b
  c := f(a)
   compare j2 < c
   blt loop
end:
  d := j2 \times 2
   return d
```

# Register allocation: conflict graphs

original program



#### original program



allocation to 3 registers possible

# Register allocation: conflict graphs



after global code motion



allocation to 3 registers possible

not 3-colorable!

d

< @ >

All avoidable overlaps

| start: |                       |  |
|--------|-----------------------|--|
| i0:    | j0 := 0               |  |
| i1:    | a := read()           |  |
| loop   | p:                    |  |
| i2:    | j1 := $\phi$ (j0, j2) |  |
| i3:    | b := a + 1            |  |
| i4:    | j2 := j1 + b          |  |
| i5:    | c := f(a)             |  |
| i6:    | compare j2 < c        |  |
| i7:    | d := j2 $\times$ 2    |  |
| i8:    | blt loop              |  |
| end:   |                       |  |
| i9:    | return d              |  |

| Pair  | Overlapping placement    |  |
|-------|--------------------------|--|
| a, d  | i7 in loop               |  |
| b, c  | i3 in start              |  |
| b, d  | i3 in start, i7 in loop  |  |
| b, j0 | i3 in start              |  |
| b, j2 | i3 in start              |  |
| c, d  | i7 in 100p, i7 before i6 |  |
| d, j2 | i7 in loop               |  |

\_

< **∂** >

All avoidable overlaps

| start:       |                       |  |
|--------------|-----------------------|--|
| i0:          | j0 := 0               |  |
| i1:          | a := read()           |  |
| 100]         | <mark>p:</mark>       |  |
| i2:          | j1 := $\phi$ (j0, j2) |  |
| i3:          | b := a + 1            |  |
| i4:          | j2 := j1 + b          |  |
| i5:          | c := f(a)             |  |
| i6:          | compare j2 < c        |  |
| i7:          | d := j2 × 2           |  |
| i8: blt loop |                       |  |
| end          | :                     |  |
| i9:          | return d              |  |

-----

| Pair  | Overlapping placement    |  |
|-------|--------------------------|--|
| a, d  | i7 in loop               |  |
| b, c  | i3 in start              |  |
| b, d  | i3 in start, i7 in loop  |  |
| b, j0 | i3 in start              |  |
| b, j2 | i3 in start              |  |
| c, d  | i7 in 100p, i7 before i6 |  |
| d, j2 | i7 in loop               |  |

i7 in loop: overlap!

All avoidable overlaps

| start: |                      |  |
|--------|----------------------|--|
| i0:    | j0 := 0              |  |
| i1:    | a := read()          |  |
| 100]   | p:                   |  |
| i2:    | j1 := $\phi(j0, j2)$ |  |
| i3:    | b := a + 1           |  |
| i4:    | j2 := j1 + b         |  |
| i5:    | c := f(a)            |  |
| i6:    | compare j2 < c       |  |
| i8:    | blt loop             |  |
| end    | :                    |  |
| i7:    | d := j2 $\times$ 2   |  |
| i9:    | return d             |  |

----

| Pair  | Overlapping placement    |  |
|-------|--------------------------|--|
| a, d  | i7 in loop               |  |
| b, c  | i3 in start              |  |
| b, d  | i3 in start, i7 in loop  |  |
| b, j0 | i3 in start              |  |
| b, j2 | i3 in start              |  |
| c, d  | i7 in 100p, i7 before i6 |  |
| d, j2 | i7 in loop               |  |

i7 not in loop: no overlap

start: i0: j0 := 0i1: a := read() i3: b := a + 1 loop: i2: j1 :=  $\phi(j0, j2)$ i4: j2 := j1 + b i5: c := f(a)i6: compare j2 < c i7: d :=  $j_2 \times 2$ i8: blt loop end: i9: return d

All avoidable overlaps

| Pair        | Overlapping placement    |  |
|-------------|--------------------------|--|
| a, d        | i7 in loop               |  |
| b, c        | i3 in start              |  |
| b, <b>d</b> | i3 in start, i7 in loop  |  |
| b, j0       | i3 in start              |  |
| b, j2       | i3 in start              |  |
| c, d        | i7 in 100p, i7 before i6 |  |
| d, j2       | i7 in loop               |  |

i3 in start, i7 in loop: overlap!

start: i0: j0 := 0i1: a := read()loop: i2: j1 :=  $\phi(j0, j2)$ i3: b := a + 1 i4: j2 := j1 + b i5: c := f(a)i6: compare j2 < c i7:  $d := j2 \times 2$ i8: blt loop end: i9: return d

All avoidable overlaps

| Pair        | Overlapping placement    |  |
|-------------|--------------------------|--|
| a, d        | i7 in loop               |  |
| b, c        | i3 in start              |  |
| b, <b>d</b> | i3 in start, i7 in loop  |  |
| b, j0       | i3 in start              |  |
| b, j2       | i3 in start              |  |
| c, d        | i7 in 100p, i7 before i6 |  |
| d, j2       | i7 in loop               |  |

i3 not in start: no overlap

All avoidable overlaps

| sta  | rt:                  |
|------|----------------------|
| i0:  | j0 := 0              |
| i1:  | a := read()          |
| i3:  | b := a + 1           |
| 100] | p:                   |
| i2:  | j1 := $\phi(j0, j2)$ |
| i4:  | j2 := j1 + b         |
| i5:  | c := f(a)            |
| i6:  | compare j2 < c       |
| i8:  | blt loop             |
| end  | :                    |
| i7:  | d := j2 $\times$ 2   |
| i9:  | return d             |

----

| Pair        | Overlapping placement    |  |
|-------------|--------------------------|--|
| a, d        | i7 in loop               |  |
| b, c        | i3 in start              |  |
| b, <b>d</b> | i3 in start, i7 in loop  |  |
| b, j0       | i3 in start              |  |
| b, j2       | i3 in start              |  |
| c, d        | i7 in 100p, i7 before i6 |  |
| d, j2       | i7 in loop               |  |

i7 not in loop: no overlap

Conflict graph with special edges for avoidable overlaps. Allocate to different registers if possible.



Conflict graph with special edges for avoidable overlaps. Allocate to different registers if possible.



5 registers: easy allocation

∢ 🗗 🕨

Conflict graph with special edges for avoidable overlaps. Allocate to different registers if possible.



4 registers: place instruction i7 in block end to avoid overlaps

< 🗗 >

Conflict graph with special edges for avoidable overlaps. Allocate to different registers if possible.



3 registers: place i3 in loop and i7 in end

ł

| Pair    | Overlapping placement     |
|---------|---------------------------|
| v1, v9  | instruction 23 in block 0 |
| v9, v10 | instruction 23 in block 1 |

| Pair    | Overlapping placement     |  |
|---------|---------------------------|--|
| v1, v9  | instruction 23 in block 0 |  |
| v9, v10 | instruction 23 in block 1 |  |
|         | must be in block 0 or 1!  |  |
|         |                           |  |
|         |                           |  |

| Pair    | Overlapping placement     | Overlapping schedule   |
|---------|---------------------------|------------------------|
| v1, v9  | instruction 23 in block 0 |                        |
| v9, v10 | instruction 23 in block 1 |                        |
| p61, v4 |                           | instr 3 before instr 0 |
|         |                           |                        |
|         |                           |                        |

| Overlapping placement     | Overlapping schedule      |
|---------------------------|---------------------------|
| instruction 23 in block 0 |                           |
| instruction 23 in block 1 |                           |
|                           | instr 3 before instr 0    |
|                           | instr 0 before instr 3    |
|                           |                           |
|                           | instruction 23 in block 0 |

| Pair    | Overlapping placement     | Overlapping schedule   |
|---------|---------------------------|------------------------|
| v1, v9  | instruction 23 in block 0 |                        |
| v9, v10 | instruction 23 in block 1 |                        |
| p61, v4 |                           | instr 3 before instr 0 |
| v3, v2  |                           | instr 0 before instr 3 |
|         | ÷                         | cyclic dependence!     |

| Pair    | Overlapping placement     | Overlapping schedule   |
|---------|---------------------------|------------------------|
| v1, v9  | instruction 23 in block 0 |                        |
| v9, v10 | instruction 23 in block 1 |                        |
| p61, v4 |                           | instr 3 before instr 0 |
| v3, v2  |                           | instr 0 before instr 3 |
|         | :                         |                        |
|         | ,                         |                        |

→ Must select a subset of reuses.

#### Which subset to choose?

To minimize spilling, choose valid subset with largest total savings in spill costs.

#### Intuition: Hypergraph Maximum Independent Set

Hypergraph  $\langle V, H \rangle$  with:

- Vertices V: reuse candidate pairs
- Hyperedges H: minimal conflicting sets

Select maximum subset of V that does not contain any  $h \in H$ .

Idea: Avoid overlaps with larger spill costs.

#### Greedy heuristic selection

- Sort candidates by descending spill costs
- For each candidate:
  - If no conflict:
    - Add candidate to selected set
    - Commit to code motions for candidate

If greedy approach causes too many overlaps: use given schedule.

< 🗗 >

Can we do better than the greedy heuristics?

#### Integer linear programming formulation

Variables:

select c Select candidate c with savings wc
place<sub>i,b</sub> Place instruction i in block b
.... Variables for relative ordering of instructions
Objective function:

$$\mathsf{maximize} \sum_{c} w_c select_c$$

Can we do better than the greedy heuristics?

#### Integer linear programming formulation

Variables:

select c Select candidate c with savings wc
place<sub>i,b</sub> Place instruction i in block b
.... Variables for relative ordering of instructions
Objective function:

$$maximize \sum_{c} w_{c}select_{c} + \sum_{i} \sum_{b} place_{i,k}$$

# **CPLEX** solver time



G. Barany, A. Krall (TU Vienna) Global Code Motion for Minimal Spilling

< **∂** >

# Results: Greedy heuristics



< @ >

# Results: Optimal (ILP)



< 🗗 >

#### Some research directions

• More freedom for code motion:

maximize 
$$\sum_{c} w_{c} select_{c} + \beta \sum_{i} \sum_{b} \beta_{b} place_{i,b}$$

- Impact of solver time limit
- Other heuristics

- Integrate code motion and register allocation by letting the allocator choose necessary code motions.
- Speedups up to 4 % 🙂
- ... but no improvement on average 😇

Conclusion: Code motion for minimal spilling seems too restrictive.

- Integrate code motion and register allocation by letting the allocator choose necessary code motions.
- Speedups up to 4 % 🙂
- ... but no improvement on average 😇

#### Conclusion: Code motion for minimal spilling seems too restrictive.

#### Thank you!

This work was supported by the Austrian Science Fund (Fonds zur Förderung der wissenschaftlichen Forschung) under contract P21842, *Optimal Code Generation for Explicitly Parallel Processors*.

< (P) >