This is an outdated home-page! Please visit my new one http://www.cs.usyd.edu.au/~scholz .

TU-Logo

Embedded Systems

Institut für Computersprachen E185/1
Home

Research

    Embedded
    Systems

    PBQP

    PDFA

    Symbolic
    Analysis



Code Generation Techniques for Embedded Systems

The number of embedded systems has been exponentially increasing. Nevertheless, most of these systems are still programmed in assembly language due to the lack of optimizing compilers. With the broadening field of applications for embedded systems, software costs has been exploding. Therefore, the shift from assembly language to high level languages is indispensable.

Standard compilation techniques often fail to pursue different compilation objectives for embedded systems, which normally do not have RISC architectures. Therefore, new compilation techniques must be developed. Some of the objectives are listed below:

  • Rarely executed code should have a small footprint to keep the memory size as small as possible.
  • Frequently executed code sequences should be translated in fast assembly code.
  • Power consumption is an issue for systems that are powered by batteries.
  • Hardware features of embedded system processors such as addressing modes are to be utilized.

Most of these compiler objectives lead to NP complete problems, which need to be solved efficiently. Part of this work is to develop an efficient and effective solver for the Partitioned Boolean Quadratic Programming (PBQP) Problem.

For following topics work has been conducted and published in refereed articles:

My work in the field of Embedded Systems is a collaboration with Erik Eckstein from ATAIR Software GmbH and the CD-Lab for Compilation Techniques for Embedded Processors of the Technical University of Vienna, which is sponsored by the Austrian Science Fund and Austrian Industry. For developing our new code generation techniques we use as a testbed the Open Compiler Environment from ATAIR.

Address Mode Selection

The addressing mode selection (AMS) problem deals with the generation of addressing modes for a given offset layout and a given address register assignment. It can be seen as a global code-selection technique for DSP-processors for fully utilizing the address generation unit -- even for complex address modes.

(1) loop { 
(2)   ar0++ 
      if (c) { 
(3)     *ar0 
      } else { 
(4)     *(ar0 + 1) 
      } 
(5)   *(ar0 + 2) 
(6) }

AMS

(1) ar0++ 
(2) loop { 
      if (c) { 
(3)     *(ar0 += 2) 
      } else { 
(4)     ar0++ 
(5)     *ar0++ 
      } 
(6)   *ar0- 
    } 
(7) ar0- 

Consider the pseudo code of the example above. The goal of AMS is to optimize the addressing modes of register ar0. The underlying target architecture supports indirect addressing mode *(ar), post modification addressing mode *(ar0++), *(ar0-),*(ar0+=c), and indexing addressing mode *(ar+c). The optimal output program for minimal execution time is shown in the same figure on the right-hand side. The add instruction can be moved out of the loop and post modification addressing modes can be used instead of indexing addressing modes in line (5) and (6). An explicit add instruction for the address register must be prior to the loop and in line (4), but this takes less cycles than the code given in the original program.

AMS is an separable problem and can be performed for every address register independent of the other ones. The decision, which address mode is the cheapest, is based on a cost model (code size, speed, etc.). In general AMS is NP complete.

The basic transformation is a local offset shift between statements of a program. Let us consider two subsequent statements stmt1;stmt2. Both statements use address register ar for accessing memory and modify the value of the address register. Now, the principle idea of AMS is to shift the value of ar between those two statements. We replace stmt1 by stmt1' and statement stmt2 by stmt2'. The semantics of stmt1' and stmt2' are given by stmt1;ar+=delta; and ar-=delta;stmt2, respectively. Depending on the offset value delta, cheaper addressing modes might be employed for both statements. This is achieved by replacing the increment of the address register and the prior address operation by a cheaper addressing mode (also known as strength reduction).

Register Allocation

For irregular architectures global register allocation is a challenging problem. The graph-coloring analogy of traditional approaches  does not match the needs of register allocation for embedded systems, which feature non-orthogonal instruction sets and small register files. The most cunning problems are that (1) hardware restrictions are too conservatively modeled and graph-coloring results in poor allocations,   (2) often irregularities increase the complexity of the interference graph which badly affects graph-coloring, (3) only a limited subset of restrictions can be modeled - but not all.

Our new approach is based on PBQP. PBQP allows to parameterize the register allocation according to the architecture. A set of cost functions model the interference constraints of the pseudo variables and the architecture. With profile information a better trade off, which pseudo variables have to be spilled, can be achieved. 

Embedded system quite often consists of various register classes as depicted in the diagram below. In this example architecture we have four register classes where some of the registers are shared. Namely, it has four address registers in the address register bank, two index registers in the index register bank, and two 64bit floating point register. Moreover, each 64bit floating point register can either be accessed as one 64bit register or as two 32bit floating point registers.

Modeling register banks with overlapping memory and constraints which are imposed by the instruction set, e.g indexed modes where only certain register pairs are allowed, are very challenging. 

Instruction Selection

Embedded system architectures feature highly irregular instruction sets and complex data paths. Traditional code generation techniques have difficulties to fully utilize the features of such architectures and typically result in inefficient code. In this work we investigate an instruction selection technique that uses static single assignment graphs (SSA-graphs) as underlying data structure for selection. Patterns defined as graph grammar guide the instruction selection to find (nearly) optimal results.

References

[5] Code Instruction Selection based on SSA-Graphs. E. Eckstein, O. König, and B. Scholz. In Proc. of International Workshop on Software and Compilers for Embedded Systems (SCOPES'03), Lecture Notes in Computer Science (LNCS), Vol. 2826, pp. 49-65, Springer Press, Vienna, Austria, September 2003. (best paper award). [pdf]

[4] Graph Coloring vs. Optimal Register Allocation for Optimizing Compilers. U. Hirnschrott, A. Krall, and B. Scholz.  In Proc. of the Joint Modular Languages Conference (JMLC'03), Lecture  Notes in Computer Science (LNCS), Vol. 2789, pp. 202-213, Springer Press, Klagenfurt, Austria, August 2003. [pdf]

[3] Address Mode Selection. E. Eckstein and B. Scholz. In Proc. of  International Symposium on Code Generation and Optimization (CGO'03), pp. 337-346, IEEE Press,  San Francisco, California, USA, March 2003. [pdf]

[2] Register Liveness Analysis for Optimizing Binary Translation. M. Probst, A. Krall, and B. Scholz. In Proc. of Working Conference on Reverse Engineering (WCRE'02), pp. 35-44, IEEE Press, Richmond, Virginia, USA, October/November 2002. [ps.gz]

[1] Register Allocation for Irregular Architectures. B. Scholz and E. Eckstein. In Proc. of Joint-Conference on Languages, Compilers, and Tools for Embedded Systems and Software and Compilers for Embedded Systems (LCTES/SCOPES'02), pp. 139-148, ACM Press, Berlin, Germany, June 2002. [ps.gz]

[TU Wien] [Institut für Computersprachen] [Home] last update: 13 Feb 2004