The goal of this project is the development of an optimizing C compiler for the Vector Signal Processor (VSP) by On Demand Microelectronics. The VSP design is based on VLIW principles and offers SIMD as well as MIMD instructions. A global unit is used for control flow instructions while a set of independent arithmetic units (slices) are used for data processing. Depending on the application, a VSP may offer 4 to 16 slices, each of them having separate data memories and registers. Communication and data exchange can be handled by several bus systems connecting the slices.
The importance of System-on-Chip and System-in-Package solutions in the domain of embedded systems was steadily increasing during the last years. Due to the rising complexity of embedded applications and irregular processor architectures, highly optimizing compilers are needed to meet the stringent chip area and power dissipation requirements of such platforms. The energy consumption of current processors is dominated by the dynamic power dissipation, which can be reduced largely by minimizing the number of memory accesses, minimizing execution cycles, and minimizing switching activities on buses. This project contributes improvements on register allocation for an irregular architecture which reduce memory accesses and execution cycles, and a post-pass code optimization for minimizing the dynamic switching on the instruction memory bus. The techniques in this project are implemented in the context of the xDSPcore architecture which will be introduced shortly.
To improve the overall performance, many of the modern advanced digital signal processors are equipped with on-chip multiple data memory banks which can be accessed in parallel in one instruction. In order to effectively exploit this architectural feature, the compiler must partition program variables between the memory banks appropriately - two parallel memory accesses always must take place on different memory banks. We attempt to resolve this problem by applying an algorithm which operates on the high-level intermediate representation, independent on the target machine. The partitioning scheme is based on the concepts of the interference graph which is constructed utilizing the control flow, data flow, and alias information. Partitioning of the interference graph is modeled as a Max Cut problem. The variable partitioning algorithm has been designed as an optional optimization phase integrated into the C compiler for the digital signal processor xDSPcore. The experimental results demonstrate that our partitioning algorithm finds a fairly good assignment of variables to memory banks.
Embedded processors for media applications usually have SIMD
SIMD instructions provide a form of vectorization where a large
machine word is viewed as a vector of subwords and the same operation is
performed on all subwords in parallel.
Systematic usage of SIMD instructions
can significantly improve program performance.
Usually each memory access must
be aligned with the instruction's data access size.
With C becoming the dominant
language for programming embedded devices, there is a clear need for C
compilers which optimize the use of SIMD instructions.
An important problem in designing such compilers is the question of
determining whether a C pointer is aligned,
i.e., whether it refers to the beginning of a machine word.
In this project, we describe a method which determines the alignment of pointers at compile time. The alignment information is used to reduce the number of dynamic alignment checks and the overhead incurred by them. Our method uses an interprocedural analysis which analyzes pointer values propagated through function calls. The effectiveness of our method is substantiated by experimental results which show that the alignments of about 50% of the pointers can typically be statically determined.
A decompiler is a program which translates an executable machine code program into a high-level programming language. Decompilers have many important applications in software engineering. They facilitate reverse engineering of legacy code, and the migration of software to new platforms. Our approach includes efficient retargetable static analyses and program optimizations to overcome the shortcomings of state-of-the-art decompilers. In particular, we develop VLIW unscheduling algorithms, software depipeling, construction of program control flow graphs and memory analyses including alias analysis.
As optimizations were introduced into compilers more than 30 years ago, it was recognized that the debugging of optimized programs is difficult. During the time the development of sophisticated optimizations methods also increased the difficulty of debugging these programs. But the demand for debuggers which can handle optimized code also increased because most of the production compilers uses optimizations. Especially for embedded processors, were nowadays also high level languages are used, these optimizations are very important for the compilers. These optimizations may be the only possibilities to achieve the required execution times for a given processor. Current developments in processor design, e.g. VLIW architectures, require that the compiler is able to utilize these additional features. For example, optimizations such as code reordering, software pipelining or loop transformations are used to achieve an high amount of parallelism for such architectures.
For the user debugging of optimized code is difficult, because of the absence of debugging tools which support optimized code. Debugging the unoptimized program is no solution. An optimized program can have a different behavior as the unoptimized version. The reason for the different semantic behavior may be an error in the application which is uncovered by the optimization or an error in the optimizer. An example for such an application error is the use an array outside the boundaries. Optimization can change the data layout. The optimizer itself can use an unsafe optimization or can have implementation errors.