FWF P19939-N13: "Compiler Technology for Top-Performance Signal Transforms"

General Overview

This page presents the FWF stand-alone project P19939-N13 "Compiler Technology for Top-Performance Signal Transforms", which was carried out by Stefan Kral between August 2007 and February 2011. Project funding was provided by Austria's central funding organization for basic research, the Austrian Science Fund ("Fonds zur Förderung der wissenschaftlichen Forschung", FWF).

This project focused on the development of compilation techniques for speeding up top-performance signal transform codes running on Intel64/AMD64 processors. All compilation and optimization techniques were implemented in NXyn (pronounced "neck-sin"), a synergistic compiler for Intel64/AMD64 processors.

The NXyn compiler comprises two main components:

This structure allows NXyn to process hand-written assembly code and to cooperate with proprietary, closed-source compilers like the state-of-the-art Intel C compiler.

To cooperate with existing compiler technology and state-of-the-art signal transform libraries in the best way, NXyn focuses on compiler backend optimizations. Its optimization methods are fully orthogonal to established optimization techniques present in modern C compilers. In particular, NXyn features:

Target Platforms

NXyn is exclusively available for Intel64/AMD64 processors running the GNU/Linux operating system in 64-bit mode. Processors based on the following micro-architectures were used during for development and testing: In addition to these processors, NXyn is likely to be useful for other Intel64/AMD64 processors, including Intel Core i5 "Clarkdale", Intel Core i7 "Nehalem", Intel Atom, and VIA Nano "Isaiah".

Target Codes

Code optimizations implemented in NXyn particularly benefit program codes comprising long basic blocks.

NXyn works both with scalar code and SIMD code (Intel SSE, Intel SSE2, Intel SSE3, Intel SSSE3, AMD SSE4a, Intel SSE4.1, and Intel SSE4.2), equally supporting integer and floating-point calculations.

Performance Results

Compiling the widely used discrete Fourier transform library FFTW with NXyn (in combination with the Intel C compiler version 11.1) consistently minimizes both run time and code size.

The following performance diagrams show that NXyn consistently improves the performance of FFTW routines running on Intel64 and AMD64 processors. All measurements have been performed using a single processor core.

Above plots compare three configurations:

As the performance plots show, NXyn gives significant performance improvements for both Intel and AMD processors, for different problem sizes, and for different instruction sets (SSE, SSE2). More performance plots featuring other problem sizes and problem types are available here.

To maximize FFTW performance, consider generating a larger set of codelets than the one included in the standard FFTW distribution. Information on how to do this is available on the FFTW web page here.


The following publications are related to details of the NXyn compiler and the signal transform specific compilation techniques that it implements.
BlueGene/L applications: Parallelism On a Massive Scale (2008)
B. R. de Supinski, M. Schulz, V. V. Bulatov, W. Cabot, B. Chan, A. W. Cook, E. W. Draeger, J. N. Glosli, J. A. Greenough, K. Henderson, A. Kubota, S. Louis, B. J. Miller, M. V. Patel, T. E. Spelce, F. H. Streitz, P. L. Williams, R. K. Yates, A. Yoo, G. Almasi, G. Bhanot, A. Gara, J. A. Gunnels, M. Gupta, J. Moreira, J. Sexton, B. Walkup, C. Archer, F. Gygi, T. C. Germann, K. Kadau, P. S. Lomdahl, C. Rendleman, M. L. Welcome, W. McLendon, B. Hendrickson, F. Franchetti, S. Kral, J. Lorenz, C. W. Ueberhuber, E. Chow, and Ü. Çatalyürek.
In the International Journal of High Performance Computing Applications, Volume 22, No. 1, Spring 2008, pages 33-51.
Smaller and faster Intel SSE Code (2011)
Stefan Kral
Submitted to Euro-Par 2011 -- International Conference on Parallel and Distributed Computing.


NXyn is open-source software available under the GNU General Public License (GPL), version 2.

The current version of NXyn is available here (release notes).

Information about installing NXyn is available here.

FFTW 3.2.2 pre-compiled with Intel icc and NXyn is available here (double precision) and here (single precision).

Last update: Sun Jun 26 18:44:18 CEST 2011