Institut für Computersprachen
Home Page of Stefan Kral
Acceptable performance at an affordable price.
Work on the PowerPC 440 FP2 port of FFTW-GEL (for IBM's BlueGene/L supercomputer) has finished.
Added a "mission statement" (see above).
New FFTW 3 benchmarks are available here.
The SIMD-vectorizing code-generator of FFTW-GEL has become a part of
FFTW 3 (the new release of FFTW).
FFTW-GEL 1.2.1 fixes a bug in FFTW-GEL 1.2 that causes wrong results in some non-complex FFTs.
An important note
FFTW-GEL is a free software project that cannot succeed without the
active contribution/participation of its' users.
If you use FFTW-GEL, if you like/love it or don't like it/hate it, if you
have suggestions on how to improve it, or if you have encountered a bug,
please let me know.
FFTW-GEL is an architecture-specific version of FFTW.
It is free software that is covered by the
GNU General Public License Version 2.
It features optimizing code generators for several processor families
(AMD K7 and Intel Pentium 4) that include automatic SIMD
vectorization of basic blocks, an optimizer for the calculation of
effective addresses, and a lot of other cool hacks.
FFTW-GEL is available for the following architectures:
- AMD K6-2/K6-III (no longer supported/maintained)
- K6/FFTW-GEL 1.1 (single-precision only)
- AMD K7 (Duron, Athlon, Athlon XP, Athlon MP)
- K7/FFTW-GEL 1.2.1 (single-precision only)
- Intel Pentium 4
- P4/FFTW-GEL 1.2.1 (double-precision only)
- AMD64 in 32-bit mode (Opteron, Athlon 64)
- Use K7/FFTW-GEL 1.2.1 for single precision, P4/FFTW-GEL 1.2.1 for double precision calculations.
- PPC440FP2 (IBM Blue Gene/L)
- BGL/FFTW-GEL-2.1.5 (supports both single and double precision).
Compatibility with FFTW
FFTW-GEL is not a 100% drop-in replacement for FFTW, as it
requires that all data passed to it is naturally aligned (128-bit
aligned data for double-precision data on the Pentium-4, 64-bit
aligned data for single-precision data on K7-based systems).
Replacing FFTW with FFTW-GEL may cause a mild performance
degradation (on K7-based machines) or application crashes (on
Pentium-4 systems) when operating on unaligned data.
You can use fftw_malloc and fftw_free as a
replacement for malloc and free whenever you want to obtain/release
naturally-aligned memory blocks.
FFTW-GEL 1.2.1 offers the following features:
- All transforms (complex to complex, real to complex, complex to real; 1D/2D/nD) that FFTW 2.1.3 offers.
- The same library interfaces as FFTW 2.1.3.
As a result of this, migration from FFTW 2.1.3 to
FFTW-GEL is particularly easy.
- The same function interfaces as FFTW 2.1.3.
- Compiler independence. While portability is one of FFTW's
great strengths, it also limits the effective utilization of
new processor features (such as SIMD floating-point instructions)
as long as these features are not properly supported by the C compiler.
The code generator of FFTW-GEL compiles directly to assembly
language and does not require a C compiler for the compilation of
Typically, FFTW-GEL 1.2.1 is faster than FFTW 2.1.3
(in combination with gcc and the recommended options).
Also (unlike many other SIMD-optimized FFT libraries) it gives you
a significant speedup not only for relatively large transforms with
a length that is a power-of-two (eg., 1024), but also for:
- transforms having a length that is not a power-of-two
(eg., 210, 1000, 10368, ...).
- transforms having a relatively short length
(eg., 8, 11, 15, 16, 24, 30, 32, ...)
- multidimensional transforms (eg., 8x8x8, ...).
You can download FFTW-GEL packages here.
Here are some benchmark results from
FFTW-GEL was written by Stefan Kral.
FFTW was written by Matteo Frigo and Steven G. Johnson.
For more information on how to use FFTW or FFTW-GEL, please
visit the FFTW homepage at
Last update: Thu Sep 13 10:15:26 CEST 2007