

# Parallel Architectures and Compilation Techniques (PACT 2010)

Vienna, Austria, September 11-15, 2010

### TECHNICAL PROGRAM



## Welcome to PACT 2010

The International Conference on Parallel Architectures and Compilation Techniques is the premier international forum for the presentation of research results in parallel computing. As a multi-disciplinary conference that brings together researchers from the hardware and software areas, PACT brings together researchers and practitioners in parallel systems to present ground-breaking research related to parallel systems ranging across instruction-level parallelism, thread-level parallelism, multiprocessor parallelism and large scale systems.

PACT 2010 hosts 3 distinguished keynote speeches on the most recent developments in parallel systems hardware, software and applications:

#### Build Watson: An Overview of DeepQA for The Jeopardy! Challenge

Eric W. Brown, IBM T.J. Watson Research Center

Computer systems that can directly and accurately answer peoples' questions over a broad domain of human knowledge have been envisioned by scientists and writers since the advent of computers themselves. Open domain question answering holds tremendous promise for facilitating informed decision making over vast volumes of natural language content.

### **Towards a Science of Parallel Programming**

Keshav Pingali, University of Texas at Austin

In spite of more than 40 years of work on parallel programming, we have few insights into how to exploit the performance potential of multicore processors. In this talk, I will argue that this problem arises largely from the limitations of the program-centric abstractions like dependence graphs that we currently use to think about parallelism. I will then propose a novel data-centric abstraction called the operator formulation of algorithms.

#### Raising the Level of Many-Core Programming with Compiler **Technology - Meeting a Grand Challenge**

Wen-mei Hwu, University of Illinois at Urbana-Champaign

Modern GPUs and CPUs are massively parallel, many-core processors. While application developers for these many-core chips are reporting 10X-100X speedup over sequential code on traditional microprocessors, the current practice of many-core programming based on OpenCL CUDA, and OpenMP puts strain on software development, testing and support teams.

# **Detailed Technical Program**

# Sunday, September 12, 2010

| 18:30 | Opening Reception                  | Strandbar Herrmann |
|-------|------------------------------------|--------------------|
|       | Address: Herrmannpark, 1030 Vienna |                    |
| 21:30 | (close to Urania)                  |                    |

| Mon          | day, September 13, 2010                                                                                                                                                                                                                                                                                                                                                                                                                                  |              |  |  |
|--------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------|--|--|
| 8:15         | Conference Opening and Welcome Festsag                                                                                                                                                                                                                                                                                                                                                                                                                   | al           |  |  |
| 8:30<br>9:30 | Keynote Address I: <i>Eric W. Brown</i> Build Watson: An Overview of DeepQA for The Jeopardy! Challenge                                                                                                                                                                                                                                                                                                                                                  | al           |  |  |
| 10:00        | Power-Aware Design Festsaa                                                                                                                                                                                                                                                                                                                                                                                                                               | al           |  |  |
|              | Power and Thermal Characterization of POWER6 System V. Jimenez, C. Boneti, F. Cazorla, R. Gioiosa, E. Kursun Cher, C. Isci, A. Buyuktosunoglu, P. Bose, M. Valero                                                                                                                                                                                                                                                                                        |              |  |  |
|              | System-level Max Power (SYMPO) – A systematic approach for escalating system-level power consumption using synthetic benchmarks  K. Ganesan, J. Jo, L. Bircher, D. Kaseridis, Z. Yu, L. John  Scalable Thread Scheduling and Global Power  Management for Heterogeneous Many-Core Architectures  J. Winter, D. Albonesi, C. Shoemaker  Dynamically Managed Multithreaded Reconfigurable  Architectures for Chip Multiprocessors  M. Watkins, D. Albonesi |              |  |  |
|              |                                                                                                                                                                                                                                                                                                                                                                                                                                                          |              |  |  |
|              |                                                                                                                                                                                                                                                                                                                                                                                                                                                          |              |  |  |
|              | Analysis and Optimization Sitzungssa                                                                                                                                                                                                                                                                                                                                                                                                                     | Sitzungssaal |  |  |
|              | Accelerating Multicore Reuse Distance Analysis with Sampling and Parallelization  D. Schuff, M. Kulkarni, V. Pai                                                                                                                                                                                                                                                                                                                                         |              |  |  |
|              | Simple and Fast Biased Locks N. Vasudevan, K. Namjoshi, S. Edwards                                                                                                                                                                                                                                                                                                                                                                                       |              |  |  |
|              | Avoiding Deadlock Avoidance<br>H. Pyla, S. Varadarajan                                                                                                                                                                                                                                                                                                                                                                                                   |              |  |  |
| 12:00        | DAFT: Decoupled Acyclic Fault Tolerance<br>Y. Zhang, J. Lee, N. Johnson, D. August                                                                                                                                                                                                                                                                                                                                                                       |              |  |  |
| 14:00        | Caches and Coherence I Festsa                                                                                                                                                                                                                                                                                                                                                                                                                            | al           |  |  |
|              | WAYPOINT: Scaling Coherence to Thousand-core<br>Architectures<br>J. Kelm, M. Johnson, S. Lumetta, S. Patel                                                                                                                                                                                                                                                                                                                                               |              |  |  |
|              | Subspace Snooping: Filtering Snoops with Operating<br>System Support<br>D. Kim, J. Ahn, J. Kim, J. Huh                                                                                                                                                                                                                                                                                                                                                   |              |  |  |
|              | Proximity Coherence for Chip Multiprocessors N. Barrow-Williams. C. Fensch. S. Moore                                                                                                                                                                                                                                                                                                                                                                     |              |  |  |

SPACE: Sharing Pattern-based Directory Coherence for Multicore Scalability

H. Zhao, A. Shriraman, S. Dwarkadas

#### Parallelization and Parallel Programming I Sitzungssaal

Feedback Driven Pipelining

M. Suleman, M. Qureshi, Khubaib, Y. Patt

Scalable Hardware Support for Conditional Parallelization Z. Li, O. Certner, J. Duato, O. Temam

Reducing Task Creation and Termination Overhead in **Explicitly Parallel Programs** 

J. Zhao, J. Shirako, V. Nandivada, V. Sarkar

MEDICS: Ultra-Portable Processing for Medical Image Reconstruction

16:00 G. Dasika, V. Robby, A. Sethia, T. Mudge, S. Mahlke

| 16:00<br>17:30 | Poster Session<br>(with coffee)                                            | Hall                |
|----------------|----------------------------------------------------------------------------|---------------------|
|                | Mayor's Reception (Heurigen) We meet at 18:15 in front of conference venue | Fuhrgassl-<br>Huber |
|                | Address: Neustift am Walde 68, 1190 Vienna                                 | Tiubei              |

| Tuesday, September 14, 2010 |                                                                                                                                                                                                                     |  |  |
|-----------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--|--|
| 8:30                        | Festsaal                                                                                                                                                                                                            |  |  |
|                             | Keynote Address II: Keshav Pingali Towards a Science of Parallel Programming                                                                                                                                        |  |  |
| 9:30                        |                                                                                                                                                                                                                     |  |  |
| 10:00                       | Heterogeneous Platforms and Frameworks Festsaal                                                                                                                                                                     |  |  |
|                             | An OpenCL Framework for Heterogeneous Multicores with Local Memory J. Lee, J. Kim, S. Seo, S. Kim, J. Park, H. Kim, T. Dao, Y. Cho, S. Seo, S. Lee, S. Cho, H. Song, J. Choi                                        |  |  |
|                             | A Software Platform for Heterogeneous Computing on<br>General-Purpose and Graphics Processors<br>J. Gummaraju, L. Morichetti, M. Houston, B. Sander, B.<br>Gaster, B. Zheng                                         |  |  |
|                             | MapCG: Writing Parallel Program Portable between CPU<br>and GPU<br>C. Hong, D. Chen, H. Lin, W. Chen, W. Zheng Criticality-<br>driven Superscalar Design Space Exploration<br>S. Navada, N. Choudhary, E. Rotenberg |  |  |
|                             | Adaptive Spatiotemporal Node Selection in Dynamic<br>Networks<br>P. Hari, J. McCabe, J. Banafato, M. Henry, K. Ko, E.<br>Koukoumidis, U. Kremer, M. Martonosi, L. Peh                                               |  |  |
|                             | Scheduling and Design Optimization Sitzungssaal                                                                                                                                                                     |  |  |
|                             | On Mitigating Memory Bandwidth Contention through Bandwidth-Aware Scheduling  D. Xu, C. Wu, P. Yew                                                                                                                  |  |  |
|                             | AKULA: A Toolset for Developing Scheduling Algorithms on<br>Multicore Systems<br>S. Zhuravlev, S. Blagodurov, A. Fedorova                                                                                           |  |  |
|                             | Criticality-driven Superscalar Design Space Exploration S. Navada, N. Choudhary, E. Rotenberg                                                                                                                       |  |  |
| 12:00                       | A Programmable Parallel Accelerator for Learning and<br>Classification<br>S. Cadambi, A. Majumdar, M. Becchi, S. Chakradhar, H.<br>Graf                                                                             |  |  |
| 14:00                       | Best Papers Festsaal                                                                                                                                                                                                |  |  |
|                             | Discovering and Understanding Performance Bottlenecks in<br>Transactional Applications<br>F. Zyulkyarov, S. Stipic, T. Harris, O. Unsal, A. Cristal, M.<br>Valero, I. Hur                                           |  |  |
|                             | Efficient Sequential Consistency using Conditional Fences C. Lin, V. Nagarajan, R. Gupta                                                                                                                            |  |  |
|                             | Partitioning Streaming Parallelism for Multi-cores: A<br>Machine Learning Based Approach<br>Z. Wang, M. O'Boyle                                                                                                     |  |  |
| 16:00                       | Handling the Problems and Opportunities Posed by Multiple<br>On-Chip Memory Controllers<br>M. Awasthi, D. Nellans, K. Sudan, R. Balasubramonian, A.<br>Davis                                                        |  |  |
| 16:15<br>18:30              | <b>Afternoon on the town</b> , a guided tour through Vienna<br>The tour starts at 16:15 in front of the ÖAW                                                                                                         |  |  |
| 18:30                       | Conference Banquet Palais Ferstel                                                                                                                                                                                   |  |  |
| 22:30                       | The guided tour ends in front of Palais Ferstel.<br>Address: Herrengasse 14, 1010 Vienna                                                                                                                            |  |  |























| 8:30                                                                                                                                                                                                                                                                        | Keynote Address III: Wen-mei Hwu                                                                                                                          | Festsaa   |
|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------|-----------|
| 9:30                                                                                                                                                                                                                                                                        | Raising the Level of Many-Core Programming with Compiler Technology – Meeting a Grand Challenge                                                           |           |
| 10:00                                                                                                                                                                                                                                                                       | Languages and Compilers                                                                                                                                   | Festsaa   |
|                                                                                                                                                                                                                                                                             | The PLUG Architecture and Compiler A. Kumar, L. De Carli, C. Estan, K. Sankaralingan                                                                      | n, S. Jha |
|                                                                                                                                                                                                                                                                             | A Model for Fusion and Code Motion in an Automa<br>Parallelizing Compiler<br>U. Bondhugula, O. Gunluk, S. Dash, L. Renganara                              |           |
| Ocelot: A Dynamic Optimization Framework for Bulk Synchronous Applications in Heterogeneous System G. Diamos, A. Kerr, S. Yalamanchili, N. Clark  An Empirical Characterization of Stream Programs a Implications for Language and Compiler Design W. Thies, S. Amarasinghe |                                                                                                                                                           |           |
|                                                                                                                                                                                                                                                                             |                                                                                                                                                           | s and its |
| 12:00                                                                                                                                                                                                                                                                       | ACM Student Research Competition Finalists selected from the poster session present their work and compete for prizes                                     | zungssaa  |
| 13:30                                                                                                                                                                                                                                                                       | Parallelization and Parallel Programming II                                                                                                               | Festsaa   |
|                                                                                                                                                                                                                                                                             | Semi-Automatic Extraction and Exploitation of Hie<br>Pipeline Parallelism Using Profiling Information<br>G. Tournavitis, B. Franke                        | rarchical |
|                                                                                                                                                                                                                                                                             | The Paralax Infrastructure: Automatic Parallelizati<br>Helping Hand<br>H. Vandierendonck, S. Rul, K. De Bosschere                                         | on With a |
|                                                                                                                                                                                                                                                                             | AM++: A Generalized Active Message Framework<br>J. Willcock, T. Hoefler, N. Edmonds, A. Lumsdaine                                                         |           |
|                                                                                                                                                                                                                                                                             | Using Thread-Local Memory Mapping to Support of Stacks in Work-Stealing Runtime Systems  I. Lee, S. Boyd-Wickizer, Z. Huang, C. Leiserson                 | Cactus    |
|                                                                                                                                                                                                                                                                             | Speculation Sitz                                                                                                                                          | zungssaa  |
|                                                                                                                                                                                                                                                                             | Speculative-Aware Execution: A Simple and Effici-<br>Technique for Utilizing Multi-Cores to Improve Sin-<br>Thread Performance<br>R. Mameesh, M. Franklin |           |
|                                                                                                                                                                                                                                                                             | The Potential of Using Dynamic Information Flow in Data Value Prediction W. Ghandour, H. Akkary, W. Masri                                                 | Analysis  |
|                                                                                                                                                                                                                                                                             | Efficient Runahead Threads<br>T. Ramirez, A. Pajuelo, O. Santana, O. Mutlu, M.                                                                            | Valero    |
| 15:30                                                                                                                                                                                                                                                                       | Energy Efficient Speculative Threads: Dynamic Th<br>Allocation in Same-ISA Heterogeneous Multicore<br>Y. Luo, V. Packirisamy, W. Hsu, A. Zhai             |           |
| 16:00                                                                                                                                                                                                                                                                       | Caches and Coherence II                                                                                                                                   | Festsaa   |
|                                                                                                                                                                                                                                                                             | SWEL: Hardware Cache Coherence Protocols to I<br>Shared Data onto Shared Caches                                                                           | Мар       |

Shared Data onto Shared Caches

S. Pugsley, J. Spjut, D. Nellans, R. Balasubramonian

ATAC: A 1000-Core Cache-Coherent Processor with On-Chip Optical Network G. Kurian, J. Miller, J. Psota, J. Michel, L. Kimerling, A.

Agarwal

Using Dead Blocks as a Virtual Victim Cache S. Khan, D. Jimenez, B. Falsafi, D. Burger

#### **Data Distribution and Tiling**

Sitzungssaal

Compiler-assisted Data Distribution for Chip Multiprocessors Y. Li, A. Abousamra, R. Melhem, A. Jones

Data Layout Transformation Exploiting Memory-Level Parallelism in Structured Grid Many-Core Applications I. Sung, J. Stratton, W. Hwu

Tiled MapReduce: Optimizing Resource Usages of Dataparallel Applications on Multicore with Tiling R. Chen, H. Chen, B. Zang

17:30 17:30 Closing Remarks

Festsaal