Sat, Sep 11 | Sun, Sep 12 | Mon, Sep 13 | Tue, Sep 14 | Wed, Sep 15 | |||||
---|---|---|---|---|---|---|---|---|---|
Opening & Welcome | 8:15 | ||||||||
8:30 | GPUScA WPABA Ct |
PMEA | Keynote Address I | Keynote Address II | Keynote Address III | 8:30 |
|||
Coffee | Coffee | Coffee | 9:30 | ||||||
10:00 | Coffee | Coffee | Power-Aware Design | Analysis & Optimization | Heterogeneous Platforms & Frameworks | Scheduling & Design Optimization | Languages & Compilers | Student Research Competition | 10:00 |
10:30 | GPUScA WPABA Ct |
PMEA | |||||||
12:00 | Lunch | Lunch | Lunch (12:00–14:00) | Lunch (12:00–14:00) | Lunch | 12:00 | |||
1:30 | GPUScA Ct |
Cetus | Parallelization & Parallel Programming II | Speculation | 1:30 | ||||
Caches & Coherence I | Parallelization & Parallel Programming I | Best Papers | |||||||
3:00 | Coffee | Coffee | |||||||
3:30 | GPUScA Ct |
Cetus | Coffee | 3:30 | |||||
Posters & Coffee | Afternoon on the town (tour starts at 4:15) |
Caches & Coherence II | Data Distribution & Tiling | 4:00 | |||||
5:00 | |||||||||
Closing Remarks | 5:30 | ||||||||
6:00 | |||||||||
Opening Reception | Mayor's Reception | Banquet | |||||||
For the workshop and tutorial program please see the Workshops & Tutorials page.
Sunday, September 12, 2010 | ||||||
---|---|---|---|---|---|---|
6:30 | Opening Reception at the Strandbar Herrmann Address: Herrmannpark (below the Urania), 1030 Vienna – view the map |
18:30 | ||||
9:30 | 21:30 | |||||
Monday, September 13, 2010 | ||||||
8:15 | Festsaal Conference Opening and Welcome |
8:15 | ||||
8:30 | Festsaal Keynote Address IBuild Watson: An Overview of DeepQA for The Jeopardy! Challenge Eric W. Brown, IBM T.J. Watson Research Center Session Chair: Michael Gschwind, IBM Systems and Technology Group |
8:30 | ||||
9:30 | Coffee | 9:30 | ||||
10:00 | Festsaal Session 1aPower-Aware Design Session Chair: Sandhya Dwarkadas, University of Rochester |
Sitzungssaal Session 1bAnalysis and Optimization Session Chair: Andreas Krall, Technische Universität Wien |
10:00 | |||
Power and Thermal Characterization of POWER6 System V. Jimenez, C. Boneti, F. Cazorla, R. Gioiosa, E. Kursun, C. Cher, C. Isci, A. Buyuktosunoglu, P. Bose, M. Valero |
Accelerating Multicore Reuse Distance Analysis with Sampling and Parallelization D. Schuff, M. Kulkarni, V. Pai |
|||||
System-level Max Power (SYMPO) – A systematic approach for escalating system-level power consumption using synthetic benchmarks K. Ganesan, J. Jo, L. Bircher, D. Kaseridis, Z. Yu, L. John |
Simple and Fast Biased Locks N. Vasudevan, K. Namjoshi, S. Edwards |
|||||
Scalable Thread Scheduling and Global Power Management for Heterogeneous Many-Core Architectures J. Winter, D. Albonesi, C. Shoemaker |
Avoiding Deadlock Avoidance H. Pyla, S. Varadarajan |
|||||
Dynamically Managed Multithreaded Reconfigurable Architectures for Chip Multiprocessors M. Watkins, D. Albonesi |
DAFT: Decoupled Acyclic Fault Tolerance Y. Zhang, J. Lee, N. Johnson, D. August |
|||||
12:00 | Lunch | 12:00 | ||||
2:00 | Festsaal Session 2aCaches and Coherence I Session Chair: Pedro Trancoso, University of Cyprus |
Sitzungssaal Session 2bParallelization and Parallel Programming I Session Chair: Rudolf Eigenmann, Purdue University |
14:00 | |||
WAYPOINT: Scaling Coherence to Thousand-core Architectures J. Kelm, M. Johnson, S. Lumetta, S. Patel |
Feedback Driven Pipelining M. Suleman, M. Qureshi, Khubaib, Y. Patt |
|||||
Subspace Snooping: Filtering Snoops with Operating System Support D. Kim, J. Ahn, J. Kim, J. Huh |
Scalable Hardware Support for Conditional Parallelization Z. Li, O. Certner, J. Duato, O. Temam |
|||||
Proximity Coherence for Chip Multiprocessors N. Barrow-Williams, C. Fensch, S. Moore |
Reducing Task Creation and Termination Overhead in Explicitly Parallel Programs J. Zhao, J. Shirako, V. Nandivada, V. Sarkar |
|||||
SPACE: Sharing Pattern-based Directory Coherence for Multicore Scalability H. Zhao, A. Shriraman, S. Dwarkadas |
MEDICS: Ultra-Portable Processing for Medical Image Reconstruction G. Dasika, V. Robby, A. Sethia, T. Mudge, S. Mahlke |
|||||
4:00 | Posters and Coffee Session Chair: Michael Franz, University of California at Irvine |
16:00 | ||||
On-Chip Network Design Considerations for Compute Accelerators A. Bakhoda, J. Kim, T. Aamodt |
CoreGenesis: Erasing Core Boundaries for Robust and Configurable Performance S. Gupta, S. Feng, A. Ansari, S. Mahlke |
|||||
Believe It or Not! Multi-core CPUs can Match GPU Performance for a FLOP-intensive Application! R. Bordawekar, U. Bondhugula, R. Rao |
Automatic Vector Instruction Selection for Dynamic Compilation R. Barik, J. Zhao, V. Sarkar |
|||||
Less is More: Trading off Work-Efficiency for Scalability in Irregular Programs M. Hassaan, M. Burtscher, K. Pingali |
Approximating Age-Based Arbitration in On-Chip Networks M. Lee, J. Kim, D. Abts, M. Marty, J. Lee |
|||||
Moths: Mobile Threads for On-Chip Networks M. Misler, N. Jerger |
||||||
Improving Speculative Loop Parallelization via Selective Squash and Speculation Reuse S. Ananthramu, D. Majeti, S. Aggarwal, M. Chaudhuri |
Posters of participants in the ACM Student Research Competition |
|||||
Revisiting Sorting for GPGPU Stream Architectures D. Merrill, A. Grimshaw |
Offloading Java to Graphics Processors P. Calvert |
|||||
Analyzing Cache Performance Bottlenecks of STM applications and addressing them with compiler's help S. Mannarswamy, R. Govindarajan |
FabScalar: Composing Synthesizable RTL Designs of Arbitrary Cores within a Canonical Superscalar Template N.K. Choudhary |
|||||
An Intra-Tile Cache Set Balancing Scheme M. Hammoud, S. Cho, R. Melhem |
GLOpenCL: Compiler and Run-Time Support for OpenCL on Hardware- and Software-Managed Cache Multicores K. Daloukas |
|||||
StatCC: A Statistical Cache Contention Model D. Eklov, D. Black-Schaffer, E. Hagersten |
Massively Parallel Neural Network Simulation P. Fox |
|||||
An Integer Programming Framework for Optimizing Shared Memory Use on GPUs W. Ma, G. Agrawal |
Autonomic Service Management: Semantic Approach H. Haitof |
|||||
Exploiting Subtrace-Level Parallelism in Clustered Processors R. Ubal, J. Sahuquillo, S. Petit, P. López, J. Duato |
Runtime Estimation Methods for Pipelined MPSoCs H. Javaid |
|||||
A Case for NUMA-aware Contention Management on Multicore Systems S. Blagodurov, S. Zhuravlev, A. Fedorova, A. Kamali |
Effective Combination of Transformations via Obstacle Identification N. Johnson |
|||||
DMATiler: Revisiting Loop Tiling for Direct Memory Access H. Lin, T. Liu, L. Renganarayana |
Graph-based Cluster Assignment for VLIW Architectures A. Jordan |
|||||
Fidelity and Scaling of the PARSEC Benchmark Inputs C. Bienia, K. Li |
On stochastic dynamic binary translation in Vx32 K. Kononenko |
|||||
Online Cache Modeling for Commodity Multicore Processors R. West, P. Zaroo, C. Waldspurger, X. Zhang |
A Scalable High-performance Reconfigurable On-chip Network Architecture M. Modarressi |
|||||
NoC-Aware Cache Design for Multithreaded Execution on Tiled Chip Multiprocessors A. Abousamra, A. Jones, R. Melhem |
IIComMP: A Software Framework for Effortless Heterogeneous Architecture Development R. Reyes |
|||||
A Software-SVM-based Transactional Memory for Multicore Accelerator Architectures with Local Memory J. Lee, S. Seo, J. Lee |
Quantifying the Impact of GPU Specific Optimizations: An Experimental Study on a Weather Forecasting Application A. Saeed, E. Elwany, E. Tawadros, K. Abdelsalam, P. Yousry, S. Hafez |
|||||
NUcache – A Multi-core Cache Organization Based on Next-Use Distance R. Manikantan, K. Rajan, R. Govindarajan |
D2C: Deterministic, Deadlock-free Concurrency N. Vasudevan |
|||||
5:30 | Break | 17:30 | ||||
6:15 | Mayor's Reception at the Heurigen Fuhrgassl-Huber We meet at 6:15 in front of the ÖAW (conference location) where we have buses to the Heurigen. Address of the Heurigen: Neustift am Walde 68, 1190 Vienna – view the map |
18:15 | ||||
10:00 | 22:00 | |||||
Tuesday, September 14, 2010 | ||||||
8:30 | Festsaal Keynote Address IITowards a Science of Parallel Programming Keshav Pingali, University of Texas at Austin Session Chair: Valentina Salapura, IBM T.J. Watson Research Center |
8:30 | ||||
9:30 | Coffee | 9:30 | ||||
10:00 | Festsaal Session 3aHeterogeneous Platforms and Frameworks Session Chair: Calin Cascaval, Qualcomm Research |
Sitzungssaal Session 3bScheduling and Design Optimization Session Chair: Antonia Zhai, University of Minnesota |
10:00 | |||
An OpenCL Framework for Heterogeneous Multicores with Local Memory J. Lee, J. Kim, S. Seo, S. Kim, J. Park, H. Kim, T. Dao, Y. Cho, S. Seo, S. Lee, S. Cho, H. Song, J. Choi |
On Mitigating Memory Bandwidth Contention through Bandwidth-Aware Scheduling D. Xu, C. Wu, P. Yew |
|||||
A Software Platform for Heterogeneous Computing on General-Purpose and Graphics Processors J. Gummaraju, L. Morichetti, M. Houston, B. Sander, B. Gaster, B. Zheng |
AKULA: A Toolset for Developing Scheduling Algorithms on Multicore Systems S. Zhuravlev, S. Blagodurov, A. Fedorova |
|||||
MapCG: Writing Parallel Program Portable between CPU and GPU C. Hong, D. Chen, H. Lin, W. Chen, W. Zheng |
Criticality-driven Superscalar Design Space Exploration S. Navada, N. Choudhary, E. Rotenberg |
|||||
Adaptive Spatiotemporal Node Selection in Dynamic Networks P. Hari, J. McCabe, J. Banafato, M. Henry, K. Ko, E. Koukoumidis, U. Kremer, M. Martonosi, L. Peh |
A Programmable Parallel Accelerator for Learning and Classification S. Cadambi, A. Majumdar, M. Becchi, S. Chakradhar, H. Graf |
|||||
12:00 | Lunch | 12:00 | ||||
2:00 | Festsaal Session 4Best Papers Session Chairs: Michael Gschwind, IBM Systems and Technology Group, and Jens Knoop, Technische Universität Wien |
14:00 | ||||
Discovering and Understanding Performance Bottlenecks in Transactional Applications F. Zyulkyarov, S. Stipic, T. Harris, O. Unsal, A. Cristal, I. Hur, M. Valero |
||||||
Efficient Sequential Consistency using Conditional Fences C. Lin, V. Nagarajan, R. Gupta |
||||||
Partitioning Streaming Parallelism for Multi-cores: A Machine Learning Based Approach Z. Wang, M. O'Boyle |
||||||
Handling the Problems and Opportunities Posed by Multiple On-Chip Memory Controllers M. Awasthi, D. Nellans, K. Sudan, R. Balasubramonian, A. Davis |
||||||
4:00 | Break | 16:00 | ||||
4:15 | Afternoon on the town: Guided tour through Vienna's historic city center The tour starts at 4:15 in front of the ÖAW (conference location) and ends timely at Palais Ferstel. |
16:15 | ||||
6:30 | Conference Banquet at Palais Ferstel Address: Herrengasse 14, 1010 Vienna – view the map |
18:30 | ||||
10:30 | 22:30 | |||||
Wednesday, September 15, 2010 | ||||||
8:30 | Festsaal Keynote Address IIIRaising the Level of Many-Core Programming with Compiler Technology – Meeting a Grand Challenge Wen-mei Hwu, University of Illinois at Urbana-Champaign Session Chair: Jens Knoop, Technische Universität Wien |
8:30 | ||||
9:30 | Coffee | 9:30 | ||||
10:00 | Festsaal Session 5aLanguages and Compilers Session Chair: Vivek Sarkar, Rice University |
Sitzungssaal Session 5bACM Student Research Competition Session Chairs: M. Anton Ertl and Franz Puntigam, Technische Universität Wien |
10:00 | |||
The PLUG Architecture and Compiler A. Kumar, L. De Carli, C. Estan, K. Sankaralingam, S. Jha |
Finalists selected from the poster session present their work and compete for the prizes | |||||
A Model for Fusion and Code Motion in an Automatic Parallelizing Compiler U. Bondhugula, O. Gunluk, S. Dash, L. Renganarayana |
||||||
Ocelot: A Dynamic Optimization Framework for Bulk-Synchronous Applications in Heterogeneous Systems G. Diamos, A. Kerr, S. Yalamanchili, N. Clark |
||||||
An Empirical Characterization of Stream Programs and its Implications for Language and Compiler Design W. Thies, S. Amarasinghe |
||||||
12:00 | Lunch | 12:00 | ||||
1:30 | Festsaal Session 6aParallelization and Parallel Programming II Session Chair: David Gregg, Trinity College Dublin |
Sitzungssaal Session 6bSpeculation Session Chair: Jim Dehnert, Google |
13:30 | |||
Semi-Automatic Extraction and Exploitation of Hierarchical Pipeline Parallelism Using Profiling Information G. Tournavitis, B. Franke |
Speculative-Aware Execution: A Simple and Efficient Technique for Utilizing Multi-Cores to Improve Single-Thread Performance R. Mameesh, M. Franklin |
|||||
The Paralax Infrastructure: Automatic Parallelization With a Helping Hand H. Vandierendonck, S. Rul, K. De Bosschere |
The Potential of Using Dynamic Information Flow Analysis in Data Value Prediction W. Ghandour, H. Akkary, W. Masri |
|||||
AM++: A Generalized Active Message Framework J. Willcock, T. Hoefler, N. Edmonds, A. Lumsdaine |
Efficient Runahead Threads T. Ramirez, A. Pajuelo, O. Santana, O. Mutlu, M. Valero |
|||||
Using Thread-Local Memory Mapping to Support Cactus Stacks in Work-Stealing Runtime Systems I. Lee, S. Boyd-Wickizer, Z. Huang, C. Leiserson |
Energy Efficient Speculative Threads: Dynamic Thread Allocation in Same-ISA Heterogeneous Multicore Systems Y. Luo, V. Packirisamy, W. Hsu, A. Zhai |
|||||
3:30 | Coffee | 15:30 | ||||
4:00 | Festsaal Session 7aCaches and Coherence II Session Chair: Aamer Jaleel, Intel |
Sitzungssaal Session 7bData Distribution and Tiling Session Chair: Lawrence Rauchwerger, Texas A&M University |
16:00 | |||
SWEL: Hardware Cache Coherence Protocols to Map Shared Data onto Shared Caches S. Pugsley, J. Spjut, D. Nellans, R. Balasubramonian |
Compiler-assisted Data Distribution for Chip Multiprocessors Y. Li, A. Abousamra, R. Melhem, A. Jones |
|||||
ATAC: A 1000-Core Cache-Coherent Processor with On-Chip Optical Network G. Kurian, J. Miller, J. Psota, J. Michel, L. Kimerling, A. Agarwal |
Data Layout Transformation Exploiting Memory-Level Parallelism in Structured Grid Many-Core Applications I. Sung, J. Stratton, W. Hwu |
|||||
Using Dead Blocks as a Virtual Victim Cache S. Khan, D. Jimenez, B. Falsafi, D. Burger |
Tiled MapReduce: Optimizing Resource Usages of Data-parallel Applications on Multicore with Tiling R. Chen, H. Chen, B. Zang |
|||||
5:30 | Festsaal Closing Remarks |
17:30 | ||||
6:00 | 18:00 |