David I. August
Professor in the Department of Computer Science, Princeton University
Affiliated with the Department of Electrical Engineering, Princeton University
Ph.D. May 2000, Department of Electrical and Computer Engineering, University of Illinois at Urbana-Champaign

Office: Computer Science Building Room 221
Email: august@princeton.edu
PGP: Public Key
PGP Fingerprint: DD96 3B12 7DA1 EF4E 46EE A23A D2AB 4FCE B365 2C9A
Fax: (609) 964-1699
Administrative Assistant: Pamela DelOrefice, (609) 258-5551

Front Page Publication List (with stats) Curriculum Vitae (PDF) The Liberty Research Group


Amortizing Software Queue Overhead for Pipelined Inter-Thread Communication [abstract] (PDF)
Ram Rangan and David I. August
Proceedings of the Workshop on Programming Models for Ubiquitous Parallelism (PMUP), September 2006.
Accept Rate: 41% (10/24).

Future chip multiprocessors are expected to contain multiple on-die processing cores. Increased memory system contention and wire delays will result in high inter-core latencies in these processors. Thus, parallelizing applications to efficiently execute on multiple contexts is key to achieving continued performance improvements. Recently proposed pipelined multithreading (PMT) techniques have shown significant promise for both manual and automatic parallelization. They tolerate increasing inter-thread communication delays by enforcing acyclic dependences amongst communicating threads and pipelining communication.

However, lack of efficient communication support for such programs hinders related language and compiler research. While researchers have proposed dedicated interconnects and storage for inter-core communication, such mechanisms are not cost-effective, consume extra power, demand chip redesign effort, and necessitate complex operating system modifications. Software impelementations of shared memory queues avoid these problems. But, they tend to have heavy overhead per communication operation, causing them to negate parallelization benefits and worse still, to perform slower than the original single-threaded codes. In this paper, we present a simple compiler analysis to coalesce synchronization and queue pointer updates for select communication operations, to minimize the intra-thread overhead of software queue implementations. A preliminary comparison of static schedule heights shows a considerable performance improvement over existing software queue implementations.