David I. August
Professor in the Department of Computer Science, Princeton University
Affiliated with the Department of Electrical Engineering, Princeton University
Ph.D. May 2000, Department of Electrical and Computer Engineering, University of Illinois at Urbana-Champaign

Office: Computer Science Building Room 221
Email: august@princeton.edu
Phone: (609) 258-2085
Fax: (609) 964-1699
Administrative Assistant: Pamela DelOrefice, (609) 258-5551

Front Page Publication List (with stats) Curriculum Vitae (PDF) The Liberty Research Group


Scalable Speculative Parallelization on Commodity Clusters [abstract] (ACM DL, PDF)
Hanjun Kim, Arun Raman, Feng Liu, Jae W. Lee, and David I. August
Proceedings of the 43rd IEEE/ACM International Symposium on Microarchitecture (MICRO), December 2010.
Accept Rate: 18% (45/248).
Highest ranked paper in double-blind review process.

While clusters of commodity servers and switches are the most popular form of large-scale parallel computers, many programs are not easily parallelized for execution upon them. In particular, high inter-node communication cost and lack of globally shared memory appear to make clusters suitable only for server applications with abundant task-level parallelism and scientific applications with regular and independent units of work. Clever use of pipeline parallelism (DSWP), thread-level speculation (TLS), and speculative pipeline parallelism (Spec-DSWP) can mitigate the costs of inter-thread communication on shared memory multicore machines. This paper presents Distributed Software Multi-threaded Transactional memory (DSMTX), a runtime system which makes these techniques applicable to non-shared memory clusters, allowing them to efficiently address inter-node communication costs. Initial results suggest that DSMTX enables efficient cluster execution of a wider set of application types. For 11 sequential C programs parallelized for a 4-core 32-node (128 total core) cluster without shared memory, DSMTX achieves a geomean speedup of 49x. This compares favorably to the 15x speedup achieved by our implementation of TLS-only support for clusters.