David I. August
Professor in the Department of Computer Science, Princeton University
Affiliated with the Department of Electrical Engineering, Princeton University
Ph.D. May 2000, Department of Electrical and Computer Engineering, University of Illinois at Urbana-Champaign

Office: Computer Science Building Room 221
Email: august@princeton.edu
Phone: (609) 258-2085
Fax: (609) 964-1699
Administrative Assistant: Pamela DelOrefice, (609) 258-5551

Front Page Publication List (with stats) Curriculum Vitae (PDF) The Liberty Research Group

Publications

Automatically Exploiting Cross-Invocation Parallelism Using Runtime Information [abstract] (PDF)
Jialu Huang, Thomas B. Jablin, Stephen R. Beard, Nick P. Johnson, and David I. August
Proceedings of the 2013 International Symposium on Code Generation and Optimization (CGO), February 2013.
Accept Rate: 28% (33/117).

Automatic parallelization is a promising approach to producing scalable multi-threaded programs for multicore architectures. Many existing automatic techniques only parallelize iterations within a loop invocation and synchronize threads at the end of each loop invocation. When parallel code contains many loop invocations, synchronization can easily become a performance bottleneck. Some automatic techniques address this problem by exploiting crossinvocation parallelism. These techniques use static analysis to partition iterations among threads to avoid cross-thread dependences. However, this partitioning is not always achievable at compile-time, because program input termines dependence patterns at run-time. By contrast, this paper proposes DOMORE, the first automatic parallelization technique that uses runtime information to exploit additional cross-invocation parallelism. Instead of partitioning iterations statically, DOMORE dynamically detects crossthread dependences and synchronizes only when necessary. DOMORE consists of a compiler and a runtime library. At compile time, DOMORE automatically parallelizes loops and inserts a custom runtime engine into programs. At runtime, the engine observes dependences and synchronizes iterations only when necessary. For six programs, DOMORE achieves a geomean loop speedup of 2.1X over parallel execution without crossinvocation parallelization and of 3.2X over sequential execution on eight cores.