David I. August
Professor in the Department of Computer Science, Princeton University
Affiliated with the Department of Electrical Engineering, Princeton University
Ph.D. May 2000, Department of Electrical and Computer Engineering, University of Illinois at Urbana-Champaign

Office: Computer Science Building Room 221
Email: august@princeton.edu
PGP: Public Key
PGP Fingerprint: DD96 3B12 7DA1 EF4E 46EE A23A D2AB 4FCE B365 2C9A
Fax: (609) 964-1699
Administrative Assistant: Pamela DelOrefice, (609) 258-5551

Front Page Publication List (with stats) Curriculum Vitae (PDF) The Liberty Research Group


Spice: Speculative Parallel Iteration Chunk Execution [abstract] (ACM DL, PDF)
Easwaran Raman, Neil Vachharajani, Ram Rangan, and David I. August
Proceedings of the 2008 International Symposium on Code Generation and Optimization (CGO), April 2008.
Accept Rate: 31% (21/66).

The recent trend in the processor industry of packing multiple processor cores in a chip has increased the importance of automatic techniques for extracting thread level parallelism. A promising approach for extracting thread level parallelism in general purpose applications is thread level speculation(TLS), which uses memory alias or value speculation to break dependences amongst threads and executes them concurrently. In this work, we present a novel software-only value prediction mechanism for TLS and an associated TLS technique called speculative parallel iteration chunk execution (Spice). Our value prediction technique predicts the loop live-ins of only a few iterations of a given loop, enabling speculative threads to start from those iterations. It also increases the probability of successful speculation by only predicting that the values will be used as live-ins in some future iterations of the loop. These twin properties enable our value prediction scheme to have high prediction accuracies while exposing significant coarse-grained thread-level parallelism. Spice has been implemented as an automatic transformation in a research compiler. The technique results in up to 157% speedup (101% on average) with 4 threads.