David I. August
Professor in the Department of Computer Science, Princeton University
Affiliated with the Department of Electrical Engineering, Princeton University
Ph.D. May 2000, Department of Electrical and Computer Engineering, University of Illinois at Urbana-Champaign

Office: Computer Science Building Room 221
Email: august@princeton.edu
PGP: Public Key
PGP Fingerprint: DD96 3B12 7DA1 EF4E 46EE A23A D2AB 4FCE B365 2C9A
Fax: (609) 964-1699
Administrative Assistant: Pamela DelOrefice, (609) 258-5551

Front Page Publication List (with stats) Curriculum Vitae (PDF) The Liberty Research Group


Exploiting Parallelism and Structure to Accelerate the Simulation of Chip Multi-processors [abstract] (IEEE Xplore, PDF)
David A. Penry, Daniel Fay, David Hodgdon, Ryan Wells, Graham Schelle, David I. August, and Daniel A. Connors
Proceedings of the Twelfth International Symposium on High-Performance Computer Architecture (HPCA), February 2006.
Accept Rate: 15% (26/172).

Simulation is an important means of evaluating new microarchitectures. Current trends toward chip multi-processors (CMPs) try the ability of designers to develop efficient simulators. CMP simulation speed can be improved by exploiting parallelism in the CMP simulation model. This may be done by either running the simulation on multiple processors or by integrating multiple processors into the simulation to replace simulated processors. Doing so usually requires tedious manual parallelization or re-design to encapsulate processors.

Both problems can be avoided by generating the simulator from a concurrent, structural model of the CMP. Such a model not only resembles hardware, making it easy to understand and use, but also provides sufficient information to automatically parallelize the simulator without requiring manual model changes. Furthermore, individual components of the model such as processors may be replaced with equivalent hardware without requiring repartitioning.

This paper presents techniques to perform automated simulator parallelization and hardware integration for CMP structural models. We show that automated parallelization can achieve an 7.60 speedup for a 16-processor CMP model on a conventional 4-processor shared-memory multiprocessor. We demonstrate the power of hardware integration by integrating eight hardware PowerPC cores into a CMP model, achieving a speedup of up to 5.82.