David I. August
Professor in the Department of Computer Science, Princeton University
Affiliated with the Department of Electrical Engineering, Princeton University
Ph.D. May 2000, Department of Electrical and Computer Engineering, University of Illinois at Urbana-Champaign

Office: Computer Science Building Room 221
Email: august@princeton.edu
Phone: (609) 258-2085
Fax: (609) 964-1699
Administrative Assistant: Pamela DelOrefice, (609) 258-5551

Front Page Publication List (with stats) Curriculum Vitae (PDF) The Liberty Research Group

Publications

Software Fault Detection Using Dynamic Instrumentation [abstract] (CiteSeerX, PDF)
George A. Reis, David I. August, Robert Cohn, and Shubhendu S. Mukherjee
Proceedings of the Fourth Annual Boston Area Architecture Workshop (BARC), February 2006.

Software-only approaches to increase hardware reliability have been proposed and evaluated as alternatives to hardware modification. These techniques have shown that they can significantly improve reliability with reasonable performance overhead. Software-only techniques do not require any hardware support and thus are far cheaper and easier to deploy. These techniques can be used for systems that have already been manufactured and now require higher reliability than the hardware can offer.

All previous proposals have been static compilation techniques that rely on source code transformations or alterations to the compilation process. Our proposal is the first application of software fault detection for transient errors that increases reliability dynamically. The application of our technique is trivial since the only requirement is the program binary, which makes it applicable for legacy programs that no longer have readily available or easily re-compilable source code. Our dynamic reliability technique can seamlessly handle variable-length instructions, mixed code and data, statically unknown indirect jump targets, dynamically generated code, and dynamically loaded libraries. Our technique is also able attach to an already running application to increase its reliability, and detach when appropriate, thus returning to faster (although unreliable) execution.