David I. August
Professor in the Department of Computer Science, Princeton University
Affiliated with the Department of Electrical Engineering, Princeton University
Ph.D. May 2000, Department of Electrical and Computer Engineering, University of Illinois at Urbana-Champaign

Office: Computer Science Building Room 221
Email: august@princeton.edu
PGP: Public Key
PGP Fingerprint: DD96 3B12 7DA1 EF4E 46EE A23A D2AB 4FCE B365 2C9A
Fax: (609) 964-1699
Administrative Assistant: Pamela DelOrefice, (609) 258-5551

Front Page Publication List (with stats) Curriculum Vitae (PDF) The Liberty Research Group


Configurable Transient Fault Detection via Dynamic Binary Translation [abstract] (PDF)
George A. Reis, Jonathan Chang, David I. August, Robert Cohn, and Shubhendu S. Mukherjee
Proceedings of the 2nd Workshop on Architectural Reliability (WAR), December 2006.

Smaller feature sizes, lower voltage levels, and reduced noise margins have helped improve the performance and lower the power consumption of modern microprocessors. These same advances have made processors more susceptible to transient faults that can corrupt data and make systems unavailable. Designers often compensate for transient faults by adding hardware redundancy and making circuitand process-level adjustments. However, applications have different data integrity and availability demands, which make hardware approaches such as these too costly for many markets..

Software techniques can provide fault tolerance at a lower cost and with greater flexibility since they can be selectively deployed in the field even after the hardware has been manufactured. Most existing software-only techniques use recompilation, requiring access to program source code. Regardless of the code transformation method, previous techniques also incur unnecessary significant performance penalties by uniformly protecting the entire program without taking into account the varying vulnerability of different program regions and state elements to transient faults.

This paper presents Spot, a software-only fault-detection technique which uses dynamic binary translation to provide softwaremodulated fault tolerance with fine-grained control of redundancy. By using dynamic binary translation, users can improve the reliability of their applications without any assistance from hardware or software vendors. By using software-modulated fault tolerance, Spot can vary the level of protection independently for each register and region of code to provide users with more, and often superior, faultdetection options. This feature of Spot increases the mean work to failure from 1.90x to 17.79x.