David I. August
Professor in the Department of Computer Science, Princeton University
Affiliated with the Department of Electrical Engineering, Princeton University
Ph.D. May 2000, Department of Electrical and Computer Engineering, University of Illinois at Urbana-Champaign

Office: Computer Science Building Room 221
Email: august@princeton.edu
PGP: Public Key
PGP Fingerprint: DD96 3B12 7DA1 EF4E 46EE A23A D2AB 4FCE B365 2C9A
Fax: (609) 964-1699
Administrative Assistant: Pamela DelOrefice, (609) 258-5551

Front Page Publication List (with stats) Curriculum Vitae (PDF) The Liberty Research Group


Fault-tolerant Typed Assembly Language [abstract] (ACM DL, PDF)
Frances Perry, Lester Mackey, George A. Reis, Jay Ligatti, David I. August, and David Walker
Proceedings of the 2007 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), June 2007.
Accept Rate: 25% (45/178).
Winner Best Paper Award.

A transient hardware fault occurs when an energetic particle strikes a transistor, causing it to change state. Although transient faults do not permanently damage the hardware, they may corrupt computations by altering stored values and signal transfers. In this paper, we propose a new scheme for provably safe and reliable computing in the presence of transient hardware faults. In our scheme, software computations are replicated to provide redundancy while special instructions compare the results of replicas to detect errors before writing critical data. In stark contrast to any previous efforts in this area, we have analyzed our fault tolerance scheme from a formal, theoretical perspective. First, we provide an operational semantics for our assembly language, which includes a precise formal denition of our fault model. Formulating such a model is a crucial step forward for the science of hardware fault tolerance as it pins down the assumptions being made about when and where faults may occur in a completely precise and transparent manner. Second, we develop an assembly-level type system designed to detect reliability problems in compiled code. On the one hand, this type system may be viewed as a theoretical tool for proving that code is reliable. On the other hand, it may be used as a practical tool for compiler debugging. In the latter case, the type system is strictly superior to conventional testing as it guarantees full coverage and perfect fault tolerance relative to the fault model. Third, we provide a formal specication for program fault tolerance under the given fault model and prove that all well-typed programs are indeed fault tolerant. In addition to the formal analysis, we evaluate the execution time of our detection scheme to quantify the performance penalty for sound fault tolerance.