Revisiting Fault-Tolerance in Processors


For this project, we are examining ways to tolerate faults (permanent or transient) in a processor designed without conventional fault tolerant hardware. We want to examine methods to map around these faults and still produce correct functionality. More specifically, we want to examine faults injected into the Floating Point hardware as these paths generally lie on the critical path. One way to map around these faults would allow for alternative pieces of code to be available which can map around certain hardware failures. Should the fault condition be exercised, we could jump to an alternative version of this code which can mask this problem and produce a correctly functioning output.


Milestones:

Mapped the modified Leon3 processor design to an FPGA development board; successfully ran test program on FPGA Leon 3 and interact with this program using JTAG debugging interface; modified fault injection interface to support more precise fault injection. Specifically, faults can be injected into a particular set of FP registers.


Plans/Outlook:

Currently, we are trying to produce a piece of sparc assembly code to use for initial testing. We found that in trying to examine a compiled piece of C++ code, it was difficult to analyze the precise mapping to sparc assembly – and thus difficult to examine the meaning of particular FP registers in relation to the code. Once we have a functional piece of sparc assembly, we can examine more directly the effect of our fault injection and analyze the alternative pieces of code that can tolerate these faults. Once this has been completed, we can build upon our findings and automate parts of this process.


Each arrow/line represents a register value at some point in time. In the top diagram, a permanent fault in the highest register is propagated to other registers as time advances. In the bottom diagram, we do not use the faulty register and thus the errors are not propagated.

Each arrow/line represents a register value at some point in time. In the top diagram, a permanent fault in the highest register is propagated to other registers as time advances. In the bottom diagram, we do not use the faulty register and thus the errors are not propagated.


Category:

Runtime Support


Campus:

UC Irvine

UIUC


People:

PI: Rakesh Kumar; Co-PIs: Nikil Dutt and Alex Nicolau (UC Irvine); Graduate Student: Anthony Brown (UIUC)



Artifacts:

Hardware: Altera Stratix 2 FPGA board, VHDL hardware implementation of Leon3 processor with fault injector (based of GRLIB IP Library from Gaisler Research). Software completed: the programmable interface for fault injection framework, as well as sparse matrix vector product C++ code to run as benchmark.




 

Click here to view other Research Projects