Scalable Shared Memory Multiprocessors: Some Ideas to Make them Reliable
Résumé
Scalable shared memory multiprocessors are promising architectures to achieve teraflops computational power. As they contain a large number of processor and memory elements, such machines have a high probability of failure. In this paper, we investigate an approach based on backward error recovery to provide a highly available scalable shared memory architecture tolerating transient and permanent processor and memory failures.