Towards resilient parallel linear Krylov solvers: recover-restart strategies - INRIA - Institut National de Recherche en Informatique et en Automatique Accéder directement au contenu
Communication Dans Un Congrès Année : 2013

Towards resilient parallel linear Krylov solvers: recover-restart strategies

Résumé

: The advent of extreme scale machines will require the use of parallel resources at an unprecedented scale, probably leading to a high rate of hardware faults. High Performance Computing (HPC) applications that aim at exploiting all these resources will thus need to be resilient, \emph{i.e.}, be able to compute a correct solution in presence of faults. In this work, we investigate possible remedies in the framework of the solution of large sparse linear systems that is often the inner most numerical kernel in many scientific and engineering applications and also one of the most time consuming part. More precisely, we present recovery followed by restarting strategies in the framework of Krylov subspace solvers where lost entries of the iterate are interpolated to define a new initial guess before restarting. In particular, we consider two interpolation policies that preserve key numerical properties of well-known solvers, namely the monotony decrease of the A-norm of the error of the conjugate gradient (CG) or the residual norm decrease of GMRES. We assess the impact of the recovery method, the fault rate and the number of processors on the robustness of the resulting linear solvers. We consider experiments with CG, GMRES and Bi-CGStab.
Fichier non déposé

Dates et versions

hal-00935685 , version 1 (23-01-2014)

Identifiants

  • HAL Id : hal-00935685 , version 1

Citer

Emmanuel Agullo, Luc Giraud, Abdou Guermouche, Jean Roman, Mawussi Zounon. Towards resilient parallel linear Krylov solvers: recover-restart strategies. Sparse days 2013, CERFACS, Jun 2013, Toulouse, France. ⟨hal-00935685⟩
176 Consultations
0 Téléchargements

Partager

Gmail Facebook X LinkedIn More