A Hierarchical Checkpointing Protocol for Parallel Applications in Cluster Federations - INRIA - Institut National de Recherche en Informatique et en Automatique Accéder directement au contenu
Rapport (Rapport De Recherche) Année : 2004

A Hierarchical Checkpointing Protocol for Parallel Applications in Cluster Federations

Résumé

Code coupling applications can be divided into communicating modules, that may be executed on different clusters in a cluster federation. As a cluster federation comprises of a large number of nodes, there is a high probability of a node failure. We propose a hierarchical checkpointing protocol that combines a synchronized checkpointing technique inside clusters and a communication-induced technique between clusters. This protocol fits to the characteristics of a cluster federation (large number of nodes, high latency and low bandwidth networking technologies between clusters). A preliminary performance evaluation performed using a discrete event simulator shows that the protocol is suitable for code coupling applications.
Fichier principal
Vignette du fichier
RR-5091.pdf (340.75 Ko) Télécharger le fichier
Loading...

Dates et versions

inria-00071492 , version 1 (23-05-2006)

Identifiants

  • HAL Id : inria-00071492 , version 1

Citer

Sébastien Monnet, Christine Morin, Ramamurthy Badrinath. A Hierarchical Checkpointing Protocol for Parallel Applications in Cluster Federations. [Research Report] RR-5091, INRIA. 2004. ⟨inria-00071492⟩
72 Consultations
166 Téléchargements

Partager

Gmail Facebook X LinkedIn More