A Generic Approach to Scheduling and Checkpointing Workflows - INRIA - Institut National de Recherche en Informatique et en Automatique Accéder directement au contenu
Communication Dans Un Congrès Année : 2018

A Generic Approach to Scheduling and Checkpointing Workflows

Résumé

This work deals with scheduling and checkpointing strategies to execute scientific workflows on failure-prone large-scale platforms. To the best of our knowledge, this work is the first to target fail-stop errors for arbitrary workflows. Most previous work addresses soft errors, which corrupt the task being executed by a processor but do not cause the entire memory of that processor to be lost, contrarily to fail-stop errors. We revisit classical mapping heuristics such as HEFT and MINMIN and complement them with several checkpointing strategies. The objective is to derive an efficient trade-off between checkpointing every task (CKPTALL), which is an overkill when failures are rare events, and checkpointing no task (CKPTNONE), which induces dramatic re-execution overhead even when only a few failures strike during execution. Contrarily to previous work, our approach applies to arbitrary workflows, not just special classes of dependence graphs such as M-SPGs (Minimal Series-Parallel Graphs). Extensive experiments report significant gain over both CKPTALL and CKPTNONE, for a wide variety of workflows.
Fichier principal
Vignette du fichier
icpp.pdf (708.43 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-01798627 , version 1 (23-05-2018)
hal-01798627 , version 2 (31-05-2018)

Identifiants

Citer

Li Han, Valentin Le Fèvre, Louis-Claude Canon, Yves Robert, Frédéric Vivien. A Generic Approach to Scheduling and Checkpointing Workflows. ICPP 2018 - 47th International Conference on Parallel Processing, Aug 2018, Eugene, OR, United States. pp.1-10, ⟨10.1145/3225058.3225145⟩. ⟨hal-01798627v2⟩
182 Consultations
303 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More