Chronos: Failure-Aware Scheduling in Shared Hadoop Clusters

Orcun Yildiz; Shadi Ibrahim; Tran Anh Phuong; Gabriel Antoniu

Communication Dans Un Congrès Année : 2015

Chronos: Failure-Aware Scheduling in Shared Hadoop Clusters

(1) , (1) , (1) , (1)

Orcun Yildiz

Fonction : Auteur
PersonId : 984193

Scalable Storage for Clouds and Beyond

Shadi Ibrahim

Fonction : Auteur
PersonId : 13360
IdHAL : shadi-ibrahim

Scalable Storage for Clouds and Beyond

Tran Anh Phuong

Fonction : Auteur

Scalable Storage for Clouds and Beyond

Gabriel Antoniu

Fonction : Auteur
PersonId : 746326
IdHAL : gabriel-antoniu
ORCID : 0000-0001-6525-3736
IdRef : 095615296

Scalable Storage for Clouds and Beyond

Résumé

Hadoop emerged as the de facto state-of-the-art system for MapReduce-based data analytics. The reliability of Hadoop systems depends in part on how well they handle failures. Currently, Hadoop handles machine failures by re-executing all the tasks of the failed machines (i.e., executing recovery tasks). Unfortunately, this elegant solution is entirely entrusted to the core of Hadoop and hidden from Hadoop schedulers. The unawareness of failures therefore may prevent Hadoop schedulers from operating correctly towards meeting their objectives (e.g., fairness, job priority) and can significantly impact the performance of MapReduce applications. This paper presents Chronos, a failure-aware scheduling strategy that enables an early yet smart action for fast failure recovery while still operating within a specific scheduler objective. Upon failure detection, rather than waiting an uncertain amount of time to get resources for recovery tasks, Chronos leverages a lightweight preemption technique to carefully allocate these resources. In addition, Chronos considers data locality when scheduling recovery tasks to further improve the performance. We demonstrate the utility of Chronos by combining it with Fifo and Fair schedulers. The experimental results show that Chronos recovers to a correct scheduling behavior within a couple of seconds only and reduces the job completion times by up to 55% compared to state-of-the-art schedulers.

Mots clés

Failure Scheduling MapReduce Preemption Hadoop Data Management

Domaines

Informatique Calcul parallèle, distribué et partagé [cs.DC]

Fichier principal

IEEEBigData2015.pdf (136.61 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Shadi Ibrahim : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-01203001

Soumis le : mardi 22 septembre 2015-10:26:20

Dernière modification le : vendredi 24 mars 2023-14:53:01

Archivage à long terme le : mardi 29 décembre 2015-07:01:00

Dates et versions

hal-01203001 , version 1 (22-09-2015)

Identifiants

HAL Id : hal-01203001 , version 1

Citer

Orcun Yildiz, Shadi Ibrahim, Tran Anh Phuong, Gabriel Antoniu. Chronos: Failure-Aware Scheduling in Shared Hadoop Clusters. BigData'15-The 2015 IEEE International Conference on Big Data, Oct 2015, Santa Clara, CA, United States. ⟨hal-01203001⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

INSTITUT-TELECOM UNIV-RENNES1 CNRS INRIA INSA-RENNES IRISA GRID5000 CENTRALESUPELEC IRISA-D1 INRIA2 UR1-MATH-STIC UR1-UFR-ISTIC UNIV-RENNES SILECS UR1-MATH-NUM

450 Consultations

444 Téléchargements

Chronos: Failure-Aware Scheduling in Shared Hadoop Clusters

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager