Optimizing jobs timeouts on clusters and production grids - INRIA - Institut National de Recherche en Informatique et en Automatique Accéder directement au contenu
Rapport Année : 2006

Optimizing jobs timeouts on clusters and production grids

Résumé

This paper presents a method to optimize the timeout value of grid computing jobs. It relies on a model of the job execution time that considers the job management system latency through a random variable. It also takes into account a proportion of outliers to model either reliable clusters or production grids characterized by faults causing jobs loss. Job management systems are first studied considering classical distributions of the latency. Different behaviors are exhibited, depending on the weight of the tail of the distribution and on the amount of outliers. Experimental results are then shown based on the latency distribution and outlier ratios measured on the EGEE grid infrastructure. Those results show that using the optimal timeout value provided by our method reduces the impact of outliers and leads to a 1.36 speed-up for reliable systems without outliers.
Fichier principal
Vignette du fichier
RR-06.35-T.GLATARD.pdf (584.45 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-00691828 , version 1 (27-04-2012)

Identifiants

  • HAL Id : hal-00691828 , version 1

Citer

Tristan Glatard, Johan Montagnat, Xavier Pennec. Optimizing jobs timeouts on clusters and production grids. 2006, pp.23. ⟨hal-00691828⟩
214 Consultations
152 Téléchargements

Partager

Gmail Facebook X LinkedIn More