Distributed Termination Detection for HPC Task-Based Environments - INRIA - Institut National de Recherche en Informatique et en Automatique Accéder directement au contenu
Rapport (Rapport De Recherche) Année : 2018

Distributed Termination Detection for HPC Task-Based Environments

Résumé

This paper revisits distributed termination detection algorithms in the context of high-performance computing applications in task systems. We first outline the need to efficiently detect termination in workflows for which the total number of tasks is data dependent and therefore not known statically but only revealed dynamically during execution. We introduce an efficient variant of the Credit Distribution Algorithm (CDA) and compare it to the original algorithm (HCDA) as well as to its two primary competitors: the Four Counters algorithm (4C) and the Efficient Delay-Optimal Distributed algorithm (EDOD). On the theoretical side, we analyze the behavior of each algorithm for some simplified task-based kernels and show the superiority of CDA in terms of the number of control messages. On the practical side, we provide a highly tuned implementation of each termination detection algorithm within PaRSEC and compare their performance for a variety of benchmarks, extracted from scientific applications that exhibit dynamic behaviors.
Fichier principal
Vignette du fichier
rr9181.pdf (947.52 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-01811823 , version 1 (10-06-2018)

Identifiants

  • HAL Id : hal-01811823 , version 1

Citer

George Bosilca, Aurelien Bouteiller, Thomas Hérault, Valentin Le Fèvre, Yves Robert, et al.. Distributed Termination Detection for HPC Task-Based Environments. [Research Report] RR-9181, Inria - Research Centre Grenoble – Rhône-Alpes. 2018, pp.1-28. ⟨hal-01811823⟩
139 Consultations
255 Téléchargements

Partager

Gmail Facebook X LinkedIn More