On the Optimization of Iterative Programming with Distributed Data Collections - INRIA - Institut National de Recherche en Informatique et en Automatique Accéder directement au contenu
Pré-Publication, Document De Travail Année : 2020

On the Optimization of Iterative Programming with Distributed Data Collections

Nils Gesbert
  • Fonction : Auteur
  • PersonId : 958488
Pierre Genevès
Nabil Layaïda

Résumé

Big data programming frameworks are becoming increasinglyimportant for the development of applications, for which performanceand scalability are critical. In those complex frameworks, optimizing codeby hand is hard and time-consuming, making automated optimizationparticularly necessary. In order to automate optimization, a prerequisite isto find suitable abstractions to represent programs; for instance, algebrasbased on monads or monoids to represent distributed data collections.Currently, however, such algebras do not represent recursive programs ina way which allows analyzing or rewriting them. In this paper, we extenda monoid algebra with a fixpoint operator for representing recursion as afirst class citizen and show how it allows new optimizations. Experimentswith the Spark platform illustrate performance gains brought by thesesystematic optimizations
Fichier principal
Vignette du fichier
paper (2).pdf (920.69 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-02066649 , version 1 (13-03-2019)
hal-02066649 , version 2 (16-10-2020)
hal-02066649 , version 3 (16-10-2020)
hal-02066649 , version 4 (16-10-2020)
hal-02066649 , version 5 (02-03-2021)
hal-02066649 , version 6 (24-05-2022)

Identifiants

  • HAL Id : hal-02066649 , version 2

Citer

Sarah Chlyah, Nils Gesbert, Pierre Genevès, Nabil Layaïda. On the Optimization of Iterative Programming with Distributed Data Collections. 2020. ⟨hal-02066649v2⟩
505 Consultations
577 Téléchargements

Partager

Gmail Facebook X LinkedIn More