From Big Data to Fast Data: Efficient Stream Data Management - INRIA - Institut National de Recherche en Informatique et en Automatique Accéder directement au contenu
Hdr Année : 2019

From Big Data to Fast Data: Efficient Stream Data Management

Du Big Data au Fast Data: Gestion efficace des données de flux

Résumé

This manuscript provides a synthetic overview of my research journey since my PhD defense. The document does not claim to present my work in its entirety, but focuses on the contributions to data management in support of stream processing. These results address all stages of the stream processing pipeline: data collection and in-transit processing at the edge, transfer towards the cloud processing sites, ingestion and persistent storage. I start by presenting the general context of stream data management in light of the recent transition from Big to Fast Data. After highlighting the challenges at the data level associated with batch and real-time analytics, I introduce a subjective overview of my proposals to address them. They bring solutions to the problems of in-transit stream storage and processing, fast data transfers, distributed metadata management, dynamic ingestion and transactional storage. The integration of these solutions into functional prototypes and the results of the large-scale experimental evaluations on clusters, clouds and supercomputers demonstrate their effectiveness for several real-life applications ranging from neuro-science to LHC nuclear physics. Finally, these contributions are put into the perspective of the High Performance Computing - Big Data convergence.
Fichier principal
Vignette du fichier
HDR_Alexandru_COSTAN.pdf (7.32 Mo) Télécharger le fichier
HDR_Alexandru_COSTAN_slides.pdf (25.65 Mo) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

tel-02059437 , version 1 (06-03-2019)
tel-02059437 , version 2 (14-03-2019)

Identifiants

  • HAL Id : tel-02059437 , version 1

Citer

Alexandru Costan. From Big Data to Fast Data: Efficient Stream Data Management. Distributed, Parallel, and Cluster Computing [cs.DC]. ENS Rennes, 2019. ⟨tel-02059437v1⟩
872 Consultations
1852 Téléchargements

Partager

Gmail Facebook X LinkedIn More