Lightweight Metric Computation for Distributed Massive Data Streams - INRIA - Institut National de Recherche en Informatique et en Automatique Accéder directement au contenu
Article Dans Une Revue Transactions on Large-Scale Data- and Knowledge-Centered Systems Année : 2017

Lightweight Metric Computation for Distributed Massive Data Streams

Résumé

The real time analysis of massive data streams is of utmost importance in data intensive applications that need to detect as fast as possible and as efficiently as possible (in terms of computation and memory space) any correlation between its inputs or any deviance from some expected nominal behavior. The IoT infrastructure can be used for monitoring any events or changes in structural conditions that can compromise safety and increase risk. It is thus a recurrent and crucial issue to determine whether huge data streams, received at monitored devices , are correlated or not as it may reveal the presence of attacks. We propose a metric, called codeviation, that allows to evaluate the correlation between distributed massive streams. This metric is inspired from classical metric in statistics and probability theory, and as such enables to understand how observed quantities change together, and in which proportion. We then propose to estimate the codeviation in the data stream model. In this model, functions are estimated on a huge sequence of data items, in an online fashion, and with a very small amount of memory with respect to both the size of the input stream and the values domain from which data items are drawn. We then generalize our approach by presenting a new metric, the Sketch-metric, which allows us to define a distance between updatable summaries of large data streams. An important feature of the Sketch-metric is that, given a measure on the entire initial data streams, the Sketch-metric preserves the axioms of the latter measure on the sketch. We finally present results obtained during extensive experiments conducted on both synthetic traces and real data sets allowing us to validate the robustness and accuracy of our metrics.
Fichier principal
Vignette du fichier
ab-tldks2017.pdf (653.63 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-01634353 , version 1 (14-11-2017)

Identifiants

Citer

Emmanuelle Anceaume, Yann Busnel. Lightweight Metric Computation for Distributed Massive Data Streams. Transactions on Large-Scale Data- and Knowledge-Centered Systems, 2017, 10430 (33), pp.1--39. ⟨10.1007/978-3-662-55696-2_1⟩. ⟨hal-01634353⟩
419 Consultations
156 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More