Sketch -metric: Comparing Data Streams via Sketching - INRIA - Institut National de Recherche en Informatique et en Automatique Accéder directement au contenu
Rapport Année : 2013

Sketch -metric: Comparing Data Streams via Sketching

Résumé

We consider the problem of estimating the distance between any two large data streams in small-space constraint. This problem is of utmost importance in data intensive monitoring applications where input streams are generated rapidly. These streams need to be processed on the fly and accurately to quickly determine any deviance from nominal behavior. We present a new metric, the \emph{Sketch $\star$-metric}, which allows to define a distance between updatable summaries (or sketches) of large data streams. An important feature of the \emph{Sketch $\star$-metric} is that, given a measure on the entire initial data streams, the \emph{Sketch $\star$-metric} preserves the axioms of the latter measure on the sketch (such as the non-negativity, the identity, the symmetry, the triangle inequality but also specific properties of the $f$-divergence or the Bregman one). Extensive experiments conducted on both synthetic traces and real data sets allow us to validate the robustness and accuracy of the \emph{Sketch $\star$-metric}.
Fichier principal
Vignette du fichier
AB13-PODS-RR2001.pdf (442.35 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-00764772 , version 1 (13-12-2012)

Identifiants

  • HAL Id : hal-00764772 , version 1

Citer

Emmanuelle Anceaume, Yann Busnel. Sketch -metric: Comparing Data Streams via Sketching. 2013, pp.8. ⟨hal-00764772⟩
364 Consultations
285 Téléchargements

Partager

Gmail Facebook X LinkedIn More