Sliding HyperLogLog: Estimating cardinality in a data stream - INRIA - Institut National de Recherche en Informatique et en Automatique Accéder directement au contenu
Pré-Publication, Document De Travail Année : 2010

Sliding HyperLogLog: Estimating cardinality in a data stream

Résumé

In this paper, a new algorithm estimating the number of active flows in a data stream is proposed. This algorithm adapts the HyperLogLog algorithm of Flajolet et al to the data stream processing by adding a sliding window mechanism. It has the advantage to estimate at any time the number of flows seen over any duration bounded by the length of the sliding window. The estimate is very accurate with a standard error of about 1.04/\sqrt{m} (the same as in HyperLogLog algorithm). As the new algorithm answers more flexible queries, it needs an additional memory storage compared to HyerLogLog algorithm. It is proved that this additional memory is at most equal to 5mln(n/m) bytes, where n is the real number of flows in the sliding window. For instance, with an additional memory of only 35kB, a standard error of about 3% can be achieved for a data stream of several million flows. Theoretical results are validated on both real and synthetic traffic.
Fichier principal
Vignette du fichier
sliding_HyperLogLog.pdf (272.64 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-00465313 , version 1 (26-03-2010)

Identifiants

  • HAL Id : hal-00465313 , version 1

Citer

Yousra Chabchoub, Georges Hébrail. Sliding HyperLogLog: Estimating cardinality in a data stream. 2010. ⟨hal-00465313⟩
247 Consultations
6490 Téléchargements

Partager

Gmail Facebook X LinkedIn More