C3Ro: An efficient mining algorithm of extended-closed contiguous robust sequential patterns in noisy data - INRIA - Institut National de Recherche en Informatique et en Automatique Accéder directement au contenu
Article Dans Une Revue Expert Systems with Applications Année : 2019

C3Ro: An efficient mining algorithm of extended-closed contiguous robust sequential patterns in noisy data

Résumé

Sequential pattern mining has been the focus of many works, but still faces a tough challenge in the mining of large databases for both efficiency and apprehensibility of its resulting set. To overcome these issues, the most promising direction taken by the literature relies on the use of constraints, including the well-known closedness constraint. However, such a mining is not resistant to noise in data, a characteristic of most real-world data. The main research question raised in this paper is thus: how to efficiently mine an apprehensible set of sequential patterns from noisy data? In order to address this research question, we introduce 1) two original constraints designed for the mining of noisy data: the robustness and the extended-closedness constraints, 2) a generic pattern mining algorithm, C3Ro, designed to mine a wide range of sequential patterns, going from closed or maximal contiguous sequential patterns to closed or maximal regular sequential patterns. C3Ro is dedicated to practitioners and is able to manage their multiple constraints. C3Ro also is the first sequential pattern mining algorithm to be as generic and parameterizable. Extensive experiments have been conducted and reveal the high efficiency of C3Ro, especially in large datasets, over well-known algorithms from the literature. Additional experiments have been conducted on a real-world job offers noisy dataset, with the goal to mine activities. This experiment offers a more thorough insight into C3Ro algorithm: job market experts confirm that the constraints we introduced actually have a significant positive impact on the apprehensibility of the set of mined activities.
Fichier principal
Vignette du fichier
Article_VersionAuteur.pdf (240.16 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-02977461 , version 1 (25-10-2020)

Identifiants

Citer

Y Abboud, Armelle Brun, Anne Boyer. C3Ro: An efficient mining algorithm of extended-closed contiguous robust sequential patterns in noisy data. Expert Systems with Applications, 2019, 131, pp.172 - 189. ⟨10.1016/j.eswa.2019.04.058⟩. ⟨hal-02977461⟩
132 Consultations
117 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More