New Directions for Data Quality Mining
Résumé
As data types and data structures change to keep up with evolving technologies and applications, data quality problems too have evolved and become more complex. Data streams, web logs, wikipedias, biomedical applications, video streams and social networking websites generate a mind boggling variety of data types. Data quality mining, the use of data mining to manage, measure and improve data quality, has focused mostly on addressing each category of data glitch separately as a static entity. In this tutorial we highlight new directions in data quality mining, particularly: (a) the applicability and effectiveness of the methodologies for various data types such as structured, semi-structured and stream data, (b) the detection of concomitant data glitches like the occurrence of outliers in data with missing values and duplicates (c) the design of sequential approaches to data quality mining, such as workflows composed of a sequence of tasks for data quality exploration and analysis. We give a brief overview of past work, introduce current research in this area, and highlight new directions and open problems in data quality mining. The tutorial includes extensive case studies, applications and practical examples.
Origine : Fichiers produits par l'(les) auteur(s)
Loading...