Statistical data mining for symbol associations in genomic databases - Université Toulouse III - Paul Sabatier - Toulouse INP Accéder directement au contenu
Article Dans Une Revue International Journal of Genetics and Genomics Année : 2014

Statistical data mining for symbol associations in genomic databases

Résumé

A methodology is proposed to automatically detect significant symbol associations in genomic databases. A new statistical test assesses the significance of a group of symbols when found in several genesets of a given database. To each pair of symbols, a p-value depending on the frequency of the two symbols and on the number of joint occurrences, is associated. All pairs with p-values below a certain threshold define a graph structure on the set of symbols. The cliques of that graph are significant symbol associations, linked to a set of genesets where they can be found. The method can be applied to any database, and is illustrated on the MSigDB C2 database. Many of the symbol associations detected in C2 or in non-specific selections correspond to already known interactions. On more specific selections of C2, many previously unknown symbol associations have been detected. These associations unveal new candidates for gene or protein interactions, needing further investigation for biological evidence.

Dates et versions

hal-01810889 , version 1 (08-06-2018)

Identifiants

Citer

Bernard Ycart, Frédéric Pont, Jean-Jacques Fournié. Statistical data mining for symbol associations in genomic databases. International Journal of Genetics and Genomics, 2014, 2 (6), pp.97-104. ⟨10.11648/j.ijgg.20140206.11⟩. ⟨hal-01810889⟩
103 Consultations
0 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More