Statistical data mining for symbol associations in genomic databases

Bernard Ycart; Frédéric Pont; Jean-Jacques Fournié

doi:10.11648/j.ijgg.20140206.11

Article Dans Une Revue International Journal of Genetics and Genomics Année : 2014

Statistical data mining for symbol associations in genomic databases

(1) , (2) , (2)

1
2

Bernard Ycart

Fonction : Auteur

Inférence Processus Stochastiques

Frédéric Pont

Fonction : Auteur
PersonId : 756570
ORCID : 0000-0002-5493-527X
IdRef : 224672517

Centre de Recherches en Cancérologie de Toulouse

Jean-Jacques Fournié

Fonction : Auteur
PersonId : 178841
IdHAL : jean-jacques-fournie
ORCID : 0000-0001-6542-6908
IdRef : 031095887

Centre de Recherches en Cancérologie de Toulouse

Résumé

A methodology is proposed to automatically detect significant symbol associations in genomic databases. A new statistical test assesses the significance of a group of symbols when found in several genesets of a given database. To each pair of symbols, a p-value depending on the frequency of the two symbols and on the number of joint occurrences, is associated. All pairs with p-values below a certain threshold define a graph structure on the set of symbols. The cliques of that graph are significant symbol associations, linked to a set of genesets where they can be found. The method can be applied to any database, and is illustrated on the MSigDB C2 database. Many of the symbol associations detected in C2 or in non-specific selections correspond to already known interactions. On more specific selections of C2, many previously unknown symbol associations have been detected. These associations unveal new candidates for gene or protein interactions, needing further investigation for biological evidence.

Mots clés

Genomic Databases Protein-Protein Interaction Frequent Itemset Searching P-Value Graph

Domaines

Bio-Informatique, Biologie Systémique [q-bio.QM]

Brigitte Bidégaray-Fesquet : Connectez-vous pour contacter le contributeur

https://hal.science/hal-01810889

Soumis le : vendredi 8 juin 2018-12:26:01

Dernière modification le : jeudi 4 avril 2024-20:58:05

Dates et versions

hal-01810889 , version 1 (08-06-2018)

Identifiants

HAL Id : hal-01810889 , version 1
ARXIV : 1307.1337
DOI : 10.11648/j.ijgg.20140206.11

Citer

Bernard Ycart, Frédéric Pont, Jean-Jacques Fournié. Statistical data mining for symbol associations in genomic databases. International Journal of Genetics and Genomics, 2014, 2 (6), pp.97-104. ⟨10.11648/j.ijgg.20140206.11⟩. ⟨hal-01810889⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UGA CNRS LJK LJK_PS LJK_PS_IPS ANR UNIV-UT3 UT3-TOULOUSEINP

103 Consultations

0 Téléchargements

Statistical data mining for symbol associations in genomic databases

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager