GrAPFI: predicting enzymatic function of proteins from domain similarity graphs - INRIA - Institut National de Recherche en Informatique et en Automatique Accéder directement au contenu
Article Dans Une Revue BMC Bioinformatics Année : 2020

GrAPFI: predicting enzymatic function of proteins from domain similarity graphs

Résumé

Background: Thanks to recent developments in genomic sequencing technologies, the number of protein sequences in public databases is growing enormously. To enrich and exploit this immensely valuable data, it is essential to annotate these sequences with functional properties such as Enzyme Commission (EC) numbers, for example. The January 2019 release of the Uniprot Knowledge base (UniprotKB) contains around 140 million protein sequences. However, only about half of a million of these (UniprotKB/SwissProt) have been reviewed and functionally annotated by expert curators using data extracted from the literature and computational analyses. To reduce the gap between the annotated and unannotated protein sequences, it is essential to develop accurate automatic protein function annotation techniques. Results: In this work, we present GrAPFI (Graph-based Automatic Protein Function Inference) for automatically annotating proteins with EC number functional descriptors from a protein domain similarity graph. We validated the performance of GrAPFI using six reference proteomes in UniprotKB/SwissProt, namely Human, Mouse, Rat, Yeast, E. Coli and Arabidopsis thaliana. We also compared GrAPFI with existing EC prediction approaches such as ECPred, DEEPre, and SVMProt. This shows that GrAPFI achieves better accuracy and comparable or better coverage with respect to these earlier approaches. Conclusions: GrAPFI is a novel protein function annotation tool that performs automatic inference on a network of proteins that are related according to their domain composition. Our evaluation of GrAPFI shows that it gives better performance than other state of the art methods. GrAPFI is available at https://gitlab.inria.fr/bsarker/bmc_grapfi.git as a stand alone tool written in Python.
Fichier principal
Vignette du fichier
Sarker et al BMC Bioinformatics-2020.pdf (2.99 Mo) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-03022601 , version 1 (24-11-2020)

Identifiants

Citer

Bishnu Sarker, David Ritchie, Sabeur Aridhi. GrAPFI: predicting enzymatic function of proteins from domain similarity graphs. BMC Bioinformatics, 2020, ⟨10.1186/s12859-020-3460-7⟩. ⟨hal-03022601⟩
90 Consultations
123 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More