Data integration in the agronomic domain : national and international data discovery system - INRIA - Institut National de Recherche en Informatique et en Automatique Accéder directement au contenu
Poster De Conférence Année : 2015

Data integration in the agronomic domain : national and international data discovery system

Résumé

Current research in Agronomy has produced a vast amount of genomic, genetic and phenomic data. To deal with the Volume, Variety and Velocity of those data, it is necessary to first refine to candidate datasets through data discovery then to integrate them through semantic web technologies. Data discovery is an approach that allows to easily search for datasets based on keywords and metadata. The plant bioinformatic node of the Institut Français de Bioinformatique (IFB) gives access to several public information systems hosting domain specific data. It is composed of five bioinformatics platforms : the South Green platform, the LIPM platform, the Roscoff platform ABiMS, the platform for Arthopods for Agroecosystems Arthropods and the URGI platform. The later one plays a key role in several national an international projects like the Whea Initiative. Those platforms integrate several plant genomic, genetic and phenomic data, which they need to expose in data discovery and integration systems. The distributed data discovery system need an ETL (Extraction, Transformation and Loading) based integration pipeline implemented on each platform. This ETL can either be developed from scratch or be based on existing technologies such as KarmaWeb, Talend or Open Refine. The pipeline is being developed at the URGI, and will be deployed on all IFB plant nodes. The data discovery system is based on SolR (implemented in the Transplant portal http://www.transplantdb.eu) which uses the Lucene search framework at its core for full-text indexing. Here, we will present the data discovery system architecture and the ETL solutions evaluation and comparison. Work funded by IFB investment for the future infrastructure project, IFB_Plant node.
Fichier non déposé

Dates et versions

lirmm-01274725 , version 1 (16-02-2016)

Identifiants

  • HAL Id : lirmm-01274725 , version 1

Citer

Florian Philippe, Aravind Venkatesan, Nordine El Hassouni, Cyril Pommier, Manuel Ruiz, et al.. Data integration in the agronomic domain : national and international data discovery system. JOBIM: Journées Ouvertes Biologie Informatique Mathématiques, Jul 2015, Clermont-Ferrand, France. , Post-105 (#59), 2015. ⟨lirmm-01274725⟩
285 Consultations
0 Téléchargements

Partager

Gmail Facebook X LinkedIn More