A contribution in nanoinformatics to facilitate the collection of structured data for Quality-by-Design in nanomedicine - INRIA - Institut National de Recherche en Informatique et en Automatique Accéder directement au contenu
Communication Dans Un Congrès Année : 2019

A contribution in nanoinformatics to facilitate the collection of structured data for Quality-by-Design in nanomedicine

Résumé

Background. The complexity of nanomaterials, their physico-chemical properties and their interactions with biological and environmental systems, leads to uncertainty in the applicability of experimental data for regulatory purposes that demand sound scientific answers [1]. A major challenge is the establishment of common languages, standards and harmonised infrastructures with applicability to the needs of the different stakeholders. Nanoinformatics is the science and practice of determining which information is relevant to compare characterization of nanomaterials and to design optimized and safe nanodevices. Objectives. Our goal is to bring a new contribution in nanoinformatics by proposing a text mining solution for the quasi-automatic collection of nanomaterials descriptors suited to the preparation of Quality-by-Design studies [2]. Methods. The first step relies on the construction of a non-structured database composed of selected scientific articles in PDF format. Secondly, the Quality-by-Design (ICH Q8-Q11) terminology is used to represent the main descriptors of nanomaterials with three main categories: (i) Critical Quality Attributes to describe the key physical, chemical and biological properties, (ii) Critical Material Attributes and (iii) Critical Process Parameters to characterize both design and manufacturing key variables. In a third step, a text mining algorithm, implemented in the Python language, is used to automatically detect CQA, CMA and CPP in articles and to build up a SQL database. A data curation step is then performed to complete and clean up the QbD database. The proposed approach was applied to a set of 30 scientific articles in Nanomedicine and performances in terms of sensitivity and specificity were finally assessed. Results. In total, 1740 words were automatically analyzed, and we obtained an average response of 83,9% (1459/1740) correct identifications, decomposed as follows: 82,2% (235/286) of accuracy for the CQA descriptors, 92,9% (604/648) for CPP and 76,9% (620/806) for CMA. Conclusion. The proposed nanoinformatic solution has shown promising performances and allows to automate a very time-consuming task related to the collection and analysis of relevant scientific data for risk assessment in Quality-by-Design studies. Short-term perspectives will be focused to the automatic extraction of more complex descriptors.
Fichier principal
Vignette du fichier
Roulette-Abstract-NME19.pdf (229.16 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-02396540 , version 1 (06-12-2019)

Identifiants

  • HAL Id : hal-02396540 , version 1

Citer

Valentine Roulette, Guillaume Delplanque, Jeanne Deleforterie, Thierry Bastogne. A contribution in nanoinformatics to facilitate the collection of structured data for Quality-by-Design in nanomedicine. NanoMed Europe, NME 2019, Jun 2019, Braga, Portugal. ⟨hal-02396540⟩
129 Consultations
71 Téléchargements

Partager

Gmail Facebook X LinkedIn More