MTG-Link: filling gaps in draft genome assemblies with linked read data - INRIA - Institut National de Recherche en Informatique et en Automatique Accéder directement au contenu
Poster De Conférence Année : 2020

MTG-Link: filling gaps in draft genome assemblies with linked read data

Résumé

The complete and accurate reconstruction of large genomes remains challenging. The scaffolding step orders and orients contigs but generates undefined sequences, called gaps. Linked read technologies, such as the 10X Genomics Chromium platform, have a great potential for filling the gaps; they provide long-range information while maintaining the power and accuracy of short-read sequencing. Thus, reads that have been sequenced from the same long DNA molecule (30-50 Kb) can be identified by a small barcode sequence. Several tools have been developed for gap-filling, but none uses the long-range information of the linked read data. Here, we present MTG-Link, a novel gap-filling tool dedicated to linked read data generated by 10X Genomics. MTG-Link is a Python pipeline combining the local assembly tool MindTheGap and an efficient read subsampling based on the barcode information. For each gap, it extracts the linked reads whose barcode is observed in the gap flanking sequences, and assembles them into contigs by traversing their de Bruijn graph. MTG-Link tests different parameters values for gap-filling, followed by an automatic qualitative evaluation of the assembly. It returns a GFA file, containing the gap-filled sequences of each gap. Validation was performed on a set of simulated gaps from real datasets with various genome complexities ; it showed that the read subsampling step of MTG-Link enables to get better genome assemblies than using MindTheGap. We applied MTG-Link on individual genomes of a mimetic butterfly (H. numata); it significantly improved the contiguity of a 1.3 Mb locus of biological interest (https://github.com/anne-gcd/MTG-Link).
BiodiversityGenomics2020_AbstractPoster_AnneGUICHARD.pdf (18.2 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-03074227 , version 1 (16-12-2020)

Identifiants

  • HAL Id : hal-03074227 , version 1

Citer

Anne Guichard, Fabrice Legeai, Arthur Le Bars, Paul Yann Jay, Mathieu Joron, et al.. MTG-Link: filling gaps in draft genome assemblies with linked read data. Biodiversity Genomics 2020, Oct 2020, Online, France. . ⟨hal-03074227⟩
132 Consultations
11 Téléchargements

Partager

Gmail Facebook X LinkedIn More