Scalable and Versatile k-mer Indexing for High-Throughput Sequencing Data - INRIA - Institut National de Recherche en Informatique et en Automatique Accéder directement au contenu
Communication Dans Un Congrès Année : 2013

Scalable and Versatile k-mer Indexing for High-Throughput Sequencing Data

Résumé

Philippe et al. (2011) proposed a data structure called Gk ar- rays for indexing and querying large collections of high-throughput sequencing data in main-memory. The data structure supports versa- tile queries for counting, locating, and analysing the coverage profile of k-mers in short-read data. The main drawback of the Gk arrays is its space-consumption, which can easily reach tens of gigabytes of main- memory even for moderate size inputs. We propose a compressed variant of Gk arrays that supports the same set of queries, but in both near-optimal time and space. In practice, the compressed Gk arrays scale up to much larger inputs with highly competitive query times compared to its non-compressed predecessor. The main applica- tions include variant calling, error correction, coverage profiling, and sequence assembly.
Fichier principal
Vignette du fichier
cgka-RR-2013.pdf (197.68 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

lirmm-00806103 , version 1 (29-03-2013)

Identifiants

  • HAL Id : lirmm-00806103 , version 1

Citer

Niko Välimäki, Eric Rivals. Scalable and Versatile k-mer Indexing for High-Throughput Sequencing Data. ISBRA'2013: International Symposium on Bioinformatics Research and Applications, May 2013, Charlotte, NC, United States. pp.237-248. ⟨lirmm-00806103⟩
489 Consultations
768 Téléchargements

Partager

Gmail Facebook X LinkedIn More