Scalable and Versatile k-mer Indexing for High-Throughput Sequencing Data

Niko Välimäki; Eric Rivals

Communication Dans Un Congrès Année : 2013

Scalable and Versatile k-mer Indexing for High-Throughput Sequencing Data

(1, 2) , (3, 4)

1
2
3
4

Niko Välimäki

Fonction : Auteur
PersonId : 938641

Department of Computer Science [Helsinki]

Department of Medical and Clinical Genetics [Helsinki]

Eric Rivals

Fonction : Auteur correspondant
PersonId : 2002
IdHAL : eric-rivals
ORCID : 0000-0003-3791-3973
IdRef : 118021850

Connectez-vous pour contacter l'auteur

Institut de Biologie Computationnelle

Méthodes et Algorithmes pour la Bioinformatique

Résumé

Philippe et al. (2011) proposed a data structure called Gk ar- rays for indexing and querying large collections of high-throughput sequencing data in main-memory. The data structure supports versa- tile queries for counting, locating, and analysing the coverage profile of k-mers in short-read data. The main drawback of the Gk arrays is its space-consumption, which can easily reach tens of gigabytes of main- memory even for moderate size inputs. We propose a compressed variant of Gk arrays that supports the same set of queries, but in both near-optimal time and space. In practice, the compressed Gk arrays scale up to much larger inputs with highly competitive query times compared to its non-compressed predecessor. The main applica- tions include variant calling, error correction, coverage profiling, and sequence assembly.

Mots clés

bioinformatics Next Generation Sequencing document listing query text indexing compressed index biological sequence analysis design and analysis of data structures

Domaines

Bio-informatique [q-bio.QM] Bio-Informatique, Biologie Systémique [q-bio.QM] Algorithme et structure de données [cs.DS]

Fichier principal

cgka-RR-2013.pdf (197.68 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Eric Rivals : Connectez-vous pour contacter le contributeur

https://hal-lirmm.ccsd.cnrs.fr/lirmm-00806103

Soumis le : vendredi 29 mars 2013-14:34:19

Dernière modification le : lundi 30 octobre 2023-13:50:04

Archivage à long terme le : dimanche 30 juin 2013-04:02:06

Dates et versions

lirmm-00806103 , version 1 (29-03-2013)

Identifiants

HAL Id : lirmm-00806103 , version 1

Citer

Niko Välimäki, Eric Rivals. Scalable and Versatile k-mer Indexing for High-Throughput Sequencing Data. ISBRA'2013: International Symposium on Bioinformatics Research and Applications, May 2013, Charlotte, NC, United States. pp.237-248. ⟨lirmm-00806103⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CNRS INRIA INRA MAB LIRMM MIPS UNIV-MONTPELLIER INRAE

489 Consultations

768 Téléchargements

Scalable and Versatile k-mer Indexing for High-Throughput Sequencing Data

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager