Approximate Hashing for Bioinformatics - INRIA - Institut National de Recherche en Informatique et en Automatique Accéder directement au contenu
Communication Dans Un Congrès Année : 2021

Approximate Hashing for Bioinformatics

Résumé

The paper extends ideas from data compression by deduplication to the Bioinformatic field. The specific problems on which we show our approach to be useful are the clustering of a large set of DNA strings and the search for approximate matches of long substrings, both based on the design of what we call an approximate hashing function. The outcome of the new procedure is very similar to the clustering and search results obtained by accurate tools, but in much less time and with less required memory.
Fichier principal
Vignette du fichier
CIAA_2021_paper_11.pdf (704.33 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-03219482 , version 1 (06-05-2021)

Identifiants

  • HAL Id : hal-03219482 , version 1

Citer

Guy Arbitman, Shmuel T Klein, Pierre Peterlongo, Dana Shapira. Approximate Hashing for Bioinformatics. CIAA 2021 - 25th International Conference on Implementation and Application of Automata, Jul 2021, Bremen, Germany. pp.1-12. ⟨hal-03219482⟩
91 Consultations
296 Téléchargements

Partager

Gmail Facebook X LinkedIn More