Lower bounds for text indexing with mismatches and differences - INRIA - Institut National de Recherche en Informatique et en Automatique Accéder directement au contenu
Communication Dans Un Congrès Année : 2019

Lower bounds for text indexing with mismatches and differences

Résumé

In this paper we study lower bounds for the fundamental problem of text indexing with mis-matches and differences. In this problem we are given a long string of length n, the "text", and the task is to preprocess it into a data structure such that given a query string Q, one can quickly identify substrings that are within Hamming or edit distance at most k from Q. This problem is at the core of various problems arising in biology and text processing. While exact text indexing allows linear-size data structures with linear query time, text indexing with k mismatches (or k differences) seems to be much harder: All known data structures have exponential dependency on k either in the space, or in the time bound. We provide conditional and pointer-machine lower bounds that make a step toward explaining this phenomenon. We start by demonstrating lower bounds for $k = Θ(log n)$. We show that assuming the Strong Exponential Time Hypothesis, any data structure for text indexing that can be constructed in polynomial time cannot have strongly sublinear query time. This bound also extends to the setting where we only ask for $(1 + ε$)-approximate solutions for text indexing. However, in many applications the value of k is rather small, and one might hope that for small k we can develop more efficient solutions. We show that this would require a radically new approach as using the current methods one cannot avoid exponential dependency on k either in the space, or in the time bound for all even $(8 /√ 3) √ log n ≤ k = o(log n)$. Our lower bounds also apply to the dictionary look-up problem, where instead of a text one is given a set of strings.
Fichier principal
Vignette du fichier
main-blast.pdf (363.91 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-01960182 , version 1 (21-12-2018)

Identifiants

Citer

Vincent Cohen-Addad, Laurent Feuilloley, Tatiana Starikovskaya. Lower bounds for text indexing with mismatches and differences. SODA 2019 - Symposium on Discrete Algorithms, Jan 2019, San Diego, United States. ⟨hal-01960182⟩
191 Consultations
139 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More