Hardness of sequence alignment from fine-grained complexity

Course project of Computational Biology, 2022

We study the hardness of sequence alignment problem in the context of data structure lower bounds. In particular, the problem is given a database of many sequences, and on each query sequence, find the sequence from the database that best aligns to the query. This characterizes the task of finding similar species from a huge biological database. We show that assuming Orthogonal Vectors Conjecture in fine-grained complexity, even approximately finding the optimal answer is very hard, in the sense that there could not be a data structure that simultaneously achieves $\mathrm{poly}(n)$ space and $n^{1-\epsilon}$ query time, where $n$ is the length of the database. This means that essentially we need to enumerate over the whole database to get the answer.

See the report here.

Yiding Zhang