A Sub-quadratic Sequence Alignment Algorithm for Unrestricted Scoring Matrices
The classical algorithm for computing the similarity between two sequences uses a dynamic programming matrix, and compares two strings of size n in O(n^2) time. We address the challenge of computing the similarity of two strings in sub-quadratic time, for metrics which use a scoring matrix of unrestricted weights. Our algorithm applies to both local and global similarity computations.
The speed-up is achieved by dividing the dynamic programming matrix into variable sized blocks, as induced by Lempel-Ziv parsing of both strings, and utilizing the inherent periodic nature of both strings. This leads to an O(n^2 / log n) algorithm for an input of constant alphabet size. For most texts, the time complexity is actually O(h*n^2 /log n) where h <= 1 is the entropy of the text.
Joint work with Gad M. Landau and Maxime Crochemore.