Computer Science Colloquium, 2002-2003

Michal Ziv-Ukelson
Department of Computer Science, University of Haifa
25th December, 2002

A Sub-quadratic Sequence Alignment Algorithm for Unrestricted Scoring Matrices

The classical algorithm for computing the similarity between two sequences uses a dynamic programming matrix, and compares two strings of size n in O(n^2) time. We address the challenge of computing the similarity of two strings in sub-quadratic time, for metrics which use a scoring matrix of unrestricted weights. Our algorithm applies to both local and global similarity computations.

The speed-up is achieved by dividing the dynamic programming matrix into variable sized blocks, as induced by Lempel-Ziv parsing of both strings, and utilizing the inherent periodic nature of both strings. This leads to an O(n^2 / log n) algorithm for an input of constant alphabet size. For most texts, the time complexity is actually O(h*n^2 /log n) where h <= 1 is the entropy of the text.

Joint work with Gad M. Landau and Maxime Crochemore.


Shuly Wintner
Last modified: Thu Oct 24 11:08:47 IST 2002