String Matching Algorithms Fall-2009, Monday, 15-18

 

Professor Gad M. Landau

 

Office hours: Monday, 18-19

8240103 (office), 052-4895100 (cell)

landau@cs.haifa.ac.il

 

Text Book:

Gusfield, D., Algorithms on Strings, Trees, and Sequences. Cambridge University Press, (1997).

 

List of subjects:

 

1.       Exact String Matching

Knuth, D.E., J.H. Morris and V.R. Pratt, Fast Pattern Matching in Strings, SIAM J. Computing, 6, 323-350, (1977).

 

Karp, R.M. and M.O. Rabin, Efficient randomized pattern-matching algorithms, IBM Journal Of Research and Development, 31, 2, 249{260, (1987).

 

2.       Approximate String Matching

Abrahamson, K.R. Generalized String Matching. SIAM J. Comput., 16, 6, 1039-1051 (1987).

 

Landau, G.M. and U. Vishkin, Fast parallel and serial approximate string matching, Journal of Algorithms, 10, 2, 157{169 (1989).

 

Galil, Z. and R. Giancarlo, Improved String Matching with k Mismatches, SIGACT NEWS, 62, 52-54, (1986).

 

Amir, A., M. Lewenstein and E. Porat, Faster algorithms for string matching with k mis-

matches, SODA, 794-803 (2000).

Presentation by A. Amir,  Presentation by M. Lewenstein.

 

3.       LCA - RMQ

 

                    Presentation by Oren Weimann.  

 

 

Berkman, O. and U. Vishkin, Recursive star-tree parallel data-structure,

SIAM J. Computing, 22, 221-242. (1989).

Presentation by Danny Hermelin.  

 

Bender, M. and M. Farach-Colton, The LCA Problem Revisited, Proceedings of the 4th Latin American Symposium on Theoretical Informatics, Pages: 88 – 94

 

Yuan, H. and M.J. Atallah,  Data Structures for Range Minimum Queries in Multidimensional Arrays, SODA 2010.

 

 

4.       Suffix trees

Ukkonen, E., On-line Construction of Suffix Trees, Algorithmica, 14, 3, 249-260 (1995).

Presentation by Ora Arbel,

 

Weiner, P.,  Linear pattern matching algorithm, Proc. 14 IEEE Symposium on Switching and Automata  Theory, 1--11, 1973.

 

 Farach, M. Optimal suffix tree construction with large alphabets.  Proc. 38th IEEE Symposium on Foundations of Computer Science, pp. 137--143, 1997.

 

5.                   Suffix arrays

 Martin Farach-Colton

Karkkaaainen, J. and P. Sanders. Simple linear work suffix array construction. Proc. 30th International Colloquium on Automata, Languages and Programming (ICALP 03)}, pp. 943--955, 2003. LNCS 2719.

 

Kim, D.K., J.S. Sim, H. Park and K. Park. Linear-time construction of suffix arrays. Symp. Combinatorial Pattern Matching (CPM 03)}, pp. 186-- 199, 2003. LNCS 2676.

 

 Ko, P. and S. Aluru. Space-efficient linear-time construction of suffix arrays. Symp. Combinatorial Pattern Matching (CPM 03)}, pp. 200--210, 2003. LNCS 2676.                                                                 

                     

                     Presentation  by Oren Weimann.           

 

6.                   Construction of Aho Corasick Automaton       

                      Aho, A.V. and M.J. Corasick. Efficient string matching. Comm. ACM,

                      18(6):333--340, 1975. 

                     

                      Dori, S. and G.M. Landau, Construction of Aho Corasick automaton in linear time for integer alphabets, Proc. 16th

                      Combinatorial Pattern Matching Conference (CPM), Lecture Notes in Computer Science,  Springer-Verlag,

                      to appear (2005).

                      Presentation by  S. Dori. Presentation by  G. Landau.

 

7.                      Dynamic Programming, Hamming,  LCS,  Edit Distance

 

           

8.       Local Alignment

Smith, T.f., M.S. Waterman, The identification of common molecular subsequences, J. Mol. Biol., 147, 195-197 (1981).

 

Efraty, N., and G.M. Landau, Sparse Normalized Local Alignment, CPM 2004.

Presentation by Nadav Efraty

 

Arslan, A.N., Ö. Eğecioğlu and P.A. Pevzner, A new approach to sequence comparison:

normalized sequence alignment, Bioinformatics, 17, 4, 327-337, (2001).

Presentation by Dekel Tsur

 

 

9.     Monge

                  

Crochemore, M., G.M. Landau and M. Ziv-Ukelson, A Sub-quadratic sequence

alignment algorithm for unrestricted scoring matrices, SIAM J. Comput., 32, 5, 1654 -- 1673 (2003).

Presentation by  M.  Ziv-Ukelson.

 

Aggarwal, A., M. Klawe, S. Moran, P. Shor, and R. Wilber: Geometric Applications of a Matrix-Searching Algorithm, Algorithmica, 2, 195-208 (1987).

 

Tiskin, A.,  Fast distance multiplication of unit-Monge matrices, SODA 2010.

 

Hermelin, D., G.M. Landau, S. Landau and O. Weimann , A Unified Algorithm for Accelerating Edit-Distance Computation via Text-Compression, STACS  529-540 (2009).

 

Lempel, A., and J. Ziv: On the complexity of finite sequences, IEEE Transactions on Information Theory, 22, 75--81 (1976).

 

Ziv, J., and A. Lempel: A Universal Algorithm for Sequential Data Compression, IEEE Transactions on Information Theory, IT-23(3), 337--343 (1977).     

 

               10. Repetitions

                     Main, M.G. and  and R. J. Lorentz. An O (n log n) Algorithm for Finding all

                     Repetitions in a string.  Journal of Algorithms,  5:422--432, 1984.

                    

                     Kolpakov, R. and  G. Kucherov:  Finding Approximate Repetitions under Hamming

                     Distance,  Theoretical Computer Science, 2003, vol 303 (1), pp 135-156.

                             

 

11. Grade: Test.