Shuly Wintner's Research


Research statement

My work in recent years focuses on two themes: formal grammar, in particular unification grammars; and computational processing of Hebrew. My main motivation is a better understanding of the mathematical and computational properties of formal grammars, with an emphasis on linguistically motivated grammatical formalisms.

In the first track, I apply techniques and paradigms from theoretical computer science to the study of natural languages, and in particular to unification-based grammatical formalisms, one of the most popular computational means for expressing the syntax of natural languages. Unification grammars can be viewed as high-level programing languages. In my Ph.D. thesis I devised a compilation technique for a general unification formalism. This work inspired a continuing interest in the mathematical and computational properties of unification grammars. In particular, I have defined a notion of modules for such grammars and provided a semantics that is compositional and fully-abstract with respect to a simple grammar composition operator. I later showed (with Efrat Jaeger and Nissim Francez) a condition on unification grammars that ensures the decidability of the recognition problem. More recently, I have defined (with Daniel Feinstein) a tighter constraint that ensures that grammars which respect it generate only mildly-context-sensitive languages.

These investigations provide a better understanding of unification grammars and their mathematical properties, which is necessary for better computational solutions. My current work (with Yael Cohen-Sygal) focuses on provisions for grammar engineering based on a formal definition of modules in unification grammars.

The other line of work is concerned with natural language resources and applications for the Hebrew language (and, to a lesser extent, also Arabic). My belief is that the special structure of Hebrew, and in particular its derivational morphology and high lexical ambiguity, make it a perfect testbed for computational investigations. In a survey paper I show that the state of the art in Hebrew processing leaves much to be desired. Under the auspices of the Knowledge Center for Hebrew Telecommunication I am involved in two projects, one whose purpose is to create a WordNet for Hebrew (with Danny Shacham), and the other whose goal is the construction of a finite-state morphological analyzer for the language (with Shlomo Yona). While the main contribution of these projects is practical, I am mostly interested in the linguistic insights that such projects can generate. For example, the work on WordNet has yielded interesting insights into the ways different languages (English, Hebrew and Italian) encode gender differences, and how these differences can be expressed computationally. We are also investigating the interface between morphology and semantics, and in particular the semantic features of the Hebrew root-and-pattern word formation process.

A separate project I am involved in (with Dan Roth and Ezra Daya) is concerned with machine learning for complex morphological tasks, and in particular Hebrew and Arabic root identification and Hebrew morphological disambiguation. Identifying the root of a given word in a Semitic language is an important task, in some cases a crucial part of morphological analysis. It is also a non-trivial task, which many humans find challenging. Given the large number of potential roots, we address the problem as one of combining several classifiers, each predicting the value of one of the root's consonants. We show that when these predictors are combined by enforcing some fairly simple linguistics constraints, high accuracy, which compares favorably with human performance on this task, can be achieved. We are currently implementing the same method for extracting the roots of Arabic words, and we plan to use it for morphological disambiguation in the near future.

Finally, in a work which integrates my interests in Semitic languages and in formal grammar, I have designed (with Yael Cohen-Sygal) a modest extension of finite-state machines which accounts very naturally for non-concatenative morphological processes such as circumfixation, root-and-pattern and even limited reduplication. We prove that this model is indeed finite-state, maintaining the closure properties of regular languages and relations. We then use it to elegantly describe some non-concatenative phenomena.

A list of current and past research projects is available at the web site of the Computational Linguistics Group.


Shuly Wintner's Home Page, http://cs.haifa.ac.il/~shuly
Maintained by shuly@cs.haifa.ac.il. Last modified: Fri Dec 22 16:24:33 IDT 2006