Unsupervised and Supervised Learning in Natural Language Processing
The research field of natural language processing (NLP) has been receiving growing attention in recent years. In particular, the major focus on empirical methods, which learn vast knowledge and inferences from available text collections (corpora), boosted feasibility and robustness of language processing techniques and facilitated real world applications. This talk will first review shortly major foundational and applied tasks within NLP and how they are approached by empirical methods. Certain issues will be illustrated by references to my personal work, stressing the need to minimize reliance on human supervision and resources. I will then describe in more detail an ongoing line of research for unsupervised semantic learning, addressing several disambiguation and inference tasks. Further, a novel approach to the (essentially supervised) task of text categorization will be presented, which establishes a different category specification scheme based on unsupervised learning, enjoying several practical advantages. I will conclude with some new directions in corpus-based semantic modeling and the roles they open for unsupervised and supervised learning.