A Machine Learning Approach to Classifying Ellipsis in Dialogue
We are concerned with the problem of identifying the interpretation type of non-sentential question fragments in dialogue, where this task is part of a system for dialogue parsing and understanding. We present a machine learning approach to the disambiguation of bare sluices in dialogue. We extracted a set of heuristic principles from a corpus-based sample and formulated them as probabilistic Horn clauses. We then used the predicates of such clauses to create a set of domain independent features to annotate an input dataset. We ran two different machine learning systems: SLIPPER, a rule-based learning algorithm, and TiMBL, a memory-based procedure. Both algrorithms performed well, yielding similar success rates of approx 90%. These results indicate that the features in terms of which we formulated our heuristic principles have significant predictive power, and that rules closely resembling our Horn clauses can be learned automatically from these features.
Joint work with Raquel Fernandez and Jonathan Ginzburg