Computer Science Colloquium, 2004-2005

Dan Pelleg
Computer Science Department, Carnegie Mellon University.

January 5th, 2004

Using Tarjan's Red Rule for Fast Dependency Tree Construction

We discuss the problem of generating dependency-trees to model a given set of data. The traditional solution, proposed by Chow and Liu, solves the problem by finding a minimum-spanning tree in the attribute space, and runs in time quadratic in the number of attributes and linear in the number of data-points. We present an improved algorithm that scales better with the size of the data. We begin by proposing a new minimum-spanning tree algorithm. Surprisingly, this algorithm is in fact slower than currently-known algorithms. We then show how to embed it in a framework that considers probabilistic confidence intervals for the edge weights. This leads to a fast dependency-tree implementation. We present experimental results for datasets with hundreds of attributes and millions of points.

Joint work with Andrew Moore.

Shuly Wintner
Last modified: Tue Nov 16 21:13:41 IST 2004