Using Tarjan's Red Rule for Fast Dependency Tree Construction
We discuss the problem of generating dependency-trees to model a given set of data. The traditional solution, proposed by Chow and Liu, solves the problem by finding a minimum-spanning tree in the attribute space, and runs in time quadratic in the number of attributes and linear in the number of data-points. We present an improved algorithm that scales better with the size of the data. We begin by proposing a new minimum-spanning tree algorithm. Surprisingly, this algorithm is in fact slower than currently-known algorithms. We then show how to embed it in a framework that considers probabilistic confidence intervals for the edge weights. This leads to a fast dependency-tree implementation. We present experimental results for datasets with hundreds of attributes and millions of points.
Joint work with Andrew Moore.