May 16, Wednesday 14:15, Room 303, Jacobs Building

Title: A Unified Framework for Approximating and Clustering Data

Lecturer: Michael Langberg

Lecturer homepage : http://www.openu.ac.il/home/mikel/

Affiliation : The Open University

 

One of the great challenges in computational theory is the extraction of patterns from massive and high-dimensional data sets, e.g., clustering. The field of geometric clustering is an extensively studied one with a wide spectrum of ``real world'' applications. A ``coreset'' for a given clustering optimization problem is a ``compressed'' representation of the data at hand, in the sense that a solution for the clustering problem with the (small) coreset as input would yield an approximate solution to the problem with the original (larger) input. Using traditional techniques, a coreset usually implies provable linear time algorithms for the corresponding clustering problem. In this talk I will present a unified and abstract framework that yields the efficient construction of succinct coresets for several clustering problems. Representing the data by a set F of positive functions over a ground set X, our framework forges a link between the combinatorial complexity of the function family F at hand (measured in terms of classical VC dimension) and the paradigm of coresets. Our coresets are obtained via randomized sampling according to a delicately designed sampling distribution. Examples in which our framework has been successful (and improves over previous works) include the k-median, the k-line median, projective clustering, linear regression, low rank approximation, and subspace approximation problems.

This talk is based on joint works with Dan Feldman and Leonard Schulman.