May 16, Wednesday 14:15, Room 303, Jacobs Building
Title: A Unified Framework for Approximating and Clustering Data
Lecturer:
Michael Langberg
Lecturer homepage
: http://www.openu.ac.il/home/mikel/
Affiliation : The Open University
One of the great challenges in computational theory is the extraction
of patterns from massive and high-dimensional data sets, e.g.,
clustering. The field of geometric clustering is an extensively
studied one with a wide spectrum of ``real world'' applications.
A ``coreset'' for a given clustering optimization problem is a
``compressed'' representation of the data at hand, in the sense that a
solution for the clustering problem with the (small) coreset as input
would yield an approximate solution to the problem with the original
(larger) input. Using traditional techniques, a coreset usually
implies provable linear time algorithms for the corresponding
clustering problem.
In this talk I will present a unified and abstract framework that
yields the efficient construction of succinct coresets for several
clustering problems. Representing the data by a set F of positive
functions over a ground set X, our framework forges a link between the
combinatorial complexity of the function family F at hand (measured in
terms of classical VC dimension) and the paradigm of coresets. Our
coresets are obtained via randomized sampling according to a
delicately designed sampling distribution. Examples in which our
framework has been successful (and improves over previous works)
include the k-median, the k-line median, projective clustering, linear
regression, low rank approximation, and subspace approximation
problems.
This talk is based on joint works with Dan Feldman and Leonard Schulman.