|
Tensor Voting: Review,
Applications to Computer Vision and Machine Learning |
Gerard
Medioni – U.S.C.
|
We first
briefly review tensor voting, which is an efficient, non-iterative
framework for tackling perceptual organization problems in arbitrary
dimension spaces. It is based on data representation by second-order
symmetric tensors, which allow a unified representation of inliers of
smooth structures, discontinuities and outliers, and data
communication by tensor voting, during which tokens propagate
information in their neighborhood by casting tensor votes. These votes
convey the amount of support of the voter for a structure (such as a
curve or a hyper-surface) that goes through the voter and receiver. No
parametric models are assumed for the underlying structure and the
criteria for determining whether a structure goes through the data are
proximity and good continuation. Our framework has proven to be very
robust even under extreme noise corruption, with a single free
parameter, the scale of the voting field.
The
second part of the talk focuses on the application of tensor voting to
real computer vision problems. Since many computer vision problems,
such as stereo and motion analysis, can be expressed as the inference
of smooth structures, they can be addressed within a perceptual
organization framework. For instance, potential pixel correspondences
generate tokens in 3- and 4-D for stereo and motion respectively. In
that space, correct matches should form salient, coherent structures
that correspond to the scene objects, while wrong matches do not align
as well as the correct ones and can be eliminated.
Finally,
we show how tensor voting can be applied to problems in higher
dimensions, while keeping the computational complexity at reasonable
levels. Since the tensors can represent all possible structure types,
which range from junctions to hyper-volumes, multiple structures of
different dimensionality can be inferred at the same time and interact
with each other. Since all processing is local, computational
complexity depends on the number of neighbors of each input point and
remains manageable even for very large numbers of inputs in high
dimensions. Therefore, tensor voting could be an alternative to
methods such as Locally Linear Embedding and Isomap, which are
state-of-the-art algorithms in machine learning.
|
Combining Holistic and Local
Representations using Kernels over Sets |
Amnon
Shashua and Tamir Hazan – Hebrew
U.
|
In the
area of learning from observations there are two main paths that are often
mutually exclusive: (i) the design of learning algorithms, and (ii) the
data representation scheme. The algorithm designers take pride in the fact
that their algorithm can generalize well given straightforward data
representations, whereas
those who work on data representations demonstrate often remarkable
results with sophisticated data representations using only straightforward
learning algorithms. This dichotomy is probably most emphasized in the
area of computer vision, where image understanding from observations
involve data instances of images or image sequences containing huge
amounts of data. Our work is about bridging the gap between algorithms and
representations. The key is to allow advanced algorithms (which typically
require metric structure on the instance space) to work with advanced data
representations (which are often not easily embedded into a metric space).
I will
present a general family of algebraic positive definite similarity
functions over spaces of matrices with varying column rank. The columns
can represent local regions in an image (whereby images have varying
number of local parts), images of an image sequence, motion trajectories
in a multibody motion, and so froth.
The family of similarity measures will be shown to be exhaustive,
thus providing a cook-book of sorts covering the possible "wish lists"
from similarity measures over sets of varying cardinality.
|
Solving geometric PDEs on
manifolds |
Alon
Spira and Ron Kimmel –
Technion
|
In this
talk we present numerical schemes for implementing geometric flows of
curves and images on manifolds. We consider a 2D parameterization plane
that is mapped to an N-dimensional space. Our approach in devising the
schemes is to implement them on the uniform Cartesian grid of the
parameterization plane instead of doing so in the N-dimensional space.
This enhances the efficiency and robustness of the resulting numerical
schemes.
The first
numerical scheme is an efficient solution to the eikonal equation on
parametric manifolds. The scheme is based on Kimmel and Sethian's solution
for triangulated manifolds, but uses the metric tensor of the parametric
manifold in order to implement the scheme on the parameterization plane.
The scheme is used to devise a short time kernel for the Beltrami image
enhancing flow. The kernel enables an arbitrary time step for the flow for
regular images as well as images painted on manifolds, such as face
images. The numerical scheme is further used for face recognition by
constructing an invariant face signature from distances calculated on the
face manifold.
Another
numerical scheme implements curve evolution by geodesic curvature flow on
parametric manifolds. The flow is implemented by back projecting the curve
from the manifold to the parameterization plane, calculating the flow on
the plane by the level sets method and then mapping it back to the
manifold. Combining this flow with geodesic constant flow enables the
implementation of geodesic active contours for images painted on
parametric manifolds.
|
Multiscale Segmentation by
Combining Motion and Intensity Cues |
Meirav
Galun, Alexander Apartsin and Ronen Basri - Weizmann
|
Motion
provides a strong cue for segmentation. In this talk we present a
multiscale method for motion segmentation. Our method begins with local,
ambiguous optical flow measurements. It uses a process of aggregation to
resolve the ambiguities and reach reliable estimates of the motion. In
addition, as the process of aggregation proceeds and larger aggregates are
identified it employs a progressively more complex model to describe the
motion. In particular, we proceed by recovering translational motion at
fine levels, through affine transformation at intermediate levels, to 3D
motion (described by a fundamental matrix) at the coarsest levels.
Finally, the method is integrated with a segmentation method that uses
intensity cues. We further demonstrate the utility of the method on both
random dot and real motion sequences.
|
The Canonical Correlations of
Color Images |
Yacov
Hel-Or - IDC |
Over the
last decade or so a lot of effort has been invested in an attempt to study
the underlying statistics of natural images. Most of this effort, however,
dealt with gray-scale images, and quite a number of studies attempted to
model the *spatial dependencies* existing between pixel values. Although
impressive results have been achieved in a variety of problems by applying
prior models on gray-scale images, only a few studies have dealt with
prior models on color images. In the latter there is a need to
characterize spatial as well as *spectral* (color) dependencies.
In this
talk I will suggest a new approach that exploits the spectral dependencies
in color images using the Canonical Correlation Analysis (CCA). I
will show how this statistical inference can help solve Inverse Problems
in general and the Demosaicing problem in particular.
It is
an interesting fact that the resulting statistical inference that is
derived solely from the statistical properties of natural images, can also
be derived independently from the characteristics of the human visual
system. This suggests that the human visual system has adapted itself to
the statistical properties of natural color images, and that the proposed
approach is based on a reliable statistical model.
This work
was conducted at HP labs. |
An algorithm based on
Biological Gain control for
High Dynamic Range Compression |
Hedva Spitzer – Tel-Aviv
U.
|
The
visual system has the ability to see and get detailed information from
high dynamic range scene. For example, a person can observe items in a one
sight while observing in a dim room and outside through a window. An algorithm for high dynamic
range compression that can be applied for still and video images is
presented. This algorithm is based on a biological model which is
suggested also for wide dynamic range and lightness constancy. It succeeds in automatically
compressing the dynamic range of images to a 'human vision appearance (as
is commonly required in cameras and displays) while maintaining contrast
and even improving it. The biological basis is retinal mechanisms of
adaptation (gain control): ‘local’, and ‘remote’. These mechanisms enable
video image applications, since they take into account the dynamics of
human adaptation mechanisms. The results indicate that the contribution of
adaptation mechanisms to image appearance is significant, robust, and were
proven to fit next generation High dynamic range cameras (CMOS based).
|
Dynamosaics: Dynamic
Mosaics with Non-Chronological Time |
Alex
Rav-Acha and Shmuel Peleg – Hebrew
U.
|
With the
limited field of view of human vision, our perception of most scenes is
built over time while our eyes are scanning the scenes. In the case of
static scenes this process can be modeled by panoramic mosaicing:
stitching together images into a panoramic view. Can a dynamic scene,
scanned by a video camera, be represented with a dynamic panoramic
video?
When a
video camera is scanning a dynamic scene, different regions are visible at
different times. The chronological time when a region becomes visible in
the input video is not part of the scene dynamics, and may be ignored.
Only the ``local time'' during the visibility period of each region is
relevant for the dynamics of the scene, and should be used for building
the dynamic mosaics.
We used
the space-time volume, when 2D image frames are stacked on the time axis
to form a 3D volume, as a basic representation which enables to create
dynamic mosaics. Various 2D
slices of the space-time volume can manipulate the chronological time and
generate panoramic movies. The chronological time can even be reversed
without affecting the local time. E.g., Given a video camera scanning
water falls from left to right, we can generate a video scanning the falls
from right to left, but in contradiction to reversal of the video
sequence, the water will flow down!
|
Pixels Correlated to
Sound |
Einat
Kidron, Yoav Schechner, and Michael Elad – Technion
|
People
and animals fuse auditory and visual information to obtain robust
perception. A particular benefit of such cross-modality analysis is the
ability to localize visual events associated with sound sources. We are
interested in a computer-vision approach that localizes the image pixels
associated with sound, aided by a single microphone. Past efforts
encountered problems stemming from the huge gap between the dimensions
involved and the available data. This had led to solutions suffering from
low spatio-temporal resolutions. We present a rigorous analysis of the
fundamental problems associated with audio-visual localization. We then
present a stable and robust algorithm which overcomes past deficiencies.
It grasps dynamic events with high spatial resolution, and derives a
unique and stable result. It exploits the fact that such events are
typically spatially sparse. The algorithm is simple and efficient thanks
to its reliance on linear programming. The formulation is convex and free
of user-defined parameters. Its capabilities are demonstrated in
experiments, where the algorithm overcomes substantial visual distractions
and audio noise.
|
Space-Time Video
Completion |
Yonatan
Wexler, Eli Shechtman and
Michal Irani – Weizmann
|
We
present a method for space-time completion of large space-time holes in
video sequences of complex dynamic scenes. The missing portions are
filled-in by sampling spatio-temporal patches from the available parts of
the video, while enforcing global spatio-temporal consistency between all
patches in and around the hole. This is obtained by posing the task of
video completion and synthesis as a global optimization problem with a
well-defined objective function.The consistent completion of static scene
parts simultaneously with dynamic behaviors leads to realistic looking
video sequences. Space-time video completion is useful for a variety of
tasks, including, but not limited to:
(i)
Sophisticated video removal (of undesired static or dynamic objects) by
completing the appropriate static or dynamic background information
(ii)
Correction of missing/corrupted video frames in old movies
(iii)
Synthesis of new video frames to add a visual story, modify it, or
generate a new one.
|
|
Dynamic Visual Search
Using Inner-Scene Similarity:
Algorithms and Inherent
Limitations |
Tamar
Avraham and Micha Lindenbaum – Technion
|
A dynamic
visual search framework based mainly on inner-scene similarity is
proposed. Algorithms as well as measures quantifying the difficulty of
search tasks are suggested.
Given a
number of candidates (e.g. sub-images), our basic hypothesis is that more
visually similar candidates are more likely to have the same identity.
Both deterministic and stochastic approaches, relying on this hypothesis,
are used to quantify this intuition.
Under the
deterministic approach, we suggest a measure similar to Kolmogorov's
$\epsilon$-covering that quantifies the difficulty of a search task and
bounds the performance of all search algorithms. We also suggest a simple
algorithm that meets this bound.
Under the
stochastic approach, we model the identities of the candidates as
correlated random variables and characterize the task using its second
order statistics. We derive a search procedure based on minimum MSE linear
estimation. Simple extensions
enable the algorithm to use top-down and/or bottom-up information, when
available. Both approaches
are evaluated experimentally.
|
Height from moving
shadows |
Yaron Caspi and Mike Werman –
Hebrew U.
|
Plane +
Parallax have been touted as an excellent representation for 3D
reconstruction. Several ways to recover 3D parallax have been proposed in
the past, most of them relay on point matches. In this talk we
describe how shadows or light stripes may be used to compute a plane + parallax
representation, where the 3D parallax refers to the height from the ground
plane. The method is based on
analyzing shadows of vertical poles (e.g., a tall building's contour) that
sweep the object twice.
Existing
beam scanning approaches (shadow or light stripes) will be reviewed, and
the differences and similarities with the proposed method will be
discussed. We show that in
contrast to existing methods, that recover the distance of a point from
the camera, our approach measures the height from the ground plane
directly. This is
particularly useful, when the camera cannot face the scene orthogonally,
and the object is very far from the camera.
|