Abstracts


Global
Motion Estimation from Point Matches

Mica ArieNachimson (Weizmann), Shahar
Kovalsky (Weizmann), Ira KemelmacherShlizerman (U of Washington), Amit Singer
(Princeton) and Ronen Basri (Weizmann)

Multiview structure recovery from a collection
of images requires the recovery of the positions and
orientations of the cameras relative to a global coordinate system. We present
an approach that recovers camera motion as a sequence of two
global optimizations: First, pairwise Essential Matrices are used
to recover the global rotations by applying robust optimization
using either spectral or semidefinite programming relaxations.
Then, we directly employ feature correspondences across images
to recover the global translation vectors using a linear
algorithm based on a novel decomposition of the Essential Matrix.
Our method is efficient and, as demonstrated in our experiments, achieves
highly accurate results on collections of real images for which ground
truth measurements are available.

Epipolar
Geometry Estimation for Urban Scenes with Repetitive Structures

Maria Kushnir and
Ilan Shimshoni – Haifa Univ

Algorithms for the
estimation of epipolar geometry from a pair of images have been very
successful in recent years, being able to deal with wide baseline images. The
algorithms succeed even when the percentage of correct matches from the
initial set of matches is very low. In this paper the problem of scenes with
repeated structures is addressed, concentrating on the common case of
building facades. In these cases a large number of repeated features is found
and can not be matched initially, causing stateoftheart algorithms to
fail. Our algorithm therefore clusters similar features in each of the two
images and matches clusters of features. From these cluster pairs, a set of
hypothesized homographies of the building facade are generated and ranked
mainly according the support of matches of nonrepeating features. Then in a
separate step the epipole is recovered yielding the fundamental matrix. The
algorithm then decides whether the fundamental matrix has been recovered
reliably enough and if not returns only the homography. The algorithm has
been tested successfully on a large number of pairs of images of buildings
from the benchmark ZuBuD database for which several stateoftheart
algorithms nearly always fail.

Photo Sequencing

Yael Moses (IDC), Tali Basha (TAU) and Shai Avidan (TAU)

Dynamic
events such as family gatherings, concerts or sports events are often
captured by a group of people. The set of still images obtained this way is
rich in dynamic content but lacks accurate temporal information. We propose a
method for photosequencing  temporally ordering a set of still images
taken asynchronously by a set of uncalibrated cameras. Photo sequencing
is an essential tool in analyzing (or visualizing) a dynamic scene captured
by still images. The fi
rst step of the method detects sets of corresponding static and dynamic
feature points across images. The static features are used to determine the
epipolar geometry between pairs of images, and each dynamic feature
votes for the temporal order of the images in which it appears. The partial
orders provided by the dynamic features are not necessarily consistent, and
we use rank aggregation to combine them into a globally consistent temporal
order of images.

Image Processing using Reordering of its Patches

Idan
Ram, Michael Elad, and Israel Cohen  Technion

What if we take all the overlapping patches from a given image and organize
them to create the shortest path by using their mutual distances? This
induces a permutation of the image pixels in a way that creates a 1D signal
with maximal regularity. What could we do with such a construction?
In this talk we show that this process enables simple and
intuitive methods for image processing tasks such as denoising and
inpainting, leading to state of theart results. Furthermore, we show how
such reordering of the patches in several scales can lead to a new wavelet
transform which efficiently represents images. We demonstrate the use of
this new transform for various image processing applications, and tie it to
the BM3D algorithm.

Mosaicing of NonOverlapping Images

Yair Poleg and Shmuel Peleg – HUJI

Image alignment and mosaicing are usually performed on a set of overlapping
images, using features in the area of overlap for alignment and for seamless
stitching. Without image overlap current methods are helpless, and this is the
case we address in this paper. So if a traveler wants to create a panoramic
mosaic of a scene from pictures he has taken, but realizes back home that his
pictures do not overlap, there is still hope. The proposed process has three
stages: (i) Images are extrapolated beyond their original boundaries, hoping
that the extrapolated areas will cover the gaps between them. This
extrapolation becomes more blurred as we move away from the original image.
(ii) The extrapolated images are aligned and their relative positions
recovered. (iii) The gaps between the images are inpainted to create a
seamless mosaic image.

Surface Regions of Interest for Viewpoint
Selection

George Leifman, Elizabeth Shtrom and
Ayellet Tal  Technion

While the detection of the interesting regions in images has been extensively
studied, relatively few papers have addressed surfaces. This paper proposes
an algorithm for detecting the regions of interest of surfaces. It looks for
regions that are distinct both locally and globally and accounts for the
distance to the foci of attention. Many applications can utilize these
regions. In this paper we explore one such application—viewpoint selection.
The most informative views are those that collectively provide the most
descriptive presentation of the surface. We show that our results compete
favorably with the stateoftheart results.

Learning Implicit Transfer for Person Reidentification

Tammy Avraham, Ilya Gurvich and Micha
Lindenbaum  Technion

The reidentification problem has
received increasing attention in the last five to six years, especially due
to its important role in surveillance systems. It is desirable that computer
vision systems will be able to keep track of people after they have left the
field of view of one camera and entered the field of view of the next, even
when these fields of view do not overlap. This work proposes a novel approach
for pedestrian reidentification. Previous reidentification methods use one
of 3 approaches: invariant features; designing metrics that aim to bring
instances of shared identities close to one another and instances of
different identities far from one another; or learning a transformation from
the appearance in one domain to the other. Our implicit approach models
camera transfer by a binary relation R = {(x, y)x and y describe
the same person seen from cameras A and B respectively}. This solution
implies that the camera transfer function is a multivalued mapping and not a
singlevalued transformation, and does not assume the existence of a metric
with desirable properties. We present an algorithm that follows this approach
and achieves new stateoftheart performance.

A Unified Multiscale Framework for Discrete
Energy Minimization

Shai Bagon and
Meirav Galun – Weizmann

Discrete energy minimization is a ubiquitous task in computer vision, yet is
NPhard in most cases. In this work we propose a multiscale framework for
coping with the NPhardness of discrete optimization. Our approach utilizes
algebraic multiscale principles to efficiently explore the discrete solution
space, yielding improved results on challenging, nonsubmodular energies for
which current methods provide unsatisfactory approximations. In contrast to
popular multiscale methods in computer vision, that builds an image pyramid,
our framework acts directly on the energy to construct an energy pyramid.
Deriving a multiscale scheme from the energy itself makes our framework
application independent and widely applicable. Our framework gives rise to
two complementary energy coarsening strategies: one in which coarser scales
involve fewer variables, and a more revolutionary one in which the coarser scales
involve fewer discrete labels. We empirically evaluated our unified framework
on a variety of both nonsubmodular and submodular energies, including
energies from Middlebury benchmark.

Subspaces, SIFTs, and Scale Invariance

Tal
Hassner (Open U), Viki Mayzels (Weizmann) and Lihi ZelnikManor (Technion)

Scale invariant feature detectors often find stable scales in only a
few image pixels. Consequently, methods for feature matching typically choose
one of two extreme options: matching a sparse set of scale invariant
features, or dense matching using arbitrary scales. In this talk we turn our
attention to the overwhelming majority of pixels, those where stable scales
are not found by standard techniques. We ask, is scaleselection necessary
for these pixels, when dense, scaleinvariant matching is required and if so,
how can it be achieved? We will show the following: (i) Features computed
over different scales, even in lowcontrast areas, can be different;
selecting a single scale, arbitrarily or otherwise, may lead to poor matches
when the images have different scales. (ii) Representing each pixel as a set
of SIFTs, extracted at multiple scales, allows for far better matches than
singlescale descriptors, but at a computational price. Finally, (iii) each
such set may be accurately represented by a lowdimensional, linear subspace.
A subspacetopoint mapping may further be used to produce a novel descriptor
representation, the ScaleLess SIFT (SLS), as an alternative to singlescale
descriptors.

Hierarchical Regularization Cascade for Joint
Learning

Alon
Zweig and Daphna Weinshall  HUJI

As
the sheer volume of available visual categorization benchmark datasets
increases, the problem of joint learning of classifiers becomes more and more
relevant. We present a hierarchical approach which exploits information
sharing among different classification tasks, in multitask, multiclass and
knowledgetransfer settings. It engages a topdown iterative method, which
begins by posing an optimization problem with an incentive for large scale
sharing among all classes. This incentive to share is gradually decreased,
until there is no sharing and all tasks are considered separately. The method
therefore exploits different levels of sharing within a given group of
related tasks, without having to make hard decisions about the grouping of
tasks. In order to deal with large scale problems, with many tasks and many
classes, we extend our batch approach to an online setting and provide regret
analysis of the algorithm. We tested our approach extensively on synthetic
and real visual categorization datasets, showing significant improvement over
baseline and stateof theart methods.

Multiregion image
segmentation with a single level set function

Anastasia Dubrovina and Ron Kimmel 
Technion

Segmenting an image into semantically similar parts is at the core of
image understanding. Many formulations of the task have been suggested
over the years. While axiomatic functionals, such as the
MumfordShah model, are hard to implement and analyze, graphbased
alternatives impose nongeometric metric on the problem. The latter are
sometimes preferred by computer scientists who are trained to optimize and
implement such formulations at the expense of throwing away the
geometric nature of the problem. Here, we tackle the most basic image
quantization, or piecewise constant segmentation problem,
while regularizing the boundaries between the regions by a weighted
Euclidean arclength. The problem is shown to be related to the
original MumfordShah functional, and formalized as a level set
evolution equation. Yet, unlike most existing methods, the evolution is
executed using a single nonnegative level set function, through
the Voronoi Implicit Interface Method for a multiphase interface
evolution. The proposed framework has been applied to synthetic and real
images, with various number of regions, and compared to a
stateoftheart algorithms for image segmentation.

Patch
Complexity, Finite Pixel Correlations and Optimal Denoising

Anat Levin (Weizmann), Boaz
Nadler (Weizmann), Fredo Durand (MIT) and Bill Freeman (MIT)

Image
restoration tasks are illposed problems, typically solved with priors. Since
the optimal prior is the exact unknown density of natural images, actual
priors are only approximate and typically restricted to small patches. This
raises several questions: How much may we hope to improve current restoration
results with future sophisticated algorithms? And more fundamentally, even
with perfect knowledge of natural image statistics, what is the inherent
ambiguity of the problem? In addition, since most current methods are limited
to finite support patches or kernels, what is the relation between the patch
complexity of natural images, patch size, and restoration errors? Focusing on
image denoising, we make several contributions. First, in light of
computational constraints, we study the relation between denoising gain and
sample size requirements in a non parametric approach. We present a law of
diminishing return, namely that with increasing patch size, rare patches not
only require a much larger dataset, but also gain little from it. This result
suggests novel adaptive variablesized patch schemes for denoising. Second,
we study absolute denoising limits, regardless of the algorithm used, and the
converge rate to them as a function of patch size. Scale invariance of
natural images plays a key role here and implies both a strictly positive
lower bound on denoising and a power law convergence. Extrapolating this
parametric law gives a ballpark estimate of the best achievable denoising,
suggesting that some improvement, although modest, is still possible.

BlurKernel Estimation from Spectral Irregularities

Amit Goldstein and Raanan Fattal  HUJI

We describe a new method for recovering the blur in
motionblurred images based on statistical irregularities their power
spectrum exhibits. This is achieved by a, powerlaw that refines the one,
traditionally used for describing natural images. The new model better
accounts for biases arising from the presence of large and strong edges in
the image. We use this model together with an accurate spectral whitening
formula to estimate the power spectrum of the blur. The blur kernel is then
recovered using a phase retrieval algorithm with improved convergence and
disambiguation capabilities. Unlike many existing methods, the new approach
does not perform a maximum a posteriori estimation, which involves repeated
reconstructions of the latent image, and hence offers favorable running
times. We compare the new method with stateoftheart methods and report
various advantages, both in terms of efficiency and accuracy.

Motion Interchange Patterns for Action
Recognition in Unconstrained VideosDecision Tree Fields

Orit KliperGross (Weizmann), Yaron Gurovich
(TAU), Hassner (Open U) and Lior Wolf
(TAU)

Action Recognition in videos is an
active research
field that is fueled by an acute need, spanning several application
domains. Still, existing systems fall short of the applications' needs
in real world scenarios, where the quality of the video is
less than optimal and the viewpoint is uncontrolled and often not
static. In this talk, we present an action recognition system in which
we consider the key elements of motion encoding and focus on
capturing local changes in motion directions. In addition, we'll present how
we decouple image edges from motion edges using a suppression
mechanism, and compensate for global camera motion by using an
especially
fitted registration scheme. Combined with a standard bagofwords
technique, our method achieves stateoftheart performance in the most
recent and challenging benchmarks.
