2012 Israel Computer Vision Day
Sunday, December 9, 2012


Vision Day Schedule



Speaker and Collaborators






Mica Arie-Nachimson,

Shahar Kovalsky,

Ira Kemelmacher-Shlizerman,

Amit Singer,

Ronen Basri

U. Washington



Global Motion Estimation from Point Matches 


Maria Kushnir,

Ilan Shimshoni

Univ. Haifa

Epipolar Geometry Estimation for Urban Scenes with Repetitive Structures


Yael Moses,

Tali Basha,

Shai Avidan



Photo Sequencing


Idan Ram,

Michael Elad,
Israel Cohen


Image Processing using Reordering of its Patches


                                  Coffee Break 


Shmuel Peleg,

Yair poleg


Mosaicing of Non-Overlapping Images


George Leifman,

Elizabeth Shtrom,
Ayellet Tal


Surface Regions of Interest for Viewpoint Selection


Tammy Avraham,

Ilya Gurvich,

Micha Lindenbaum


Learning Implicit Transfer for Person Re-identification



13:50- 14:00



Meirav Galun,

Shai Bagon


A Unified Multiscale Framework for Discrete Energy Minimization


Tal Hassner,

Viki Mayzels,

Lihi Zelnik-Manor

Open U


Subspaces, SIFTs, and Scale Invariance


Alon Zweig,

Daphna Weinshall 


Hierarchical Regularization Cascade for Joint Learning


Anastasia Dubrovina,

Ronny Kimmel


Multi-region image segmentation with a single level set function


                                  Coffee Break  


Anat Levin,
Boaz Nadler,

Fredo Durand,

Bill Freeman


Patch Complexity, Finite Pixel Correlations and Optimal Denoising


Amit Goldstein,

Raanan Fattal


Blur-Kernel Estimation from Spectral Irregularities


Orit Kipler-Gross,

Yaron Gurovich,

Tal Hassner,

Lior Wolf

Open U.

Motion Interchange Patterns for Action Recognition in Unconstrained Videos







Global Motion Estimation from Point Matches

Mica Arie-Nachimson (Weizmann), Shahar Kovalsky (Weizmann), Ira Kemelmacher-Shlizerman (U of Washington), Amit Singer (Princeton) and Ronen Basri (Weizmann)

Multiview structure recovery from a collection of images requires the recovery of the positions and orientations of the cameras relative to a global coordinate system. We present an approach that recovers camera motion as a sequence of two global optimizations: First, pairwise Essential Matrices are used to recover the global rotations by applying robust optimization using either spectral or semidefinite programming relaxations. Then, we directly employ feature correspondences across images to recover the global translation vectors using a linear algorithm based on a novel decomposition of the Essential Matrix. Our method is efficient and, as demonstrated in our experiments, achieves highly accurate results on collections of real images for which ground truth measurements are available.


Epipolar Geometry Estimation for Urban Scenes with Repetitive Structures

Maria Kushnir and Ilan Shimshoni – Haifa Univ

Algorithms for the estimation of epipolar geometry from a pair of images have been very successful in recent years, being able to deal with wide baseline images. The algorithms succeed even when the percentage of correct matches from the initial set of matches is very low. In this paper the problem of scenes with repeated structures is addressed, concentrating on the common case of building facades. In these cases a large number of repeated features is found and can not be matched initially, causing state-of-the-art algorithms to fail. Our algorithm therefore clusters similar features in each of the two images and matches clusters of features. From these cluster pairs, a set of hypothesized homographies of the building facade are generated and ranked mainly according the support of matches of non-repeating features. Then in a separate step the epipole is recovered yielding the fundamental matrix. The algorithm then decides whether the fundamental matrix has been recovered reliably enough and if not returns only the homography. The algorithm has been tested successfully on a large number of pairs of images of buildings from the benchmark ZuBuD database for which several state-of-the-art algorithms nearly always fail


Photo Sequencing

Yael Moses (IDC), Tali Basha (TAU) and Shai Avidan (TAU)

Dynamic events such as family gatherings, concerts or sports events are often captured by a group of people. The set of still images obtained this way is rich in dynamic content but lacks accurate temporal information. We propose a method for photo-sequencing -- temporally ordering a set of still images taken asynchronously by a set of uncalibrated cameras. Photo sequencing is an essential tool in analyzing (or visualizing) a dynamic scene captured by still images. The fi
rst step of the method detects sets of corresponding static and dynamic feature points across images. The static features are used to determine the epipolar geometry between pairs of images, and each dynamic feature votes for the temporal order of the images in which it appears. The partial orders provided by the dynamic features are not necessarily consistent, and we use rank aggregation to combine them into a globally consistent temporal order of images.



Image Processing using Reordering of its Patches

Idan Ram, Michael Elad, and  Israel Cohen -- Technion

What if we take all the overlapping patches from a given image and organize them to create the shortest path by using their mutual distances? This induces a permutation of the image pixels in a way that creates a 1D signal with maximal regularity. What could we do with such a construction?

In this talk we show that this process enables simple and intuitive methods for image processing tasks such as denoising and inpainting, leading to state of- the-art results. Furthermore, we show how such reordering of the patches in several scales can lead to a new wavelet transform which efficiently represents images. We demonstrate the use of this new transform for various image processing applications, and tie it to the BM3D algorithm.

Mosaicing of Non-Overlapping Images

Yair Poleg and Shmuel Peleg – HUJI

Image alignment and mosaicing are usually performed on a set of overlapping images, using features in the area of overlap for alignment and for seamless stitching. Without image overlap current methods are helpless, and this is the case we address in this paper. So if a traveler wants to create a panoramic mosaic of a scene from pictures he has taken, but realizes back home that his pictures do not overlap, there is still hope. The proposed process has three stages: (i) Images are extrapolated beyond their original boundaries, hoping that the extrapolated areas will cover the gaps between them. This extrapolation becomes more blurred as we move away from the original image. (ii) The extrapolated images are aligned and their relative positions recovered. (iii) The gaps between the images are inpainted to create a seamless mosaic image.

Surface Regions of Interest for Viewpoint Selection

George Leifman, Elizabeth Shtrom and Ayellet Tal -- Technion

While the detection of the interesting regions in images has been extensively studied, relatively few papers have addressed surfaces. This paper proposes an algorithm for detecting the regions of interest of surfaces. It looks for regions that are distinct both locally and globally and accounts for the distance to the foci of attention. Many applications can utilize these regions. In this paper we explore one such application—viewpoint selection. The most informative views are those that collectively provide the most descriptive presentation of the surface. We show that our results compete favorably with the state-of-the-art results.



Learning Implicit Transfer for Person Re-identification  

Tammy Avraham, Ilya Gurvich and Micha Lindenbaum -- Technion


The re-identification problem has received increasing attention in the last five to six years, especially due to its important role in surveillance systems. It is desirable that computer vision systems will be able to keep track of people after they have left the field of view of one camera and entered the field of view of the next, even when these fields of view do not overlap. This work proposes a novel approach for pedestrian re-identification. Previous re-identification methods use one of 3 approaches: invariant features; designing metrics that aim to bring instances of shared identities close to one another and instances of different identities far from one another; or learning a transformation from the appearance in one domain to the other. Our implicit approach models camera transfer by a binary relation                 R = {(x, y)|x and y describe the same person seen from cameras A and B respectively}. This solution implies that the camera transfer function is a multi-valued mapping and not a single-valued transformation, and does not assume the existence of a metric with desirable properties. We present an algorithm that follows this approach and achieves new state-of-the-art performance.


A Unified Multiscale Framework for Discrete Energy Minimization

Shai Bagon and Meirav Galun – Weizmann

Discrete energy minimization is a ubiquitous task in computer vision, yet is NP-hard in most cases. In this work we propose a multiscale framework for coping with the NP-hardness of discrete optimization. Our approach utilizes algebraic multiscale principles to efficiently explore the discrete solution space, yielding improved results on challenging, non-submodular energies for which current methods provide unsatisfactory approximations. In contrast to popular multiscale methods in computer vision, that builds an image pyramid, our framework acts directly on the energy to construct an energy pyramid. Deriving a multiscale scheme from the energy itself makes our framework application independent and widely applicable. Our framework gives rise to two complementary energy coarsening strategies: one in which coarser scales involve fewer variables, and a more revolutionary one in which the coarser scales involve fewer discrete labels. We empirically evaluated our unified framework on a variety of both non-submodular and submodular energies, including energies from Middlebury benchmark. 


Subspaces, SIFTs, and Scale Invariance  

Tal Hassner (Open U), Viki Mayzels (Weizmann) and Lihi Zelnik-Manor (Technion)

Scale invariant feature detectors often find stable scales in only a few image pixels. Consequently, methods for feature matching typically choose one of two extreme options: matching a sparse set of scale invariant features, or dense matching using arbitrary scales. In this talk we turn our attention to the overwhelming majority of pixels, those where stable scales are not found by standard techniques. We ask, is scale-selection necessary for these pixels, when dense, scale-invariant matching is required and if so, how can it be achieved? We will show the following: (i) Features computed over different scales, even in low-contrast areas, can be different; selecting a single scale, arbitrarily or otherwise, may lead to poor matches when the images have different scales. (ii) Representing each pixel as a set of SIFTs, extracted at multiple scales, allows for far better matches than single-scale descriptors, but at a computational price. Finally, (iii) each such set may be accurately represented by a low-dimensional, linear subspace. A subspace-to-point mapping may further be used to produce a novel descriptor representation, the Scale-Less SIFT (SLS), as an alternative to single-scale descriptors.



Hierarchical Regularization Cascade for Joint Learning

Alon Zweig and Daphna Weinshall -- HUJI

As the sheer volume of available visual categorization benchmark datasets increases, the problem of joint learning of classifiers becomes more and more relevant. We present a hierarchical approach which exploits information sharing among different classification tasks, in multi-task, multi-class and knowledge-transfer settings. It engages a top-down iterative method, which begins by posing an optimization problem with an incentive for large scale sharing among all classes. This incentive to share is gradually decreased, until there is no sharing and all tasks are considered separately. The method therefore exploits different levels of sharing within a given group of related tasks, without having to make hard decisions about the grouping of tasks. In order to deal with large scale problems, with many tasks and many classes, we extend our batch approach to an online setting and provide regret analysis of the algorithm. We tested our approach extensively on synthetic and real visual categorization datasets, showing significant improvement over baseline and state-of the-art methods.




Multi-region image segmentation with a single level set function

Anastasia Dubrovina and Ron Kimmel -- Technion   

Segmenting an image into semantically similar parts is at the core of image understanding.  Many formulations of the task have been suggested over the years.  While axiomatic functionals, such as the Mumford-Shah model, are hard to implement and analyze, graph-based alternatives impose non-geometric metric on the problem.  The latter are sometimes preferred by computer scientists who are trained to optimize and implement such formulations at the expense of throwing away the geometric nature of the problem. Here, we tackle the most basic image quantization, or piecewise constant segmentation problem, while regularizing the boundaries between the regions by a weighted Euclidean arc-length.  The problem is shown to be related to the original Mumford-Shah functional, and formalized as  a level set evolution equation. Yet, unlike most existing methods, the evolution is executed using a single non-negative level set function, through the Voronoi Implicit Interface Method for a multi-phase interface evolution. The proposed framework has been applied to synthetic and real images, with various number of regions, and compared to a state-of-the-art algorithms for image segmentation.



Patch Complexity, Finite Pixel Correlations and Optimal Denoising

Anat Levin (Weizmann), Boaz Nadler (Weizmann), Fredo Durand (MIT) and Bill Freeman (MIT)

Image restoration tasks are ill-posed problems, typically solved with priors. Since the optimal prior is the exact unknown density of natural images, actual priors are only approximate and typically restricted to small patches. This raises several questions: How much may we hope to improve current restoration results with future sophisticated algorithms? And more fundamentally, even with perfect knowledge of natural image statistics, what is the inherent ambiguity of the problem? In addition, since most current methods are limited to finite support patches or kernels, what is the relation between the patch complexity of natural images, patch size, and restoration errors? Focusing on image denoising, we make several contributions. First, in light of computational constraints, we study the relation between denoising gain and sample size requirements in a non parametric approach. We present a law of diminishing return, namely that with increasing patch size, rare patches not only require a much larger dataset, but also gain little from it. This result suggests novel adaptive variable-sized patch schemes for denoising. Second, we study absolute denoising limits, regardless of the algorithm used, and the converge rate to them as a function of patch size. Scale invariance of natural images plays a key role here and implies both a strictly positive lower bound on denoising and a power law convergence. Extrapolating this parametric law gives a ballpark estimate of the best achievable denoising, suggesting that some improvement, although modest, is still possible.

Blur-Kernel Estimation from Spectral Irregularities

Amit Goldstein and Raanan Fattal -- HUJI  

We describe a new method for recovering the blur in motion-blurred images based on statistical irregularities their power spectrum exhibits. This is achieved by a, power-law that refines the one, traditionally used for describing natural images. The new model better accounts for biases arising from the presence of large and strong edges in the image. We use this model together with an accurate spectral whitening formula to estimate the power spectrum of the blur. The blur kernel is then recovered using a phase retrieval algorithm with improved convergence and disambiguation capabilities. Unlike many existing methods, the new approach does not perform a maximum a posteriori estimation, which involves repeated reconstructions of the latent image, and hence offers favorable running times. We compare the new method with state-of-the-art methods and report various advantages, both in terms of efficiency and accuracy.


Motion Interchange Patterns for Action Recognition in Unconstrained VideosDecision Tree Fields

Orit Kliper-Gross (Weizmann), Yaron Gurovich (TAU), Hassner (Open U) and Lior Wolf  (TAU)


Action Recognition in videos is an active research
field that is fueled by an acute need, spanning several application domains. Still, existing systems fall short of the applications' needs in real world scenarios, where the quality of the video is less than optimal and the viewpoint is uncontrolled and often not static. In this talk, we present an action recognition system in which we consider the key elements of motion encoding and focus on capturing local changes in motion directions. In addition, we'll present how we decouple image edges from motion edges using a suppression mechanism, and compensate for global camera motion by using an especially
fitted registration scheme. Combined with a standard bag-of-words technique, our method achieves state-of-the-art performance in the most recent and challenging benchmarks.