Description: Description: Description: Description: Description: Description: Description: Description: Description: Description: Description: Description: image001                                                                    Description: Description: Description: Description: Description: Description: Description: Description: Description: Description: Description: Description: image004

                                      

2013 Israel Computer Vision Day
Monday, January 20, 2014


 

Vision Day Schedule

 

Time

Speaker and Collaborators

Affiliation

Title

08:50-09:20

                                   Gathering

09:20-09:40

Yohay Swirski,

Yoav Schechner

Technion

3Descatter from Motion

 

9:45-10:05

Yonatan Aflalo

Ron Kimmel

Technion

Spectral-MDS  and spectral-GMDS

10:10-10:30

Tal Hassner

Open U.

Viewing Real-World Faces in 3D

10:35-10:55

Yehonatan Goldman

Ilan Shimshoni

Ehud Rivlin 

Technion
Haifa
 

Robust Epipolar Geometry Estimation Using Noisy Pose Prior

11:00-11:30

                                  Coffee Break 

11:30-11:50

Simon Korman

Daniel Reichman Gilad Tsur

Shai Avidan

TAU

Weizmann

FAsT-Match: Fast Affine Template Matching

   11:55-12:15

Elhanan Elboher Michael Werman Yacov Hel-Or

HUJI

IDC

The Generalized Laplacian Distance and its Applications for Visual Matching

12:20-12:40

Alexandra Gilinsky

Lihi Zelnik-Manor

Technion

SIFTpack: a Compact Representation for Efficient SIFT matching

12:45-13:05

Ofir Pele

Ben Taskar

Ariel

The Tangent Earth Mover's Distance

13:05-14:10

                                     Lunch       

14:10-14:20

                                "Intermezzo"

14:20-14:40

Raja Giryes

Michael Elad

Technion

Sparsity based Poisson Denoising with Dictionary Learning

14:45-15:05

Tomer Michaeli

Michal Irani

Weizmann

Nonparametric Blind Super Resolution

15:10-15:30

Dror Sholomon

Omid David

Nathan S. Netanyahu

BIU

A Genetic Algorithm-Based Solver for Very Large Jigsaw Puzzles

15:35-15:55

Dan Levi

Shai Silberstein Aharon Bar-Hillel

GM
 

Fast multiple-part based object detection using KD-Ferns

16:00-16:20

                                  Coffee Break  

16:20-16:40

Shahar Gino

Orly Goitein

Eli Konen

Hedva Spitzer

TAU

Video Stabilization and Region-Of-Interest tracking in non-rigid object Cardiac MRI

16:45-17:05

Yair Hanani

Lior Wolf

Tal Hassner

TAU

The Piggyback Video Representation

17:10-17:30

Shimrit Haber

Yossi Keller

BIU

A probabilistic graph-based framework for multi-cue visual tracking

 

 

The Computer Vision Day is sponsored by:

 

Description: Description: Description: Description: Description: Description: Description: Description: Description: Description: Description: Description: RTC Vision        Description: Description: Description: Description: Description: Description: Description: Description: Description: Description: Description: Description: GM      Description: Description: Description: Description: Description: Description: Description: Description: Description: Description: Description: Description: Mobileye logo.jpg   



 

 

 

 

Abstracts

 

3Descatter from Motion

Yohay Swirski and Yoav Schechner - Technion


Spatio-temporal irradiance variations are created by some structured-light setups. They also occur naturally underwater, where they are termed flicker. Methods for overcoming or exploiting flicker or scatter exist, when the imaging geometry is static or quasi-static. We generalize operations to a free-moving platform that carries standard frame-rate stereo cameras. The 3D scene structure is illumination invariant. Thus, as a reference for motion estimation, we use stereoscopic range maps, rather than object radiance. Consequently, each object point can be tracked and then filtered in time, yielding deflickered videos and 3D recovery, as well as descattering.

 

Spectral-MDS  and spectral-GMDS

Yonatan Aflalo and Ron Kimmel - Technion


Multidimensional scaling (MDS) is a family of methods that embed a given set of points into a simple, usually flat, domain. The points are assumed to be sampled from some metric space, and the mapping attempts to preserve the distances between each pair of points in the set. Distances in the target space can be computed analytically in this setting. Generalized MDS is an extension that allows mapping one metric space into another, that is, multidimensional scaling into target spaces in which distances are evaluated numerically rather than analytically. Here, we propose an efficient approach for computing such mappings between surfaces based on their natural spectral decomposition, where the surfaces are treated as sampled metric-spaces. The resulting spectral-GMDS procedure enables efficient embedding by implicitly incorporating smoothness of the mapping into the problem, thereby substantially reducing the complexity involved in its solution while practically overcoming its smilingly non-convex nature. The method is compared to existing techniques that compute dense correspondence between shapes. Numerical experiments of the proposed method demonstrate its efficiency and accuracy compared to state-of-the-art approaches.

 

Viewing Real-World Faces in 3D

Tal Hassner – Open University


We present a data-driven method for estimating the 3D shapes of faces viewed in single, unconstrained photos (aka "in-the-wild"). Our method was designed with an emphasis on robustness and efficiency - with the explicit goal of deployment in real-world applications which reconstruct and display faces in 3D. Our key observation is that for many practical applications, warping the shape of a reference face to match the appearance of a query, is enough to produce realistic impressions of the query's 3D shape. Doing so, however, requires matching visual features between the (possibly very different) query and reference images, while ensuring that a plausible face shape is produced. To this end, we describe an optimization process which seeks to maximize the similarity of appearances and depths, jointly, to those of a reference model. We describe our system for monocular face shape reconstruction and present both qualitative and quantitative experiments, comparing our method against alternative systems, and demonstrating its capabilities.

 

 

Robust Epipolar Geometry Estimation Using Noisy Pose Prior

Yehonatan Goldman (Technion), Ilan Shimshoni (Haifa) and Ehud Rivlin (Technion) 


Epipolar geometry estimation is fundamental to many computer vision algorithms. It has therefore attracted a lot of interest in recent years, yielding high quality estimation algorithms for wide baseline image pairs. Currently many types of cameras (e.g., in smartphones and robot navigation systems) produce geo-tagged images containing pose and internal calibration data. Exploiting this information as part of an epipolar geometry estimation algorithm may be useful but not trivial, since the pose measurement may be quite noisy. We introduce SOREPP, a novel estimation algorithm designed to exploit pose priors naturally. It sparsely samples the pose space around the measured pose and for a few promising candidates applies a robust optimization procedure. It uses all the putative correspondences simultaneously, even though many of them are outliers, yielding a very efficient algorithm whose runtime is independent of the inlier fractions. SOREPP was extensively tested on synthetic data and on hundreds of real image pairs taken by a smartphone. Its ability to handle challenging scenarios with extremely low inlier fractions of less than 10% was demonstrated as was its ability to handle images taken by close cameras. It outperforms current state-of-the-art algorithms that do not use pose priors as well as other algorithms that do.

FAsT-Match: Fast Affine Template Matching

Simon Korman (TAU), Daniel Reichman (Weizmann), Gilad Tsur (TAU) and Shai Avidan (TAU)


Fast-Match is a fast algorithm for approximate template matching under 2D affine transformations that minimizes the Sum-of-Absolute-Differences (SAD) error measure. There is a huge number of transformations to consider but we prove that they can be sampled using a density that depends on the smoothness of the image. For each potential transformation, we approximate the SAD error using a sublinear algorithm that randomly examines only a small number of pixels. We further accelerate the algorithm using a branch-and-bound scheme. As images are known to be piecewise smooth, the result is a practical affine template matching algorithm with approximation guarantees, which takes a few seconds to run on a standard machine. We perform several experiments on three different datasets, and report very good results. To the best of our knowledge, this is the first template matching algorithm which is guaranteed to handle arbitrary 2D affine transformations



The Generalized Laplacian Distance and its Applications for Visual Matching

Elhanan Elboher (HUJI), Michael Werman (HUJI), Yacov Hel-Or (IDC)


The graph Laplacian operator, which originated in spectral graph theory, is commonly used for machine learning applications. However, so far, the graph Laplacian has not been used for the design of sophisticated distance functions. We explore the Laplacian distance, a distance function related to the graph Laplacian, and use it for visual search. We show that previous techniques such as Matching by Tone Mapping (MTM) are particular cases of the Laplacian distance. Generalizing the Laplacian distance results in distance measures which are tolerant to various visual distortions.  A novel algorithm based on linear decomposition makes it possible to compute these generalized distances efficiently.  The proposed approach is demonstrated for tone mapping invariant, outlier robust and multimodal template matching.

 

 

SIFTpack: a Compact Representation for Efficient SIFT Matching

Alexandra Gilinsky and Lihi Zelnik-Manor - Technion

 

Computing distances between large sets of SIFT descriptors is a basic step in numerous algorithms in computer vision. When the number of descriptors is large, as is often the case, computing these distances can be extremely time consuming. In this paper we propose the SIFTpack: a compact way of storing SIFT descriptors, which enables significantly faster calculations between sets of SIFTs than the current solutions. SIFTpack can be used to represent SIFTs densely extracted from a single image or sparsely from multiple different images. We show that the SIFTpack representation saves both storage space and run time, for both finding nearest neighbors and for computing all distances between all descriptors. The usefulness of SIFTpack is also demonstrated as an alternative implementation for K-means dictionaries of visual words.


 

The Tangent Earth Mover's Distance

Ofir Pele and Ben Taskar – Ariel

 

We present a new histogram distance, the Tangent Earth Mover’s Distance (TEMD). The TEMD is a generalization of the Earth Mover’s Distance (EMD) that is invariant to some global transformations. Thus, like the EMD it is robust to local deformations. Additionally, it is robust to global transformations such as global translations and rotations of the whole image. The TEMD is formulated as a linear program which allows efficient computation. Additionally, previous works about the efficient computation of the EMD that reduced the number of variables in the EMD linear program can be used to accelerate also the TEMD computation. We present results for image retrieval using the Scale Invariant Feature Transform (SIFT) and color image descriptors. We show that the new TEMD outperforms state of the art distances.

 

Sparsity based Poisson Denoising with Dictionary Learning

Raja Giryes and Michael Elad - Technion


The problem of Poisson denoising appears in various imaging applications, such as low-light photography and medical imaging. We propose to harness sparse representation modeling of image patches for this denoising task, handling severe SNR scenarios. We employ an exponential sparsity model, as recently proposed by Salmon et al., relying directly on the true noise statistics. Our scheme uses a greedy pursuit, with boot-strapping based stopping criterion, and dictionary learning within the denoising process, leading to state-of-the-art-results. 

 

Nonparametric Blind Super Resolution

Tomer Michaeli and Michal Irani - Weizmann


Super resolution (SR) algorithms typically assume that the blur kernel is known (either the Point Spread Function 'PSF' of the camera, or some default low-pass filter like a Gaussian). However, the performance of SR methods significantly deteriorates when the assumed blur kernel deviates from the true one. We propose a general framework for "blind" super resolution. In particular, we show that: (i) Unlike the common belief, the PSF of the camera is the wrong blur kernel to use in SR algorithms. (ii) We show how the correct SR blur kernel can be recovered directly from the low-resolution image. This is done by exploiting the inherent recurrence property of small natural image patches (either internally within the same image, or externally in a collection of other natural images). In particular, we show that recurrence of small patches across scales of the low-res image (which forms the basis for single-image SR), can also be used for estimating the optimal blur kernel. This leads to significant improvement in SR results.

 

 

A Genetic Algorithm-Based Solver for Very Large Jigsaw Puzzles

Dror Sholomon, Omid David, and Nathan S. Netanyahu – Bar Ilan


In this paper we propose the first effective automated, genetic algorithm (GA)-based jigsaw puzzle solver. We introduce a novel crossover procedure that merges two "parent" solutions to an "improved" child solution by detecting, extracting, and combining correctly assembled puzzle segments. The solver proposed exhibits state-of-the-art performance solving previously attempted puzzles faster and far more accurately, and also puzzles of size never before attempted. Other contributions include the creation of a benchmark of large images, previously unavailable. We share the data sets and all of our results for future testing and comparative evaluation of jigsaw puzzle solvers.

 

 

 

Fast multiple-part based object detection using KD-Ferns

Dan Levi, Shai Silberstein, Aharon Bar-Hillel – General Motors


In this work we present a new part-based object detection algorithm with hundreds of parts performing real-time detection. Part-based models are currently state-of-the-art for object detection due to their ability to represent large appearance variations. However, due to their high computational demands such methods are limited to several parts only and are too slow for practical real-time implementation. Our algorithm is an accelerated version of the "Feature Synthesis'' (FS) method[1], which uses multiple object parts for detection and is among state-of-the-art methods on human detection benchmarks, but also suffers from a high computational cost. The proposed Accelerated Feature Synthesis (AFS) uses several strategies for reducing the number of locations searched for each part. The first strategy uses a novel algorithm for approximate nearest neighbor search which we developed, termed "KD-Ferns'', to compare each image location to only a subset of the model parts. Candidate part locations for a specific part are further reduced using spatial inhibition, and using an object-level "coarse-to-fine'' strategy. In our empirical evaluation on pedestrian detection benchmarks, AFS maintains almost fully the accuracy performance of the original FS, while running more than X4 faster than existing part-based methods which use only several parts. AFS is to our best knowledge the first part-based object detection method achieving real-time running performance: nearly 10 frames per-second on 640X480 images on a regular CPU.

 

 

Video Stabilization and Region-Of-Interest tracking in non-rigid object Cardiac MRI

Shahar Gino, Orly Goitein, Eli Konen and Hedva Spitzer – TAU

 

Several Cardiac-MRI (CMRI) sequences, such as the perfusion series, are influenced by diaphragm and cardiac motion throughout the respiratory and cardiac cycles [1]. Perfusion is a sequence of Cardiac-MRI (CMRI), which is a non-invasive tool to assess myocardial abnormalities, such ischemia. Myocardial first-pass perfusion schemes track the contrast agent changes (Gadolinium) passage through the heart. This perfusion imaging is used as a key component of most clinical cardiac MRI exams. Stabilizing these videos expected to allow a significant improvement in medical diagnosis. Video-stabilization and ROI-tracking are well-known problems in computer-vision, with many practical applications [2]-[3]. However these two problems become even more challenging for medical gray videos, in which separating ROI from its background at varying texture conditions makes it hard. We suggest a novel algorithm for CMRI tracking and stabilization, inspired by human visual system (VS) mechanisms. It combines information from both edge and region pathways and adaptively weights them according to ROI state. The algorithm applies cortical receptive fields for the contour (edge) detection and contour completion of VS mechanism for region base pathway. The ROI motion is then estimated by common linear-approximation for stabilization. The Video-stabilization is obtained by solving the ROI-tracking problem, and keeping its initial position fixed. The proposed algorithm was tested on several CMRI videos and appears to achieve promising results. It is autonomous, self-adapting and requires no user-interference. It is robust to image type and highly-sensitive to objects motion. Moreover, it handles occlusions and deformations and runs in a reasonable complexity. Finally we suggest a method for measuring a given video ROI-stability, which has been used for estimating our quality-of-results (QoR). We are using both objective and clinical approaches for estimating our results. The objective approach is based on Inter-Frame-Similarity (ITF) and Structural Similarity (SSIM) metrics. The clinical approach is based on statistical experiment done with radiologists. We perform cooperation comparison according unique measures which compare our video results with video input and to state-of-art competitors’ algorithms. All of the video results are being ranked by the radiologists with a 1-5 scale. Preliminary results are quite encouraging in the sense of object-tracking and video-stabilization. It appears to be successful for track moving and deforming objects with high-sensitivity, which allows a promising video-stabilization. Stabilizing perfusion CMRI slice by heart tracking seems well for long burst of frames. This should allow better medical diagnosis.

The Piggyback Video Representation

Yair Hanani, Lior Wolf and Tal Hassner  - TAU  


In video understanding, the spatial patterns formed by local space-time interest points hold discriminative information. We encode these spatial regularities using a word2vec neural network, a recently proposed tool in the field of text processing. Then, building upon recent accumulator based image representation solutions, input videos are represented in a hybrid manner: the appearance of local space time interest points is used to collect and associate the learned descriptors, which capture the spatial patterns. Competitive results are shown on recent action recognition benchmarks, using well established methods as the underlying appearance descriptors..

 

A probabilistic graph-based framework for multi-cue visual tracking

Shimrit Haber and Yossi Keller – Bar-Ilan University

 

Object tracking is a fundamental task in computer vision. Varying tracking scenarios require the use of multiiple cues such as color, texture, motion detection, template matching, object detection and Kalman filtering, to name a few. In this talk with discuss recent results on multi-cue tracking, that is formulated as a probabilistic inference problem. An image (video frame) is represented by a set of image patches denoted as superpixels, and the tracking is formulated as the classification of these superpixels to either foreground/background. The inference is is computed by representing the superpixels as graph nodes that are matched to a binary (foreground/background) state graph. We derive a computationally efficient inference scheme based on spectral graph matching. This formulation allows to adaptively utilize multiple cues simultaneously, and is exemplified by applying it to surveillance video segments..