Description: Description: Description: Description: Description: Description: Description: Description: Description: Description: Description: Description: image001                                                                    Description: Description: Description: Description: Description: Description: Description: Description: Description: Description: Description: Description: image004


2013 Israel Computer Vision Day
Monday, January 20, 2014


Vision Day Schedule



Speaker and Collaborators






Yohay Swirski,

Yoav Schechner


3Descatter from Motion



Yonatan Aflalo

Ron Kimmel


Spectral-MDS  and spectral-GMDS


Tal Hassner

Open U.

Viewing Real-World Faces in 3D


Yehonatan Goldman

Ilan Shimshoni

Ehud Rivlin 


Robust Epipolar Geometry Estimation Using Noisy Pose Prior


                                  Coffee Break 


Simon Korman

Daniel Reichman Gilad Tsur

Shai Avidan



FAsT-Match: Fast Affine Template Matching


Elhanan Elboher Michael Werman Yacov Hel-Or



The Generalized Laplacian Distance and its Applications for Visual Matching


Alexandra Gilinsky

Lihi Zelnik-Manor


SIFTpack: a Compact Representation for Efficient SIFT matching


Ofir Pele

Ben Taskar


The Tangent Earth Mover's Distance






Raja Giryes

Michael Elad


Sparsity based Poisson Denoising with Dictionary Learning


Tomer Michaeli

Michal Irani


Nonparametric Blind Super Resolution


Dror Sholomon

Omid David

Nathan S. Netanyahu


A Genetic Algorithm-Based Solver for Very Large Jigsaw Puzzles


Dan Levi

Shai Silberstein Aharon Bar-Hillel


Fast multiple-part based object detection using KD-Ferns


                                  Coffee Break  


Shahar Gino

Orly Goitein

Eli Konen

Hedva Spitzer


Video Stabilization and Region-Of-Interest tracking in non-rigid object Cardiac MRI


Yair Hanani

Lior Wolf

Tal Hassner


The Piggyback Video Representation


Shimrit Haber

Yossi Keller


A probabilistic graph-based framework for multi-cue visual tracking



The Computer Vision Day is sponsored by:


Description: Description: Description: Description: Description: Description: Description: Description: Description: Description: Description: Description: RTC Vision        Description: Description: Description: Description: Description: Description: Description: Description: Description: Description: Description: Description: GM      Description: Description: Description: Description: Description: Description: Description: Description: Description: Description: Description: Description: Mobileye logo.jpg   







3Descatter from Motion

Yohay Swirski and Yoav Schechner - Technion

Spatio-temporal irradiance variations are created by some structured-light setups. They also occur naturally underwater, where they are termed flicker. Methods for overcoming or exploiting flicker or scatter exist, when the imaging geometry is static or quasi-static. We generalize operations to a free-moving platform that carries standard frame-rate stereo cameras. The 3D scene structure is illumination invariant. Thus, as a reference for motion estimation, we use stereoscopic range maps, rather than object radiance. Consequently, each object point can be tracked and then filtered in time, yielding deflickered videos and 3D recovery, as well as descattering.


Spectral-MDS  and spectral-GMDS

Yonatan Aflalo and Ron Kimmel - Technion

Multidimensional scaling (MDS) is a family of methods that embed a given set of points into a simple, usually flat, domain. The points are assumed to be sampled from some metric space, and the mapping attempts to preserve the distances between each pair of points in the set. Distances in the target space can be computed analytically in this setting. Generalized MDS is an extension that allows mapping one metric space into another, that is, multidimensional scaling into target spaces in which distances are evaluated numerically rather than analytically. Here, we propose an efficient approach for computing such mappings between surfaces based on their natural spectral decomposition, where the surfaces are treated as sampled metric-spaces. The resulting spectral-GMDS procedure enables efficient embedding by implicitly incorporating smoothness of the mapping into the problem, thereby substantially reducing the complexity involved in its solution while practically overcoming its smilingly non-convex nature. The method is compared to existing techniques that compute dense correspondence between shapes. Numerical experiments of the proposed method demonstrate its efficiency and accuracy compared to state-of-the-art approaches.


Viewing Real-World Faces in 3D

Tal Hassner – Open University

We present a data-driven method for estimating the 3D shapes of faces viewed in single, unconstrained photos (aka "in-the-wild"). Our method was designed with an emphasis on robustness and efficiency - with the explicit goal of deployment in real-world applications which reconstruct and display faces in 3D. Our key observation is that for many practical applications, warping the shape of a reference face to match the appearance of a query, is enough to produce realistic impressions of the query's 3D shape. Doing so, however, requires matching visual features between the (possibly very different) query and reference images, while ensuring that a plausible face shape is produced. To this end, we describe an optimization process which seeks to maximize the similarity of appearances and depths, jointly, to those of a reference model. We describe our system for monocular face shape reconstruction and present both qualitative and quantitative experiments, comparing our method against alternative systems, and demonstrating its capabilities.



Robust Epipolar Geometry Estimation Using Noisy Pose Prior

Yehonatan Goldman (Technion), Ilan Shimshoni (Haifa) and Ehud Rivlin (Technion) 

Epipolar geometry estimation is fundamental to many computer vision algorithms. It has therefore attracted a lot of interest in recent years, yielding high quality estimation algorithms for wide baseline image pairs. Currently many types of cameras (e.g., in smartphones and robot navigation systems) produce geo-tagged images containing pose and internal calibration data. Exploiting this information as part of an epipolar geometry estimation algorithm may be useful but not trivial, since the pose measurement may be quite noisy. We introduce SOREPP, a novel estimation algorithm designed to exploit pose priors naturally. It sparsely samples the pose space around the measured pose and for a few promising candidates applies a robust optimization procedure. It uses all the putative correspondences simultaneously, even though many of them are outliers, yielding a very efficient algorithm whose runtime is independent of the inlier fractions. SOREPP was extensively tested on synthetic data and on hundreds of real image pairs taken by a smartphone. Its ability to handle challenging scenarios with extremely low inlier fractions of less than 10% was demonstrated as was its ability to handle images taken by close cameras. It outperforms current state-of-the-art algorithms that do not use pose priors as well as other algorithms that do.

FAsT-Match: Fast Affine Template Matching

Simon Korman (TAU), Daniel Reichman (Weizmann), Gilad Tsur (TAU) and Shai Avidan (TAU)

Fast-Match is a fast algorithm for approximate template matching under 2D affine transformations that minimizes the Sum-of-Absolute-Differences (SAD) error measure. There is a huge number of transformations to consider but we prove that they can be sampled using a density that depends on the smoothness of the image. For each potential transformation, we approximate the SAD error using a sublinear algorithm that randomly examines only a small number of pixels. We further accelerate the algorithm using a branch-and-bound scheme. As images are known to be piecewise smooth, the result is a practical affine template matching algorithm with approximation guarantees, which takes a few seconds to run on a standard machine. We perform several experiments on three different datasets, and report very good results. To the best of our knowledge, this is the first template matching algorithm which is guaranteed to handle arbitrary 2D affine transformations

The Generalized Laplacian Distance and its Applications for Visual Matching

Elhanan Elboher (HUJI), Michael Werman (HUJI), Yacov Hel-Or (IDC)

The graph Laplacian operator, which originated in spectral graph theory, is commonly used for machine learning applications. However, so far, the graph Laplacian has not been used for the design of sophisticated distance functions. We explore the Laplacian distance, a distance function related to the graph Laplacian, and use it for visual search. We show that previous techniques such as Matching by Tone Mapping (MTM) are particular cases of the Laplacian distance. Generalizing the Laplacian distance results in distance measures which are tolerant to various visual distortions.  A novel algorithm based on linear decomposition makes it possible to compute these generalized distances efficiently.  The proposed approach is demonstrated for tone mapping invariant, outlier robust and multimodal template matching.



SIFTpack: a Compact Representation for Efficient SIFT Matching

Alexandra Gilinsky and Lihi Zelnik-Manor - Technion


Computing distances between large sets of SIFT descriptors is a basic step in numerous algorithms in computer vision. When the number of descriptors is large, as is often the case, computing these distances can be extremely time consuming. In this paper we propose the SIFTpack: a compact way of storing SIFT descriptors, which enables significantly faster calculations between sets of SIFTs than the current solutions. SIFTpack can be used to represent SIFTs densely extracted from a single image or sparsely from multiple different images. We show that the SIFTpack representation saves both storage space and run time, for both finding nearest neighbors and for computing all distances between all descriptors. The usefulness of SIFTpack is also demonstrated as an alternative implementation for K-means dictionaries of visual words.


The Tangent Earth Mover's Distance

Ofir Pele and Ben Taskar – Ariel


We present a new histogram distance, the Tangent Earth Mover’s Distance (TEMD). The TEMD is a generalization of the Earth Mover’s Distance (EMD) that is invariant to some global transformations. Thus, like the EMD it is robust to local deformations. Additionally, it is robust to global transformations such as global translations and rotations of the whole image. The TEMD is formulated as a linear program which allows efficient computation. Additionally, previous works about the efficient computation of the EMD that reduced the number of variables in the EMD linear program can be used to accelerate also the TEMD computation. We present results for image retrieval using the Scale Invariant Feature Transform (SIFT) and color image descriptors. We show that the new TEMD outperforms state of the art distances.


Sparsity based Poisson Denoising with Dictionary Learning

Raja Giryes and Michael Elad - Technion

The problem of Poisson denoising appears in various imaging applications, such as low-light photography and medical imaging. We propose to harness sparse representation modeling of image patches for this denoising task, handling severe SNR scenarios. We employ an exponential sparsity model, as recently proposed by Salmon et al., relying directly on the true noise statistics. Our scheme uses a greedy pursuit, with boot-strapping based stopping criterion, and dictionary learning within the denoising process, leading to state-of-the-art-results. 


Nonparametric Blind Super Resolution

Tomer Michaeli and Michal Irani - Weizmann

Super resolution (SR) algorithms typically assume that the blur kernel is known (either the Point Spread Function 'PSF' of the camera, or some default low-pass filter like a Gaussian). However, the performance of SR methods significantly deteriorates when the assumed blur kernel deviates from the true one. We propose a general framework for "blind" super resolution. In particular, we show that: (i) Unlike the common belief, the PSF of the camera is the wrong blur kernel to use in SR algorithms. (ii) We show how the correct SR blur kernel can be recovered directly from the low-resolution image. This is done by exploiting the inherent recurrence property of small natural image patches (either internally within the same image, or externally in a collection of other natural images). In particular, we show that recurrence of small patches across scales of the low-res image (which forms the basis for single-image SR), can also be used for estimating the optimal blur kernel. This leads to significant improvement in SR results.



A Genetic Algorithm-Based Solver for Very Large Jigsaw Puzzles

Dror Sholomon, Omid David, and Nathan S. Netanyahu – Bar Ilan

In this paper we propose the first effective automated, genetic algorithm (GA)-based jigsaw puzzle solver. We introduce a novel crossover procedure that merges two "parent" solutions to an "improved" child solution by detecting, extracting, and combining correctly assembled puzzle segments. The solver proposed exhibits state-of-the-art performance solving previously attempted puzzles faster and far more accurately, and also puzzles of size never before attempted. Other contributions include the creation of a benchmark of large images, previously unavailable. We share the data sets and all of our results for future testing and comparative evaluation of jigsaw puzzle solvers.




Fast multiple-part based object detection using KD-Ferns

Dan Levi, Shai Silberstein, Aharon Bar-Hillel – General Motors

In this work we present a new part-based object detection algorithm with hundreds of parts performing real-time detection. Part-based models are currently state-of-the-art for object detection due to their ability to represent large appearance variations. However, due to their high computational demands such methods are limited to several parts only and are too slow for practical real-time implementation. Our algorithm is an accelerated version of the "Feature Synthesis'' (FS) method[1], which uses multiple object parts for detection and is among state-of-the-art methods on human detection benchmarks, but also suffers from a high computational cost. The proposed Accelerated Feature Synthesis (AFS) uses several strategies for reducing the number of locations searched for each part. The first strategy uses a novel algorithm for approximate nearest neighbor search which we developed, termed "KD-Ferns'', to compare each image location to only a subset of the model parts. Candidate part locations for a specific part are further reduced using spatial inhibition, and using an object-level "coarse-to-fine'' strategy. In our empirical evaluation on pedestrian detection benchmarks, AFS maintains almost fully the accuracy performance of the original FS, while running more than X4 faster than existing part-based methods which use only several parts. AFS is to our best knowledge the first part-based object detection method achieving real-time running performance: nearly 10 frames per-second on 640X480 images on a regular CPU.



Video Stabilization and Region-Of-Interest tracking in non-rigid object Cardiac MRI

Shahar Gino, Orly Goitein, Eli Konen and Hedva Spitzer – TAU


Several Cardiac-MRI (CMRI) sequences, such as the perfusion series, are influenced by diaphragm and cardiac motion throughout the respiratory and cardiac cycles [1]. Perfusion is a sequence of Cardiac-MRI (CMRI), which is a non-invasive tool to assess myocardial abnormalities, such ischemia. Myocardial first-pass perfusion schemes track the contrast agent changes (Gadolinium) passage through the heart. This perfusion imaging is used as a key component of most clinical cardiac MRI exams. Stabilizing these videos expected to allow a significant improvement in medical diagnosis. Video-stabilization and ROI-tracking are well-known problems in computer-vision, with many practical applications [2]-[3]. However these two problems become even more challenging for medical gray videos, in which separating ROI from its background at varying texture conditions makes it hard. We suggest a novel algorithm for CMRI tracking and stabilization, inspired by human visual system (VS) mechanisms. It combines information from both edge and region pathways and adaptively weights them according to ROI state. The algorithm applies cortical receptive fields for the contour (edge) detection and contour completion of VS mechanism for region base pathway. The ROI motion is then estimated by common linear-approximation for stabilization. The Video-stabilization is obtained by solving the ROI-tracking problem, and keeping its initial position fixed. The proposed algorithm was tested on several CMRI videos and appears to achieve promising results. It is autonomous, self-adapting and requires no user-interference. It is robust to image type and highly-sensitive to objects motion. Moreover, it handles occlusions and deformations and runs in a reasonable complexity. Finally we suggest a method for measuring a given video ROI-stability, which has been used for estimating our quality-of-results (QoR). We are using both objective and clinical approaches for estimating our results. The objective approach is based on Inter-Frame-Similarity (ITF) and Structural Similarity (SSIM) metrics. The clinical approach is based on statistical experiment done with radiologists. We perform cooperation comparison according unique measures which compare our video results with video input and to state-of-art competitors’ algorithms. All of the video results are being ranked by the radiologists with a 1-5 scale. Preliminary results are quite encouraging in the sense of object-tracking and video-stabilization. It appears to be successful for track moving and deforming objects with high-sensitivity, which allows a promising video-stabilization. Stabilizing perfusion CMRI slice by heart tracking seems well for long burst of frames. This should allow better medical diagnosis.

The Piggyback Video Representation

Yair Hanani, Lior Wolf and Tal Hassner  - TAU  

In video understanding, the spatial patterns formed by local space-time interest points hold discriminative information. We encode these spatial regularities using a word2vec neural network, a recently proposed tool in the field of text processing. Then, building upon recent accumulator based image representation solutions, input videos are represented in a hybrid manner: the appearance of local space time interest points is used to collect and associate the learned descriptors, which capture the spatial patterns. Competitive results are shown on recent action recognition benchmarks, using well established methods as the underlying appearance descriptors..


A probabilistic graph-based framework for multi-cue visual tracking

Shimrit Haber and Yossi Keller – Bar-Ilan University


Object tracking is a fundamental task in computer vision. Varying tracking scenarios require the use of multiiple cues such as color, texture, motion detection, template matching, object detection and Kalman filtering, to name a few. In this talk with discuss recent results on multi-cue tracking, that is formulated as a probabilistic inference problem. An image (video frame) is represented by a set of image patches denoted as superpixels, and the tracking is formulated as the classification of these superpixels to either foreground/background. The inference is is computed by representing the superpixels as graph nodes that are matched to a binary (foreground/background) state graph. We derive a computationally efficient inference scheme based on spectral graph matching. This formulation allows to adaptively utilize multiple cues simultaneously, and is exemplified by applying it to surveillance video segments..