Description: Description: Description: Description: Description: Description: Description: Description: Description: Description: Description: Description: image001                                                                    Description: Description: Description: Description: Description: Description: Description: Description: Description: Description: Description: Description: image004


2015 Israel Computer Vision Day
Sunday, April 17, 2016


Vision Day Schedule



Speaker and Collaborators






Yedid Hoshen

Shmuel Peleg


Visual Learning of Arithmetic Operations




Yaniv Romano

Michael Elad


Con-Patch: When a Patch Meets its Context


Mor Dar

Yael Moses


Temporal Epipolar Regions


Maria Kushnir

Ilan Shimshoni


A General Preprocessing Method for Improved Performance of Epipolar Geometry Estimation Algorithms


                                  Coffee Break 


Nadav Cohen

Or Sharir

Amnon Shashua


On the Expressive Power of Deep Learning: A Tensor Analysis


Yuval Bahat

Michal Irani


Blind Dehazing Using Internal Patch Recurrence


Dana Berman

Tali Treibitz

Shai Avidan

Haifa + TAU

Non-Local Image Dehazing


Lior Wolf

David Gadot


PatchBatch: a Batch Augmented Loss for Optical Flow






Aviad Levis

Yoav Schechner

Amit Aides


Airborne Three-Dimensional Cloud Tomography


Inbar Huberman Raanan Fattal


Detecting Repeating Objects using Patch Correlation Analysis


Aaron Wetzler

Ron Slossberg

Ron Kimmel


Rule Of Thumb: Deep derotation for improved fingertip detection


Dan Levi

Noa Garnett

Ethan Fetaya


StixelNet: A Deep Convolutional Network for Obstacle Detection and Road Segmentation


                                  Coffee Break  


Leah Bar

Nir Sochen

Nahum Kiryati


Blind Restoration of Images with Piecewise Space-Variant  Blur


Harel Haim
Emanuel Marom
Alex Bronstein


Computational multi-focus imaging combining sparse model with color dependent phase mask


Omry Sendik

Yosi Keller

Bar Ilan

How old are you? DeepAge to the rescue




The Computer Vision Day is sponsored by:


Description: Description: Description: Description: Description: Description: Description: Description: Description: Description: Description: Description: RTC Vision     Description: Description: Description: Description: Description: Description: Description: Description: Description: Description: Description: Description: GM    Description: Description: Description: Description: Description: Description: Description: Description: Description: Description: Description: Description: Mobileye logo.jpg 













Visual Learning of Arithmetic Operations

Yedid Hoshen and Shmuel Peleg - HUJI

A simple Neural Network model is presented for end-to-end visual learning of arithmetic operations from pictures of numbers. The input consists of two pictures, each showing a 7-digit number. The output, also a picture, displays the number showing the result of an arithmetic operation (e.g., addition or subtraction) on the two input numbers. The concepts of a number, or of an operator, are not explicitly introduced. This indicates that addition is a simple cognitive task, which can be learned visually using a very small number of neurons.

Other operations, e.g., multiplication, were not learnable using this architecture. Some tasks were not learnable end-to-end (e.g., addition with Roman numerals), but were easily learnable once broken into two separate sub-tasks: a perceptual Character Recognition and cognitive Arithmetic sub-tasks. This indicates that while some tasks may be easily learnable end-to-end, other may need to be broken into sub-tasks.


Con-Patch: When a Patch Meets its Context

Yaniv Romano and Michael Elad - Technion

Measuring the similarity between patches in images is a fundamental building block in various tasks. Naturally, the patch-size has a major impact on the matching quality, and on the consequent application performance. Under the assumption that our patch database is sufficiently sampled, using large patches (e.g. 21-by-21) should be preferred over small l ones (e.g. 7-by-7). However, this "dense-sampling" assumption is rarely true; in most cases large patches cannot find relevant nearby examples. This phenomenon is a consequence of the curse of dimensionality, stating that the database-size should grow exponentially with the patch-size to ensure proper matches. This explains the favored choice of small patch-size in most applications.

Is there a way to keep the simplicity and work with small patches while getting some of the benefits that large patches provide? In this work we offer such an approach. We propose to concatenate the regular content of a conventional (small) patch with a compact representation of its (large) surroundings -- its context. Therefore, with a minor increase of the dimensions (e.g. with additional 10 values to the patch representation), we implicitly/softly describe the information of a large patch. The additional descriptors are computed based on a self-similarity behavior of the patch surrounding.

We show that this approach achieves better matches, compared to the use of conventional-size patches, without the need to increase the database-size. Also, the effectiveness of the proposed method is tested on three distinct problems: (i) External natural image denoising, (ii) Depth image super-resolution, and (iii) Motion-compensated frame-rate up-conversion.



Temporal Epipolar Regions

Mor dar and Yael Moses – IDC

Dynamic events are often photographed by a number of people from different viewpoints at different times, resulting in an unconstrained set of images. Finding the corresponding moving features in each of the images allows us to extract information about objects of interest in the scene. Computing correspondence of moving features in such a set of images is considerably more challenging than computing correspondence in video due to possible significant differences in viewpoints and inconsistent timing between image captures. The prediction methods used in video for improving robustness and efficiency are not applicable to a set of still images. In this paper we propose a novel method to predict locations of an approximately linear moving feature point, given a small subset of correspondences and the temporal order of image captures. Our method extends the use of epipolar geometry to divide images into valid and invalid regions, termed Temporal Epipolar Regions (TERs). We formally prove that the location of a feature in a new image is restricted to valid TERs. We demonstrate the effectiveness of our method in reducing the search space for correspondence on both synthetic and challenging real world data, and show the improved matching.




A General Preprocessing Method for Improved Performance of Epipolar Geometry Estimation Algorithms

Maria Kushnir and Ilan Shimshoni - Haifa

In this paper a deterministic preprocessing algorithm is presented. It is especially designed to deal with repeated structures and wide baseline image pairs. It generates putative matches and their probabilities. They are then given as input to state-of-the-art epipolar geometry estimation algorithms, improving their results considerably, succeeding on hard cases on which they failed before. The algorithm consists of three steps, whose scope changes from local to global. In the local step, it extracts from a pair of images local features (e.g. SIFT), clustering similar features from each image. The clusters are matched yielding a large number of matches. Then pairs of spatially close features (2keypoint) are matched and ranked by a classifier. The highest ranked 2keypoint-matches are selected. In the global step, fundamental matrices are computed from each two 2keypoint-matches. A match's score is the number of fundamental matrices, which it supports. This number combined with scores generated by standard methods is given to a classifier to estimate its probability. The ranked matches are given as input to state-of-the-art algorithms such as BEEM, BLOGS and USAC yielding much better results than the original algorithms. Extensive testing was performed on almost 900 image pairs from six publicly available datasets.


On the Expressive Power of Deep Learning: A Tensor Analysis

Nadav Cohen, Or Sharir,  and Amnon Shashua - HUJI

It has long been conjectured that hypotheses spaces suitable for data that is compositional in nature, such as images or text, may be more efficiently represented with deep hierarchical networks than with shallow ones.  Despite the vast empirical evidence supporting this belief, theoretical analyses to date are limited.  In particular, they do not account for the locality, sharing and pooling constructs of convolutional networks, the most successful deep learning architecture to date.  In this work we derive an equivalence between convolutional networks and hierarchical tensor decompositions.  The type of decomposition corresponds to the structure of a network (depth, breadth, receptive fields), and the underlying algebraic operations correspond to the choice of activation and pooling operators.


Using tools from measure theory and tensor analysis, we show that linear activation and product pooling, corresponding to the SimNet architecture, lead to "complete depth efficiency", meaning that besides a negligible set, all functions that can be implemented by a deep network of polynomial size require exponential size in order to be implemented (or even approximated) by a shallow network.  We then show that with rectified linear activation and max or average pooling, corresponding to standard convolutional neural networks, the expressive power deteriorates: average pooling leads to loss of universality, whereas max pooling brings forth incomplete depth efficiency.  This leads us to believe that developing effective methods for training SimNets, thereby fulfilling their expressive potential, may give rise to a deep learning architecture that is provably superior to convolutional neural networks but has so far been overlooked by practitioners.

Blind Dehazing Using Internal Patch Recurrence

Yuval Bahat and Michal Irani – Weizmann

Images of outdoor scenes are often degraded by haze, fog and other scattering phenomena. In this work we show how such images can be dehazed using internal patch recurrence. Small image patches tend to repeat abundantly inside a natural image, both within the same scale, as well as across different scales. This behavior has been used as a strong prior for image denoising, super-resolution, image completion and more. Nevertheless, this strong recurrence property significantly diminishes when the imaging conditions are not ideal, as is the case in images taken under bad weather conditions (haze, fog, underwater scattering, etc.). In this work we show how we can exploit the deviations from the ideal patch recurrence for ``Blind Dehazing'' - namely, recovering the unknown haze parameters and reconstructing a haze-free image. We seek the haze parameters that, when used for dehazing the input image, will maximize the patch recurrence in the dehazed output image. More specifically, pairs of co-occurring patches at different depths (hence undergoing different degrees of haze) allow recovery of the Airlight color, as well as the relative-transmission of each such pair of patches. This in turn leads to dense recovery of the scene structure, and to full image dehazing.




Non-Local Image Dehazing

Dana Berman,  Tali Treibitz and Shai Avidan – Haifa + TAU


Haze limits visibility and reduces image contrast in outdoor images. The degradation is different for every pixel and depends on the distance of the scene point from the camera. This  dependency is expressed in the transmission coefficients, that control the scene attenuation and amount of haze in every pixel. Previous methods solve the single image dehazing problem using various patch-based priors. We, on the other hand, propose an algorithm based on a new, non-local prior. The algorithm relies on the assumption that colors of a haze-free image are well approximated by a few hundred distinct colors, that form tight clusters in RGB space. Our key observation is that pixels in a given cluster are often non-local, i.e., they are spread over the entire image plane and are located at different distances from the camera. In the presence of haze these varying distances translate to different transmission coefficients. Therefore, each color cluster in the clear image becomes a line in RGB space, that we term a haze-line. Using these haze-lines, our algorithm recovers both the distance map and the haze-free image. The algorithm is linear in the size of the image, deterministic and requires no training. It performs well on a wide variety of images and is competitive with other state-of-the-art methods.


PatchBatch: a Batch Augmented Loss for Optical Flow

Lior Wolf and David Gadot- TAU


We propose new loss functions for learning patch based descriptors via deep Convolutional Neural Networks. The learned descriptors are compared using the L2 norm and do not require network processing of pairs of patches. The success of the method is based on a few technical novelties, including an innovative loss function that, for each training batch, computes higher moments of the score distributions. Combined with an Approximate Nearest Neighbor patch matching method and a flow interpolating method, state of the art performance is obtained on the most challenging and competitive optical flow benchmarks.



Airborne Three-Dimensional Cloud Tomography

Aviad Levis, Yoav Schechner, and Amit Aides - Technion

We seek to sense the three dimensional (3D) volumetric distribution of scatterers in a heterogenous medium. An important case study for such a medium is the atmosphere. Atmospheric contents and their role in Earth’s radiation balance have significant uncertainties with regards to scattering components: aerosols and clouds. Clouds, made of water droplets, also lead to local effects as precipitation and shadows. Our sensing approach is computational tomography using passive multi-angular imagery. For light-matter interaction that accounts for multiple scattering, we use the 3D radiative transfer equation as a forward model. Volumetric recovery by inverting this model suffers from a computational bottleneck on large scales, which include many unknowns. Steps taken make this tomography tractable, without approximating the scattering order or angle range.



Detecting Repeating Objects using Patch Correlation Analysis

Inbar Huberman and  Raanan Fattal - HUJI

In this paper we describe a new method for detecting and counting a repeating object in an image. While the method relies on a fairly sophisticated deformable part model, unlike existing techniques it estimates the model parameters in an unsupervised fashion thus alleviating the need for a user-annotated training data and avoiding the associated specificity. This automatic fitting process is carried out by exploiting the recurrence of small image patches associated with the repeating object and analyzing their spatial correlation. The analysis allows us to reject outlier patches, recover the visual and shape parameters of the part model, and detect the object instances efficiently.

In order to achieve a practical system which is able to cope with diverse images, we describe a simple and intuitive active-learning procedure that updates the object classification by querying the user on very few carefully chosen marginal classifications. Evaluation of the new method against the state-of-the-art techniques demonstrates its ability to achieve higher accuracy through a better user experience.



Rule Of Thumb: Deep derotation for improved fingertip detection

Aaron Wetzler, Ron Slossberg, and Ron Kimmel – Technion

We investigate a novel global orientation regression approach for articulated objects using a deep convolutional neural network. This is integrated with an in-plane image derotation scheme, DeROT, to tackle the problem of per-frame fingertip detection in depth images. The method reduces the complexity of learning in the space of articulated poses which is demonstrated by using two distinct state-of-the-art learning based hand pose estimation methods applied to fingertip detection. Significant classification improvements are shown over the baseline implementation. Our framework involves no tracking, kinematic constraints or explicit prior model of the articulated object in hand. To support our approach we also describe a new pipeline for high accuracy magnetic annotation and labeling of objects imaged by a depth camera.



StixelNet: A Deep Convolutional Network for Obstacle Detection and Road Segmentation

Dan Levi, Noa Garnett, Ethan Fetaya – General Motors

General obstacle detection is a key enabler for obstacle avoidance in mobile robotics and autonomous driving. We address the task of detecting the closest obstacle in each direction from a driving vehicle. As opposed to existing methods based on 3D sensing we use a single color camera. In our approach the task is reduced to a column-wise regression problem. The regression is then solved using a deep convolutional neural network (CNN). In addition, we introduce a new loss function based on a semi-discrete representation of the obstacle position probability to train the network. The network is trained using ground truth automatically generated from a laser-scanner point cloud. Using the KITTI dataset, we show that the our monocular-based approach outperforms existing camera-based methods including ones using stereo. We also apply the network on the related task of road segmentation achieving among the best results on the KITTI road segmentation challenge.



Blind Restoration of Images with Piecewise Space-Variant  Blur

Leah Bar, Nir Sochen and Nahum Kiryati – TAU

We address the problem of a single image blind space-variant deblurring, where different parts of the image are blurred by different blur kernels. Assuming a region-wise space variant point spread function, a blur measure is defined followed by an evolving level set based segmentation procedure which extracts the regions. Then a blind kernel identification is carried out for each blur domain. We define a global space-variant deconvolution process which is stabilized by a unified common regularizer, thus preserving discontinuities between the differently restored image regions. Promising experimental results are presented for real images of two phase shift variant out of focus blur. 



Computational multi-focus imaging combining sparse model with color dependent phase mask

Harel Haim, Emanuel Marom and Alex Bronstein - TAU  

A method for extended depth of field imaging based on image acquisition through a thin binary phase plate followed by fast automatic computational post-processing is presented. By placing a wavelength dependent optical mask inside the pupil of a conventional camera lens, one acquires a unique response for each of the three main color channels, which adds valuable information that allows blind reconstruction of blurred images without the need of an iterative search process for estimating the blurring kernel. The presented simulation as well as capture of a real life scene show how acquiring a one-shot image focused at a single plane, enable generating a de-blurred scene over an extended range in space.



How old are you? DeepAge to the rescue

Omry Sendik and Yosi Keller – Bar-Ilan University


We present a joint Deep Convolutional Neural Network and Support Vector Regression approach for estimating a person’s age from a face. We start by leaning a robust face representation using deep network, followed by kernel-based support vector regression. We then show the age estimation accuracy can be further improved that by learning an age-related dimensionality reduction metric. The proposed schemes were successfully applied to the MORPH-II and FG-Net datasets outperforming contemporary state-of-the-art approaches.