Abstracts


Visual Learning of Arithmetic Operations

Yedid Hoshen and Shmuel Peleg  HUJI

A simple Neural Network model is presented for
endtoend visual learning of arithmetic operations from pictures of numbers.
The input consists of two pictures, each showing a 7digit number. The
output, also a picture, displays the number showing the result of an
arithmetic operation (e.g., addition or subtraction) on the two input
numbers. The concepts of a number, or of an operator, are not explicitly
introduced. This indicates that addition is a simple cognitive task, which
can be learned visually using a very small number of neurons.
Other operations, e.g., multiplication, were not learnable using this
architecture. Some tasks were not learnable endtoend (e.g., addition with
Roman numerals), but were easily learnable once broken into two separate
subtasks: a perceptual Character Recognition and cognitive Arithmetic
subtasks. This indicates that while some tasks may be easily learnable
endtoend, other may need to be broken into subtasks.

ConPatch: When a Patch Meets its Context

Yaniv Romano and Michael Elad  Technion

Measuring the similarity
between patches in images is a fundamental building block in various tasks.
Naturally, the patchsize has a major impact on the matching quality, and on
the consequent application performance. Under the assumption that our patch
database is sufficiently sampled, using large patches (e.g. 21by21) should
be preferred over small l ones (e.g. 7by7). However, this
"densesampling" assumption is rarely true; in most cases large
patches cannot find relevant nearby examples. This phenomenon is a consequence
of the curse of dimensionality, stating that the databasesize should grow
exponentially with the patchsize to ensure proper matches. This explains the
favored choice of small patchsize in most applications.
Is there a way to keep the simplicity and work with small patches
while getting some of the benefits that large patches provide? In this work
we offer such an approach. We propose to concatenate the regular content of a
conventional (small) patch with a compact representation of its (large)
surroundings  its context. Therefore, with a minor increase of the
dimensions (e.g. with additional 10 values to the patch representation), we
implicitly/softly describe the information of a large patch. The additional
descriptors are computed based on a selfsimilarity behavior of the patch
surrounding.
We show that this approach achieves better matches, compared to the
use of conventionalsize patches, without the need to increase the
databasesize. Also, the effectiveness of the proposed method is tested on
three distinct problems: (i) External natural image denoising, (ii) Depth
image superresolution, and (iii) Motioncompensated framerate
upconversion.

Temporal Epipolar Regions

Mor dar and Yael Moses – IDC

Dynamic
events are often photographed by a number of people from different viewpoints
at different times, resulting in an unconstrained set of images. Finding the
corresponding moving features in each of the images allows us to extract
information about objects of interest in the scene. Computing correspondence
of moving features in such a set of images is considerably more challenging
than computing correspondence in video due to possible significant
differences in viewpoints and inconsistent timing between image captures. The
prediction methods used in video for improving robustness and efficiency are
not applicable to a set of still images. In this paper we propose a novel
method to predict locations of an approximately linear moving feature point,
given a small subset of correspondences and the temporal order of image
captures. Our method extends the use of epipolar geometry to divide images
into valid and invalid regions, termed Temporal Epipolar Regions (TERs). We
formally prove that the location of a feature in a new image is restricted to
valid TERs. We demonstrate the effectiveness of our method in reducing the
search space for correspondence on both synthetic and challenging real world
data, and show the improved matching.

A General Preprocessing Method for Improved Performance of Epipolar
Geometry Estimation Algorithms

Maria
Kushnir and Ilan Shimshoni  Haifa

In this paper a deterministic preprocessing algorithm is presented. It is
especially designed to deal with repeated structures and wide baseline image
pairs. It generates putative matches and their probabilities. They are then
given as input to stateoftheart epipolar geometry estimation algorithms,
improving their results considerably, succeeding on hard cases on which they
failed before. The algorithm consists of three steps, whose scope changes
from local to global. In the local step, it extracts from a pair of images
local features (e.g. SIFT), clustering similar features from each image. The
clusters are matched yielding a large number of matches. Then pairs of
spatially close features (2keypoint) are matched and ranked by a classifier.
The highest ranked 2keypointmatches are selected. In the global step,
fundamental matrices are computed from each two 2keypointmatches. A match's
score is the number of fundamental matrices, which it supports. This number
combined with scores generated by standard methods is given to a classifier
to estimate its probability. The ranked matches are given as input to
stateoftheart algorithms such as BEEM, BLOGS and USAC yielding much better
results than the original algorithms. Extensive testing was performed on
almost 900 image pairs from six publicly available datasets.

On the Expressive Power of Deep Learning: A Tensor Analysis

Nadav Cohen, Or Sharir, and Amnon Shashua  HUJI

It has long been conjectured that hypotheses spaces suitable for data that is
compositional in nature, such as images or text, may be more efficiently represented
with deep hierarchical networks than with shallow ones. Despite the
vast empirical evidence supporting this belief, theoretical analyses to date
are limited. In particular, they do not account for the locality,
sharing and pooling constructs of convolutional networks, the most successful
deep learning architecture to date. In this work we derive an
equivalence between convolutional networks and hierarchical tensor
decompositions. The type of decomposition corresponds to the structure
of a network (depth, breadth, receptive fields), and the underlying
algebraic operations correspond to the choice of activation and pooling
operators.
Using tools from measure theory and tensor analysis, we show that
linear activation and product pooling, corresponding to the SimNet architecture, lead to "complete depth
efficiency", meaning that besides a negligible set, all functions that
can be implemented by a deep network of polynomial size require exponential
size in order to be implemented (or even approximated) by a shallow
network. We then show that with rectified linear activation and
max or average pooling, corresponding to standard convolutional neural
networks, the expressive power deteriorates: average pooling leads to
loss of universality, whereas max pooling brings forth incomplete depth
efficiency. This leads us to believe that developing effective methods
for training SimNets, thereby fulfilling their
expressive potential, may give rise to a deep learning architecture that is
provably superior to convolutional neural networks but has so far been
overlooked by practitioners.

Blind Dehazing Using Internal Patch Recurrence

Yuval Bahat and Michal Irani  Weizmann

Images of outdoor scenes are often degraded by haze, fog and other scattering
phenomena. In this work we show how such images can be dehazed using internal
patch recurrence. Small image patches tend to repeat abundantly inside a
natural image, both within the same scale, as well as across different
scales. This behavior has been used as a strong prior for image denoising,
superresolution, image completion and more. Nevertheless, this strong
recurrence property significantly diminishes when the imaging conditions are
not ideal, as is the case in images taken under bad weather conditions (haze,
fog, underwater scattering, etc.). In this work we show how we can exploit
the deviations from the ideal patch recurrence for ``Blind
Dehazing''  namely, recovering the unknown haze parameters and
reconstructing a hazefree image. We seek the haze parameters that, when used
for dehazing the input image, will maximize the patch recurrence in the
dehazed output image. More specifically, pairs of cooccurring patches at
different depths (hence undergoing different degrees of haze) allow recovery of
the Airlight color, as well as the relativetransmission of each such pair of
patches. This in turn leads to dense recovery of the scene structure, and to
full image dehazing.

NonLocal Image Dehazing

Dana Berman, Tali
Treibitz and Shai Avidan –
Haifa + TAU

Haze limits visibility and reduces image contrast in outdoor images.
The degradation is different for every pixel and depends on the distance of
the scene point from the camera. This
dependency is expressed in the transmission coefficients, that control
the scene attenuation and amount of haze in every pixel. Previous methods
solve the single image dehazing problem using various patchbased priors. We,
on the other hand, propose an algorithm based on a new, nonlocal prior. The
algorithm relies on the assumption that colors of a hazefree image are well
approximated by a few hundred distinct colors, that form tight clusters in
RGB space. Our key observation is that pixels in a given cluster are often
nonlocal, i.e., they are spread over the entire image plane and are located
at different distances from the camera. In the presence of haze these varying
distances translate to different transmission coefficients. Therefore, each
color cluster in the clear image becomes a line in RGB space, that we term a
hazeline. Using these hazelines, our algorithm recovers both the distance
map and the hazefree image. The algorithm is linear in the size of the
image, deterministic and requires no training. It performs well on a wide
variety of images and is competitive with other stateoftheart methods.

PatchBatch: a
Batch Augmented Loss for Optical Flow

Lior Wolf and David Gadot  TAU

We propose new loss functions for learning patch based descriptors via
deep Convolutional Neural Networks. The learned descriptors are compared
using the L2 norm and do not require network processing of pairs of patches.
The success of the method is based on a few technical novelties, including an
innovative loss function that, for each training batch, computes higher
moments of the score distributions. Combined with an Approximate Nearest
Neighbor patch matching method and a flow interpolating method, state of the
art performance is obtained on the most challenging and competitive optical
flow benchmarks.

Airborne
ThreeDimensional Cloud Tomography

Aviad Levis, Yoav Schechner, and Amit Aides  Technion

We seek to sense the three dimensional (3D) volumetric distribution of scatterers
in a heterogenous medium. An important case study for such a medium is the
atmosphere. Atmospheric contents and their role in Earth’s radiation balance
have significant uncertainties with regards to scattering components:
aerosols and clouds. Clouds, made of water droplets, also lead to local
effects as precipitation and shadows. Our sensing approach is computational
tomography using passive multiangular imagery. For lightmatter interaction
that accounts for multiple scattering, we use the 3D radiative transfer
equation as a forward model. Volumetric recovery by inverting this model
suffers from a computational bottleneck on large scales, which include many
unknowns. Steps taken make this tomography tractable, without approximating
the scattering order or angle range.

Detecting Repeating Objects using Patch
Correlation Analysis

Inbar Huberman and Raanan Fattal  HUJI

In this paper we describe a new method for detecting and
counting a repeating object in an image. While the method relies on a fairly
sophisticated deformable part model, unlike existing techniques it estimates
the model parameters in an unsupervised fashion thus alleviating the need for
a userannotated training data and avoiding the associated specificity. This
automatic fitting process is carried out by exploiting the recurrence of
small image patches associated with the repeating object and analyzing their
spatial correlation. The analysis allows us to reject outlier patches,
recover the visual and shape parameters of the part model, and detect the
object instances efficiently.
In order to achieve a practical system which is able to cope with diverse
images, we describe a simple and intuitive activelearning procedure that
updates the object classification by querying the user on very few carefully
chosen marginal classifications. Evaluation of the new method against the
stateoftheart techniques demonstrates its ability to achieve higher
accuracy through a better user experience.

Rule Of Thumb: Deep derotation for improved
fingertip detection

Aaron Wetzler, Ron Slossberg, and Ron
Kimmel – Technion

We investigate
a novel global orientation regression approach for articulated objects using
a deep convolutional neural network. This is integrated with an inplane
image derotation scheme, DeROT, to tackle the problem of perframe fingertip
detection in depth images. The method reduces the complexity of learning in
the space of articulated poses which is demonstrated by using two distinct
stateoftheart learning based hand pose estimation methods applied to
fingertip detection. Significant classification improvements are shown over
the baseline implementation. Our framework involves no tracking, kinematic
constraints or explicit prior model of the articulated object in hand. To
support our approach we also describe a new pipeline for high accuracy
magnetic annotation and labeling of objects imaged by a depth camera.

StixelNet:
A Deep Convolutional Network for Obstacle Detection and Road Segmentation

Dan Levi, Noa Garnett, Ethan Fetaya – General Motors

General obstacle detection is a key enabler for
obstacle avoidance in mobile robotics and autonomous driving. We address the task
of detecting the closest obstacle in each direction from a driving vehicle.
As opposed to existing methods based on 3D sensing we use a single color
camera. In our approach the task is reduced to a columnwise regression
problem. The regression is then solved using a deep convolutional neural
network (CNN). In addition, we introduce a new loss function based on a
semidiscrete representation of the obstacle position probability to train
the network. The network is trained using ground truth automatically
generated from a laserscanner point cloud. Using the KITTI dataset, we show
that the our monocularbased approach outperforms existing camerabased
methods including ones using stereo. We also apply the network on the related
task of road segmentation achieving among the best results on the KITTI road
segmentation challenge.

Blind Restoration of Images with Piecewise
SpaceVariant Blur

Leah Bar, Nir Sochen and Nahum Kiryati – TAU

We address the problem of a single image blind
spacevariant deblurring, where different parts of
the image are blurred by different blur kernels. Assuming a regionwise space
variant point spread function, a blur measure is defined followed by an
evolving level set based segmentation procedure which extracts the regions.
Then a blind kernel identification is carried out for each blur domain. We
define a global spacevariant deconvolution process which is stabilized by a
unified common regularizer, thus preserving
discontinuities between the differently restored image regions. Promising
experimental results are presented for real images of two phase shift variant
out of focus blur.

Computational multifocus imaging combining sparse model
with color dependent phase mask

Harel Haim, Emanuel
Marom
and
Alex Bronstein  TAU

A
method for extended depth of field imaging based on image acquisition through
a thin binary phase plate followed by fast automatic computational
postprocessing is presented. By placing a wavelength dependent optical mask
inside the pupil of a conventional camera lens, one acquires a unique
response for each of the three main color channels, which adds valuable
information that allows blind reconstruction of blurred images without the
need of an iterative search process for estimating the blurring kernel. The
presented simulation as well as capture of a real life scene show how
acquiring a oneshot image focused at a single plane, enable generating a
deblurred scene over an extended range in space.

How old are you? DeepAge to the rescue

Omry Sendik and Yosi Keller – BarIlan University

We present a joint Deep Convolutional Neural Network and Support
Vector Regression approach for estimating a person’s age from a face. We
start by leaning a robust face representation using deep network, followed
by kernelbased support vector regression. We then show the age estimation
accuracy can be further improved that by learning an agerelated
dimensionality reduction metric. The proposed schemes were successfully
applied to the MORPHII and FGNet datasets outperforming contemporary
stateoftheart approaches.
