2010 Israel Computer Vision Day
Sunday, December 5, 2010

The Efi Arazi School of Computer Science

IDC  Herzliya


Supported by GM - Advanced Technical Center - Israel








Previous Vision Days Web Page:  2003, 2004, 2005, 2006, 2007, 2008, 2009.



Vision Day Schedule



Speaker and Collaborators








Steven Seitz

Guest Lecturer 

University of Washington

Photos of People



Dan Levi

Aharon Bar-Hillel

Eyal Krupka

Chen Goldberg


GM – Israel Lab

Microsoft – Israel


Part-based feature synthesis for human detection


Benjamin Kimia
Maruthi Narayanan 

Brown University

Perceptual Fragments as a Mid-Level Representation for Object Recognition


Lihi Zelnik

Ayellet Tal

Stas Goferman

Dmitry Rudoy


The good, the bad and the beautiful pixels


Coffee Break            .


Omer Barkol

Hadas Kogan

Doron Shaked

Mani Fischer


 A Robust Measure For Automatic Inspection


Anat Levin

Boaz Nadler


Natural Image Denoising: Optimality and Inherent Bounds


Yacov Hel-Or

Hagit Hel-Or

Eyal David



Matching By Tone Mapping



Tali Basha

Yael Moses



Multi-View Scene Flow Estimation: A View Centered Variational Approach


Lunch            .


Shai Bagon

Or Brostovski

Meirav Galun

Michal Irani


Detecting and Sketching the Common


Amir Egozi

Hugo Guterman

Yosi Keller



Improving Shape retrieval by Spectral Matching and Meta Similarity


Marina Alterman

 Yoav Schechner

Aryeh Weiss


Multiplexed Fluorescence Unmixing


Guy Ben-Yosef

Ohad Ben-Shahar


Curve Completion as Minimum Length in the Tangent Bundle


Coffee Break            .


Gilad Freedman

Raanan Fattal


Image and Video Upscaling from Local Self-Examples


Yuval Barkan

Hedva Spitzer


New chromatic aberration algorithm  – A Model for neuronal compensation 


Yonatan Wexler

Boris Epshtein

Eyal Ofek

OrCam Visual Systems,

Microsoft WA

 Detecting Text in Natural Scenes with Stroke Width Transform 


Dan Raviv

Ron Kimmel
Alex Bronstein

Michael Bronstein



Lugano University

 Volumetric Heat Kernel Signatures


Guy Rosman 

Xue-Cheng Tai

Lorina Dascal

Ron Kimmel


Nanyang University

/ Bergen University

Polyakov Action for Efficient Color Image Processing




General:  This is the seventh Israel Computer Vision Day. It will be hosted at IDC.

For more details, requests to be added to the mailing list etc, please contact:

hagit@cs.haifa.ac.il   toky@idc.ac.il


Location and Directions: The Vision Day will take place at the Interdisciplinary Center (IDC), Herzliya, in the Ivcher Auditorium.  For driving instructions see map.

A convenient option is to arrive by train, see time schedule here. Get off at the Herzliya Station, and order a taxi ride by phone. There are two taxi stations that provide this service: Moniyot Av-Yam (09 9501263 or 09 9563111), and Moniyot Pituach (09 9582288 or 09 9588001).








Photos of People

Steven Seitz  - University of Washington


I have 10,000 photos of my five-year-old son.  This number, while it sounds large, is actually very common--most of us have many thousands of photos of family and friends.  These photos track the changes in my son's appearance, shape, and behavior over the course of his life.  They contain detailed information about shape geometry and reflectance, and it's evolution over time.  They also characterize the variability in his facial expressions.  Can we reconstruct my son from this collection (and what does that mean exactly?).  In this talk, I explore new directions in photo browsing and modeling from large photo collections of people.




Part-based feature synthesis for human detection

Dan Levi, Aharon Bar-Hillel, Eyal Krupka, Chen Goldberg – GM Israel, Microsoft Israel, TAU


We introduce a new approach for learning part-based object detection through feature synthesis. Our method consists of an iterative process of feature generation and pruning. A feature generation procedure is presented in which basic part-based features are developed into a feature hierarchy using operators for part localization, part refining and part combination. Feature pruning is done using a new feature selection algorithm for linear SVM, termed Predictive Feature Selection (PFS), which is governed by weight prediction. The algorithm makes it possible to choose from O(10^6) features in an efficient but accurate manner.  We analyze the validity and behavior of PFS and empirically demonstrate its speed and accuracy advantages over relevant competitors. We present an empirical evaluation of our method on three human detection datasets including the current de-facto benchmarks (the INRIA and Caltech pedestrian datasets) and a new challenging dataset of children images in difficult poses. The evaluation suggests that our approach is on a par with the best current methods and advances the state-of-the-art on the Caltech pedestrian training dataset.



Perceptual Fragments as a Mid-Level Representation for Object Recognition

Benjamin B. Kimia, Maruthi Narayanan  - Brown University


What kind of representation should mediate between coordinate-bound pixels and coordinate-free categories? The dramatic shift in paradigm from segmentation-then-recognition to the use of bags or structures of appearance-based parts has led to an implicit shift in the mid-level representation used. First, we argue that as the number of categories and the exemplars per category increases to realistic proportions, form will play the critical role in the intermediate-level representation with appearance features augmenting this role. We show that super-pixels and contour fragments are not adequate to represent partial objects and propose the use of atomic fragments, image regions induced by the shock-graph of image curves. While like super-pixels in appearance, these atomic fragments encode boundary end points and boundary relations as well as regional attributes. Second, we posit that the demise of the segmentation-based strategy was due to early commitment to grouping options. Instead, we propose a framework to maintain a set of alternate and conflicting hypotheses which are presented to the higher level processes in the form of image fragments.



The good, the bad and the beautiful pixels

Lihi Zelnik, Ayellet Tal, Stas Goferman and Dmitry Rudoy - Technion


In recent years more and more cameras are recording the world using more and more pixels. Watching and processing all this data takes lots of time, which we don't want to spend. But do we really need all the pixels? In this research I will show that in many cases we don't need all the pixels to convey the content of the recorded scene. More specifically, when multiple cameras view the same scene often a single ``good'' view suffices to visualize what's going on. Within a single view keeping only the ``important'' pixels suffices to convey the story the image/video tells. I hence intend to develop algorithms for finding such ``good'' views and ``important'' pixels.

A Robust Measure For Automatic Inspection

Omer Barkol, Hadas Kogan, Doron Shaked , Mani Fischer - HP


We introduce a new similarity measure that is insensitive to sub-pixel misregistration. The proposed measure is essential in some differences detection scenarios. For example, in a setting where a digital reference is compared to an image, where the imaging process introduces deformations that appear as non constant misregistration between the two images. Our goal is to ignore image differences that result from misregistration and detect only the true, albeit minute, defects. In order to define a misregistration insensitive similarity, we argue that a similarity measure must respect convex combinations. We show that the well known SSIM [1] does not hold this property and propose a modified version of SSIM that respects convex combinations. We then use this measure to define Sub-Pixel misregistration aware SSIM (SPSSIM).



Natural Image Denoising: Optimality and Inherent Bounds

Anat Levin, Boaz Nadler - Weizmann


The goal of natural image denoising is to estimate a clean version of a given noisy image, utilizing prior knowledge on the statistics of natural images. The problem has been studied intensively with considerable progress made in recent years. However, it seems that image denoising algorithms are starting to converge and recent algorithms improve over previous ones by only fractional dB values. It is thus important to understand how much more can we still improve natural image denoising algorithms and what are the inherent limits imposed by the actual statistics of the data. The challenge in evaluating such limits is that constructing proper models of natural image statistics is a long standing and yet unsolved problem.


To overcome the absence of accurate image priors, this work takes a non parametric approach and represents the distribution of natural images using a huge set of 1010 patches. We then derive a simple statistical measure which provides a lower bound on the optimal Bayesian minimum mean square error (MMSE). This imposes a limit on the best possible results of denoising algorithms which utilize a fixed support around a denoised pixel and a generic natural image prior. Our findings suggest that for small windows, state of the art denoising algorithms are approaching optimality and cannot be further improved beyond 0.1dB values.


Matching By Tone Mapping

Yacov Hel-Or, Hagit Hel-Or, Eyal David – IDC, HAIFA


A fast pattern matching scheme termed Matching by Tone Mapping (MTM) is introduced which allows matching under non-linear tone mappings. We exploit the recently introduced Slice Transform to implement a fast computational scheme requiring computational time similar to the fast implementation of Normalized Cross Correlation (NCC). In fact, the MTM measure can be viewed as a generalization of the NCC for non-linear mappings and actually reduces to NCC when mappings are restricted to be linear. The MTM is shown to be invariant to non-linear tone mappings, and is empirically shown to be highly discriminative and robust to noise.



Multi-View Scene Flow Estimation: A View Centered Variational Approach

Tali Basha, Yael Moses – TAU, IDC


We present a novel method for recovering the 3D structure and scene flow from calibrated multi-view sequences. We propose a 3D point cloud parametrization of the 3D structure and scene flow that allows us to directly estimate the desired unknowns. A unified global energy functional is proposed to incorporate the information from the available sequences and simultaneously recover both depth and scene flow. The functional enforces multi-view geometric consistency and imposes brightness constancy and piecewise smoothness assumptions directly on the 3D unknowns. It inherently handles the challenges of discontinuities, occlusions, and large displacements. The main contribution of this work is the fusion of a 3D representation and an advanced variational framework that directly uses the available multi-view information. The minimization of the functional is successfully obtained despite the non-convex optimization problem. The proposed method was tested on real and synthetic data.



Detecting and Sketching the Common

Shai Bagon,  Or Brostovski,  Meirav Galun,   Michal Irani – Weizmann


Given very few images containing a common object of interest under severe variations in appearance, we detect the common object and provide a compact visual representation of that object, depicted by a binary sketch. Our algorithm is composed of two stages:

(i) Detect a mutually common (yet non-trivial) ensemble of `self-similarity descriptors' shared by all the input images.

(ii) Having found such a mutually common ensemble, `invert' it to generate a compact sketch which best represents this ensemble.

This provides a simple and compact visual representation of the common object, while eliminating the background clutter of the query images. It can be obtained from very few query images. Such clean sketches may be useful for detection, retrieval, recognition, co-segmentation, and for artistic graphical purposes.



Improving Shape retrieval by Spectral Matching and Meta Similarity

Amir Egozi, Hugo Guterman, Yosi Keller – BGU, BIU


We propose two computational approaches for improving the retrieval of planar shapes. First, we suggest a geometrically motivated quadratic similarity measure that is optimized by way of spectral relaxation of a quadratic assignment. By utilizing state-of-the-art shape descriptors and a pairwise serialization constraint, we derive a formulation that is resilient to boundary noise, articulations and non-rigid deformations. This allows both shape matching and retrieval. We also introduce a shape meta-similarity measure that agglomerates pairwise shape similarities and improves the retrieval accuracy. When applied to the MPEG-7 shape dataset in conjunction with the proposed geometric matching scheme, we obtained a retrieval rate of 92.5%.



Multiplexed Fluorescence Unmixing

Marina Alterman, Yoav Schechner, Aryeh Weiss - Technion


Multiplexed imaging and illumination have been used to recover enhanced arrays of intensity or spectral reflectance samples, per pixel. However, these arrays are often not the ultimate goal of a system, since the intensity is a result of underlying object characteristics, which interest the user. For example, spectral reflectance, emission or absorption distributions stem from an underlying mixture of materials. Therefore, systems try to infer concentrations of these underlying mixed components. Thus, computational analysis does not end with recovery of intensity (or equivalent) arrays. Inversion of mixtures, termed unmixing, is central to many problems. We incorporate the mixing/unmixing process explicitly into the optimization of multiplexing codes. This way, optimal recovery of the underlying components (materials) is directly sought. Without this integrated approach, multiplexing can even degrade the unmixing result. Moreover, by directly defining the goal of data acquisition to be recovery of components (materials) rather than of intensity arrays, the acquisition becomes more efficient. This yields significant generalizations of multiplexing theory. We apply this approach to fluorescence imaging.



Curve completion as minimum length in the tangent bundle

Guy Ben-Yosef, Ohad Ben-Shahar - BGU


Visual curve completion is typically handled in an axiomatic fashion where the shape of the sought-after completed curve follows formal descriptions of desired, image-based perceptual properties (e.g, minimum curvature, roundedness, etc...). Unfortunately, however, these desired properties are still a matter of debate in the perceptual literature. Instead of the image plane, here we study the problem in the mathematical space R2 ͺ S1 that abstracts the cortical areas where curve completion occurs. In this space one can apply basic principles from which perceptual properties in the image plane are derived rather than imposed. In particular, we show how a “least action” principle in R2ͺS1 entails many perceptual properties which have support in the perceptual curve completion literature. We formalize this principle in a variational framework for general parametric curves, we derive its differential properties, we present numerical solutions, and we show results on a variety of images.



Image and Video Upscaling from Local Self-Examples

Gilad Freedman, Raanan Fattal - HUJI


We propose a new high-quality and efficient single-image upscaling technique that extends existing example-based super-resolution frameworks. In our approach we do not rely on an external example database or use the whole input image as a source for example patches. Instead, we follow a local self-similarity assumption on natural images and extract patches from extremely localized regions in the input image. This allows us to reduce considerably the nearest-patch search time without compromising quality in most images. Tests, that we perform and report, show that the local-self similarity assumption holds better for small scaling factors where there are more example patches of greater relevance. We implement these small scalings using dedicated novel non-dyadic filter banks, that we derive based on principles that model the upscaling process. Moreover, the new filters are nearly-biorthogonal and hence produce high-resolution images that are highly consistent with the input image without solving implicit back-projection equations. The local and explicit nature of our algorithm makes it simple, efficient and allow a trivial parallel implementation on a GPU. We demonstrate the new method ability to produce high-quality resolution enhancement, its application to video sequences with no algorithmic modifications, and its efficiency to perform real-time enhancement of low-resolution video standard into recent high-definition formats.



A New chromatic aberration algorithm  – A Model for neuronal compensation

Yuval Barkan, Hedva Spitzer - TAU


The human visual system has many challenges among them is the need to overcome imperfections of its optics, which degrades the retinal image. One of the most dominant limitations is the dependence of the refractive power of the lens with wavelength which causes the short-wavelengths (blue light) to be focused in front  the retina, and thus blurring the retinal chromatic image. This phenomenon is termed longitudinal chromatic aberration (LCA). In this paper we ask the intriguing question, of how it is that despite the imperfections of the ocular optics, the perceived visual appearance is still of a sharp and clear chromatic image. We propose a plausible neural mechanism and model which is supported by known physiological and psychophysical evidences, and by a computational model that demonstrates the compensations mechanism on real images. We also test whether the proposed model can be used as an effective algorithm for correction of images with chromatic aberrations. The model and the algorithm are based on the structure of retinal color coding cells, including of blue-yellow cells (small bistratfied cells). The uniqueness of the blue-yellow cells (i.e., spatial and chromatic masks) which have large and overlapped receptive fields regions (that are excited by blue and inhibited by yellow lights) can lead to a significant compensation of LCA. This has been demonstrated by computational simulations of our model on a large set of real images. The algorithm can repair LCA in general and that which is expressed mainly in cameras that have short depth of field. We further show that an artifact of the neuronal compensation mechanism is a prominent visual phenomenon of large color shifts, which was found recently by Monnier & Shevell (2003).



Detecting Text in Natural Scenes with Stroke Width Transform

Yonatan Wexler, Boris Epshtein and Eyal Ofek  – OrCam Visual Systems, Microsoft - Redmond


In this talk I will present a new image operator that seeks to find the value of stroke width for each image pixel, and demonstrate its use on the task of text detection in natural images. The suggested operator is local and data dependent, which makes it fast and robust enough to eliminate the need for multi-scale computation or scanning windows. Extensive testing shows that the suggested scheme outperforms the latest published algorithms. Its simplicity allows the algorithm to detect texts in many fonts and languages.


This is joint work with B. Epshtein and E. Ofek from Microsoft, Redmond. It was presented at the 2010 conference for Computer Vision and Pattern

Recognition in San-Francisco.


Volumetric Heat Kernel Signatures

Dan Raviv, Ron Kimmel, Alex Bronstein, Michael Bronstein – Technion, TAU, Lugano University


Invariant shape descriptors are instrumental in numerous shape analysis tasks including deformable shape comparison, registration, classification, and retrieval. Most existing constructions model a 3D shape as a two-dimensional surface describing the shape boundary, typically represented as a triangular mesh or a point cloud. Using intrinsic properties of the surface, invariant descriptors can be designed. One such example is the recently introduced heat kernel signature, based on the Laplace-Beltrami operator of the surface. In many applications, however, a volumetric shape model is more natural and convenient. Moreover, modeling shape deformations as approximate isometries of the volume of an object, rather than its boundary, better captures natural behavior of non-rigid deformations in many cases. Here, we extend the idea of heat kernel signature to robust isometry-invariant volumetric descriptors, and show their utility in shape retrieval. The proposed approach achieves state-of-the-art results on the SHREC 2010 large-scale shape retrieval benchmark.



Polyakov Action for Efficient Color Image Processing

Guy Rosman, Xue-Cheng Tai, Lorina Dascal, Ron Kimmel - Technion, Nanyang University / Bergen University


The Laplace-Beltrami operator is an extension of the Laplacian from flat domains to curved manifolds. It was proven to be useful for color image processing as it models a meaningful coupling between the color channels. This coupling is naturally expressed in the Beltrami framework in which a color image is regarded as a two dimensional manifold embedded in a hybrid, five dimensional, spatial-chromatic (x,y,R,G,B) space.


The Beltrami filter defined by this framework minimizes the Polyakov action, adopted from high-energy physics, which measures the area of the image manifold. Minimization is usually obtained through a geometric heat equation defined by the Laplace-Beltrami operator. Though efficient simplifications such as the bilateral filter have been proposed for the single channel case, so far, the coupling between the color channel posed a non-trivial obstacle when designing fast Beltrami filters.


Here, we  propose to use an augmented Lagrangian approach to design an efficient and accurate regularization framework for color image processing by minimizing the Polyakov action. We extend the augmented Lagrangian framework for total variation (TV) image denoising to the more general Polyakov action case for color images, and apply the proposed framework to denoise and deblur color images.