Description: Description: Description: Description: Description: Description: Description: Description: Description: image001                                               Description: Description: Description: Description: Description: Description: Description: Description: Description: image004


2011 Israel Computer Vision Day
Sunday, December 25, 2011

The Efi Arazi School of Computer Science

IDC,  Herzliya


Description: Description: Description: Description: Description: Description: Description: Description: Description: image006

Partially Supported by GM - Advanced Technical Center - Israel




The Vision Day is free for all and no registration is required.


The Vision day will take place in the Ivzer Auditorium (see directions below).
 To handle overflow in the lecture hall, the talks will also be broadcasted in a nearby auditorium (Elpern).


Previous Vision Days Web Page:  2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010.



Vision Day Schedule



Speaker and Collaborators






Todd Zickler
Ayan Chakrabarti


Spatio-spectral image statistics (for vision?) 


Orit Kliper
Tal Hassner
Lior Wolf

Open Univ.
Tel-Aviv Univ.

The One-Shot-Similarity Metric Learning (OSSML) for Action Recognition


Michael Lindenbaum
Artiom Myaskouvskey Yann Gousseau


Beyond independence: An extension of the a-contrario decision procedure


Tali Basha
Yael Moses
Shai Avidan

Tel-Aviv Univ.
Tel-Aviv Univ.

Geometrically Consistent Stereo Seam Carving


Coffee Break 


Dolev Pomeranz, Michal Shemesh

Ohad Ben-Shahar

Ben Gurion Univ.

A fully automated greedy square jigsaw puzzle solver


Amnon Shashua

Shai Shalev

Yontanan Wexler

Hebrew Univ.

ShareBoost: Efficient Multiclass Learning with Feature Sharing


Yonathan Aflalo

Dan Raviv
Ron Kimmel


Scale invariant geometry for non-rigid surface analysis



13:50- 14:00



Daniel Glasner
Shiv N. Vitaladevuni
Ronen Basri


Contour-Based Joint Clustering of Multiple Segmentations


Amit Goldstein
Raanan Fattal

Hebrew Univ.

Video Stabilization using Epipolar Geometry


Meir Cohen
Ilan Shimshoni
Ehud Rivlin
Amit Adam

Haifa Univ.
Rafael (+Technion)

Detecting Mutual Awareness Events


Yoav Schechner
Marina Alternamn
Joseph Shamir
Pietro Perona
David Diner
John Martonchik 

Calif. Inst. Tech.

Vision through the air-water


Coffee Break  


Oded Shahar
Alon Faktor
Michal Irani


Space-Time Super-Resolution from a Single Video


Aharon Bar Hillel
Dmitri Hanukaev

Dan Levi

General Motors ATCI

Hebrew Univ.

General Motors ATCI

Fusing visual and range imaging for object class recognition



Shai Bagon
Sebastian Nowozin, Carsten Rother
Toby Sharp
Pushmeet Kohli Bangpeng Yao 

Microsoft Res. Camb.

Decision Tree Fields



General:  This is the eighth Israel Computer Vision Day. It will be hosted at IDC.

The Vision Day is free for all and no registration is required.
For more details, requests to be added to the mailing list etc, please contact:

The vision day is organized by Yael Moses and Yacov Hel-Or from the Interdisciplinary Center Herzliya,
and Hagit Hel-Or Haifa University.


Location and Directions: The Vision Day will take place at the Interdisciplinary Center (IDC), Herzliya, in the Ivcher Auditorium. 
For driving instructions see map.

A convenient option is to arrive by train, see time schedule here. Get off at the Herzliya Station, and order a taxi ride by phone. There are two taxi stations that provide this service: Moniyot Av-Yam (09 9501263 or 09 9563111), and Moniyot Pituach (09 9582288 or 09 9588001).






Spatio-spectral image statistics for vision

Todd Zickler and Ayan Chakrabarti -- Harvard

There are many models for spatial statistics of grayscale images, and we use these frequently for tasks like denoising and deblurring. There are also (much simpler) models for statistics of spectral point samples, and these are useful for color constancy. What about correlations between the spatial and spectral dimensions of an image? Are they significant? Are they useful? I'll describe some initial attempts at answering these questions, including our collection and analysis of a database of visible-range hyperspectral natural images.


The One-Shot-Similarity Metric Learning (OSSML) for Action Recognition

Orit Kliper (Tel-Aviv Univ.)  Tal Hassner (Open Univ), and Lior Wolf (Tel-Aviv Univ.)

The One-Shot-Similarity (OSS) is a framework for classier-based similarity functions.
It is based on the use of background samples and was shown to excel in tasks ranging from face recognition to document analysis. However, we found that its performance depends on the ability to effectively learn the underlying classifiers, which in turn depends on the underlying metric. In this work we present a metric learning technique that is geared toward improved OSS performance. We test the proposed technique using the recent ASLAN - Action Similarity LAbeliNg data set and benchmark. The ASLAN data set includes thousands of videos collected from the web, in over 400 complex action classes. The benchmark focuses on action similarity (same/not-same), and testing is performed on never-before-seen actions. We will also present the baseline results on the ASLAN benchmark, and compare them to human performance. We show that enhanced performance is obtained using the new presented OSSML, and that this method compares favorably to leading similarity learning techniques.


Beyond independence: An extension of the a-contrario decision procedure

Michael Lindenbaum, Artiom Myaskouvskey, and Yann Gousseau -- Technion

The a contrario approach is a principled method for making
algorithmic decisions, that has been applied successfully to many tasks in image analysis. The method is based on a background model (or null hypothesis) for the image, relying on independence assumptions and characterizing images in which no detection should be made. This model is often image dependent, relying on statistics gathered from the image, and therefore adaptive. In this work we propose a generalization for background models which relaxes the independence assumption and instead uses image dependent second order properties. The second order properties are modeled using graphical models. The modified a contrario technique is applied to two tasks: line segment detection and part-based object detection, and its advantages are demonstrated. In particular, we show that the proposed method enables reasonably accurate prediction of the false detection rate with no need for training data.



Geometrically Consistent Stereo Seam Carving

Tali Basha (Tel-Aviv Univ), Yael Moses (IDC), Shai Avidan (Tel-Aviv Univ)

Image retargeting algorithms attempt to adapt the image content to the screen without distorting the important objects in the scene. Existing methods address retargeting of a single image. In this paper we propose a novel method for retargeting a pair of stereo images. Naively retargeting each image independently will distort the geometric structure and make it impossible to perceive the 3D structure of the scene. We show how to extend a single image seam carving to work on a pair of images. Our method minimizes the visual distortion in each of the images as well as the depth distortion. A key property of the proposed method is that it takes into account the visibility relations between pixels in the image pair (occluded and occluding pixels). As a result, our method guarantees, as we formally prove, that the retargeted pair is geometrically consistent with a feasible 3D scene, similar to the original one. Hence, the retargeted stereo pair can be viewed on a stereoscopic display or processed by any computer vision algorithm. We demonstrate our method on a number of challenging indoor and outdoor stereo images.


A fully automated greedy square jigsaw puzzle solver

Dolev Pomeranz , Michal Shemesh and Ohad Ben Shahar -- Ben-Gurion Univ.

In the square jigsaw puzzle problem one is required to reconstruct the complete image from a set of non-overlapping, unordered, square puzzle parts. Here we propose a fully automatic solver for this problem, where unlike some previous work, it assumes no clues regarding parts’ location and requires no prior knowledge about the original image or its simplified (e.g., lower resolution) versions. To do so, we introduce a greedy solver which combines both informed piece placement and rearrangement of puzzle segments to find the final solution. Among our other contributions are new compatibility metrics which better predict the chances of two given parts to be neighbors, and a novel estimation measure which evaluates the quality of puzzle solutions without the need for ground-truth information. Incorporating these contributions, our approach facilitates solutions that surpass state-of-the-art solvers on puzzles of size larger than ever attempted before.

ShareBoost: Efficient Multiclass Learning with Feature Sharing

Amnon Shashua (Hebrew Univ.),  Shai Shalev-Schwartz (Hebrew Univ.), and Yoni Wexler (Orcam)

Multiclass prediction is the problem of classifying an object into a relevant target class.  We consider the problem of learning a multiclass predictor that uses only few features, and in particular, the number of used features should increase sub-linearly with the number of possible classes. This implies that features should be shared by several classes. We describe and analyze the ShareBoost algorithm for learning a multiclass predictor that uses few shared features. We prove that ShareBoost efficiently finds a predictor that uses few shared features (if such a predictor exists) and that it has a small generalization error. We also describe how to use ShareBoost for learning a non-linear predictor that has a fast evaluation time. In a series of experiments with natural data sets we demonstrate the benefits of ShareBoost and evaluate its success relatively to other state-of-the-art approaches.



Scale invariant geometry for non-rigid surface analysis

Yonathan Aflalo, Dan Raviv and Ron Kimmel -- Technion


Local scale variations within the same species are common in nature. The shape matching puzzle poses fascinating questions, like how should we measure the discrepancy between a small dog with large ears and a large one with small ears? are there similar geometric structures that are common to an elephant and a giraffe? what is the morphometric similarity between a blue whale and a dolphin? Existing tools that attempt to quantify the resemblance between surfaces which are insensitive to deformations in size are limited to either scale invariant local descriptors, or global normalization methods. Here, we propose novel tools for shape exploration by introducing a scale invariant metric for surfaces. The geometric measures we consider can be used for non-rigid shape analysis, it could help in generating local invariant features, produce scale invariant geodesics, embed one surface into another while being robust to changes in local and global size, and assist in the computational study of intrinsic symmetries where size does not matter.


Contour-Based Joint Clustering of Multiple Segmentations

Daniel Glasner, Shiv Vitaladevuni, and  Ronen Basri -- Weizmann

We present an unsupervised, shape-based method for joint clustering of multiple image segmentations. Given two or more closely-related images, such as close frames in a video sequence or images of the same scene taken under different lighting conditions, our method generates a joint segmentation of the images. We introduce a novel contour-based representation that allows us to cast the shape-based joint clustering problem as a quadratic semi-assignment problem. Our score function is additive. We use complex-valued affinities to assess the quality of matching the edge elements at the exterior bounding contour of clusters, while ignoring the contributions of elements that fall in the interior of the clusters. We further combine this contour-based score with region information and use a linear programming relaxation to solve for the joint clusters. We evaluate our approach on the occlusion boundary data-set of Stein et al. 


Video Stabilization using Epipolar Geometry

Amit Goldstein and Raanan Fattal -- Hebrew University

We present a new video stabilization technique that uses projective scene reconstruction to treat jittered videos sequences. Unlike methods that recover the full three-dimensional geometry of the scene, this model accounts for simple geometric relations between points and
epipolar lines. Using this level of scene understanding, we obtain the physical correctness of 3D stabilization methods yet avoid their lack of robustness and computational costs. Our method consists of tracking features points in the scene and using them to compute fundamental matrices that model stabilized camera motion. We then project the tracked points onto the novel stabilized frames using epipolar point transfer and synthesize new frames using image-based frame warping. Since this model is only valid for static scenes, we develop a time-view reprojection that accounts for non-stationary points in a principled way. This reprojection is based on modeling the dynamics of smooth inertial object motion in three-dimensional space and allows us to avoid the need to interpolate stabilization for moving objects from their static surrounding. Thus, we achieve an adequate stabilization when both the camera and the objects are moving. We demonstrate the abilities of our approach to stabilize hand-held video shots in various scenarios: scenes with no parallax that challenge 3D approaches, scenes containing non-trivial parallax effects, videos with camera zooming and in-camera stabilization, as well as movies with large moving objects.



Detecting Mutual Awareness Events

Meir Cohen (Technion), Ilan Shimshoni (Haifa Univ.), Ehud Rivlin (Technion) and  Amit Adam (Technion)

It is quite common that multiple human observers attend to a single static interest point. This is known as a mutual awareness (MAW) event. A preferred way to monitor these situations is with a camera that captures the human observers while using existing face detection and head pose estimation algorithms. The current work studies the underlying geometric constraints of MAW events and reformulates them in terms of image measurements. The constraints are then used in a method that (1) detects whether such an interest point does exist, (2) determines where it is located, (3) identifies who was attending to it, and (4) reports where and when each observer was while attending to it. The method is also applied on another interesting event when a single moving human observer fixates on a single static interest point.

The method suits the general case of an uncalibrated camera in a general environment. This is in contrast to other works on similar problems that inherently assume a known environment and a calibrated camera. The method was tested on about 75 images from various scenes and robustly detects MAW events and estimates theirs related attributes. Most of the images were found by searching the Internet.




Vision through the air-water surface

Yoav Schechner , Marina Alternamn (Technion),  Joseph Shamir, Pietro Perona, David Diner, John Martonchik (CalTech 

We increase the complexity of visual tasks by considering vision through the water surface. Here, there is need to handle reflection from the water surface, atmospheric effects, underwater scattering and random surface distortions. Seeing into water from above (e.g., space) is important for remote sensing of coastal regions, while seeing into air from underwater is related to marine animal vision. We present models and methods for such tasks. In particular, we explain how true object motion can be distinguished from the random dynamic motion of image projection caused by water waves.


Space-Time Super-Resolution from a Single Video

Oded Shahar,  Alon Faktor, and Michal Irani -- Weizmann

Spatial Super Resolution (SR) aims to recover fine image details, smaller than a pixel size. Temporal SR aims to recover rapid dynamic events that occur faster than the video frame-rate, and are therefore invisible or seen incorrectly in the video sequence. Previous methods for Space-Time SR combined information from multiple video recordings of the same dynamic scene. In this talk we show how this can be done from a single video recording. Our approach is based on the observation that small space-time patches ("ST patches", e.g., 5x5x3) of a single ‘natural video’, recur many times inside the same video sequence at multiple spatio-temporal scales. We statistically explore the degree of these ST-patch recurrences inside ‘natural videos’, and show that this is a very strong statistical phenomenon. Space-time SR is obtained by combining information from multiple ST-patches at sub-frame accuracy. We show how finding similar ST-patches can be done both efficiently (with a randomized-based search in space-time), and at sub-frame accuracy (despite severe motion aliasing). Our approach is particularly useful for temporal SR, resolving both severe motion aliasing and severe motion blur in complex ‘natural videos’.


Fusing visual and range imaging for object class recognition

Aharon Bar Hillel (General Motors ATCI), Dmitri Hanukaev (Hebrew Univ) and Dan Levi (General Motors ATCI)

Category level object recognition has improved significantly in the last few years, but machine performance remains unsatisfactory for most real-world applications. We believe this gap may be bridged using additional depth information obtained from range imaging, which was recently used to overcome similar problems in body shape interpretation. This paper presents a system which successfully fuses visual and range imaging for object category classification. We explore fusion at multiple levels: using depth as an attention mechanism, high-level fusion at the classifier level and low-level fusion of local descriptors, and show that each mechanism makes a unique contribution to performance. For low-level fusion we present a new algorithm for training of local descriptors, the Generalized Image Feature Transform (GIFT), which generalizes current representations such as SIFT and spatial pyramids and allows for the creation of new representations based on multiple channels of information. We show that our system improves state-of-the-art visual-only and depth-only methods on a diverse dataset of every-day objects.


Decision Tree Fields

Shai Bagon (Weizmann) Sebastian Nowozin (MSRC), Carsten Rother (MSRC), Toby Sharp (MSRC), Pushmeet Kohli (MSRC) and Bangpeng (Stanford)


This talk introduces a new formulation for discrete image labeling tasks, the Decision Tree Field (DTF), that combines and generalizes random forests and conditional random fields (CRF) which have been widely used in computer vision. In a typical CRF model the unary potentials are derived from sophisticated random forest or boosting based classifiers, however, the pairwise potentials are assumed to (1) have a simple parametric form with a pre-specified and fixed dependence on the image data, and (2) to be defined on the basis of a small and fixed neighborhood. In contrast, in DTF, local interactions between multiple variables are determined by means of decision trees evaluated on the image data, allowing the interactions to be adapted to the image content. This results in powerful graphical models which are able to represent complex label structure. Our key technical contribution is to show that the DTF model can be trained efficiently and jointly using a convex approximate likelihood function, enabling us to learn over a million free model parameters. We show experimentally that for applications which have a rich and complex label structure, our model achieves excellent results.