Israel Vision Day 2008

2008 Israel Computer Vision Day
Sunday, December 28, 2008

The Efi Arazi School of Computer Science

I.D.C. Herzliya

Previous Vision Days Web Page: 2003, 2004, 2005, 2006, 2007.

Vision Day Schedule*

Time	Speaker and Collaborators	Affiliation	Title
09:00-09:30	Gathering
09:30-10:00	Michael Black Alexandru Balan	Brown University	Predicting Human Body Shape Under Clothing
10:00-10:20	Lior Wolf, Yoni Donner	TAU	DNA-based visual identification
10:20-10:40	Tal Hassner, Lior Wolf, Yaniv Taigman	Open University TAU Face.com	Descriptor Based Methods in the Wild
10:40-11:00	Arik Shamir, Michael Rubinstein, Shai Avidan	IDC	Improved Seam Carving for Video Retargeting
11:00-11:30	Coffee Break
11:30-11:50	Gerard Medioni	USC	Facial Gesture Analysis in an Interactive Environment
11:50-12:10	Ran Eshel, Yael Moses	IDC	Homography Based Multiple Camera Detection and Tracking of People in a Dense Crowd
12:10-12:30	Shai Bagon, Oren Boiman, Michal Irani	Weizmann	What is a Good Image Segment? A Unified Approach to Segment Extraction
12:30-12:50	Margarita Osadchy	Haifa	Loose Shape Model for Discriminative Learning of Object Categories
12:50-14:00	Lunch break
14:00-14:50	Igal Dvir, Gideon Stein, Renato Keshet, Shai Dekel,	Nice Vision MobilEye HP Labs GE Healthcare	Industry Session Organizer: Chen Sagiv
14:50-15:10	Raanan Fattal	HUJI	Single Image Dehazing
15:10-15:30	Amnon Shashua, Tamir Hazan	HUJI	Fenchel Duality with Applications to Inference in Graphical Models s
15:30-16:00	Coffee Break
16:00-16:20	Michael Lindenbaum, Roman Sandler	Technion	Unsupervised estimation of segmentation quality using nonnegative factorization
16:20-16:40	Tomer Avidor, Moty Golan	RAFAEL	Removal of Turbulence Disturbance in a Movie Scene for Static and Moving Camera, Enabling Vision Applications
16:40-17:00	Shai Avidan, Taeg Sang Cho, Moshe Butman, Bill Freeman	Adobe	The patch transform and its applications to image editing
17:00-17:20	Oren Boiman, Eli Shechtman, Michal Irani	Weizmann	In Defense of Nearest-Neighbor Based Image Classifcation

* Please note: Presentations will be given in English

General: This is the sixth Israel Computer Vision Day. It will be hosted at IDC.

For more details, requests to be added to the mailing list etc, please contact:

hagit@cs.haifa.ac.il toky@idc.ac.il yael@idc.ac.il

Location and Directions: The Vision Day will take place at the Interdisciplinary Center (IDC), Herzliya, in the Ivtzer Auditorium. For driving instructions see map.

A convenient option is to arrive by train, see time schedule here. Get off at the Herzliya Station, and order a taxi ride by phone. There are two taxi stations that provide this service: Moniyot Av-Yam (09 9501263 or 09 9563111), and Moniyot Pituach (09 9582288 or 09 9588001).

Abstracts

Predicting Human Body Shape Under Clothing

Michael J. Black – Brown University

We propose a method to estimate the detailed 3D shape of a person from images of that person wearing clothing. The approach exploits a model of human body shapes that is learned from a database of over 2000 range scans. We show that the parameters of this shape model can be recovered independently of body pose. We further propose a generalization of the visual hull to account for the fact that observed silhouettes of clothed people do not provide a tight bound on the true 3D shape. With clothed subjects, different poses provide different constraints on the possible underlying 3D body shape. We consequently combine constraints across pose to more accurately estimate 3D body shape in the presence of occluding clothing. Finally we use the recovered 3D shape to estimate the gender of subjects and then employ gender-specific body models to refine our shape estimates.

Results on a novel database of thousands of images of clothed and ``naked'' subjects, as well as sequences from the HumanEva dataset, suggest the method may be accurate enough for biometric shape analysis in video.

This is joint work with Alexandru Balan.

DNA-based visual identification

Lior Wolf, Yoni Donner - TAU

The appearance of an animal species is a complex phenotype partially encoded in its genome. Previous work on linking genotype to visually-identifiable phenotypes has focused on univariate or low-dimensional traits such as eye color, principal variations in skeletal structure and height, as well as on the discovery of specific genes that contribute to these traits. Here, we go beyond single traits to the direct genotype-phenotype analysis of photographs and illustrations of animal species. We address the problems of (1) identification and (2) synthesis of images of previously unseen animals using genetic data. We demonstrate that both these problems are feasible: in a multiple choice test, our algorithm identifies with high accuracy the correct image of previously unseen dogs, fish, birds and ants, based only on either a short gene sequence or microsatellite data; additionally, using the same sequence we are able to approximate images of unseen fish contours. Our predictions are based on correlative phenotype-genotype links rather than on specific gene targeting, and they employ microsatellite data and the cytochrome c oxidase I mitochondrial gene, both of which are assumed to have little causal influence on appearance. Such correlative links enable the use of high-dimensional phenotypes in genetic research, and applications may range from forensics to personalized medical treatment.

Descriptor Based Methods in the Wild

Tal Hassner, Lior Wolf, Yaniv Taigman – Open University

Recent methods for learning the similarities between images have presented impressive results on the problem of pair-matching (same-not-same classification) of face images. In this talk we present pair-matching results comparing the performance of image descriptor based methods to the state of the art in same/not-same classification, obtained on the Labeled Faces in the Wild (LFW) image set. We propose various contributions, spanning several aspects of automatic face analysis: (i) We present a family of novel image descriptors which we call the "patch-LBP" descriptors. (ii) We show that descriptor based methods can obtain performance which is comparable to existing state of the art methods on both the same-not-same and multi-person recognition problems. (iii) We present the novel "One-Shot" vector similarity measure which we have used to improve our same-not-same results well above leading methods.

Improved Seam Carving for Video Retargeting

Arik Shamir, Michael Rubinstein, Shai Avidan - IDC

Video, like images, should support content aware resizing. We present video retargeting using an improved seam carving operator. Instead of removing 1D seams from 2D images we remove 2D seam manifolds from 3D space-time volumes. To achieve this we replace the dynamic programming method of seam carving with graph cuts that are suitable for 3D volumes. In the new formulation, a seam is given by a minimal cut in the graph and we show how to construct a graph such that the resulting cut is a valid seam. That is, the cut is monotonic and connected. In addition, we present a novel energy criterion that improves the visual quality of the retargeted images and videos. The original seam carving operator is focused on removing seams with the least amount of energy, ignoring energy that is introduced into the images and video by applying the operator. To counter this, the new criterion is looking forward in time - removing seams that introduce the least amount of energy into the retargeted result. We show how to encode the improved criterion into graph cuts (for images and video) as well as dynamic programming (for images). We apply our technique to images and videos and present results of various applications.

Facial Gesture Analysis in an Interactive Environment

Gerard Medioni - USC

Facial gesture analysis is an important problem in computer vision. Facial gestures carry critical information in nonverbal communication. The difficulty of automatic facial gesture recognition lies in the complexity of face motions. These motions can be categorized into a global, rigid head motion, and local, nonrigid facial deformations. These two components are coupled in an observed facial motion.

We present our recent research of this topic, which includes tracking and modeling these two motions for gesture understanding. It can be divided into three parts: 3D head pose estimation, modeling and tracking nonrigid facial deformations, and expression recognition. We have developed a novel hybrid 3D head tracking algorithm to differentiate these two motions. The hybrid tracker integrates both intensity and feature correspondence for robust real-time head pose estimation. Nonrigid motions are analyzed in 3D by manifold learning techniques. We decompose nonrigid facial deformations on a basis of 1D manifolds. Each 1D manifold is learned offline from sequences of labeled basic expressions, such as smile, surprise, etc. Any expression is then a linear combination of values along these axes, with the coefficient representing the level of activation. Manifold learning is accomplished using N-D Tensor Voting. The output of our system is a rich representation of the face, including the 3D pose, 3D shape, expression label with probability, and the activation level.

Homography Based Multiple Camera Detection and Tracking of People in a Dense Crowd

Ran Eshel, Yael Moses - IDC

Tracking people in a dense crowd is a challenging problem for a single camera tracker due to occlusions and extensive motion that make human segmentation difficult. In this work we suggest a method for simultaneously tracking all the people in a densely crowded scene using a set of cameras with overlapping fields of view. To overcome occlusions, the cameras are placed at a high elevation and only people's heads are tracked. Head detection is still difficult since each foreground region may consist of multiple subjects. By combining data from several views, height information is extracted and used for head segmentation. The head tops, which are regarded as 2D patches at various heights, are detected by applying intensity correlation to aligned frames from the different cameras. The detected head tops are then tracked using common assumptions on motion direction and velocity. The method was tested on sequences in indoor and outdoor environments under challenging illumination conditions. It was successful in tracking up to 21 people walking in a small area (2.5 people per m^2), in spite of severe and persistent occlusions.

What is a Good Image Segment? A Unified Approach to Segment Extraction

Shai Bagon, Oren Boiman, Michal Irani - Weizmann

There is a huge diversity of definitions of "visually meaningful" image segments, ranging from simple uniformly colored segments, textured segments, through symmetric patterns, and up to complex semantically meaningful objects. This diversity has led to a wide range of different approaches for image segmentation. In this paper we present a single unified framework for addressing this problem - "Segmentation by Composition". We define a good image segment as one which can be easily composed using its own pieces, but is difficult to compose using pieces from other parts of the image. This non-parametric approach captures a large diversity of segment types, yet requires no pre-definition or modeling of segment types, nor prior training. Based on this definition, we develop a segment extraction algorithm - i.e., given a single point-of-interest, provide the "best" image segment containing that point. This induces a figure-ground image segmentation, which applies to a range of different segmentation tasks: single image segmentation, simultaneous co-segmentation of several images, and class-based segmentations.

Loose Shape Model for Discriminative Learning of Object Categories

Margarita Osadchy – Haifa

We consider the problem of visual categorization with minimal supervision during training. We propose a part-based model that loosely captures structural information. We represent images as a collection of parts characterized by an appearance codeword from a visual vocabulary and by a neighborhood context, organized in an ordered set of bag-of-features representations. These bags are computed in local overlapping areas around the part. A semantic distance between images is obtained by matching parts associated with the same codeword using their context distributions. The classification is done using SVM with the kernel obtained from the proposed dissimilarity measure. The experiments show that our method outperforms all the classification methods from the PASCAL challenge on half of the VOC2006 categories and has the best average EER.

Industry Session

Organized by Chen Sagiv

1. Igal Dvir, CTO, Nice Vision

TBA

2. Challenges and Solutions for Bundling Multiple DAS Applications on a Single Hardware platform - Gideon Stein, Chief Scientist, MobilEye

Joint work with: Itay Gat, Gaby Hayon

This talk addresses the key challenges in bundling multiple camera based Driver Assistance Systems onto the same hardware platform. In particular, we discuss combinations of lane departure warning (LDW), Automatic High-beam Control (AHC), traffic sign recognition (TSR) and forward collision warning (FCW). The advantages of bundling are in cost reduction and that it allows more functions to be added to the car without increasing the footprint on the car windshield.

The challenge in bundling is that the different applications traditionally have different requirements from the image sensor and optics. We show how algorithms can be modified so that they can all work together by relying less the particular physics of the camera and making more use of advanced pattern recognition techniques.

This shift in algorithm paradigm means an increase in computational requirement. The introduction of new automotive qualified, high performance vision processors makes these new algorithms both viable and affordable paving the way to bundles of application running on the same platform.

3. Automatic Image Enhancement - Renato Keshet, Project Leader, HP Labs

This talk presents HIPIE (HP Indigo Photo Image Enhancement), a robust system that analyzes and automatically enhances images as part of a commercial photobook pipeline. The system can sharpen, denoise, enhance global and local contrast, brighten, enhance face contrast, boost color and improve resolution, as needed on a per-image base (as a result of a series of image analysis modules), all in a couple of seconds per photo. The system, developed mostly at HP Labs in Haifa, is implemented in many print shops in Israel and around the world, and works 24/7.

In this presentation, we briefly describe the technology behind the system, with highlight on the analysis modules and unified enhancement scheme. We also mention our view of some of the research challenges for this and other image-related areas in the future.

4. Shai Dekel, Chief Scientist, Imaging Solutions, GE Healthcare

Pathology is the study of diseases by examining body tissues, typically under magnification. Today, pathologists use a microscope to look at slides of tissue samples that have been prepared with stains by a specialist called a histotechnician. This process has not changed much in over 100 years. However, there is an emerging movement toward digital pathology called "whole slide imaging" in which entire slides are digitally scanned so that they can be viewed on a computer. Note that one uncompressed digital slide scanned at high resolution can reach a size of 30GB and a typical patient case can contain 5-30 such slides. The digital representation of the slides motivates the development of new image analysis algorithms that can potentially assist the pathologists in their review process.

Single Image Dehazing

Raanan Fattal – HUJI

In this talk we present a new method for estimating the optical transmission in hazy scenes given a single input image. Based on this estimation, the scattered light is eliminated to increase scene visibility and recover haze-free scene contrasts. In this new approach we formulate a refined image formation model that accounts for surface shading in addition to the transmission function. This allows us to resolve ambiguities in the data by searching for a solution in which the resulting shading and transmission functions are locally statistically uncorrelated. A similar principle is used to estimate the color of the haze. Results demonstrate the new method abilities to remove the haze layer as well as provide a reliable transmission estimate which can be used for additional applications such as image refocusing and novel view synthesis.

Fenchel Duality with Applications to Inference in Graphical Models

Amnon Shashua, Tamir Hazan - HUJI

Quite a number of problems involving inference from data, whether visual data or otherwise, fall into the category of optimization. I will describe a general scheme for message passing update rules based on the framework of Fenchel duality.

Using the framework we derive all past inference algorithms like the Belief Propagation sum-product and max-product as well as new convergent algorithms for maximum-a-posteriori (MAP) and marginal estimation using "convex free energies".

Unsupervised estimation of segmentation quality using nonnegative factorization

Roman Sandler, Michael Lindenbaum - Technion

We propose an unsupervised method for evaluating image segmentation. Common methods are typically based on evaluating smoothness within segments and contrast between them, and the measure they provide is not explicitly related to segmentation errors. The proposed approach differs from these methods on several important points and has several advantages over them. First, it provides a meaningful, quantitative assessment of segmentation quality, in precision/recall terms, which were applicable so far only for supervised evaluation. Second, it builds on a new image model, which characterizes the segments as a mixture of basic feature distributions. The precision/recall estimates are then obtained by a nonnegative matrix factorization (NMF) process. A third important advantage is that the estimates, which are based on intrinsic properties of the specific image being evaluated and not on a comparison to typical images (learning), are relatively robust to context factors such as image quality or the presence of texture.

Experimental results demonstrate the accuracy of the precision/recall estimates in comparison to ground truth based on human judgment. Moreover, it is shown that tuning a segmentation algorithm using the unsupervised measure improves the algorithm’s quality (as measured by a supervised method).

Removal of Turbulence Disturbance in a Movie Scene for Static and Moving Camera, Enabling Vision Applications

Tomer Avidor, Moty Golan - RAFAEL

The common method of reconstructing a turbulence scene is through the creation of an artificial reference image. The reference image is usually obtained by averaging video through time. Using optical flow from that reference image to input images would give rise to such applications as: super-resolution, tracking and so forth. However this technique suffers from several drawbacks: the resulting artificial reference frame is blurred, so calculated optical-flow fields are not precise and inhibit the results of applications based on these fields, and there is no accounting for camera motion or for motion within the field. We show a mathematical framework to reconstruct the movie scene as would have been seen without turbulence interference, yielding an observable live video output. We then use both frames and optical flow fields to get the aforementioned applications (tracking, super-resolution, mosaics) while dealing with camera motion, and draw guidelines to deal with in-scene motion inherently.

The patch transform and its applications to image editing

Shai Avidan, Taeg Sang Cho, Moshe Butman, Bill Freeman – Adobe

We introduce the patch transform, where an image is broken into non-overlapping patches, and modifications or constraints are applied in the "patch domain". A modified image is then reconstructed from the patches, subject to those constraints. When no constraints are given, the reconstruction problem reduces to solving a jigsaw puzzle. Constraints the user may specify include the spatial locations of patches, the size of the output image, or the pool of patches from which an image is reconstructed. We define terms in a Markov network to specify a good image reconstruction from patches: neighboring patches must fit to form a plausible image, and each patch should be used only once. We find an approximate solution to the Markov network using loopy belief propagation, introducing an approximation to handle the combinatorially difficult patch exclusion constraint. The resulting image reconstructions show the original image, modified to respect the user's changes. We apply the patch transform to various image editing tasks and show that the algorithm performs well on real world images.

In Defense of Nearest-Neighbor Based Image Classification

Oren Boiman, Eli Shechtman, Michal Irani – Weizmann

State-of-the-art image classification methods require an intensive learning/training stage (using SVM, Boosting, etc.) In contrast, non-parametric Nearest-Neighbor (NN) based image classifiers require no training time and have other favorable properties. However, the large performance gap between these two families of approaches rendered NN-based image classifiers useless. We claim that the effectiveness of non-parametric NN-based image classification has been considerably under-valued. We argue that two practices commonly used in image classification methods, have led to the inferior performance of NN-based image classifiers:

(i) Quantization of local image descriptors (used to generate "bags-of-words", codebooks).

(ii) Computation of 'Image-to-Image' distance, instead of 'Image-to-Class' distance.

We propose a trivial NN-based classifier – NBNN, (Naive-Bayes Nearest-Neighbor), which employs NN-distances in the space of the local image descriptors (and not in the space of images). NBNN computes direct 'Image-to-Class' distances without descriptor quantization. We further show that under the Naive-Bayes assumption, the theoretically optimal image classifier can be accurately approximated by NBNN. Although NBNN is extremely simple, efficient, and requires no learning/training phase, its performance ranks among the top leading learning-based image classifiers. Empirical comparisons are shown on several challenging databases (Caltech-101,Caltech-256 and Graz-01).