Abstracts
|
|
Predicting
Human Body Shape Under Clothing
|
Michael J. Black – Brown University
|
We propose a method to estimate the detailed 3D shape of
a person from images of that person wearing clothing. The approach exploits a model of human body
shapes that is learned from a database of over 2000 range scans. We show that the parameters of this shape
model can be recovered independently of body pose. We further propose a generalization of the
visual hull to account for the fact that observed silhouettes of clothed
people do not provide a tight bound on the true 3D shape. With clothed subjects, different poses
provide different constraints on the possible underlying 3D body shape. We consequently combine constraints across
pose to more accurately estimate 3D body shape in the presence of occluding
clothing. Finally we use the recovered
3D shape to estimate the gender of subjects and then employ gender-specific
body models to refine our shape estimates.
Results on a novel database of thousands of images of
clothed and ``naked'' subjects, as well as sequences from the HumanEva dataset, suggest the method may be accurate
enough for biometric shape analysis in video.
This is joint work with Alexandru
Balan.
|
DNA-based
visual identification
|
Lior Wolf, Yoni Donner - TAU
|
The appearance of an animal species is a complex
phenotype partially encoded in its genome. Previous work on linking genotype
to visually-identifiable phenotypes has focused on univariate
or low-dimensional traits such as eye color, principal variations in skeletal
structure and height, as well as on the discovery of specific genes that
contribute to these traits. Here, we go beyond single traits to the direct
genotype-phenotype analysis of photographs and illustrations of animal
species. We address the problems of (1) identification and (2) synthesis of
images of previously unseen animals using genetic data. We demonstrate that
both these problems are feasible: in a multiple choice test, our algorithm
identifies with high accuracy the correct image of previously unseen dogs,
fish, birds and ants, based only on either a short gene sequence or
microsatellite data; additionally, using the same sequence we are able to
approximate images of unseen fish contours. Our predictions are based on
correlative phenotype-genotype links rather than on specific gene targeting,
and they employ microsatellite data and the cytochrome
c oxidase I mitochondrial gene, both of which are
assumed to have little causal influence on appearance. Such correlative links
enable the use of high-dimensional phenotypes in genetic research, and applications
may range from forensics to personalized medical treatment.
|
Descriptor
Based Methods in the Wild
|
Tal Hassner, Lior
Wolf, Yaniv Taigman –
Open University
|
Recent methods for
learning the similarities between images have presented impressive results on
the problem of pair-matching (same-not-same classification) of face images.
In this talk we present pair-matching results comparing the performance of
image descriptor based methods to the state of the art in same/not-same
classification, obtained on the Labeled Faces in the Wild (LFW) image set. We
propose various contributions, spanning several aspects of automatic face
analysis: (i) We present a family of novel image
descriptors which we call the "patch-LBP" descriptors. (ii) We show
that descriptor based methods can obtain performance which is comparable to
existing state of the art methods on both the same-not-same and multi-person
recognition problems. (iii) We present the novel "One-Shot" vector
similarity measure which we have used to improve our same-not-same results
well above leading methods.
|
Improved
Seam Carving for Video Retargeting
|
Arik Shamir, Michael Rubinstein, Shai Avidan - IDC
|
Video, like images, should support content aware
resizing. We present video retargeting using an improved seam carving
operator. Instead of removing 1D seams from 2D images we remove 2D seam
manifolds from 3D space-time volumes. To achieve this we replace the dynamic
programming method of seam carving with graph cuts that are suitable for 3D
volumes. In the new formulation, a seam is given by a minimal cut in the
graph and we show how to construct a graph such that the resulting cut is a
valid seam. That is, the cut is monotonic and connected. In addition, we
present a novel energy criterion that improves the visual quality of the
retargeted images and videos. The original seam carving operator is focused
on removing seams with the least amount of energy, ignoring energy that is
introduced into the images and video by applying the operator. To counter
this, the new criterion is looking forward in time - removing seams that
introduce the least amount of energy into the retargeted result. We show how
to encode the improved criterion into graph cuts (for images and video) as
well as dynamic programming (for images). We apply our technique to images
and videos and present results of various applications.
|
Facial
Gesture Analysis in an Interactive Environment
|
Gerard Medioni - USC
|
Facial gesture analysis is an important problem in
computer vision. Facial gestures carry critical information in nonverbal
communication. The difficulty of automatic facial gesture recognition lies in
the complexity of face motions. These motions can be categorized into a
global, rigid head motion, and local, nonrigid
facial deformations. These two components are coupled in an observed facial
motion.
We present our recent research of this topic, which
includes tracking and modeling these two motions for gesture understanding.
It can be divided into three parts: 3D head pose estimation, modeling and
tracking nonrigid facial deformations, and
expression recognition. We have developed a novel hybrid 3D head tracking
algorithm to differentiate these two motions. The hybrid tracker integrates
both intensity and feature correspondence for robust real-time head pose
estimation. Nonrigid motions are analyzed in 3D by
manifold learning techniques. We decompose nonrigid
facial deformations on a basis of 1D manifolds. Each 1D manifold is learned
offline from sequences of labeled basic expressions, such as smile, surprise,
etc. Any expression is then a linear combination of values along these axes,
with the coefficient representing the level of activation. Manifold learning
is accomplished using N-D Tensor Voting. The output of our system is a rich
representation of the face, including the 3D pose, 3D shape, expression label
with probability, and the activation level.
|
Homography
Based Multiple Camera Detection and Tracking of People in a Dense Crowd
|
Ran Eshel, Yael Moses - IDC
|
Tracking people in a dense crowd is a challenging problem
for a single camera tracker due to occlusions and extensive motion that make
human segmentation difficult. In this work we suggest a method for
simultaneously tracking all the people in a densely crowded scene using a set
of cameras with overlapping fields of view. To overcome occlusions, the
cameras are placed at a high elevation and only people's heads are tracked.
Head detection is still difficult since each foreground region may consist of
multiple subjects. By combining data from several views, height information
is extracted and used for head segmentation. The head tops, which are regarded
as 2D patches at various heights, are detected by applying intensity
correlation to aligned frames from the different cameras. The detected head
tops are then tracked using common assumptions on motion direction and
velocity. The method was tested on sequences in indoor and outdoor
environments under challenging illumination conditions. It was successful in
tracking up to 21 people walking in a small area (2.5 people per m^2), in
spite of severe and persistent occlusions.
|
What
is a Good Image Segment? A Unified
Approach to Segment Extraction
|
Shai Bagon, Oren Boiman,
Michal Irani - Weizmann
|
There is a huge diversity of definitions of
"visually meaningful" image segments, ranging from simple uniformly
colored segments, textured segments, through symmetric patterns, and up to
complex semantically meaningful objects. This diversity has led to a wide
range of different approaches for image segmentation. In this paper we
present a single unified framework for addressing this problem - "Segmentation by Composition". We
define a good image segment as one which can be easily composed using its own
pieces, but is difficult to compose using pieces from other parts of the
image. This non-parametric approach captures a large diversity of segment
types, yet requires no pre-definition or modeling of segment types, nor prior training. Based on this
definition, we develop a segment extraction algorithm - i.e., given a single
point-of-interest, provide the "best" image segment containing that
point. This induces a figure-ground image segmentation, which applies to a
range of different segmentation tasks: single image segmentation,
simultaneous co-segmentation of several images, and class-based
segmentations.
|
Loose
Shape Model for Discriminative Learning of Object Categories
|
Margarita Osadchy – Haifa
|
We consider the problem of visual
categorization with minimal supervision during training. We propose a part-based
model that loosely captures structural information. We represent images as a
collection of parts characterized by an appearance codeword from a visual
vocabulary and by a neighborhood context, organized in an ordered set of
bag-of-features representations. These bags are computed in local overlapping
areas around the part. A semantic distance between images is obtained by
matching parts associated with the same codeword using their context
distributions. The classification is
done using SVM with the kernel obtained from the proposed dissimilarity
measure. The experiments show that our method outperforms all the
classification methods from the PASCAL challenge on half of the VOC2006
categories and has the best average EER.
|
Industry
Session
|
Organized by Chen Sagiv
|
1. Igal Dvir, CTO, Nice Vision
TBA
2. Challenges and Solutions for Bundling
Multiple DAS Applications on a Single Hardware platform - Gideon Stein, Chief
Scientist, MobilEye
Joint work with: Itay Gat, Gaby Hayon
This talk addresses the key challenges in bundling multiple camera
based Driver Assistance Systems onto the same hardware platform. In
particular, we discuss combinations of lane departure warning (LDW),
Automatic High-beam Control (AHC), traffic sign recognition (TSR) and forward
collision warning (FCW). The advantages of bundling are in cost reduction and
that it allows more functions to be added to the car without increasing the
footprint on the car windshield.
The challenge in bundling is that the different applications
traditionally have different requirements from the image sensor and optics.
We show how algorithms can be modified so that they can all work together by
relying less the particular physics of the camera and making more use of
advanced pattern recognition techniques.
This shift in algorithm paradigm means an increase in computational
requirement. The introduction of new automotive qualified, high performance
vision processors makes these new algorithms both viable and affordable
paving the way to bundles of application running on the same platform.
3.
Automatic Image Enhancement - Renato Keshet, Project Leader, HP Labs
This talk presents HIPIE (HP Indigo Photo Image Enhancement), a robust system
that analyzes and automatically enhances images as part of a commercial photobook pipeline. The system can sharpen, denoise, enhance global and local contrast, brighten,
enhance face contrast, boost color and improve resolution, as needed on a
per-image base (as a result of a series of image analysis modules), all in a
couple of seconds per photo. The system, developed mostly at HP Labs in Haifa, is implemented in many print shops in Israel and
around the world, and works 24/7.
In this presentation, we briefly describe
the technology behind the system, with highlight on the analysis modules and
unified enhancement scheme. We also mention our view of some of the research
challenges for this and other image-related areas in the future.
4. Shai Dekel, Chief Scientist, Imaging Solutions, GE Healthcare
Pathology is the study of diseases by
examining body tissues, typically under magnification. Today, pathologists
use a microscope to look at slides of tissue samples that have been prepared
with stains by a specialist called a histotechnician.
This process has not changed much in over 100 years. However, there is an
emerging movement toward digital pathology called "whole slide
imaging" in which entire slides are digitally scanned so that they can
be viewed on a computer. Note that one uncompressed digital slide scanned at
high resolution can reach a size of 30GB and a typical patient case can
contain 5-30 such slides. The digital representation of the slides motivates
the development of new image analysis algorithms that can potentially assist
the pathologists in their review process.
|
Single
Image Dehazing
|
Raanan Fattal –
HUJI
|
In this talk we present a new method for
estimating the optical transmission in hazy scenes given a single input
image. Based on this estimation, the scattered light is eliminated to
increase scene visibility and recover haze-free scene contrasts. In this new
approach we formulate a refined image formation model that accounts for
surface shading in addition to the transmission function. This allows us to
resolve ambiguities in the data by searching for a solution in which the
resulting shading and transmission functions are locally statistically
uncorrelated. A similar principle is used to estimate the color of the haze.
Results demonstrate the new method abilities to remove the haze layer as well
as provide a reliable transmission estimate which can be used for additional
applications such as image refocusing and novel view synthesis.
|
Fenchel
Duality with Applications to Inference in Graphical Models
|
Amnon Shashua,
Tamir Hazan - HUJI
|
Quite a number of problems involving inference from data,
whether visual data or otherwise, fall into the category of optimization. I
will describe a general scheme for message passing update rules based on the
framework of Fenchel duality.
Using the framework we derive all past inference
algorithms like the Belief Propagation sum-product and max-product as well as
new convergent algorithms for maximum-a-posteriori (MAP) and marginal estimation
using "convex free energies".
|
Unsupervised
estimation of segmentation quality using nonnegative factorization
|
Roman Sandler, Michael
Lindenbaum - Technion
|
We propose an unsupervised method for evaluating image
segmentation. Common methods are typically based on evaluating smoothness
within segments and contrast between them, and the measure they provide is
not explicitly related to segmentation errors. The proposed approach differs
from these methods on several important points and has several advantages
over them. First, it provides a meaningful, quantitative assessment of
segmentation quality, in precision/recall terms, which were applicable so far
only for supervised evaluation. Second, it builds on a new image model, which
characterizes the segments as a mixture of basic feature distributions. The
precision/recall estimates are then obtained by a nonnegative matrix
factorization (NMF) process. A third important advantage is that the
estimates, which are based on intrinsic properties of the specific image
being evaluated and not on a comparison to typical images (learning), are
relatively robust to context factors such as image quality or the presence of
texture.
Experimental results demonstrate the accuracy of the
precision/recall estimates in comparison to ground truth based on human
judgment. Moreover, it is shown that tuning a segmentation algorithm using
the unsupervised measure improves the algorithm’s quality (as measured by a
supervised method).
|
Removal
of Turbulence Disturbance in a Movie Scene for Static and Moving Camera,
Enabling Vision Applications
|
Tomer Avidor, Moty Golan -
RAFAEL
|
The common method of reconstructing a turbulence scene is
through the creation of an artificial reference image. The reference image is
usually obtained by averaging video through time. Using optical flow from
that reference image to input images would give rise to such applications as:
super-resolution, tracking and so forth.
However this technique suffers from several drawbacks: the resulting
artificial reference frame is blurred, so calculated optical-flow fields are
not precise and inhibit the results of applications based on these fields,
and there is no accounting for camera motion or for motion within the field.
We show a mathematical framework to reconstruct the movie scene as would have
been seen without turbulence interference, yielding an observable live video
output. We then use both frames and optical flow fields to get the
aforementioned applications (tracking, super-resolution, mosaics) while
dealing with camera motion, and draw guidelines to deal with in-scene motion
inherently.
|
The
patch transform and its applications to image editing
|
Shai Avidan, Taeg Sang Cho, Moshe Butman,
Bill Freeman – Adobe
|
We introduce the patch transform, where an image is
broken into non-overlapping patches, and modifications or constraints are
applied in the "patch domain". A modified image is then reconstructed
from the patches, subject to those constraints. When no constraints are
given, the reconstruction problem reduces to solving a jigsaw puzzle.
Constraints the user may specify include the spatial locations of patches,
the size of the output image, or the pool of patches from which an image is
reconstructed. We define terms in a Markov network to specify a good image
reconstruction from patches: neighboring patches must fit to form a plausible
image, and each patch should be used only once. We find an approximate
solution to the Markov network using loopy belief propagation, introducing an
approximation to handle the combinatorially
difficult patch exclusion constraint. The resulting image reconstructions
show the original image, modified to respect the user's changes. We apply the
patch transform to various image editing tasks and show that the algorithm
performs well on real world images.
|
In
Defense of Nearest-Neighbor Based Image Classification
|
Oren Boiman, Eli Shechtman, Michal Irani – Weizmann
|
State-of-the-art image classification methods require an
intensive learning/training stage (using SVM, Boosting, etc.) In contrast, non-parametric
Nearest-Neighbor (NN) based image classifiers require no training time and
have other favorable properties. However, the large performance gap between
these two families of approaches rendered NN-based image classifiers
useless. We claim that the
effectiveness of non-parametric NN-based image classification has been
considerably under-valued. We argue that two practices commonly used in image
classification methods, have led to the inferior performance of NN-based
image classifiers:
(i) Quantization of local image
descriptors (used to generate "bags-of-words", codebooks).
(ii) Computation of 'Image-to-Image' distance, instead of
'Image-to-Class' distance.
We propose a trivial NN-based classifier – NBNN, (Naive-Bayes Nearest-Neighbor), which employs NN-distances in
the space of the local image descriptors (and not in the space of images).
NBNN computes direct 'Image-to-Class' distances without descriptor
quantization. We further show that under the Naive-Bayes
assumption, the theoretically optimal image classifier can be accurately
approximated by NBNN. Although NBNN is
extremely simple, efficient, and requires no learning/training phase, its
performance ranks among the top leading learning-based image classifiers.
Empirical comparisons are shown on several challenging databases
(Caltech-101,Caltech-256 and Graz-01).
|