|   Abstracts     | 
 
  |   | 
 
  | Predicting
  Human Body Shape Under Clothing | 
 
  | Michael J. Black – Brown University | 
 
  |   We propose a method to estimate the detailed 3D shape of
  a person from images of that person wearing clothing.  The approach exploits a model of human body
  shapes that is learned from a database of over 2000 range scans.   We show that the parameters of this shape
  model can be recovered independently of body pose.  We further propose a generalization of the
  visual hull to account for the fact that observed silhouettes of clothed
  people do not provide a tight bound on the true 3D shape.  With clothed subjects, different poses
  provide different constraints on the possible underlying 3D body shape.  We consequently combine constraints across
  pose to more accurately estimate 3D body shape in the presence of occluding
  clothing.  Finally we use the recovered
  3D shape to estimate the gender of subjects and then employ gender-specific
  body models to refine our shape estimates.  Results on a novel database of thousands of images of
  clothed and ``naked'' subjects, as well as sequences from the HumanEva dataset, suggest the method may be accurate
  enough for biometric shape analysis in video. This is joint work with Alexandru
  Balan.   | 
 
  | DNA-based
  visual identification  | 
 
  | Lior Wolf, Yoni Donner - TAU  | 
 
  |   The appearance of an animal species is a complex
  phenotype partially encoded in its genome. Previous work on linking genotype
  to visually-identifiable phenotypes has focused on univariate
  or low-dimensional traits such as eye color, principal variations in skeletal
  structure and height, as well as on the discovery of specific genes that
  contribute to these traits. Here, we go beyond single traits to the direct
  genotype-phenotype analysis of photographs and illustrations of animal
  species. We address the problems of (1) identification and (2) synthesis of
  images of previously unseen animals using genetic data. We demonstrate that
  both these problems are feasible: in a multiple choice test, our algorithm
  identifies with high accuracy the correct image of previously unseen dogs,
  fish, birds and ants, based only on either a short gene sequence or
  microsatellite data; additionally, using the same sequence we are able to
  approximate images of unseen fish contours. Our predictions are based on
  correlative phenotype-genotype links rather than on specific gene targeting,
  and they employ microsatellite data and the cytochrome
  c oxidase I mitochondrial gene, both of which are
  assumed to have little causal influence on appearance. Such correlative links
  enable the use of high-dimensional phenotypes in genetic research, and applications
  may range from forensics to personalized medical treatment.   | 
 
  | Descriptor
  Based Methods in the Wild | 
 
  | Tal Hassner, Lior
  Wolf, Yaniv Taigman –
  Open University | 
 
  |   Recent methods for
  learning the similarities between images have presented impressive results on
  the problem of pair-matching (same-not-same classification) of face images.
  In this talk we present pair-matching results comparing the performance of
  image descriptor based methods to the state of the art in same/not-same
  classification, obtained on the Labeled Faces in the Wild (LFW) image set. We
  propose various contributions, spanning several aspects of automatic face
  analysis: (i) We present a family of novel image
  descriptors which we call the "patch-LBP" descriptors. (ii) We show
  that descriptor based methods can obtain performance which is comparable to
  existing state of the art methods on both the same-not-same and multi-person
  recognition problems. (iii) We present the novel "One-Shot" vector
  similarity measure which we have used to improve our same-not-same results
  well above leading methods.   | 
 
  | Improved
  Seam Carving for Video Retargeting  | 
 
  | Arik Shamir, Michael Rubinstein, Shai Avidan - IDC  | 
 
  |   Video, like images, should support content aware
  resizing. We present video retargeting using an improved seam carving
  operator. Instead of removing 1D seams from 2D images we remove 2D seam
  manifolds from 3D space-time volumes. To achieve this we replace the dynamic
  programming method of seam carving with graph cuts that are suitable for 3D
  volumes. In the new formulation, a seam is given by a minimal cut in the
  graph and we show how to construct a graph such that the resulting cut is a
  valid seam. That is, the cut is monotonic and connected. In addition, we
  present a novel energy criterion that improves the visual quality of the
  retargeted images and videos. The original seam carving operator is focused
  on removing seams with the least amount of energy, ignoring energy that is
  introduced into the images and video by applying the operator. To counter
  this, the new criterion is looking forward in time - removing seams that
  introduce the least amount of energy into the retargeted result. We show how
  to encode the improved criterion into graph cuts (for images and video) as
  well as dynamic programming (for images). We apply our technique to images
  and videos and present results of various applications.   | 
 
  | Facial
  Gesture Analysis in an Interactive Environment | 
 
  | Gerard Medioni - USC | 
 
  |   Facial gesture analysis is an important problem in
  computer vision. Facial gestures carry critical information in nonverbal
  communication. The difficulty of automatic facial gesture recognition lies in
  the complexity of face motions. These motions can be categorized into a
  global, rigid head motion, and local, nonrigid
  facial deformations. These two components are coupled in an observed facial
  motion. We present our recent research of this topic, which
  includes tracking and modeling these two motions for gesture understanding.
  It can be divided into three parts: 3D head pose estimation, modeling and
  tracking nonrigid facial deformations, and
  expression recognition. We have developed a novel hybrid 3D head tracking
  algorithm to differentiate these two motions. The hybrid tracker integrates
  both intensity and feature correspondence for robust real-time head pose
  estimation. Nonrigid motions are analyzed in 3D by
  manifold learning techniques. We decompose nonrigid
  facial deformations on a basis of 1D manifolds. Each 1D manifold is learned
  offline from sequences of labeled basic expressions, such as smile, surprise,
  etc. Any expression is then a linear combination of values along these axes,
  with the coefficient representing the level of activation. Manifold learning
  is accomplished using N-D Tensor Voting. The output of our system is a rich
  representation of the face, including the 3D pose, 3D shape, expression label
  with probability, and the activation level.   | 
 
  | Homography
  Based Multiple Camera Detection and Tracking of People in a Dense Crowd | 
 
  | Ran Eshel, Yael Moses - IDC | 
 
  |   Tracking people in a dense crowd is a challenging problem
  for a single camera tracker due to occlusions and extensive motion that make
  human segmentation difficult. In this work we suggest a method for
  simultaneously tracking all the people in a densely crowded scene using a set
  of cameras with overlapping fields of view. To overcome occlusions, the
  cameras are placed at a high elevation and only people's heads are tracked.
  Head detection is still difficult since each foreground region may consist of
  multiple subjects. By combining data from several views, height information
  is extracted and used for head segmentation. The head tops, which are regarded
  as 2D patches at various heights, are detected by applying intensity
  correlation to aligned frames from the different cameras. The detected head
  tops are then tracked using common assumptions on motion direction and
  velocity. The method was tested on sequences in indoor and outdoor
  environments under challenging illumination conditions. It was successful in
  tracking up to 21 people walking in a small area (2.5 people per m^2), in
  spite of severe and persistent occlusions.   | 
 
  | What
  is a Good Image Segment?   A Unified
  Approach to Segment Extraction | 
 
  | Shai Bagon, Oren Boiman,
  Michal Irani - Weizmann | 
 
  |   There is a huge diversity of definitions of
  "visually meaningful" image segments, ranging from simple uniformly
  colored segments, textured segments, through symmetric patterns, and up to
  complex semantically meaningful objects. This diversity has led to a wide
  range of different approaches for image segmentation. In this paper we
  present a single unified framework for addressing this problem -  "Segmentation by Composition". We
  define a good image segment as one which can be easily composed using its own
  pieces, but is difficult to compose using pieces from other parts of the
  image. This non-parametric approach captures a large diversity of segment
  types, yet requires no pre-definition or modeling of segment types,  nor prior training. Based on this
  definition, we develop a segment extraction algorithm - i.e., given a single
  point-of-interest, provide the "best" image segment containing that
  point. This induces a figure-ground image segmentation, which applies to a
  range of different segmentation tasks: single image segmentation,
  simultaneous co-segmentation of several images, and class-based
  segmentations.   | 
 
  | Loose
  Shape Model for Discriminative Learning of Object Categories | 
 
  | Margarita Osadchy – Haifa  | 
 
  |   We consider the problem of visual
  categorization with minimal supervision during training. We propose a part-based
  model that loosely captures structural information. We represent images as a
  collection of parts characterized by an appearance codeword from a visual
  vocabulary and by a neighborhood context, organized in an ordered set of
  bag-of-features representations. These bags are computed in local overlapping
  areas around the part. A semantic distance between images is obtained by
  matching parts associated with the same codeword using their context
  distributions.  The classification is
  done using SVM with the kernel obtained from the proposed dissimilarity
  measure. The experiments show that our method outperforms all the
  classification methods from the PASCAL challenge on half of the VOC2006
  categories and has the best average EER.    | 
 
  | Industry
  Session | 
 
  | Organized by Chen Sagiv | 
 
  |   1. Igal Dvir, CTO, Nice Vision TBA
     2. Challenges and Solutions for Bundling
  Multiple DAS Applications on a Single Hardware platform - Gideon Stein, Chief
  Scientist, MobilEye Joint work with: Itay Gat, Gaby Hayon   This talk addresses the key challenges in bundling multiple camera
  based Driver Assistance Systems onto the same hardware platform. In
  particular, we discuss combinations of lane departure warning (LDW),
  Automatic High-beam Control (AHC), traffic sign recognition (TSR) and forward
  collision warning (FCW). The advantages of bundling are in cost reduction and
  that it allows more functions to be added to the car without increasing the
  footprint on the car windshield. The challenge in bundling is that the different applications
  traditionally have different requirements from the image sensor and optics.
  We show how algorithms can be modified so that they can all work together by
  relying less the particular physics of the camera and making more use of
  advanced pattern recognition techniques. This shift in algorithm paradigm means an increase in computational
  requirement. The introduction of new automotive qualified, high performance
  vision processors makes these new algorithms both viable and affordable
  paving the way to bundles of application running on the same platform.   3. 
  Automatic Image Enhancement - Renato Keshet, Project Leader, HP Labs This talk presents HIPIE (HP Indigo Photo Image Enhancement), a robust system
  that analyzes and automatically enhances images as part of a commercial photobook pipeline. The system can sharpen, denoise, enhance global and local contrast, brighten,
  enhance face contrast, boost color and improve resolution, as needed on a
  per-image base (as a result of a series of image analysis modules), all in a
  couple of seconds per photo. The system, developed mostly at HP Labs in Haifa, is implemented in many print shops in Israel and
  around the world, and works 24/7.
 In this presentation, we briefly describe
  the technology behind the system, with highlight on the analysis modules and
  unified enhancement scheme. We also mention our view of some of the research
  challenges for this and other image-related areas in the future.   4. Shai Dekel, Chief Scientist, Imaging Solutions, GE Healthcare   Pathology is the study of diseases by
  examining body tissues, typically under magnification. Today, pathologists
  use a microscope to look at slides of tissue samples that have been prepared
  with stains by a specialist called a histotechnician.
  This process has not changed much in over 100 years. However, there is an
  emerging movement toward digital pathology called "whole slide
  imaging" in which entire slides are digitally scanned so that they can
  be viewed on a computer. Note that one uncompressed digital slide scanned at
  high resolution can reach a size of 30GB and a typical patient case can
  contain 5-30 such slides. The digital representation of the slides motivates
  the development of new image analysis algorithms that can potentially assist
  the pathologists in their review process.     | 
 
  | Single
  Image Dehazing | 
 
  | Raanan Fattal –
  HUJI | 
 
  |   In this talk we present a new method for
  estimating the optical transmission in hazy scenes given a single input
  image. Based on this estimation, the scattered light is eliminated to
  increase scene visibility and recover haze-free scene contrasts. In this new
  approach we formulate a refined image formation model that accounts for
  surface shading in addition to the transmission function. This allows us to
  resolve ambiguities in the data by searching for a solution in which the
  resulting shading and transmission functions are locally statistically
  uncorrelated. A similar principle is used to estimate the color of the haze.
  Results demonstrate the new method abilities to remove the haze layer as well
  as provide a reliable transmission estimate which can be used for additional
  applications such as image refocusing and novel view synthesis.   | 
 
  | Fenchel
  Duality with Applications to Inference in Graphical Models | 
 
  | Amnon Shashua,
  Tamir Hazan - HUJI | 
 
  |   Quite a number of problems involving inference from data,
  whether visual data or otherwise, fall into the category of optimization. I
  will describe a general scheme for message passing update rules based on the
  framework of Fenchel duality. Using the framework we derive all past inference
  algorithms like the Belief Propagation sum-product and max-product as well as
  new convergent algorithms for maximum-a-posteriori (MAP) and marginal estimation
  using "convex free energies".     | 
 
  | Unsupervised
  estimation of segmentation quality using nonnegative factorization | 
 
  | Roman Sandler, Michael
  Lindenbaum - Technion | 
 
  |   We propose an unsupervised method for evaluating image
  segmentation. Common methods are typically based on evaluating smoothness
  within segments and contrast between them, and the measure they provide is
  not explicitly related to segmentation errors. The proposed approach differs
  from these methods on several important points and has several advantages
  over them. First, it provides a meaningful, quantitative assessment of
  segmentation quality, in precision/recall terms, which were applicable so far
  only for supervised evaluation. Second, it builds on a new image model, which
  characterizes the segments as a mixture of basic feature distributions. The
  precision/recall estimates are then obtained by a nonnegative matrix
  factorization (NMF) process. A third important advantage is that the
  estimates, which are based on intrinsic properties of the specific image
  being evaluated and not on a comparison to typical images (learning), are
  relatively robust to context factors such as image quality or the presence of
  texture. Experimental results demonstrate the accuracy of the
  precision/recall estimates in comparison to ground truth based on human
  judgment. Moreover, it is shown that tuning a segmentation algorithm using
  the unsupervised measure improves the algorithm’s quality (as measured by a
  supervised method).     | 
 
  | Removal
  of Turbulence Disturbance in a Movie Scene for Static and Moving Camera,
  Enabling Vision Applications | 
 
  | Tomer Avidor, Moty Golan -
  RAFAEL | 
 
  |   The common method of reconstructing a turbulence scene is
  through the creation of an artificial reference image. The reference image is
  usually obtained by averaging video through time. Using optical flow from
  that reference image to input images would give rise to such applications as:
  super-resolution, tracking and so forth. 
  However this technique suffers from several drawbacks: the resulting
  artificial reference frame is blurred, so calculated optical-flow fields are
  not precise and inhibit the results of applications based on these fields,
  and there is no accounting for camera motion or for motion within the field.
  We show a mathematical framework to reconstruct the movie scene as would have
  been seen without turbulence interference, yielding an observable live video
  output. We then use both frames and optical flow fields to get the
  aforementioned applications (tracking, super-resolution, mosaics) while
  dealing with camera motion, and draw guidelines to deal with in-scene motion
  inherently.    | 
 
  | The
  patch transform and its applications to image editing | 
 
  | Shai Avidan, Taeg Sang Cho, Moshe Butman,
  Bill Freeman – Adobe | 
 
  |   We introduce the patch transform, where an image is
  broken into non-overlapping patches, and modifications or constraints are
  applied in the "patch domain". A modified image is then reconstructed
  from the patches, subject to those constraints. When no constraints are
  given, the reconstruction problem reduces to solving a jigsaw puzzle.
  Constraints the user may specify include the spatial locations of patches,
  the size of the output image, or the pool of patches from which an image is
  reconstructed. We define terms in a Markov network to specify a good image
  reconstruction from patches: neighboring patches must fit to form a plausible
  image, and each patch should be used only once. We find an approximate
  solution to the Markov network using loopy belief propagation, introducing an
  approximation to handle the combinatorially
  difficult patch exclusion constraint. The resulting image reconstructions
  show the original image, modified to respect the user's changes. We apply the
  patch transform to various image editing tasks and show that the algorithm
  performs well on real world images.      | 
 
  | In
  Defense of Nearest-Neighbor Based Image Classification | 
 
  | Oren Boiman,   Eli Shechtman,   Michal Irani – Weizmann  | 
 
  |   State-of-the-art image classification methods require an
  intensive learning/training stage (using SVM, Boosting, etc.)  In contrast, non-parametric
  Nearest-Neighbor (NN) based image classifiers require no training time and
  have other favorable properties. However, the large performance gap between
  these two families of approaches rendered NN-based image classifiers
  useless.   We claim that the
  effectiveness of non-parametric NN-based image classification has been
  considerably under-valued. We argue that two practices commonly used in image
  classification methods, have led to the inferior performance of NN-based
  image classifiers: (i) Quantization of local image
  descriptors (used to generate "bags-of-words", codebooks). (ii) Computation of 'Image-to-Image' distance, instead of
  'Image-to-Class' distance. We propose a trivial NN-based classifier – NBNN, (Naive-Bayes Nearest-Neighbor), which employs NN-distances in
  the space of the local image descriptors (and not in the space of images).
  NBNN computes direct 'Image-to-Class' distances without descriptor
  quantization. We further show that under the Naive-Bayes
  assumption, the theoretically optimal image classifier can be accurately
  approximated by NBNN.  Although NBNN is
  extremely simple, efficient, and requires no learning/training phase, its
  performance ranks among the top leading learning-based image classifiers.
  Empirical comparisons are shown on several challenging databases
  (Caltech-101,Caltech-256 and Graz-01).   |