2006 Israel Computer Vision Day
Sunday, December 17, 2006

The Efi Arazi School of Computer Science

I.D.C.  Herzliya


Computer Science Dept., University of Haifa

Supported by the Israeli Ministry of Science and Technology







Previous Vision Days Web Page:  2003 , 2004, 2005 .





Speaker and Collaborators








Alex Rav-Acha,

Yael Pritch,

Shmuel Peleg 


Making a Long Video Short: Dynamic Video Synopsis


Ido Leichter,

Michael Lindenbaum,

Ehud Rivlin


Bittracker - A Bitmap Tracker for Visual Tracking under Very General Conditions


Tomer Amiaz,

Nahum Kiryati


Advances in Accurate Optical Flow Estimation


Coffee Break


Ohad Ben-Shachar


Toward Curvature-based Segmentation: Texture Saliency and Segregation Without Feature Gradient


Gerard Medioni


Identification of Non-Cooperative Subjects at a Distance


Michael Elad,

Michal Aharon,

Julien Mairal,
Guillermo Sapiro


U of Minnesota

Sparse and Redundant Representations and Learned Dictionaries for Image Processing Applications


Ishay Kamon


Motion detection in Rafael - current status and future needs


Lunch break


Tamir Hazan,

Amnon Shashua


Latent Class Model Reconstruction


Guy Gilboa,

Stanely Osher


Nonlocal evolutions for image regularization and supervised segmentation


Yair Moshe,

Hagit Hel-Or


Fast Block Motion Estimation Using Gray Code Kernels


Tammy Riklin-Raviv, Nahum Kiryati,

Nir Sochen


Segmentation by Level sets and Symmetry


Coffee Break


Yoav Scechner,

Yael Erez,

Dan Adam


AcoustiClean Images


Michael Kolomenkin,

Ilan Shimshoni


Image Matching Using Photometric Information


Boris Epstein
Shimon Ullman


Learning to distinguish between closely similar object classes


Michael Bronstein, Alexander Bronstein, Alfred Bruckstein and Ron Kimmel


Partial Similarity of Objects, or How to Compare a Centaur to a Horse




General:  This is the fourth Israel Computer Vision Day. It will be hosted at IDC.

For more details, requests to be added to the mailing list etc, please contact:

hagit@cs.haifa.ac.il   toky@idc.ac.il



Location and Directions: The Vision Day will take place at the Interdisciplinary Center (IDC), Herzliya,  in the Ivtzer Auditorium.  For driving instructions see map. 

A convenient option to arrive is by train, see time schedule here. Get off at the Herzliya Station, and order a taxi ride by phone. There are two taxi stations that provide this service: Moniyot Av-Yam (09 9501263 or 09 9563111), and Moniyot Pituach (09 9582288 or 09 9588001). The fair for a taxi ride from the railway station to IDC is around 20.- NIS.






Making a Long Video Short: Dynamic Video Synopsis

Alex Rav-AchaYael Pritch and Shmuel Peleg – HUJI


The power of video over still images is the ability to represent dynamic
activities. But video browsing and retrieval are inconvenient due to inherent spatio-temporal redundancies, where some time intervals may have no activity, or have activities that occur in a small image region. Video synopsis aims to provide a compact video representation, while preserving the essential activities of the original video.


We present dynamic video synopsis, where most of the activity in the video is condensed by simultaneously showing several actions, even when they originally occurred at different times. For example, we can create a   ”stroboscopic movie”, where multiple dynamic instances of a moving object are played simultaneously. This is an extension of the still stroboscopic picture.


Previous approaches for video abstraction addressed mostly the temporal redundancy by selecting representative key-frames or time intervals. In dynamic video synopsis the activity is shifted into a significantly shorter period, in which the activity is much denser. Video examples can be found online in http://www.vision.huji.ac.il/synopsis


Bittracker - A Bitmap Tracker for Visual Tracking under Very General Conditions

Ido Leichter, Michael Lindenbaum and Ehud Rivlin - Technion

This paper addresses the problem of visual tracking under very general conditions: a possibly non-rigid target whose appearance may drastically change over time; general camera motion; a 3D scene; and no a priori information except initialization. This is in contrast to the vast majority of trackers which rely on some limited model in which, for example, the target's appearance is known a priori or restricted, the scene is planar, or a pan tilt zoom camera is used. Their goal is to achieve speed and robustness, but their limited context may cause them to fail in the more general case.


The proposed tracker works by approximating, in each frame, a PDF (probability distribution function) of the target's bitmap and then estimating the maximum a posteriori bitmap. The PDF is marginalized over all possible motions per pixel, thus avoiding the stage in which optical flow is determined. This is an advantage over other general-context trackers that do not use the motion cue at all or rely on the error-prone calculation of optical flow. Using a Gibbs distribution with a first-order neighborhood system yields a bitmap PDF whose maximization may be transformed into that of a quadratic pseudo-Boolean function, the maximum of which is approximated via a reduction to a maximum-flow problem. Many experiments were conducted to demonstrate that the tracker is able to track under the aforementioned general context.


Advances in Accurate Optical Flow Estimation

Tomer Amiaz and Nahum Kiryati – Tel-Aviv University


This talk presents two methods of improving optical flow estimation accuracy: using level sets to accommodate flow discontinuities and extending the coarse to fine approach to over-fine levels.


Dense optical flow schemes are challenged by the presence of motion discontinuities. In state of the art optical flow methods, over-smoothing accounts for most of the error. Embedding state of the art optical flow estimation within an active contour segmentation framework results in piecewise smooth flow fields. Experimental results show the superiority of our algorithm with respect to previous techniques.


Modern optical flow algorithms employ the coarse to fine approach. We suggest upgrading this class of algorithms by adding over-fine interpolated levels to the pyramid. Theoretical analysis of the coarse to over-fine approach explains its advantages in handling flow-field discontinuities and simulations show its benefit for sub-pixel motion. By applying the suggested technique to various multi-scale optical flow algorithms, we reduced the estimation error by 10-30% on common test sequences.


Toward Curvature-based Segmentation: Texture Saliency and Segregation Without Feature Gradient

Ohad Ben-Shachar – Ben-Gurion

The analysis of texture patterns, and texture segregation in particular, are at the heart of visual processing. In this work we question the widely accepted view that the detection (both perceptual and computational) of salient perceptual singularities (i.e., borders) between perceptually coherent texture regions is tightly dependent upon feature gradients.  Specifically, we study smooth orientation-defined textures (ODTs) and show psychophysically that they exhibit striking perceptual singularities even without any outstanding gradients in their defining feature, namely orientation. We further show how these generic singularities are not only unpredictable from the orientation gradient, but that they also defy popular segmentation algorithms and neural models. We then examine smooth ODTs from a (differential) geometric point of view and develop a theory that fully predicts their perceptual singularities from two ODT curvatures. The theoretical results exhibit striking correspondence to segregation performed by human subjects and the analytical framework is also extended into a biologically plausible computational scheme that works directly on raw ODT images and exhibits the same performance. Extensions and implications of our results are discussed for multi-oriented and general textures, and for the role of curvature in various aspects of visual processing.


Identification of non-cooperative subjects at a distance

Gerard Medioni – USC

We present a system to locate, track and identify people going through a predefined zone, indoors or outdoors. This is accomplished by processing images taken from an ultra-high resolution video camera, inferring the location of the subjects’ head, using this information to crop the region of interest, building a 3D face model from this face image sequence, and using this 3D model to perform biometric identification. This approach is applicable with subjects approximately 50 feet away, indoors or outdoors, using an off-the-shelf ultra-high resolution video camera.
We choose faces as our biometric basis, because most of the distinctive and permanent biometric features (such as fingerprints, hand shape, iris or retinal scans) require cooperative subjects in close proximity to the biometric system. Unfortunately, even top 2D face recognition systems today are neither reliable nor accurate enough. We thus perform face recognition in 3D. This allows the use of true shape invariants for recognition, and circumvents difficulties associated with pose and lighting.
To generate these 3D descriptions, we use an image sequence, as natural head and body motion provides enough viewpoint variation to perform stereo-motion for 3D face reconstruction. We present encouraging initial results.

Sparse and Redundant Representations and Learned Dictionaries for Image Processing Applications

Michael Elad, Michal Aharon, Julien Mairal, and Guillermo SapiroTechion, University of Minnesota


In this talk we consider several inverse problems in image processing, using sparse and redundant representations over trained dictionaries. Using the K-SVD algorithm, we obtain a dictionary that describes the image content effectively. Two training options are considered: using the corrupted image itself, or training on a corpus of high-quality image database. Since the K-SVD is limited in handling small image patches, we extend its deployment to arbitrary image sizes by defining a global image prior that forces sparsity over patches in every location in the image. We show how such Bayesian treatment leads to a simple and effective denoising algorithm for gray-level images with state-of-the-art denoising performance. We then extend these results to color images, handling their denosing, inpainting, and demosaicing. Following the above ideas, with necessary modifications to avoid color artifacts and over-fitting, we present stat-of-the art results in each of these applications.


Motion detection in Rafael - current status and future needs

Ishay Kamon – Rafael


In the following talk we will present up-to-date results in motion detection and tracking in Rafael, as well as future needs.  Motion detection algorithms are being developed in the image processing group of Rafael for over 10 years, mainly for air-to-air missile applications. We specialize in detecting small and weak targets in the presence of complex backgrounds. Recently these capabilities are being adjusted for many other working scenarios, such as GMTI - ground moving target indicator, surveillance, etc. We will present results of motion detection and tracking in video sequences, as well as motion detection from a pair of images.   We will also discuss operational needs, based on the experience of the second Lebanon war, and point to knowledge gaps to which research effort may be directed.


Latent Class Model Reconstruction:  On the Suitability of Maximum Likelihood versus Least -Squares for Latent Class Model Reconstruction for Vision and Learning Applications

Tamir Hazan and Amnon Shashua - HUJI


The latent class model is a popular approach for signal reconstruction.
Normally, achieving a maximum likelihood approximation for the
decomposition of a signal to a super-position of factors is considered
as the statistically optimal approach. A maximum likelihood solution is
equivalent to a low-rank decomposition under the relative-entropy error
measure. We will show that under an additive noise model with bounded
L-infinity noise, a statistically-valid least-squares decomposition
yields an approximation with a bounded L-infinity error whereas the ML
approach can produce arbitrary bad (in L-infinity) solutions. We will
also derive a simple statistically-valid least-squares solution and
demonstrate its superiority over the ML solution.


Nonlocal evolutions for image regularization and supervised segmentation

Guy Gilboa and Stanely Osher – UCLA 


A general class of nonlocal weighted convex functionals is examined. The weights are based on image features and represent the affinity between different pixels in the image. By prescribing different formulas for the weights, one can generalize many non-iterative local and nonlocal denoising algorithms, including nonlocal means and bilateral filters. The steepest descent for minimizing the quadratic functional can be interpreted as a nonlocal diffusion process. It is shown how supervised segmentation can be performed by this type of diffusion. State of the art denoising results based on patch similarities is presented.


Fast Block Motion Estimation Using Gray Code Kernels

Yair Moshe and Hagit Hel-Or – University of Haifa


Motion estimation plays an important role in modern video coders. In such coders, motion is estimated using a block matching algorithm that estimates the amount of motion on a block-by-block basis. A full search technique for finding the best matching blocks delivers good accuracy but is usually not practical due to its high computational complexity. In this talk we present a novel approach to block-based motion estimation which is based on a recently introduced family of filters called the Gray-Code Kernels (GCK). Filtering an image with a sequence of Gray-Code Kernels is highly efficient and requires only 2 operations per pixel for each filter kernel, independent of the size or dimension of the kernel. We exploit the advantages of the GCK to produce an efficient block matching scheme which is incorporated in an H.264/MPEG-4 AVC video coder. The new scheme is shown to significantly outperform popular fast motion estimation algorithms, such as three-step search and diamond search. In addition, the tradeoff between computational complexity and quality of results can be easily controlled in the proposed algorithm, thus it enables adaptivity to image content.


Segmentation by Level sets and Symmetry

Tammy Riklin-Raviv, Nahum Kiryati, and Nir Sochen – TAU

Shape symmetry is an important cue for image understanding. In the absence of more detailed prior shape information, segmentation can be significantly facilitated by symmetry. However, when symmetry is distorted by perspectivity, the detection of symmetry becomes non-trivial, thus complicating symmetry-aided segmentation.


We present an original approach for segmentation of symmetrical objects accommodating perspective distortion. The key idea is the use of the replicative form induced by the symmetry for challenging segmentation tasks. This is accomplished by dynamic extraction of the object boundaries, based on the image gradients, gray levels or colors, concurrently with registration of the image symmetrical counterpart (e.g. reflection) to itself. The symmetrical counterpart of the evolving object contour supports the segmentation by resolving possible ambiguities due to noise, clutter, distortion, shadows, occlusions and assimilation with the background. The symmetry constraint is integrated in a comprehensive level-set functional for segmentation that determines the evolution of the delineating contour. The proposed framework is exemplified on various images of skew-symmetrical objects and its superiority over state of the art variational segmentation techniques is demonstrated.


AcoustiClean Images

Yoav Scechner Yael Erez and Dan Adam - Technion

Ultrasound images are very noisy. Along with system noise, a significant noise source is the speckle phenomenon, caused by interference in the viewed object. Most past approaches for denoising ultrasound images essentially blur the image, and they do not handle attenuation. Our approach, on the contrary, does not blur the image and does handle attenuation. Our denoising approach is based on frequency compounding, in which images of the same object are acquired in different acoustic frequencies, and then compounded. Existing frequency compounding methods have been based on simple averaging, and have achieved only limited enhancement. The reason is that the statistical and physical characteristics of the signal and noise vary with depth, and the noise is correlated. Hence, we suggest a spatially varying frequency compounding, based on understanding of these characteristics. Our method suppresses the various noise sources and recovers attenuated objects, while maintaining high resolution.


Image Matching Using Photometric Information

Michael Kolomenkin and Ilan ShimshoniUniversity of Haifa


Image matching is an essential task in many computer vision applications. It is obvious that thorough utilization of all available information is critical for the success of matching algorithms. However most popular matching methods do not incorporate effectively photometric data. Some algorithms are based on geometric, color invariant features, thus completely neglecting available photometric information. Others assume that color does not differ significantly in the two images; that assumption may be wrong when the images are not taken at the same time, for example when a recently taken image is compared with a database. This paper introduces a method for using color information in image matching tasks. Initially the images are segmented using an off-the-shelf segmentation process (EDISON). No assumptions are made on the quality of the segmentation. Then the algorithm employs a model for natural illumination change to define the probability of two segments to originate from the same surface. When additional information is supplied (for example suspected corresponding point features in both images), the probabilities are updated. We show that the probabilities can easily be utilized in any existing image matching system. We propose a technique to make use of them in a SIFT-based algorithm. The technique’s capabilities are demonstrated on real images, where it causes a significant improvement in comparison with the original SIFT results in the percentage of correct matches found.


Learning to distinguish between closely similar object classes

Boris Epstein and Shimon UllmanWeizmann


We show that the discrimination between visually similar classes often depends on the detection of so-called 'satellite features'. These are local features which are not informative by themselves, and can only be detected reliably at locations specified relative to other features. This makes satellite features difficult to extract by current classification methods. We describe a novel scheme which can extract discriminative satellite features and use them to distinguish between visually similar classes. The algorithm first searches for a set of features ("anchor features") that can be found in all the similar classes. Such features can be detected because the classes are visually similar. The anchors are used to determine the locations of satellite features, which are then used to distinguish between the similar classes. The algorithm is fully automatic, and is shown to work well for many categories of visually similar classes.


Partial Similarity of Objects, or How to Compare a Centaur to a Horse

Michael Bronstein, Alexander Bronstein, Alfred Bruckstein and Ron Kimmel - Technion


Similarity is one of the most important abstract concepts in human perception of the world. It is encountered, for example, during our interaction with other people whose faces we recognize. In computer vision, numerous applications have to deal with comparing objects observed in a scene with some patterns known a priori.


Often, it happens that while two objects are not similar, they are partially similar, i.e., have large similar parts. We present a novel approach to quantify this semantic definition of partial similarity between abstract objects using the notion of Pareto optimality.


We will exemplify our approach in several applications, including the problems of recognition of non-rigid 2D and 3D objects and analysis of text sequences.