2007 Israel Computer Vision Day
Sunday, December 9, 2007

The Efi Arazi School of Computer Science

I.D.C.  Herzliya

 

Computer Science Dept., University of Haifa

 

 

 

 

 

 


Previous Vision Days Web Page:  2003 , 2004, 2005, 2006 .

 

 

 

Time

Speaker and Collaborators

Affiliation

Title

09:00-09:30

Gathering

09:30-09:45

Opening

09:45-10:10

Nir Sochen,

Shamgar Gurevich, Ronny Hadani 

TAU

On some deterministic dictionaries supporting sparsity

10:10-10:35

Lior Wolf,

Hueihan Jhuang,

Tamir Hazan

TAU

Learning Appearances with Low-Rank SVM

10:35-11:00

R. Kimmel, Dan Raviv,

A. Bronstein,

M. Bronstein

Technion

On Biometry, Isometry, and Intrinsic Symmetry

11:00-11:30

Coffee Break

11:30-11:55

Matan Protter,

Miki Elad

Technion

Super-resolution with no explicit motion estimation

11:55-12:20

Stas Rozenfeld, 

Ilan Shimshoni,

Micha Lindenbaum

Haifa
Technion

Dense  mirroring surface recovery from 1D homographies and sparse correspondences

12:20-12:45

Roman Sandler,

Micha Lindenbaum

Technion

 

Optimizing Gabor Filter Design for Texture Edge Detection and Classification

12:45-13:10

Eli Shechtman ,

Michal Irani

Weizmann

Matching Local Self-Similarities across Images and Videos

13:10-14:10

Lunch break

14:10-14:35

Yael Pritch,

Shmuel Peleg,

Alex Rav Acha

HUJI

Webcam Synopsis and Indexing

14:35-15:00

Yacov Hel-Or,

Doron Shaked

IDC

Discriminative approach for Wavelet denoising

15:00-15:25

Daniel Keren

Haifa

Applying Property Testing to Image and Video Segmentation

15:25-15:50

Ofir Pele,

Michael Werman

HUJI

Robust Real Time Pattern Matching using Bayesian Sequential Hypothesis Testing

15:50-16:20

Coffee Break

16:20-16:45

 

Yuval Barkan,

Hedva Spitzer

 

TAU

Brightness contrast-contrast induction model predicts assimilation and inverted assimilation effects

16:45-17:10

Maxim Shoshani

Technion

An Evolutionary Patch Pattern Approach for Texture Discrimination

17:10-17:35

Amnon Shashua,

Tamir Hazan

HUJI

The use of Non-Extensive Divergence for Robust 'bag of features' Algorithms for Visual

17:35-18:00

Michael Zibulevsky

Technion

Blind Source Separation, Deconvolution and Localization using Sparse Signal

 

 

 

General:  This is the fourth Israel Computer Vision Day. It will be hosted at IDC.

For more details, requests to be added to the mailing list etc, please contact:

hagit@cs.haifa.ac.il   toky@idc.ac.il

 

Location and Directions: The Vision Day will take place at the Interdisciplinary Center (IDC), Herzliya,  in the Ivtzer Auditorium.  For driving instructions see map.

A convenient option is to arrive by train, see time schedule here. Get off at the Herzliya Station, and order a taxi ride by phone. There are two taxi stations that provide this service: Moniyot Av-Yam (09 9501263 or 09 9563111), and Moniyot Pituach (09 9582288 or 09 9588001).

 

 

 

 

Abstracts

 

 

On some deterministic dictionaries supporting sparsity

 

Nir Sochen, Shamgar Gurevich, and Ronny Hadani – TAU

 

 

We describe two deterministic constructions of dictionaries 
of functions on the finite line which supports certain degree of 
sparsity.

 

 

 

Learning Appearances with Low-Rank SVM

 

Lior Wolf, Hueihan Jhuang, Tamir Hazan - TAU

 

Several authors have noticed that the common representation of images as vectors is sub-optimal. The process of vectorization eliminates spatial relations between some of the nearby image measurements and produces a vector of a dimension which is the product of the measurements' dimensions. It seems that images may be better represented when taking into account their structure as a 2D (or multi-D) array.

 

Our framework, "Low-Rank separators", studies the use of a separating hyperplane which are constrained to have the structure of low-rank matrices. We first prove that the low-rank constraint provides preferable generalization properties. We then define two "Low-rank SVM problems" and propose algorithms to solve these. Finally, we provide supporting experimental evidence for the framework.

 

 

On Biometry, Isometry, and Intrinsic Symmetry

 

R. Kimmel, Dan Raviv A. Bronstein and M. Bronstein - Technion

 

 

It can be shown that a person's identity is associated with the intrinsic geometry the facial surface, while facial expressions are related to the extrinsic geometry. Our first attempt in face recognition was to represent the intrinsic geometry of the surface by isometrically embedding it into a low-dimensional Euclidean space. The result is an expression-invariant representation of the face we called canonical form.  Next, we embedded surfaces into non-Euclidean simple spaces. Particularly, two- and three-dimensional spheres were found to be appealing for the representation of faces, as the resulting metric distortion is usually smaller compared to a Euclidean space. Last year, we introduced the Generalized Multidimensional Scaling (GMDS), which allows embedding into manifolds with an arbitrary geometric structure. Instead of embedding the facial surfaces into a common embedding space, we embed one surface into the other and use the metric distortion as a measure of their dissimilarity. The GMDS allows surface recognition even when significant parts are missing. Finally, mapping a surface into its own geometry can be used in order to study self (isometric) symmetries of a given object.

 

In this talk I will review the relation between these measures, the underlying theory, the resulting numerical machinery, and applications ranging from recognition of faces in shape analysis, to morphing, warping, intrinsic symmetry, and texture mapping.

 

 

 

Super-resolution with no explicit motion estimation

 

Matan Protter, Miki Elad – Technion

 

 

Super-resolution reconstruction proposes a fusion of several low quality
images into one higher quality result with better optical resolution.
Classic super resolution techniques strongly rely on the availability of
accurate motion estimation for this fusion task. When the motion is
estimated inaccurately, as often happens for non-global motion fields, this results with annoying artifacts in the super-resolved outcome. Encouraged by recent developments on the video denoising problem, where state-of-the-art algorithms are formed with no explicit motion estimation, we seek a super-resolution algorithm of similar nature. In this talk we present our solution, a novel super-resolution algorithm that is based on the Non-Local-Means (NLM) algorithm. We show how this denoising method is generalized to become a relatively simple super resolution algorithm with no explicit motion estimation. Results on several various movies show that the proposed method is very successful in providing super-resolution on general sequences.

 

 

Dense  mirroring surface recovery from 1D homographies and sparse correspondences

 

Stas Rozenfeld,  Ilan Shimshoni, Micha Lindenbaum – Haifa + Technion

 

 

In this work we recover the 3D shape of mirroring objects such as
mirrors, sunglasses, and stainless steel objects. A computer
 monitor displays several images of parallel stripes,
each image at a different angle. Reflections of these stripes in a
mirroring surface are captured by the camera. For every image point,
the directions of the displayed stripes and their reflections in the
image are related by a 1D homography which can be computed robustly
and using the statistically accurate heteroscedastic model, without
monitor--image correspondence, which is generally required by other
techniques. In addition, each image of a stripes is followed by its
negative image. The direction of the stripes is estimated using the
subtraction of two images. This allows to perform the shape recovery
procedure not only for pure mirroring surfaces. Focusing on a small
set of image points for which monitor--image correspondence is
computed, the depth and the local shape may be calculated relying on
this homography. This is done by an optimization process which is
related to the one proposed by Savarese, Chen and Perona
2005 but is different and more stable due to the minimization of an angle error which is more "geometric" measure then the matrix degeneracy constraint. Connection between our cost function and the one used by Savarese, Chen and Perona is addressed. Then dense surface recovery is performed using constrained interpolation, which does not simply interpolate the surface  depth values, but rather solves for the depth, the correspondence, and the local surface shape, simultaneously at each interpolated point. Consistency with the 1D homography is thus required. The proposed method as well as the method described in by Savarese, Chen and Perona  are inherently unstable on a small part of the surface. We propose a method to detect these instabilities and correct them. Additionally, we provide an algebraic constraint sufficient for the point to be unstable, and give simple geometric interpretation of this
constraint in case of planar and spherical surface.  The method was
implemented and the shapes of a mirror, sunglasses, and a stainless
steel ashtray were recovered at sub-millimeter accuracy.

 

 

Optimizing Gabor Filter Design for Texture Edge Detection and Classification

 

Roman Sandler , Micha Lindenbaum - Technion

 

 

An effective and efficient texture analysis method, based on a new
criterion for designing Gabor filter sets, is proposed.


The commonly used filter sets are usually designed for optimal
signal representation. We propose here an alternative criterion for
designing the filter set. We consider a set of filters and its
response to pairs of harmonic signals. Two signals are considered
separable if the corresponding two sets of vector responses are
disjoint in at least one of the components. We propose an algorithm
for deriving the set of Gabor filters that maximizes the fraction of
separable harmonic signal pairs in a given frequency range. The
resulting filters differ significantly from the traditional ones.

We test these maximal harmonic discrimination (MHD) filters in
several texture analysis tasks: clustering, recognition, and edge
detection. It turns out that the proposed filters perform much
better than the traditional ones in these tasks. They can achieve
performance similar to that of state-of-the-art, distribution based
(texton) methods, while being simpler and more computationally
efficient.

 

 

Matching Local Self-Similarities across Images and Videos

 

Eli Shechtman, Michal Irani - Weizmann

 

 

We present an approach for measuring similarity between visual entities (images or videos) based on matching internal self-similarities. What is correlated across images (or across video sequences) is the internal  layout of local self-similarities (up to some distortions), even though the patterns generating those local self-similarities are quite different in each of the images/videos. These internal self-similarities are efficiently captured by a compact local "self-similarity descriptor", measured densely throughout the image/video, at multiple scales, while accounting for local and global geometric distortions. This gives rise to matching capabilities of complex visual data, including detection of objects in real cluttered images using only rough hand-sketches, handling textured objects with no clear boundaries, and detecting complex actions in cluttered video data with no prior learning. We compare our measure to commonly used image-based and video-based similarity measures, and demonstrate its applicability to object detection, retrieval, and action detection.

 

 

 

Webcam Synopsis and Indexing

 

Yael Pritch, Alex Rav-Acha, Shmuel Peleg - HUJI

 

 

The world is covered with millions of webcams (or security cameras). However, the video recorded by such cameras is rarely watched as there are more cameras than people to watch them and their contents is not so interesting most of the time.

 

Video Synopsis can be used to create a synopsis of endless video streams, as generated by webcams and by security cameras. It can address queries like ``Show in one minute the highlights of this camera broadcast during the past day''.  This process includes two major phases: (i) An online conversion of the endless video stream into a database of objects and activities (rather than frames).  (ii) A response phase, generating the video synopsis as a response to the user's query.

 

The synopsis video can also be used as an index into the original video. Several methodologies for video indexing based on video synopsis will be described.

 

 

 

 

 

Discriminative approach for Wavelet denoising

 

Yacov Hel-Or, Doron Shaked – IDC + HP

 

 

This work suggests a discriminative approach for wavelet denoising
where a set of shrinkage functions (SFs)  are designed to perform optimally (in a MSE sense)  with respect to a given set of images. Using the suggested scheme a new set of SFs are generated which are shown to be different from the traditional soft/hard  thresholding in the overcomplete case. These SFs are demonstrated to obtain the state-of-the-art denoising performance. As opposed to the descriptive approaches modeling image or noise priors are not required here and the SFs are learned directly from the example set. Thus, the framework enables the shrinkage operation to be customized seamlessly to a new set of restoration problems, such as: image de-blurring, JPEG artifacts removal, and different types of additive noise.

 

 

 

Applying Property Testing to Image and Video Segmentation

 

Daniel KerenUniversity of Haifa

 

 

Property testing is a rapidly growing field of research. Typically,
a property testing algorithm proceeds by quickly determining whether
an input can satisfy some condition, under the assumption that most
inputs do not satisfy it. If the input is "far" from satisfying
the condition, the algorithm is guaranteed to reject it with high
probability.

We suggest that property testing is especially suitable to image
detection, since typically most inputs are far from the sought
pattern. We analyze the problem of deciding whether a binary
image can be segmented to a given rectangular grid, and reduce
it to a problem whose size is a constant independent of the size
of the original image, and which is -- with high probability --
equivalent to the original problem.

 

 

 

Robust Real Time Pattern Matching using Bayesian Sequential Hypothesis Testing

 

Ofir Pele, Michael Werman – HUJI

 

The talk describes a method for robust real time pattern matching. We first introduce a family of image distance measures, the "Image Hamming Distance Family". Members of this family are robust to occlusion, small geometrical transforms, light changes and non-rigid deformations. We then present a novel Bayesian framework for sequential hypothesis testing on finite populations. Based on this framework, we design an optimal rejection/acceptance sampling algorithm. This algorithm quickly determines whether two images are similar with respect to a member of the Image Hamming Distance Family. We also present a fast framework that designs a near-optimal sampling algorithm. Finally we describe a method that accelerates pattern matching exploiting image smoothness to adaptively slide the window often by more than one pixel. The decision how much we can slide is based on a novel rank we define for each feature in the pattern. Extensive experimental results show that the method performance is excellent. Implemented on a Pentium 4 3GHz processor, detection of a pattern with 2197 pixels, in 640x480 pixel frames, where in each frame the pattern rotated and was highly occluded, proceeds at only 7.2ms per frame.

 

 

Brightness contrast-contrast induction model predicts assimilation and inverted assimilation effects

 

Yuval Barkan, Hedva Spitzer - TAU

 

 

In classical assimilation effects, intermediate luminance patches appear lighter when their immediate surround is comprised of white patches, and appears darker when its immediate surround is comprised of dark patches. With patches either darker or lighter than both inducing patches, the direction of the brightness effect is reversed and termed as “inverted assimilation effect”. Several explanations and models have been suggested, some are relevant to specific stimulus geometry, anchoring theory and models which involve high level cortical processing (such as scission, etc.).None of these studies predicted the variety types of assimilation effects and their inverted effects.  We suggest here, a compound brightness model, which is based on contrast-contrast induction (second order adaptation mechanism) which predicts the various types of brightness assimilation effects and their inverted effects, in addition to the prediction of the dual effects (contrast enhancement and suppression) of contrast-contrast induction. The model is composed of three main stages: composing the On-center retinal opponent cells receptive fields, performing second order adaptation (gain control of “curve-shifting of the contrast domain). The final stage is transformation of the “perceived” adapted response to luminance image, by utilizing a variation of Jacobi iteration process, to enable elegant edge integration.

 

 

 

An Evolutionary Patch Pattern Approach for Texture Discrimination

 

Maxim Shoshani - Technion

 

 

A new evolutionary approach is presented, based on implicit pattern – process relationships. For implementing this approach, any gray level texture image is decomposed into a progressive sequence of binary patch patterns that describe a process of change from background to foreground domination.  Each of the binary patterns throughout these sequences is parameterized, using several metrics that describe, for example, its fragmentation level, both for the background (e.g., white) and foreground ( e.g., black) patch patterns. Any texture type is then assumed to have a unique evolutionary path represented by a distinctive region in the feature space of metrics characterizing these patterns and their change. Application of hierarchical clustering based on a few (3 or 4) metrics representing characteristic stages in the patterns' change process allowed us to accurately discriminate between 50 samples of 10 Brodatz texture types.

 

 

 

The use of Non-Extensive Divergence for Robust 'bag of features' Algorithms for Visual Recognition

 

Amnon Shashua, Tamir Hazan – HUJI

 

 

Images are sometimes represented by a set of informative features ("bag of visual-terms"). pLSA is a method to analyze the features-images co-occurrence matrix and recover hidden categories of images using a KL-divergence low-rank fit. We introduce the Tsallis divergence error measure showing much improved performance in the presence of noise. We provide an optimization framework which extends the Maximum Likelihood framework and theoretically guaranteed to provide robustness under clutter, noise and outliers. Specifically, the conditions under which our approach excels is when the co-occurrences array is sparse --- which happens in the application domain of "bag of visual words".

 

 

 

 

Blind Source Separation, Deconvolution and Localization using Sparse Signal Representations

 

Michael ZibulevskyTechnion

 

The blind source separation problem is concerned with extraction of the
underlying unknown source signals from a set of their linear mixtures,
where the mixing matrix is unknown. The blind deconvolution problem
consists in recovery of a signal/image blurred by unknown convolution
kernel. These two problems are close related and are encountered in
acoustics, radio, radar, medical signal and image processing,
hyperspectral imaging, and more, hence the tremendous interest to them.

We show that exploiting the sparsity of a wavelet-type and other representations of the sources dramatically improves the quality of separation/deconvolution. This leads to a L1-norm minimization problem, which can be solved efficiently by the proposed relative Newton method combined with the Smoothing Method of Multipliers.


We also show how to use the sparsity priors for super-resolution source
localization using multisensor observations.