Israel Vision Day 2007

2007 Israel Computer Vision Day
Sunday, December 9, 2007

The Efi Arazi School of Computer Science

I.D.C. Herzliya

Computer Science Dept., University of Haifa

Previous Vision Days Web Page: 2003 , 2004, 2005, 2006 .

Time	Speaker and Collaborators	Affiliation	Title
09:00-09:30	Gathering
09:30-09:45	Opening
09:45-10:10	Nir Sochen, Shamgar Gurevich, Ronny Hadani	TAU	On some deterministic dictionaries supporting sparsity
10:10-10:35	Lior Wolf, Hueihan Jhuang, Tamir Hazan	TAU	Learning Appearances with Low-Rank SVM
10:35-11:00	R. Kimmel, Dan Raviv, A. Bronstein, M. Bronstein	Technion	On Biometry, Isometry, and Intrinsic Symmetry
11:00-11:30	Coffee Break
11:30-11:55	Matan Protter, Miki Elad	Technion	Super-resolution with no explicit motion estimation
11:55-12:20	Stas Rozenfeld, Ilan Shimshoni, Micha Lindenbaum	Haifa Technion	Dense mirroring surface recovery from 1D homographies and sparse correspondences
12:20-12:45	Roman Sandler, Micha Lindenbaum	Technion	Optimizing Gabor Filter Design for Texture Edge Detection and Classification
12:45-13:10	Eli Shechtman , Michal Irani	Weizmann	Matching Local Self-Similarities across Images and Videos
13:10-14:10	Lunch break
14:10-14:35	Yael Pritch, Shmuel Peleg, Alex Rav Acha	HUJI	Webcam Synopsis and Indexing
14:35-15:00	Yacov Hel-Or, Doron Shaked	IDC	Discriminative approach for Wavelet denoising
15:00-15:25	Daniel Keren	Haifa	Applying Property Testing to Image and Video Segmentation
15:25-15:50	Ofir Pele, Michael Werman	HUJI	Robust Real Time Pattern Matching using Bayesian Sequential Hypothesis Testing
15:50-16:20	Coffee Break
16:20-16:45	Yuval Barkan, Hedva Spitzer	TAU	Brightness contrast-contrast induction model predicts assimilation and inverted assimilation effects
16:45-17:10	Maxim Shoshani	Technion	An Evolutionary Patch Pattern Approach for Texture Discrimination
17:10-17:35	Amnon Shashua, Tamir Hazan	HUJI	The use of Non-Extensive Divergence for Robust 'bag of features' Algorithms for Visual
17:35-18:00	Michael Zibulevsky	Technion	Blind Source Separation, Deconvolution and Localization using Sparse Signal

General: This is the fourth Israel Computer Vision Day. It will be hosted at IDC.

For more details, requests to be added to the mailing list etc, please contact:

hagit@cs.haifa.ac.il toky@idc.ac.il

Location and Directions: The Vision Day will take place at the Interdisciplinary Center (IDC), Herzliya, in the Ivtzer Auditorium. For driving instructions see map.

A convenient option is to arrive by train, see time schedule here. Get off at the Herzliya Station, and order a taxi ride by phone. There are two taxi stations that provide this service: Moniyot Av-Yam (09 9501263 or 09 9563111), and Moniyot Pituach (09 9582288 or 09 9588001).

Abstracts

	On some deterministic dictionaries supporting sparsity
	Nir Sochen, Shamgar Gurevich, and Ronny Hadani – TAU
	We describe two deterministic constructions of dictionaries of functions on the finite line which supports certain degree of sparsity.
	Learning Appearances with Low-Rank SVM
	Lior Wolf, Hueihan Jhuang, Tamir Hazan - TAU
	Several authors have noticed that the common representation of images as vectors is sub-optimal. The process of vectorization eliminates spatial relations between some of the nearby image measurements and produces a vector of a dimension which is the product of the measurements' dimensions. It seems that images may be better represented when taking into account their structure as a 2D (or multi-D) array. Our framework, "Low-Rank separators", studies the use of a separating hyperplane which are constrained to have the structure of low-rank matrices. We first prove that the low-rank constraint provides preferable generalization properties. We then define two "Low-rank SVM problems" and propose algorithms to solve these. Finally, we provide supporting experimental evidence for the framework.
	On Biometry, Isometry, and Intrinsic Symmetry
	R. Kimmel, Dan Raviv A. Bronstein and M. Bronstein - Technion
	It can be shown that a person's identity is associated with the intrinsic geometry the facial surface, while facial expressions are related to the extrinsic geometry. Our first attempt in face recognition was to represent the intrinsic geometry of the surface by isometrically embedding it into a low-dimensional Euclidean space. The result is an expression-invariant representation of the face we called canonical form. Next, we embedded surfaces into non-Euclidean simple spaces. Particularly, two- and three-dimensional spheres were found to be appealing for the representation of faces, as the resulting metric distortion is usually smaller compared to a Euclidean space. Last year, we introduced the Generalized Multidimensional Scaling (GMDS), which allows embedding into manifolds with an arbitrary geometric structure. Instead of embedding the facial surfaces into a common embedding space, we embed one surface into the other and use the metric distortion as a measure of their dissimilarity. The GMDS allows surface recognition even when significant parts are missing. Finally, mapping a surface into its own geometry can be used in order to study self (isometric) symmetries of a given object. In this talk I will review the relation between these measures, the underlying theory, the resulting numerical machinery, and applications ranging from recognition of faces in shape analysis, to morphing, warping, intrinsic symmetry, and texture mapping.
	Super-resolution with no explicit motion estimation
	Matan Protter, Miki Elad – Technion
	Super-resolution reconstruction proposes a fusion of several low quality images into one higher quality result with better optical resolution. Classic super resolution techniques strongly rely on the availability of accurate motion estimation for this fusion task. When the motion is estimated inaccurately, as often happens for non-global motion fields, this results with annoying artifacts in the super-resolved outcome. Encouraged by recent developments on the video denoising problem, where state-of-the-art algorithms are formed with no explicit motion estimation, we seek a super-resolution algorithm of similar nature. In this talk we present our solution, a novel super-resolution algorithm that is based on the Non-Local-Means (NLM) algorithm. We show how this denoising method is generalized to become a relatively simple super resolution algorithm with no explicit motion estimation. Results on several various movies show that the proposed method is very successful in providing super-resolution on general sequences.
	Dense mirroring surface recovery from 1D homographies and sparse correspondences
	Stas Rozenfeld, Ilan Shimshoni, Micha Lindenbaum – Haifa + Technion
	In this work we recover the 3D shape of mirroring objects such as mirrors, sunglasses, and stainless steel objects. A computer monitor displays several images of parallel stripes, each image at a different angle. Reflections of these stripes in a mirroring surface are captured by the camera. For every image point, the directions of the displayed stripes and their reflections in the image are related by a 1D homography which can be computed robustly and using the statistically accurate heteroscedastic model, without monitor--image correspondence, which is generally required by other techniques. In addition, each image of a stripes is followed by its negative image. The direction of the stripes is estimated using the subtraction of two images. This allows to perform the shape recovery procedure not only for pure mirroring surfaces. Focusing on a small set of image points for which monitor--image correspondence is computed, the depth and the local shape may be calculated relying on this homography. This is done by an optimization process which is related to the one proposed by Savarese, Chen and Perona 2005 but is different and more stable due to the minimization of an angle error which is more "geometric" measure then the matrix degeneracy constraint. Connection between our cost function and the one used by Savarese, Chen and Perona is addressed. Then dense surface recovery is performed using constrained interpolation, which does not simply interpolate the surface depth values, but rather solves for the depth, the correspondence, and the local surface shape, simultaneously at each interpolated point. Consistency with the 1D homography is thus required. The proposed method as well as the method described in by Savarese, Chen and Perona are inherently unstable on a small part of the surface. We propose a method to detect these instabilities and correct them. Additionally, we provide an algebraic constraint sufficient for the point to be unstable, and give simple geometric interpretation of this constraint in case of planar and spherical surface. The method was implemented and the shapes of a mirror, sunglasses, and a stainless steel ashtray were recovered at sub-millimeter accuracy.
	Optimizing Gabor Filter Design for Texture Edge Detection and Classification
	Roman Sandler , Micha Lindenbaum - Technion
	An effective and efficient texture analysis method, based on a new criterion for designing Gabor filter sets, is proposed. The commonly used filter sets are usually designed for optimal signal representation. We propose here an alternative criterion for designing the filter set. We consider a set of filters and its response to pairs of harmonic signals. Two signals are considered separable if the corresponding two sets of vector responses are disjoint in at least one of the components. We propose an algorithm for deriving the set of Gabor filters that maximizes the fraction of separable harmonic signal pairs in a given frequency range. The resulting filters differ significantly from the traditional ones. We test these maximal harmonic discrimination (MHD) filters in several texture analysis tasks: clustering, recognition, and edge detection. It turns out that the proposed filters perform much better than the traditional ones in these tasks. They can achieve performance similar to that of state-of-the-art, distribution based (texton) methods, while being simpler and more computationally efficient.
	Matching Local Self-Similarities across Images and Videos
	Eli Shechtman, Michal Irani - Weizmann
	We present an approach for measuring similarity between visual entities (images or videos) based on matching internal self-similarities. What is correlated across images (or across video sequences) is the internal layout of local self-similarities (up to some distortions), even though the patterns generating those local self-similarities are quite different in each of the images/videos. These internal self-similarities are efficiently captured by a compact local "self-similarity descriptor", measured densely throughout the image/video, at multiple scales, while accounting for local and global geometric distortions. This gives rise to matching capabilities of complex visual data, including detection of objects in real cluttered images using only rough hand-sketches, handling textured objects with no clear boundaries, and detecting complex actions in cluttered video data with no prior learning. We compare our measure to commonly used image-based and video-based similarity measures, and demonstrate its applicability to object detection, retrieval, and action detection.
	Webcam Synopsis and Indexing
	Yael Pritch, Alex Rav-Acha, Shmuel Peleg - HUJI
	The world is covered with millions of webcams (or security cameras). However, the video recorded by such cameras is rarely watched as there are more cameras than people to watch them and their contents is not so interesting most of the time. Video Synopsis can be used to create a synopsis of endless video streams, as generated by webcams and by security cameras. It can address queries like ``Show in one minute the highlights of this camera broadcast during the past day''. This process includes two major phases: (i) An online conversion of the endless video stream into a database of objects and activities (rather than frames). (ii) A response phase, generating the video synopsis as a response to the user's query. The synopsis video can also be used as an index into the original video. Several methodologies for video indexing based on video synopsis will be described.

	Discriminative approach for Wavelet denoising
	Yacov Hel-Or, Doron Shaked – IDC + HP
	This work suggests a discriminative approach for wavelet denoising where a set of shrinkage functions (SFs) are designed to perform optimally (in a MSE sense) with respect to a given set of images. Using the suggested scheme a new set of SFs are generated which are shown to be different from the traditional soft/hard thresholding in the overcomplete case. These SFs are demonstrated to obtain the state-of-the-art denoising performance. As opposed to the descriptive approaches modeling image or noise priors are not required here and the SFs are learned directly from the example set. Thus, the framework enables the shrinkage operation to be customized seamlessly to a new set of restoration problems, such as: image de-blurring, JPEG artifacts removal, and different types of additive noise.
	Applying Property Testing to Image and Video Segmentation
	Daniel Keren – University of Haifa
	Property testing is a rapidly growing field of research. Typically, a property testing algorithm proceeds by quickly determining whether an input can satisfy some condition, under the assumption that most inputs do not satisfy it. If the input is "far" from satisfying the condition, the algorithm is guaranteed to reject it with high probability. We suggest that property testing is especially suitable to image detection, since typically most inputs are far from the sought pattern. We analyze the problem of deciding whether a binary image can be segmented to a given rectangular grid, and reduce it to a problem whose size is a constant independent of the size of the original image, and which is -- with high probability -- equivalent to the original problem.
	Robust Real Time Pattern Matching using Bayesian Sequential Hypothesis Testing
	Ofir Pele, Michael Werman – HUJI
	The talk describes a method for robust real time pattern matching. We first introduce a family of image distance measures, the "Image Hamming Distance Family". Members of this family are robust to occlusion, small geometrical transforms, light changes and non-rigid deformations. We then present a novel Bayesian framework for sequential hypothesis testing on finite populations. Based on this framework, we design an optimal rejection/acceptance sampling algorithm. This algorithm quickly determines whether two images are similar with respect to a member of the Image Hamming Distance Family. We also present a fast framework that designs a near-optimal sampling algorithm. Finally we describe a method that accelerates pattern matching exploiting image smoothness to adaptively slide the window often by more than one pixel. The decision how much we can slide is based on a novel rank we define for each feature in the pattern. Extensive experimental results show that the method performance is excellent. Implemented on a Pentium 4 3GHz processor, detection of a pattern with 2197 pixels, in 640x480 pixel frames, where in each frame the pattern rotated and was highly occluded, proceeds at only 7.2ms per frame.
	Brightness contrast-contrast induction model predicts assimilation and inverted assimilation effects
	Yuval Barkan, Hedva Spitzer - TAU
	In classical assimilation effects, intermediate luminance patches appear lighter when their immediate surround is comprised of white patches, and appears darker when its immediate surround is comprised of dark patches. With patches either darker or lighter than both inducing patches, the direction of the brightness effect is reversed and termed as “inverted assimilation effect”. Several explanations and models have been suggested, some are relevant to specific stimulus geometry, anchoring theory and models which involve high level cortical processing (such as scission, etc.).None of these studies predicted the variety types of assimilation effects and their inverted effects. We suggest here, a compound brightness model, which is based on contrast-contrast induction (second order adaptation mechanism) which predicts the various types of brightness assimilation effects and their inverted effects, in addition to the prediction of the dual effects (contrast enhancement and suppression) of contrast-contrast induction. The model is composed of three main stages: composing the On-center retinal opponent cells receptive fields, performing second order adaptation (gain control of “curve-shifting of the contrast domain). The final stage is transformation of the “perceived” adapted response to luminance image, by utilizing a variation of Jacobi iteration process, to enable elegant edge integration.
	An Evolutionary Patch Pattern Approach for Texture Discrimination
	Maxim Shoshani - Technion
	A new evolutionary approach is presented, based on implicit pattern – process relationships. For implementing this approach, any gray level texture image is decomposed into a progressive sequence of binary patch patterns that describe a process of change from background to foreground domination. Each of the binary patterns throughout these sequences is parameterized, using several metrics that describe, for example, its fragmentation level, both for the background (e.g., white) and foreground ( e.g., black) patch patterns. Any texture type is then assumed to have a unique evolutionary path represented by a distinctive region in the feature space of metrics characterizing these patterns and their change. Application of hierarchical clustering based on a few (3 or 4) metrics representing characteristic stages in the patterns' change process allowed us to accurately discriminate between 50 samples of 10 Brodatz texture types.
	The use of Non-Extensive Divergence for Robust 'bag of features' Algorithms for Visual Recognition
	Amnon Shashua, Tamir Hazan – HUJI
	Images are sometimes represented by a set of informative features ("bag of visual-terms"). pLSA is a method to analyze the features-images co-occurrence matrix and recover hidden categories of images using a KL-divergence low-rank fit. We introduce the Tsallis divergence error measure showing much improved performance in the presence of noise. We provide an optimization framework which extends the Maximum Likelihood framework and theoretically guaranteed to provide robustness under clutter, noise and outliers. Specifically, the conditions under which our approach excels is when the co-occurrences array is sparse --- which happens in the application domain of "bag of visual words".
	Blind Source Separation, Deconvolution and Localization using Sparse Signal Representations
	Michael Zibulevsky – Technion
The blind source separation problem is concerned with extraction of the underlying unknown source signals from a set of their linear mixtures, where the mixing matrix is unknown. The blind deconvolution problem consists in recovery of a signal/image blurred by unknown convolution kernel. These two problems are close related and are encountered in acoustics, radio, radar, medical signal and image processing, hyperspectral imaging, and more, hence the tremendous interest to them. We show that exploiting the sparsity of a wavelet-type and other representations of the sources dramatically improves the quality of separation/deconvolution. This leads to a L1-norm minimization problem, which can be solved efficiently by the proposed relative Newton method combined with the Smoothing Method of Multipliers. We also show how to use the sparsity priors for super-resolution source localization using multisensor observations.