Recognition and Classification in Images and Video *

203.4780


Course overview        Useful links        Syllabus        Detailed schedule  



Meeting Times:  Monday  9-12, Room 462

Instruction Hour: Wednesday 11:00-12:00, Room 410 (Jacobs)

Instructor: Dr. Rita Osadchy

e-mail: rita [at]cs [dot]haifa.ac.il
Office: Jacobs 410

 

Announcements:

 

§  No class 5.5;

§  The reviews for the topic: Patch-based Representations are due to 5.5;

§  The reviews for the topic: Detection as a binary decision are due to 12.5;

§  Both topics will be presented on 12.5. ( If we don’t have enough time for the second topic, we will continue it on 19.5);

§  All announcements and guidelines will be distributed by email.

§  Those who do not send their contact address on time will not be added to the contact list!!!

§  You must send me an email to (rita[at]cs[dot]haifa.ac.il) by March 1 from your active address with the subject "course 4780" 

 

Course overview:

 

General: This is a graduate course in computer vision.   We will survey and discuss vision papers relating to object and activity recognition and scene understanding.  The goal of the course is to understand classical and modern approaches to some important problems, analyzing their strengths and weaknesses, and identifying interesting open questions.

Requirements: Students will be responsible for writing a paper review each week, participating in discussions, completing a programming project, and presenting one topic in a class.

Note that presentations are due one week before the slot your presentation is scheduled.  This means you will need to read the papers, create slides, etc. one week before the date you are signed up for, to leave time for improvement. Note, that you should get my approval for your presentation.

More details on the requirements and grading breakdown are here


Syllabus:

A.    Recognizing specific objects

           Global features:

1.     Linear Subspaces

2.     Detection as a binary decision

Local features:

3.     Local features, matching for object instances

4.     Visual Vocabularies and Bag of Words

 

          Region-based methods:

5.     Mid-Level Representations

B.    Beyond Single objects (using additional information)

1.     Saliency

2.     Attributes

3.     Context

 

C.    Scalability problems

1.     Scaling with the large number of categories

2.     Large-scale search

 

D.    Action recognition in video and images



Schedule and papers:

Note:  * = required reading. 
Additional papers are provided for reference, and as a starting point for background reading for projects.
Paper presentations: Cover the starred papers.

Date

Topics

Papers and links

Presenters

3.3

Course intro 

 [slides]

Instructor

10.3

Introduction to Object and Event Recognition

 [slides]

Instructor

17.3

Introduction to Object and Event Recognition

 

Instructor

24.3

No class

 

 

31.3

Linear Subspaces

Global appearance models for object recognition, dimensionality reduction.

 

 

 

 

 

 

 

 


o    *Eigenfaces for Recognition, Turk and Pentland, 1991.  [pdf]

o    *P.N. Belhumeur, J.P. Hespanha, D.J. Kriegman, Eigenfaces vs. Fisherfaces: Recognition using Class Specific Linear Projection, 1996 [pdf]

o    Face Database [here]

Additional Material

Shimon Ullman and Ronen Basri, Recognition by Linear Combinations of Models, IEEE Transactions on Pattern Analysis and Machine Intelligence, 1991 [pdf]

T.F. Cootes and C.J. Taylor, "Statistical models of appearance for medical image analysis and computer vision", Proc. SPIE Medical Imaging 2001. [pdf]

 

 

Mor [pdf]

7.4

Cyber Day

o     

 

28.4

Local features and matching for object instances:

Invariant local features, instance recognition

SiftModelsFound

o    *Object Recognition from Local Scale-Invariant Features, Lowe, ICCV 1999.  [pdf]  [code] [other implementations of SIFT] [IJCV]

o    *Selected pages from: Local Invariant Feature Detectors: A Survey, Tuytelaars and Mikolajczyk.  Foundations and Trends in Computer Graphics and Vision, 2008. [pdf]  [Oxford code] [Read pp. 178-188, 216-220, 254-255]

o     

o    Oxford group interest point software

o    Andrea Vedaldi's VLFeat code, including SIFT, MSER, hierarchical k-means.

o    INRIA LEAR team's software, including interest points, shape features

o    FLANN - Fast Library for Approximate Nearest Neighbors.  Marius Muja et al. 

o    Google Goggles

o    Kooaba

Additional Material

For more background on feature extraction: Szeliski book: Sec 3.2 Linear filtering, 4.1 Points and patches, 4.2 Edges

Scalable Recognition with a Vocabulary Tree, D. Nister and H. Stewenius, CVPR 2006. [pdf]

SURF: Speeded Up Robust Features, Bay, Ess, Tuytelaars, and Van Gool, CVIU 2008.  [pdf] [code]

Robust Wide Baseline Stereo from Maximally Stable Extremal Regions, J. Matas, O. Chum, U. Martin, and T. Pajdla, BMVC 2002.  [pdf]

A Performance Evaluation of Local Descriptors. K. Mikolajczyk and C. Schmid.  CVPR 2003 [pdf]

Guy[pdf]

12.5

Patch-based Representations

visual vocabularies, bag-of-words and SPK for scene classification



o    *Visual Categorization with Bags of Keypoints, C. Dance, J. Willamowski, L. Fan, C. Bray, and G. Csurka, ECCV International Workshop on Statistical Learning in Computer Vision, 2004.  [pdf]

o    *Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories, Lazebnik, Schmid, and Ponce, CVPR [pdf], [code],[data].

Additional Material

Video Google: A Text Retrieval Approach to Object Matching in Videos, Sivic and Zisserman, ICCV 2003.  [pdf]  [demo]

Pedestrian Detection in Crowded Scenes, Leibe, Seemann, and Schiele, CVPR 2005.  [pdf]

Sampling Strategies for Bag-of-Features Image Classification.  E. Nowak, F. Jurie, and B. Triggs.  ECCV 2006. [pdf]

Scalable Recognition with a Vocabulary Tree, D. Nister and H. Stewenius, CVPR 2006. [pdf]

 

 

Assaf [pdf]

12.5

Detection as a binary decision

Sliding window detection, detection as a binary decision problem.

 

 

o    *Histograms of Oriented Gradients for Human Detection, Dalal and Triggs, CVPR 2005.  [pdf]  [code] [PASCAL datasets]

o    *Rapid Object Detection Using a Boosted Cascade of Simple Features, Viola and Jones, CVPR 2001.  [pdf]  [code]

 

o    LIBSVM library for support vector machines

o    PASCAL VOC Visual Object Classes Challenge

o    Face data

Additional Material

Beyond Sliding Windows: Object Localization by Efficient Subwindow Search.

A Discriminatively Trained, Multiscale, Deformable Part Model, by P. Felzenszwalb,  D.  McAllester and D. Ramanan.   CVPR 2008.  [pdf]  [code

A Trainable System for Object Detection, C. Papageorgiou and T. Poggio, IJCV 2000.  [pdf]

Class-specific Hough Forests for Object Detection.  J. Gall and V. Lempitsky.  CVPR 2009.  [pdf] [slides] [code]

Majd[pdf]

19.5

Context and scenes

Multi-object scenes, inter-object relationships, understanding scenes' spatial layout, 3d context

  • *Using the Forest to See the Trees: Exploiting Context for Visual Object Detection and Localization.  Torralba, Murphy, and Freeman.  CACM 2009.  [pdf] [related code]

o    *Object Bank: A High-Level Image Representation for Scene Classification & Semantic Feature Sparsification.  L-J. Li, H. Su, E. Xing, L. Fei-Fei.  NIPS 2010.  [pdf]  [code]

 

Multi-Class Segmentation with Relative Location Prior.  S. Gould, J. Rodgers, D. Cohen, G. Elidan and D.  Koller.  IJCV 2008. [pdf] [code]

Estimating Spatial Layout of Rooms using Volumetric Reasoning about Objects and Surfaces.  D. Lee, A. Gupta, M. Hebert, and T. Kanade.  NIPS 2010.  [pdf] [code]

Contextual Priming for Object Detection, A. Torralba.  IJCV 2003.  [pdf] [web] [code]

Recognition Using Visual Phrases.  M. Sadeghi and A. Farhadi.  CVPR 2011.  [pdf]

Thinking Inside the Box: Using Appearance Models and Context Based on Room Geometry.  V. Hedau, D. Hoiem, and D. Forsyth.  ECCV 2010 [pdf] [code and data]

Putting Objects in Perspective, by D. Hoiem, A. Efros, and M. Hebert, CVPR 2006.  [pdf] [web]

Learning Spatial Context: Using Stuff to Find Things, by G. Heitz and D. Koller, ECCV 2008.  [pdf] [code]

Context Based Object Categorization: A Critical SurveyC. Galleguillos and S. Belongie.  [pdf]

 

 

[pdf]

26.5

Describing objects with attributes

Visual properties, learning from natural language descriptions, intermediate representations

o    *Describing Objects by Their Attributes, A. Farhadi, I. Endres, D. Hoiem, and D. Forsyth, CVPR 2009.  [pdf]  [web] [data]

o    *Attribute and Simile Classifiers for Face Verification, N. Kumar, A. Berg, P. Belhumeur, S. Nayar.  ICCV 2009.  [pdf] [web] [lfw data ] [pubfig data]

Relative Attributes.  D. Parikh and K. Grauman.  ICCV 2011.  [pdf]  [code/data]

FaceTracer: A Search Engine for Large Collections of Images with Faces.  N. Kumar, P. Belhumeur, and S. Nayar.  ECCV 2008.  [pdf] [code, data, demo]

Learning To Detect Unseen Object Classes by Between-Class Attribute Transfer, C. Lampert, H. Nickisch, and S. Harmeling, CVPR 2009  [pdf] [web] [data]

A Joint Learning Framework for Attribute Models and Object Descriptions.  D. Mahajan, S. Sellamanickam, V. Nair.  ICCV 2011.  [pdf]

SUN Attribute Database: Discovering, Annotating, and Recognizing Scene Attributes.  G. Patterson and J. Hays.  CVPR 2012.  [pdf] [data]

Multi-Attribute Spaces: Calibration for Attribute Fusion and Similarity Search.  W. Scheirer, N. Kumar, P. Belhumeur, T. Boult.  CVPR 2012  [pdf]

Attribute-Centric Recognition for Cross-Category Generalization.  A. Farhadi, I. Endres, D. Hoiem.  CVPR 2010.  [pdf]

Bahjat [pdf]

2.6

Dealing with many categories

Sharing features between classes, transfer, taxonomy, learning from few examples, exploiting class relationships

o    *Sharing Visual Features for Multiclass and Multiview Object Detection, A. Torralba, K. Murphy, W. Freeman, PAMI 2007.  [pdf]  [code]

o    *Tabula Rasa: Model Transfer for Object Category Detection. Y. Atar and A. Zisserman.  CVPR 2011. [pdf] [HoG code]

Discriminative Learning of Relaxed Hierarchy for Large-scale Visual Recognition.  T. Gao and Daphne Koller ICCV 2011.  [pdf] [code]

Comparative Object Similarity for Improved Recognition with Few or Zero Examples. G. Wang, D. Forsyth, and D. Hoeim. CVPR 2010. [pdf]

Constructing Category Hierarchies for Visual Recognition, M. Marszalek and C. Schmid.  ECCV 2008.  [pdf]  [web] [Caltech256]

Incremental Learning of Object Detectors Using a Visual Shape Alphabet.  Opelt, Pinz, and Zisserman, CVPR 2006.  [pdf]

Michael

9.6

Activity recognition

Recognizing and localizing human actions in video or static images

o    *Learning Realistic Human Actions from Movies.  I. Laptev, M. Marszałek, C. Schmid and B. Rozenfeld.  CVPR 2008.  [pdf]  [data] [code]

o    *Action Recognition from a Distributed Representation of Pose and Appearance, S. Maji, L. Bourdev, J.  Malik, CVPR 2011.  [pdf]  [code]

Detecting Actions, Poses, and Objects with Relational Phraselets.  C. Desai and D. Ramanan.  ECCV 2012.  [pdf] [data] [code]

Beyond Actions: Discriminative Models for Contextual Group Activities.  T. Lian, Y. Wang, W. Yang, and G. Mori.  NIPS 2010.  [pdf] [data]

Efficient Activity Detection with Max-Subgraph Search.  C.-Y. Chen and K. Grauman. CVPR 2012.  [pdf] [project page]  [code]

Action Bank: a High-Level Representation of Activity in Video.  S. Sadanand and J. Corso.  CVPR 2012 [pdf]  [code/data]

A Hough Transform-Based Voting Framework for Action Recognition.  A. Yao, J. Gall, L. Van Gool.  CVPR 2010.  [pdf[code/data]

Actions in Context, M. Marszalek, I. Laptev, C. Schmid.  CVPR 2009.  [pdf] [web] [data]

Objects in Action: An Approach for Combining Action Understanding and Object Perception.   A. Gupta and L. Davis.  CVPR, 2007.  [pdf]  [data]

A Scalable Approach to Activity Recognition Based on Object Use. J. Wu, A. Osuntogun, T. Choudhury, M. Philipose, and J. Rehg.  ICCV 2007.  [pdf]

Recognizing Actions at a Distance.  A. Efros, G. Mori, J. Malik.  ICCV 2003.  [pdf] [web]

Learning a Hierarchy of Discriminative Space-Time Neighborhood Features for Human Action Recognition.  A. Kovashka and K. Grauman.  CVPR 2010.  [pdf]

Temporal Causality for the Analysis of Visual Events.  K. Prabhakar, S. Oh, P. Wang, G. Abowd, and J. Rehg.  CVPR 2010.  [pdf] [Georgia Tech Computational Behavior Science project]

What's Going on?: Discovering Spatio-Temporal Dependencies in Dynamic Scenes.  D. Kuettel et al.  CVPR 2010.  [pdf]

Learning Actions From the Web.  N. Ikizler-Cinbis, R. Gokberk Cinbis, S. Sclaroff.  ICCV 2009.  [pdf]

Maor

???

Importance and saliency

Among all items in the scene, which deserve attention (first)?  What makes images interesting or memorable?

 

 dataset ]

Learning to Detect a Salient Object.  T. Liu et al. CVPR 2007.  [pdf]  [results]  [data]  [code]

What Makes an Image Memorable?  P. Isola, J. Xiao, A. Torralba, A. Oliva. CVPR 2011. [pdf] [web] [code/data]

What Do We Perceive in a Glance of a Real-World Scene?  L. Fei-Fei, A. Iyer, C. Koch, and P. Perona.  Journal of Vision, 2007.  [pdf]

What Makes a Patch Distinct?  Ran Margolin, Ayellet Tal, Lihi Zelnik-Manor. [pdf], [code].

Surface Regions of Interest for Viewpoint Selection, George Leifman, Elizabeth Shtrom and Ayellet Tal. [pdf]

A Model of Saliency-based Visual Attention for Rapid Scene Analysis.  L. Itti, C. Koch, and E. Niebur.  PAMI 1998  [pdf]

Interesting Objects are Visually Salient.  L. Elazary and L. Itti.  Journal of Vision, 8(3):1–15, 2008.  [pdf]

What is an Object?  B. Alexe, T. Deselaers, and V. Ferrari.  CVPR 2010.  [pdf] [code]

A Principled Approach to Detecting Surprising Events in Video.  L. Itti and P. Baldi.  CVPR 2005  [pdf]

Key-Segments for Video Object Segmentation.  Y. J. Lee, J. Kim, and K. Grauman.  ICCV 2011  [pdf]

 

???

Large-scale image/object search and mining:

Scalable retrieval algorithms, mining for visual themes, particularly for object instances

o    *Semi-supervised hashing for large-scale search. J Wang, S Kumar, SF Chang [pdf]

o    *Supervised Hashing with Kernels.  W. Liu, J. Wang, R. Ji, Y. Jiang, S.-F. Chang.  CVPR 2012 [pdf]

Kernelized Locality Sensitive Hashing for Scalable Image Search, by B. Kulis and K. Grauman, ICCV 2009 [pdf]  [code] [80M Tiny Images data]

Image Webs: Computing and Exploiting Connectivity in Image Collections.  K. Heath, N. Gelfand, M. Ovsjanikov, M. Aanjaneya, and L. Guibas.  CVPR 2010.  [pdf]

Discovering Favorite Views of Popular Places with Iconoid Shift.  T. Weyand and B. Leibe.  ICCV 2011.  [pdf] [Paris 500K dataset]

Total Recall II: Query Expansion Revisited.  O. Chum, A. Mikulik, M. Perdoch, and J. Matas.  CVPR 2011.  [pdf]

Geometric Min-Hashing: Finding a (Thick) Needle in a Haystack, O. Chum, M. Perdoch, and J. Matas.  CVPR 2009.  [pdf]

Three Things Everyone Should Know to Improve Object Retrieval.  R. Arandjelovic and A. Zisserman.  CVPR 2012.  [pdf]

Learning Query-dependent Prefilters for Scalable Image Retrieval.  L. Torresani, M. Szummer, and A. Fitzgibbon.  CVPR 2009.  [pdf]

Detecting Objects in Large Image Collections and Videos by Efficient Subimage Retrieval, C. Lampert, ICCV 2009.  [pdf]  

Object Retrieval with Large Vocabularies and Fast Spatial Matching.  J. Philbin, O. Chum, M. Isard, J. Sivic, and A. Zisserman, CVPR 2007.  [pdf] [approx k-means code]

 

Other useful links:

 


*    This course is based on UT-Austin course: Special Topics in Computer Vision,  by Kristen Grauman: