CS331B: Representation Learning in Computer Vision

Autumn Quarter 2016, Stanford University
Mon, Wed 3:00 PM - 4:20 PM at Thornton 110

Course Description

A representation performs the task of converting an observation in the real world (e.g. an image, a recorded speech signal, a word in a sentence) into a mathematical form (e.g. a vector). This mathematical form is then used by subsequent steps (e.g. a classifier) to produce the outcome, such as classifying an image or recognizing a spoken word. Forming the proper representation for a task is an essential problem in modern AI. In this course, we focus on 1) establishing why representations matter, 2) classical and moderns methods of forming representations in Computer Vision, 3) methods of analyzing and probing representations, 4) portraying the future landscape of representations with generic and comprehensive AI/vision systems over the horizon, and finally 5) going beyond computer vision by talking about non-visual representations, such as the ones used in NLP or neuroscience. The course will heavily feature systems based on deep learning and convolutional neural networks. We will have several teaching lectures, a number of prominent external guest speakers, as well as presentations by the students on recent papers and their projects.

“Solving a problem simply means representing it so as to make the solution transparent.”
- Herbert Simon, Sciences of the Artificial

Required Prerequisites: CS131A, CS231A, CS231B, or CS231N. If you do not have the required prerequisites, please contact a member of the course staff before enrolling in this course.

Course Staff

Silvio Savarese, Instructor
Office Hours: Monday 1:30-2:30 PM, Gates 154

Amir Zamir, Instructor
Office Hours: Thursday 2:30-3:30 PM, Gates 133
Please email Amir before attending his OH

Kenji Hata, Course Assistant
Office Hours: Wednesday 2-3 PM, Gates 247

Grading

See the Grading page for more detail.

Class participation: 20%
Paper presentation (quality, clarity, depth, etc.): 30%
Course project: 50%

Progress Report: 10%
Final Report: 30 %
Presentation: 10 %

We will use Gradescope. Please use the access code “MG8D89”.

For related questions to the course, please use Piazza.

To see the list of projects completed by our students this year, please see this page.

Schedule

Lecture	Date	Title	Details	Presenter
	09/26/2016	No class
1	09/28/2016	Introduction [Silvio's slides] [Amir's slides]	Motivation Overview of the topics and lectures What is a representation (X2VEC)? Why does it matter? Logistics and policies	Silvio Savarese and Amir Zamir
2	10/03/2016	Basics of Representations and Traditional (handcrafted) 2D Representations [Google slides] [pdf]	Basic properties of representations Ill-posedness, nonlinearity, complexity, dimensionality, vertical vs horizontal domain, etc. Classical visual features I 2D Matching features (e.g. SIFT, HOG, DAISY, Kernels) Video Understanding Features (e.g., 3DSIFT, Dense Trajectory Features - DTF, STIP, ICCV15 Storyline)	Amir Zamir
3	10/05/2016	Learning Representations I (2D) [Google slides] [pdf]	What is feature learning? Deep Neural Networks I Unsupervised feature learning Sparse Coding, Autoencoders, etc. Supervised (2D) feature learning 2D matching features (e.g. matchnet) 2D object detection features (e.g. Imagenet)	Amir Zamir
4	10/10/2016	2D & 3D Object Representations [slides]	Basic principles for designing a good representation for object recognition Why are 3D representations useful for object understanding? Overview of classic and more recent methods for 2D and 3D object detection and classification	Silvio Savarese
5	10/12/2016	Student Paper Presentation [Presentation 1] [Presentation 2] [Presentation 3]	E Simo-Serra, E Trulls, L Ferraz, I Kokkinos, P Fua, F Moreno-Noguer, Discriminative learning of deep convolutional feature point descriptors, ICCV 2015 Y Xiang, W Choi, Y Lin, S Savarese, Data-Driven 3D Voxel Patterns for Object Category Recognition, CVPR 2015 R Socher, B Huval, B Bhat, CD Manning, AY Ng, Convolutional-recursive deep learning for 3d object classification, NIPS 2012	Shikhar Shrestha and Vishakh Hegde Tanya Glozman and Orly Liba Iro Armeni and Manik Dhar
	10/14/2016	Project Proposal Due (11:59 PM)
6	10/17/2016	2D & 3D Scene Representations [slides]	2D & 3D Scene Understanding Why is a 3D representation useful for scene understanding? Relating objects and space in the 3D physical world From objects to activity understanding in the 3D physical world Datasets	Silvio Savarese
7	10/19/2016	Student Paper Presentation [Presentation 1] [Presentation 2] [Presentation 3]	D Eigen, R Fergus, Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-Scale Convolutional Architecture, ICCV 2015 Z Ren, E Sudderth, Three-Dimensional Object Detection and Layout Prediction using Clouds of Oriented Gradients, CVPR 2016 A Dosovitskiy, J Springenberg, M Tatarchenko, T Brox, Learning to Generate Chairs, Tables and Cars with Convolutional Networks, TPAMI 2016	Rex Ying and Charles Qi Shannon Kao and Max Wang Jee Ian Tam and Liu Jiang
	10/24/2016	No class
8	10/26/2016	Learning Representations II (Objects, scenes, videos, recurrent models) [Google slides] [pdf]	Deep Neural Networks II Object representation (e.g., Imagenet) and their under-the-hood (parts, attributes, etc) Scene representation (e.g. MIT Places) and their under-the-hood (RFs, minimal image, objects, etc) Static video representation (two-stream networks. Fusion networks, etc) Recurrent modeling (Structural-RNN, action recognition, SocialLSTM)	Amir Zamir
9	10/31/2016	Learning Representations III Understanding and Probing Representations [Google slides] [pdf]	Learning Representations III Methods of mixing representations (multitask learning, fine tuning, locked-feature extraction, LwF) Curriculum learning Generative Adversarial Networks (GANs) Basic GANs and issues Energy-based GANs Use case: realistic image edit manifold Understanding Representations Nearest neighbors and embeddings: PCA. tSNE. Read-out functions Representation Inverters Simonyan 2014 Supervised (e.g. Dosovitskiy 2016	Amir Zamir
10	11/02/2016	Student Paper Presentation [Presentation 1] [Presentation 2] [Presentation 3]	M Huh, P Agrawal, AA Efros, What makes ImageNet good for transfer learning P Agrawal, R Girshick, J Malik, Analyzing the performance of multilayer neural networks for object recognition, ECCV 2014 C Vondrick, H Pirsiavash, A Torralba, Generating Videos with Scene Dynamics, NIPS 2016	Trevor Standley Varun Kumar Vijay and Shayne Longpre Bryan Anenberg and Aojia Zhao
11	11/03/2016 Hewlett 201 5:30-7:00 (Makeup)	From Representation to Actuation [video 1] [video 2]	How to go from features to performing actions, motor control, and planning.	Guest Lecturer: Animesh Garg
12	11/07/2016	Generic Representations (representations that generalize beyond what they’re trained for) [Google slides]	Open-World systems Self-supervision	Amir Zamir
13	11/09/2016	Student Paper Presentation [Presentation 1] [Presentation 2]	D Jayaraman, K Grauman, Slow and steady feature analysis: higher order temporal coherence in video, CVPR 2016 I Misra, CL Zitnick, M Hebert, Shuffle and Learn: Unsupervised Learning Using Temporal Order Verification, ECCV 2016	Ajay Sohmshetty and Lisa Wang Donsuk Lee and Rui Shu
	11/14/2016	No Class - CVPR Deadline
14	11/16/2016	Student Paper Presentation [Presentation 1] [Presentation 2] [Presentation 3]	R Girdhar, DF Fouhey, M Rodriguez, A Gupta, Learning a Predictable and Generative Vector Representation for Objects, ECCV 2016 D Pathak, P Krahenbuhl, J Donahue, T Darrell, AA Efros, Context Encoders: Feature Learning by Inpainting, CVPR 2016 L Pinto, D Gandhi, Y Han, YL Park, A Gupta, The Curious Robot: Learning Visual Representations via Physical Interactions, ECCV 2016	Boris Ivanovic and Yolanda Wang Joey Greer and Sasha Sax Russell Kaplan and Raphael Palefsky-Smith
	11/18/2016	Project Progress Report Due (11:59 PM)
	11/21/2016 to 11/25/2016	Thanksgiving Break
15	11/28/2016	Generic Representations II [Google slides]	Biologically inspired generic representations Zamir ECCV16 3D representation. The future of representation learning	Amir Zamir
16	11/29/2016 Building 260-113 5:30-7:00 (Makeup)	Generative Visualization of Representations [video] [Google slides]	Style Transfer Deep Dream	Guest Lecturer: Justin Johnson
17	11/30/2016	Student Paper Presentation [Presentation 1] [Presentation 2] [Presentation 3]	CF Cadieu, H Hong, DLK Yamins, N Pinto, D Ardila, EA Solomon, NJ Majaj, JJ DiCarlo, Deep neural networks rival the representation of primate IT cortex for core visual object recognition, PLoS Computational Biology 2014 P Agrawal, D Stansbury, J Malik, JL Gallant, Pixels to voxels: modeling visual representation in the human brain RA Brooks, Intelligence without representation, Elsevier 1991	William Shen and Te-lin Wu JunYoung Gwak and Kuan Fang Rachel Luo and Alex Kuefler
17	12/07/2017	Representation in the Brain	Representations in the brain fMRI based representation Experimental protocols & findings	Guest Lecturer: Dan Yamins
18	12/08/2016 Building 420-040 5:00-6:30 (Makeup)	Natural Language Representation [video 1] [video 2] [slides]	word2vec n-gram Recent language representations	Guest Lecturer: William L. Hamilton
19	12/12/2016 Hewlett 200 3:30-6:30	Student Project Presentation
	12/16/2016	Final Project Report Due (11:59 PM)