CS331B: Representation Learning in Computer Vision

Autumn Quarter 2016, Stanford University
Mon, Wed 3:00 PM - 4:20 PM at Thornton 110

Course Description

A representation performs the task of converting an observation in the real world (e.g. an image, a recorded speech signal, a word in a sentence) into a mathematical form (e.g. a vector). This mathematical form is then used by subsequent steps (e.g. a classifier) to produce the outcome, such as classifying an image or recognizing a spoken word. Forming the proper representation for a task is an essential problem in modern AI. In this course, we focus on 1) establishing why representations matter, 2) classical and moderns methods of forming representations in Computer Vision, 3) methods of analyzing and probing representations, 4) portraying the future landscape of representations with generic and comprehensive AI/vision systems over the horizon, and finally 5) going beyond computer vision by talking about non-visual representations, such as the ones used in NLP or neuroscience. The course will heavily feature systems based on deep learning and convolutional neural networks. We will have several teaching lectures, a number of prominent external guest speakers, as well as presentations by the students on recent papers and their projects.

“Solving a problem simply means representing it so as to make the solution transparent.”
- Herbert Simon, Sciences of the Artificial

Required Prerequisites: CS131A, CS231A, CS231B, or CS231N. If you do not have the required prerequisites, please contact a member of the course staff before enrolling in this course.

Course Staff

Silvio Savarese, Instructor
Office Hours: Monday 1:30-2:30 PM, Gates 154

Amir Zamir, Instructor
Office Hours: Thursday 2:30-3:30 PM, Gates 133
Please email Amir before attending his OH

Kenji Hata, Course Assistant
Office Hours: Wednesday 2-3 PM, Gates 247

Grading

See the Grading page for more detail.

Class participation: 20%
Paper presentation (quality, clarity, depth, etc.): 30%
Course project: 50%

We will use Gradescope. Please use the access code “MG8D89”.

For related questions to the course, please use Piazza.

To see the list of projects completed by our students this year, please see this page.

Schedule

Lecture Date Title Details Presenter
09/26/2016 No class
1 09/28/2016 Introduction
[Silvio's slides] [Amir's slides]
  • Motivation
  • Overview of the topics and lectures
  • What is a representation (X2VEC)? Why does it matter?
  • Logistics and policies
Silvio Savarese and Amir Zamir
2 10/03/2016 Basics of Representations and Traditional (handcrafted) 2D Representations
[Google slides] [pdf]
  • Basic properties of representations
    • Ill-posedness, nonlinearity, complexity, dimensionality, vertical vs horizontal domain, etc.
  • Classical visual features I
    • 2D Matching features (e.g. SIFT, HOG, DAISY, Kernels)
    • Video Understanding Features (e.g., 3DSIFT, Dense Trajectory Features - DTF, STIP, ICCV15 Storyline)
Amir Zamir
3 10/05/2016 Learning Representations I
(2D)
[Google slides] [pdf]
  • What is feature learning?
  • Deep Neural Networks I
  • Unsupervised feature learning
    • Sparse Coding, Autoencoders, etc.
  • Supervised (2D) feature learning
    • 2D matching features (e.g. matchnet)
    • 2D object detection features (e.g. Imagenet)
Amir Zamir
4 10/10/2016 2D & 3D Object Representations
[slides]
  • Basic principles for designing a good representation for object recognition
  • Why are 3D representations useful for object understanding?
  • Overview of classic and more recent methods for 2D and 3D object detection and classification
Silvio Savarese
5 10/12/2016 Student Paper Presentation
[Presentation 1]
[Presentation 2]
[Presentation 3]
Shikhar Shrestha and Vishakh Hegde
Tanya Glozman and Orly Liba
Iro Armeni and Manik Dhar
10/14/2016 Project Proposal Due
(11:59 PM)
6 10/17/2016 2D & 3D Scene Representations
[slides]
  • 2D & 3D Scene Understanding
  • Why is a 3D representation useful for scene understanding?
  • Relating objects and space in the 3D physical world
  • From objects to activity understanding in the 3D physical world
  • Datasets
Silvio Savarese
7 10/19/2016 Student Paper Presentation
[Presentation 1]
[Presentation 2]
[Presentation 3]
Rex Ying and Charles Qi
Shannon Kao and Max Wang
Jee Ian Tam and Liu Jiang
10/24/2016 No class
8 10/26/2016 Learning Representations II
(Objects, scenes, videos, recurrent models)
[Google slides] [pdf]
  • Deep Neural Networks II
    • Object representation (e.g., Imagenet) and their under-the-hood (parts, attributes, etc)
    • Scene representation (e.g. MIT Places) and their under-the-hood (RFs, minimal image, objects, etc)
    • Static video representation (two-stream networks. Fusion networks, etc)
    • Recurrent modeling (Structural-RNN, action recognition, SocialLSTM)
Amir Zamir
9 10/31/2016 Learning Representations III

Understanding and Probing Representations
[Google slides] [pdf]
  • Learning Representations III
    • Methods of mixing representations (multitask learning, fine tuning, locked-feature extraction, LwF)
    • Curriculum learning
    • Generative Adversarial Networks (GANs)
      • Basic GANs and issues
      • Energy-based GANs
      • Use case: realistic image edit manifold
  • Understanding Representations
    • Nearest neighbors and embeddings: PCA. tSNE.
    • Read-out functions
    • Representation Inverters
      • Simonyan 2014
      • Supervised (e.g. Dosovitskiy 2016
Amir Zamir
10 11/02/2016 Student Paper Presentation
[Presentation 1]
[Presentation 2]
[Presentation 3]
Trevor Standley
Varun Kumar Vijay and Shayne Longpre
Bryan Anenberg and Aojia Zhao
11 11/03/2016
Hewlett 201
5:30-7:00
(Makeup)
From Representation to Actuation
[video 1] [video 2]
  • How to go from features to performing actions, motor control, and planning.
Guest Lecturer:
Animesh Garg
12 11/07/2016 Generic Representations (representations that generalize beyond what they’re trained for)
[Google slides]
  • Open-World systems
  • Self-supervision
Amir Zamir
13 11/09/2016 Student Paper Presentation
[Presentation 1]
[Presentation 2]
Ajay Sohmshetty and Lisa Wang
Donsuk Lee and Rui Shu
11/14/2016 No Class - CVPR Deadline
14 11/16/2016 Student Paper Presentation
[Presentation 1]
[Presentation 2]
[Presentation 3]
Boris Ivanovic and Yolanda Wang
Joey Greer and Sasha Sax
Russell Kaplan and Raphael Palefsky-Smith
11/18/2016 Project Progress Report Due
(11:59 PM)
11/21/2016
to
11/25/2016
Thanksgiving Break
15 11/28/2016 Generic Representations II
[Google slides]
  • Biologically inspired generic representations
  • Zamir ECCV16 3D representation.
  • The future of representation learning
Amir Zamir
16 11/29/2016
Building 260-113
5:30-7:00
(Makeup)
Generative Visualization of Representations
[video] [Google slides]
  • Style Transfer
  • Deep Dream
Guest Lecturer:
Justin Johnson
17 11/30/2016 Student Paper Presentation
[Presentation 1]
[Presentation 2]
[Presentation 3]
William Shen and Te-lin Wu
JunYoung Gwak and Kuan Fang
Rachel Luo and Alex Kuefler
17 12/07/2017 Representation in the Brain
  • Representations in the brain
  • fMRI based representation
  • Experimental protocols & findings
Guest Lecturer:
Dan Yamins
18 12/08/2016
Building
420-040
5:00-6:30
(Makeup)
Natural Language Representation
[video 1] [video 2] [slides]
  • word2vec
  • n-gram
  • Recent language representations
Guest Lecturer:
William L. Hamilton
19 12/12/2016
Hewlett 200
3:30-6:30
Student Project Presentation
12/16/2016 Final Project Report Due
(11:59 PM)