16824 Spring 2022

Carnegie Mellon University 16-824: Visual Learning and Recognition
VIS LRN & RECOG
Spring 2022
[ Home \| Schedule \| Assignments and Resources \| Piazza]

Course Overview

Key Topics: Visual Recognition, Deep Learning, Image Classification, Object Detection, Video Understanding, 3D Scene Understanding. Generative Models for Images and Videos.

Description: This graduate-level computer vision course focuses on representation and reasoning for large amounts of data (images, videos, and associated tags, text, GPS locations, etc.) toward the ultimate goal of understanding the visual world surrounding us. We will be reading an eclectic mix of classic and recent papers on topics including Theories of Perception, Mid-level Vision (Grouping, Segmentation, Poses), Object and Scene Recognition, 3D Scene Understanding, Action Recognition, Contextual Reasoning, Joint Language and Vision Models, Deep Generative Models, etc. We will be covering a wide range of supervised, semi-supervised and unsupervised approaches for each of the topics above.

Course Relevance: The course is relevant to students who want to understand and implement state-of-the-art deep learning and computer vision algorithms.

Course Goals: There are three primary course goals. First, the course aims to familiarize students with the fundamental concepts of deep learning models. Second, the course helps students understand state-of-the-art methods in visual recognition and understanding. Third, through the programming assignments and final project, students have an opportunity to learn how to build practical computer vision systems.

Course Information

Instructor

Deepak Pathak

TAs

Logistics

Class meetings: Monday, Wednesday 11:50 am - 01:10 pm (EST). Location: POS A35 (Posner Hall)
Deepak's office hours: Wednesday 1:10 - 2:00 pm (EST), Smith 218 (zoom link on piazza)
TA's office hours: 4-5 pm (EST), NSH 1505 (zoom link on piazza).
Mon - Murtaza, Tue - Wenxuan, Wed - Qichen, Thu - Mohit, Fri - Russell
Contact instructors via private post on Piazza
All lectures and office hours will be fully remote for January.

Assignment Structure

Students will be expected to:

Class participation (10%): Participate in class and online discussion.
Homework Assignment (45%): Submit all homework assignments on time. Collaboration is allowed so long as final work is done independently, and all collaborators are acknowledged.
Complete final (group) project (45%):

Students will complete an independent research project in groups of 2-4.
Write-up findings on a blog post.
Present project in class, and publish 5 minute YouTube presentation online.

Learning Resources

Resource will include lecture slides, textbooks, webpages, Colab, videos, and paper reading list. Additional coding tutorials on PyTorch will be provided.

Extra Time Commitments

This course requires ~9 hrs/weekly for assignments, projects and study.

Collaboration Policy

For each assignment, TAs will not be looking over any of your code before the assignment deadline. You may discuss code with classmates. Please list collaborators whom you discussed with in the assignment write-up.

Late Policy

For the programming assignments, students will be allowed a total of five late days per semester; each additional late day will incur a 10% penalty. The code should be easy to run by TAs.

Prerequisites

While there are no formal prerequisites, this course assumes familiarity with computer vision (16-720 or similar) and machine learning (10-601 or similar). If you have not taken courses covering this material, consult with the instructor.

Website template modified from here

Carnegie Mellon University

16-824: Visual Learning and Recognition

VIS LRN & RECOG

Spring 2022