16824 Spring 2024

Carnegie Mellon University 16-824: Visual Learning and Recognition
VIS LRN & RECOG
Spring 2024
[ Home \| Schedule \| Assignments and Resources \| Previous Offerings \| Piazza]

Course Overview

Key Topics: Visual Recognition, Deep Learning, Image Classification, Object Detection, Video Understanding, 3D Scene Understanding. Generative Models for Images and Videos.

Description: This graduate-level computer vision course focuses on representation and reasoning for large amounts of data (images, videos, and associated tags, text, GPS locations, etc.) toward the ultimate goal of understanding the visual world surrounding us. We will be reading an eclectic mix of classic and recent papers on topics including Theories of Perception, Mid-level Vision (Grouping, Segmentation, Poses), Object and Scene Recognition, 3D Scene Understanding, Action Recognition, Contextual Reasoning, Joint Language and Vision Models, Deep Generative Models, etc. We will be covering a wide range of supervised, semi-supervised and unsupervised approaches for each of the topics above.

Course Relevance: The course is relevant to students who want to understand and implement state-of-the-art deep learning and computer vision algorithms.

Course Goals: There are three primary course goals. First, the course aims to familiarize students with the fundamental concepts of deep learning models. Second, the course helps students understand state-of-the-art methods in visual recognition and understanding. Third, through the programming assignments and final project, students have an opportunity to learn how to build practical computer vision systems.

Course Information

Instructor

Deepak Pathak

TAs

Logistics

Class meetings: Monday, Wednesday 7:00 pm - 08:20 pm (EST). Location: GHC 4401 (GHC)
Deepak's office hours: After Class; Email for seperate appointment.
TA's office hours:
- Kenny: Mon 4:15-5pm (EST). TCS 458
- Sayan: Tue 4:15-5pm (EST). Smith 200
- Mihir: Wed 4:30-5:15pm (EST). Smith 220
- Shagun: Thu 6:30-7:30pm (EST). NSH 3002
- Himangi: Fri 6:00-7pm (EST). NSH 1109
Contact instructors via private post on Piazza if you have any specific questions.

Assignment Structure

Class Participation (10%): Participate in class and online discussion on piazza. Students must read one paper per class and post a few lines (question, thoughts, summary, insight, etc.) by the end of the corresponding class day.
Homework Assignments (45%): Submit all homework assignments on time. Each assignment is worth 15% of the overall grade.
Final (group) Project (45%):

Students will complete an independent research project in groups of 3-4.
Grade breakdown:
- 0% for the project proposal (Mandatory)
- 5% for the midterm report
- 15% for the presentation
- 25% for the final report
Present project in class, and submit a project report in the form of a pdf.

Learning Resources

Resources will include lecture slides, textbooks, webpages, Colab, videos, and paper reading list. Additional coding tutorials on PyTorch will be provided.

Extra Time Commitments

This course requires ~9 hrs/weekly for assignments, projects and study.

Collaboration Policy

Collaboration is encouraged, but the work you submit for assignments is expected to be entirely your own. That is, the writing and code must be yours, and you must fully understand everything that you submit. Discussing a paper or the details of how to solve a problem is fine, but you must write your submission yourself. Please list collaborators whom you discussed with in the assignment write-up. If we find highly identical work without proper accreditation of collaborators, we will take action according to university policies. For more, see the CMU academic integrity guidelines.

Late Policy

For the homework assignments only, students will be allowed a total of seven late days per semester. Any work submitted late after the seven late days have been used will be given an automatic zero on the assignment. Make sure to start early and complete your assignments on time! Please note that the late days do not apply to any part of the final project: that includes the project proposal, mid-term progress update and final project report. This policy will be enforced strictly.

Prerequisites

While there are no formal prerequisites, this course assumes familiarity with computer vision (16-720 or similar) and machine learning (10-601 or similar). If you have not taken courses covering this material, consult with the instructor. Additionally, you must be familiar with how to use PyTorch and have some prior experience using the framework.

Website template modified from here

Carnegie Mellon University

16-824: Visual Learning and Recognition

VIS LRN & RECOG

Spring 2024