16824 Fall 2025

Carnegie Mellon University 16-824: Visual Learning and Recognition
VIS LRN & RECOG
Fall 2025
[ Home \| Schedule \| Assignments and Resources \| Piazza \| Previous Offerings]
Mondays and Wednesdays, 2:00-3:20pm, TEP 1403

Course Overview

Key Topics: Visual Recognition, Deep Learning, Image Classification, Object Detection, Video Understanding, 3D Scene Understanding. Generative Models for Images and Videos.

Description: This graduate-level computer vision course focuses on representation and reasoning for large amounts of data (images, videos, and associated tags, text, GPS locations, etc.) toward the ultimate goal of understanding the visual world surrounding us. We will be reading an eclectic mix of classic and recent papers on topics including Theories of Perception, Mid-level Vision (Grouping, Segmentation, Poses), Object and Scene Recognition, 3D Scene Understanding, Action Recognition, Contextual Reasoning, Joint Language and Vision Models, Deep Generative Models, etc. We will be covering a wide range of supervised, semi-supervised and unsupervised approaches for each of the topics above.

Course Relevance: The course is relevant to students who want to understand and implement state-of-the-art deep learning and computer vision algorithms.

Course Goals: There are three primary course goals. First, the course aims to familiarize students with the fundamental concepts of deep learning models. Second, the course helps students understand state-of-the-art methods in visual recognition and understanding. Third, through the programming assignments and final project, students have an opportunity to learn how to build practical computer vision systems.

Course Staff

Please use the course Piazza page for all communication with course staff.

Instructor

Jun-Yan Zhu

TAs

Assignment Structure

Class Participation (10%): Participate in class and online piazza discussion. Students must read one paper per class and post a few lines (question, answer, thoughts, insight, etc.) within one week of the corresponding class day.
Homework Assignments (45%): Submit all homework assignments on time. Each assignment is worth 15% of the overall grade.
Final (group) Project (45%):

Students will complete an independent research project in groups of 3-4.
Submit a project proposal.
Present project in class.
Write up your findings in a short paper (4 - 8 pages, standard CVPR template).

Learning Resources

Resources will include lecture slides, textbooks, webpages, Colab, videos, and paper reading list. Additional coding tutorials on PyTorch will be provided.

There is no official textbook for this course. But you will find the following textbooks useful.

“Computer Vision: Algorithms and Applications”, Richard Szeliski, 2010
“Deep Learning”, Ian Goodfellow, Yoshua Bengio, and Aaron Courville, 2016
“Foundations of Computer Vision”, Antonio Torralba, Phillip Isola and William T. Freeman, 2024

Extra Time Commitments

This course requires ~9 hrs/weekly for assignments, projects and study.

Collaboration Policy

Collaboration is encouraged, but the work you submit for assignments is expected to be entirely your own. That is, the writing and code must be yours, and you must fully understand everything that you submit. Discussing a paper or the details of how to solve a problem is fine, but you must write your submission yourself. Please list collaborators whom you discussed with in the assignment write-up. If we find highly identical work without proper accreditation of collaborators, we will take action according to university policies. For more, see the CMU academic integrity guidelines.

Use of Large Language Models

Using a large language model (e.g., ChatGPT, CoPilot, Cursor, etc.) to generate any part of your programming assignments or Piazza posts is strictly prohibited and a violation of academic integrity. For the final project, you are permitted to use LLMs. If you do, you must acknowledge this in your final report and submit a log (or chat history) of all prompts used to generate project content. Failure to properly document your use of AI on the project is also considered an academic integrity violation.

Late Policy

For the programming assignments, students will be allowed a total of five late days per semester; each additional late day will incur a 10% continuously prorated penalty. The code should be easy to run by TAs. Make sure to start early and complete your assignments on time! Please note that the late days do not apply to any part of the final project: that includes the project proposal and final project report. This policy will be enforced strictly.

Prerequisites

While there are no formal prerequisites, this course assumes familiarity with computer vision (16-720 or similar) and machine learning (10-601 or similar). If you have not taken courses covering this material, consult with the instructor. Additionally, you must be familiar with how to use PyTorch and have some prior experience using the framework.

Website template modified from here

Carnegie Mellon University

16-824: Visual Learning and Recognition

VIS LRN & RECOG

Fall 2025

Mondays and Wednesdays, 2:00-3:20pm, TEP 1403