Carnegie Mellon University16-824: Visual Learning and Recognition |
VIS LRN & RECOG |
Fall 2024 |
Key Topics: Visual Recognition, Deep Learning, Image Classification, Object Detection, Video Understanding, 3D Scene Understanding. Generative Models for Images and Videos.
Description: This graduate-level computer vision course focuses on representation and reasoning for large amounts of data (images, videos, and associated tags, text, GPS locations, etc.) toward the ultimate goal of understanding the visual world surrounding us. We will be reading an eclectic mix of classic and recent papers on topics including Theories of Perception, Mid-level Vision (Grouping, Segmentation, Poses), Object and Scene Recognition, 3D Scene Understanding, Action Recognition, Contextual Reasoning, Joint Language and Vision Models, Deep Generative Models, etc. We will be covering a wide range of supervised, semi-supervised and unsupervised approaches for each of the topics above.
Course Relevance: The course is relevant to students who want to understand and implement state-of-the-art deep learning and computer vision algorithms.
Course Goals: There are three primary course goals. First, the course aims to familiarize students with the fundamental concepts of deep learning models. Second, the course helps students understand state-of-the-art methods in visual recognition and understanding. Third, through the programming assignments and final project, students have an opportunity to learn how to build practical computer vision systems.
Resources will include lecture slides, textbooks, webpages, Colab, videos, and paper reading list. Additional coding tutorials on PyTorch will be provided.
There is no official textbook for this course. But you will find the following textbooks useful. They both free online versions:
This course requires ~9 hrs/weekly for assignments, projects and study.
Collaboration is encouraged, but the work you submit for assignments is expected to be entirely your own. That is, the writing and code must be yours, and you must fully understand everything that you submit. Discussing a paper or the details of how to solve a problem is fine, but you must write your submission yourself. Please list collaborators whom you discussed with in the assignment write-up. If we find highly identical work without proper accreditation of collaborators, we will take action according to university policies. For more, see the CMU academic integrity guidelines.
For the programming assignments, students will be allowed a total of five late days per semester; each additional late day will incur a 10% penalty. The code should be easy to run by TAs. Make sure to start early and complete your assignments on time! Please note that the late days do not apply to any part of the final project: that includes the project proposal, mid-term progress update and final project report. This policy will be enforced strictly.
While there are no formal prerequisites, this course assumes familiarity with computer vision (16-720 or similar) and machine learning (10-601 or similar). If you have not taken courses covering this material, consult with the instructor. Additionally, you must be familiar with how to use PyTorch and have some prior experience using the framework.