Carnegie Mellon University16-824: Visual Learning and Recognition |
VIS LRN & RECOG |
Spring 2022 |
Key Topics: Visual Recognition, Deep Learning, Image Classification, Object Detection, Video Understanding, 3D Scene Understanding. Generative Models for Images and Videos.
Description: This graduate-level computer vision course focuses on representation and reasoning for large amounts of data (images, videos, and associated tags, text, GPS locations, etc.) toward the ultimate goal of understanding the visual world surrounding us. We will be reading an eclectic mix of classic and recent papers on topics including Theories of Perception, Mid-level Vision (Grouping, Segmentation, Poses), Object and Scene Recognition, 3D Scene Understanding, Action Recognition, Contextual Reasoning, Joint Language and Vision Models, Deep Generative Models, etc. We will be covering a wide range of supervised, semi-supervised and unsupervised approaches for each of the topics above.
Course Relevance: The course is relevant to students who want to understand and implement state-of-the-art deep learning and computer vision algorithms.
Course Goals: There are three primary course goals. First, the course aims to familiarize students with the fundamental concepts of deep learning models. Second, the course helps students understand state-of-the-art methods in visual recognition and understanding. Third, through the programming assignments and final project, students have an opportunity to learn how to build practical computer vision systems.
Resource will include lecture slides, textbooks, webpages, Colab, videos, and paper reading list. Additional coding tutorials on PyTorch will be provided.
This course requires ~9 hrs/weekly for assignments, projects and study.
For each assignment, TAs will not be looking over any of your code before the assignment deadline. You may discuss code with classmates. Please list collaborators whom you discussed with in the assignment write-up.
For the programming assignments, students will be allowed a total of five late days per semester; each additional late day will incur a 10% penalty. The code should be easy to run by TAs.
While there are no formal prerequisites, this course assumes familiarity with computer vision (16-720 or similar) and machine learning (10-601 or similar). If you have not taken courses covering this material, consult with the instructor.