Instructors	Mengye Ren
Lecture	Tuesday 4:55pm-6:55pm (31 Washington Pl, Room 411)

About This Course

This course covers a wide variety of introductory topics in machine learning and statistical modeling, including statistical learning theory, convex optimization, generative and discriminative models, kernel methods, boosting, latent variable models and so on. The primary goal is to provide students with the tools and principles needed to solve the machine learning problems found in practice. Course syllabus can be found here.

For registration information, please contact CS Graduate Office.

Prerequisites

If you'd like to waive the prerequisites, please send an email to the instructor. For each prerequisite, please clearly list which courses you've taken are equivalent, and highlight it in the transcript.

Solid mathematical background: equivalent to a 1-semester undergraduate course in each of the following: linear algebra, multivariate calculus, probability theory, and statistics
Computer science background: equivalent to 1-semester undergraduate course in data structures, algorithms, and python programming. Python programming required for most homework assignments
Recommended: At least one advanced, proof-based mathematics course
Some prerequisites may be waived with permission of the instructor

Logistics

Lecture format: In-person
Lecture location: 100 Washington Square East, Room 411, New York, NY 10003
Office hours:

Mengye Ren: Thursday 1:00PM - 2:00PM; Room 508 (60 5th Ave)

Discussions: We will use Campuswire for class discussion. Rather than emailing questions to the teaching staff, please post your questions on Campuswire, where they will be answered by the instructors, TAs, graders, and other students. For questions that are not specific to the class, you are also encouraged to post to Stack Overflow for programming questions and Cross Validated for statistics and machine learning questions. Please also post a link to these postings in Campuswire, so others in the class can answer the questions and benefit from the answers.

Grading

Homework (40%). There are 4 homeworks. See Assignments section for details. Some homeworks may have optional problems. You should view the optional problems primarily as a way to engage with more material, if you have the time. They will be counted towards extra credit.
Exams: midterm (30%). A midterm will be held in class during week 8's lecture.
Final project (30%). Students can form a team of maximum 3 members. A 2-page project proposal will be due by the end of week 8 (after midterm). A full project report will be due in the last week of the course.
Extra credits (2%). You will be awarded with up to 2% extra credit if you answer other students' questions in a substantial and helpful way on Campuswire. Extra credits may bump up your grade up to half a grade (e.g. B to B+).

Resources

Related courses

Acknowledgement: This course was originally developed by David S. Rosenberg, and later adapted by He He, Tal Linzen, and others.
Spring 2022 offering of DS-GA-1003
Foundations of Maching Learning from Bloomberg ML EDU by David S. Rosenberg (with videos)

Past exams

2023 Fall Midterm & Solution

2023 Spring Midterm & Solution

Textbooks

The cover of Elements of Statistical Learning

The cover of An Introduction to Statistical Learning

The cover of Understanding Machine Learning: From Theory to Algorithms

The cover of Pattern Recognition and Machine Learning

The cover of Bayesian Reasoning and Machine Learning

The Elements of Statistical Learning (Hastie, Friedman, and Tibshirani): This will be our main textbook for L1 and L2 regularization, trees, bagging, random forests, and boosting. It's written by three statisticians who invented many of the techniques discussed. There's an easier version of this book that covers many of the same topics, described below. (Available for free as a PDF.)
An Introduction to Statistical Learning (James, Witten, Hastie, and Tibshirani): This book is written by two of the same authors as The Elements of Statistical Learning. It's much less intense mathematically, and it's good for a lighter introduction to the topics. (Available for free as a PDF.)
Understanding Machine Learning: From Theory to Algorithms (Shalev-Shwartz and Ben-David): Covers a lot of theory that we don't go into, but it would be a good supplemental resource for a more theoretical course, such as Mohri's Foundations of Machine Learning course. (Available for free as a PDF.)
Pattern Recognition and Machine Learning (Christopher Bishop): Our primary reference for probabilistic methods, including bayesian regression, latent variable models, and the EM algorithm. It's highly recommended, but unfortunately not free online.
Bayesian Reasoning and Machine Learning (David Barber): A very nice resource for our topics in probabilistic modeling, and a possible substitute for the Bishop book. Would serve as a good supplemental reference for a more advanced course in probabilistic modeling, such as DS-GA 1005: Inference and Representation (Available for free as a PDF.)
Deep Learning (Goodfellow, Bengio, and Courville): A useful textbook on deep learning foundation and advanced techniques (available for free in HTML).

Hands-On Machine Learning with Scikit-Learn and TensorFlow (Aurélien Géron)

This is a practical guide to machine learning that corresponds fairly well with the content and level of our course. While most of our homework is about coding ML from scratch with numpy, this book makes heavy use of scikit-learn and TensorFlow. Comfort with the first two chapters of this book would be part of the ideal preparation for this course, and it will also be a handy reference for practical projects and work beyond this course, when you'll want to make use of existing ML packages, rather than rolling your own.

Data Science for Business (Provost and Fawcett)

Ideally, this would be everybody's first book on machine learning. The intended audience is both the ML practitioner and the ML product manager. It's full of important core concepts and practical wisdom. The math is so minimal that it's perfect for reading on your phone, and I encourage you to read it in parallel to doing this class, especially if you haven't taken DS-GA 1001.

Software

NumPy is "the fundamental package for scientific computing with Python." Our homework assignments will use NumPy arrays extensively.
scikit-learn is a comprehensive machine learning toolkit for Python. We won't use this for most of the homework assignments, since we'll be coding things from scratch. However, you may want to run the scikit-learn version of the algorithms to check that your own outputs are correct. Also, studying the source code can be a good learning experience.

Lectures

(HTF) refers to Hastie, Tibshirani, and Friedman's book The Elements of Statistical Learning
(SSBD) refers to Shalev-Shwartz and Ben-David's book Understanding Machine Learning: From Theory to Algorithms
(JWHT) refers to James, Witten, Hastie, and Tibshirani's book An Introduction to Statistical Learning

Future schedule is subject to change.

Week 1

	Topics	Materials	References
Lecture Sep 3	Topics Introduction Slides Scribble	Materials (None)	References (None)

Topics

Materials

References

Lecture Sep 3

Topics

Introduction
Slides
Scribble

Materials

(None)

References

(None)

Week 2

	Topics	Materials	References
Lecture Sep 10	Topics Gradient descent Stochastic gradient descent Loss functions Slides Scribble	Materials Bottou's SGD Tricks Gradient Descent Convergence SGD Convergence	References (None)

Topics

Materials

References

Lecture Sep 10

Topics

Gradient descent
Stochastic gradient descent
Loss functions
Slides
Scribble

Materials

References

(None)

Week 3

	Topics	Materials	References
Lecture Sep 17	Topics Feature selection Regularization Lasso Optimization Max margin classifiers Slides Scribble	Materials (None)	References (None)

Topics

Materials

References

Lecture Sep 17

Topics

Feature selection
Regularization
Lasso Optimization
Max margin classifiers
Slides
Scribble

Materials

(None)

References

(None)

Week 4

	Topics	Materials	References
Lecture Sep 24	Topics Support Vector Machines Subgradient Descent SVM Dual Slides Scribble	Materials Geometric Derivation of SVMs SVM Insights from Duality Rosenberg's notes on convex optimization Convergence proof of subgradient descent Convex optimization and Lagrangian duality tutorial slides	References (None)

Topics

Materials

References

Lecture Sep 24

Topics

Support Vector Machines
Subgradient Descent
SVM Dual
Slides
Scribble

Materials

References

(None)

Week 5

	Topics	Materials	References
Lecture Oct 1	Topics Kernels Probabilistic Methods Slides Scribble	Materials (None)	References SSBD Ch. 16 A Survey of Kernels for Structured Data

Topics

Materials

References

Lecture Oct 1

Topics

Kernels
Probabilistic Methods
Slides
Scribble

Materials

(None)

References

SSBD Ch. 16
A Survey of Kernels for Structured Data

Week 6

	Topics	Materials	References
Lecture Oct 8	Topics Maximum Likelihood Estimation Generative and Discriminative Models Bayes Rule Slides Scribble	Materials Maximum Likelihood Conditional Probability Models	References (None)

Topics

Materials

References

Lecture Oct 8

Topics

Maximum Likelihood Estimation
Generative and Discriminative Models
Bayes Rule
Slides
Scribble

Materials

References

(None)

Week 8

	Topics	Materials	References
Midterm Oct 22	Topics Midterm	Materials Exam Solution	References (None)

Topics

Materials

References

Midterm Oct 22

Topics

Midterm

Materials

References

(None)

Week 9

	Topics	Materials	References
Lecture Oct 29	Topics Baysian Methods Bayesian Regression Multi-class Slides Scribble	Materials Multiclass Questions Multiclass Solutions	References SSBD Chapter 17 In Defense of One-Vs-All Classification Reducing Multiclass to Binary

Topics

Materials

References

Lecture Oct 29

Topics

Baysian Methods
Bayesian Regression
Multi-class
Slides
Scribble

Materials

References

Week 10

	Topics	Materials	References
Lecture Nov 5	Topics Structured Prediction Decision Trees Slides Scribble	Materials CRF Tutorial Structured Prediction	References JWHT 8.1 (Trees) HTF 9.2 (Trees)

Topics

Materials

References

Lecture Nov 5

Topics

Structured Prediction
Decision Trees
Slides
Scribble

Materials

References

JWHT 8.1 (Trees)
HTF 9.2 (Trees)

Week 11

	Topics	Materials	References
Lecture Nov 12	Topics Bagging, Random Forest Boosting Slides Scribble	Materials Intro to the Bootstrap Viola-Jones Face Detector	References (None)

Topics

Materials

References

Lecture Nov 12

Topics

Bagging, Random Forest
Boosting
Slides
Scribble

Materials

References

(None)

Week 12

	Topics	Materials	References
Lecture Nov 19	Topics Feature Learning Neural Networks I Slides Scribble	Materials Yes you should understand backprop (Karpathy) Challenges with backprop (Karpathy Lecture) Automatic Differentiation How Does Learning Rate Decay Help Modern Neural Networks?	References (None)

Topics

Materials

References

Lecture Nov 19

Topics

Feature Learning
Neural Networks I
Slides
Scribble

Materials

References

(None)

Week 13

	Topics	Materials	References
Lecture Nov 26	Topics Neural Networks II Deep Learning Slides Scribble	Materials (None)	References GBC Ch. 8 (Optimization) GBC Ch. 9 (CNNs) GBC Ch. 10 (RNNs)

Topics

Materials

References

Lecture Nov 26

Topics

Neural Networks II
Deep Learning
Slides
Scribble

Materials

(None)

References

GBC Ch. 8 (Optimization)
GBC Ch. 9 (CNNs)
GBC Ch. 10 (RNNs)

Week 14

	Topics	Materials	References
Lecture Dec 3	Topics k-Means, Gaussian Mixture Models EM, ELBO Variational Autoencoders Slides Scribble	Materials (None)	References HTF, 13.2.1 (k-Means) Bishop 9.2,9.3 (GMM/EM) An Alternative to EM for GMM [Optional]

Topics

Materials

References

Lecture Dec 3

Topics

k-Means, Gaussian Mixture Models
EM, ELBO
Variational Autoencoders
Slides
Scribble

Materials

(None)

References

HTF, 13.2.1 (k-Means)
Bishop 9.2,9.3 (GMM/EM)
An Alternative to EM for GMM [Optional]

Assignments

Late Policy: Homeworks are due at 12:00 PM Eastern time (Noon) on the date specified. You have 4 late days in total which can be used throughout the semester without penalty. Once you run out of late days, each additional late day will incur a 20% penalty. For example, if you submit an assignment 1 day late after using all your late days, a score of 90 will only be counted as 72. Note that the maximum late days per homework is two days, meaning that Gradescope will not accept submissions 48 hours after the due date.

Collaboration Policy: You may form study groups and discuss problems with your classmates. However, you must write up the homework solutions and the code from scratch, without referring to notes from your joint session. In your solution to each problem, you must write down the names of any person with whom you discussed the problem—this will not affect your grade.

Submission: Homework should be submitted through Gradescope. If you have not used Gradescope before, please watch this short video: "For students: submitting homework." At the beginning of the semester, you will be added to the Gradescope class roster. This will give you access to the course page, and the assignment submission form. To submit assignments, you will need to:

Upload a single PDF document containing all the math, code, plots, and exposition required for each problem.
Where homework assignments are divided into sections, please begin each section on a new page.
You will then select the appropriate page ranges for each homework problem, as described in the "submitting homework" video.

Feedback: Check Gradescope to get your scores on each individual problem, as well as comments on your answers. Regrading requests should be submitted on Gradescope.

Homework 0

Typesetting your homework

Due: January 1st, 12:00 PM Eastern time (Noon)

hw0.pdf hw0.zip

Homework 1

Linear Regression & Gradient Descent

Due: October 1st, 12:00 PM Eastern time (Noon)

hw1.pdf hw1.zip

Homework 2

SVMs, Kernels & Logistic Regression

Due: October 15th, 12:00 PM Eastern time (Noon)

hw2.pdf hw2.zip

Homework 3

Bayesian ML & Multiclass

Due: November 12th, 12:00 PM Eastern time (Noon)

hw3.pdf hw3.zip

Homework 4

Decision Trees, Boosting, and Neural Networks

Due: December 3rd, 12:00 PM Eastern time (Noon)

hw4.pdf hw4.zip

Course Project

The final course project constitutes 30% of your overall grade. The objective is to apply the machine learning concepts acquired during this course to a real-world problem. Choose a pertinent and applicable issue, identify an appropriate data source for your machine learning solution, and if no suitable data source exists, propose methods to gather the required data efficiently. More project instruction

Template and detailed project instructions package can be found here: pdf zip

Key dates

Oct 15, 2024: Form groups of three students. Groups will be assigned a number. Students not part of a group will be assigned a group by the instructor.
Oct 31, 2024: Submit project proposals on Gradescope by 12PM. While this proposal is mandatory, it will not be graded. Its intent is to establish a checkpoint. With the submission, book a mandatory consultation with the instructor to discuss/approve your selected topic and proposed methodologies. This consultation can be outside of regular office hours.
Nov 15, 2024: You should have done at least one office hour consultation by this date.
Dec 9, 2024: Submit your presentation slide deck by 11:59PM.
Dec 10, 2024: Course project presentation 4:55 - 6:55PM.
Dec 13, 2024: Final report due on Gradescope by 12PM (Noon).
Dec 14, 2024: Complete the self and peer evaluation via Google Form.

People

Instructor

Mengye Ren

mengye@cs.nyu.edu

Profile Page

Mengye Ren is an Assistant Professor of Computer Science and Data Science at NYU. His research focuses on deep learning and computer vision.

Graders

Pavan Ravishankar

pr2248@nyu.edu

Profile Page

Pavan is a PhD student in Computer Science at NYU Courant. His research interests center around end-to-end fairness in ML pipelines.

Yilun Kuang

yilun.kuang@nyu.edu

Profile Page

Yilun is a PhD student in Data Science at NYU. His My research interests includes large language models, diffusion models, self-supervised learning, etc.

Yash Amin

yva2006@nyu.edu

Yash is a second-year Master student in Computer Science at NYU Tandon Engineering.

Machine Learning CSCI-GA-2565 · Fall 2024 · NYU Courant Computer Science

About This Course

Prerequisites

Logistics

Grading

Resources

Related courses

Past exams

Textbooks

Other tutorials and references

Software

Lectures

Week 1

Lecture Sep 3

Topics

Materials

References

Week 2

Lecture Sep 10

Topics

Materials

References

Week 3

Lecture Sep 17

Topics

Materials

References

Week 4

Lecture Sep 24

Topics

Materials

References

Week 5

Lecture Oct 1

Topics

Materials

References

Week 6

Lecture Oct 8

Topics

Materials

References

Week 8

Midterm Oct 22

Topics

Materials

References

Week 9

Lecture Oct 29

Topics

Materials

References

Week 10

Lecture Nov 5

Topics

Materials

References

Week 11

Lecture Nov 12

Topics

Materials

References

Week 12

Lecture Nov 19

Topics

Materials

References

Week 13

Lecture Nov 26

Topics

Materials

References

Week 14

Lecture Dec 3

Topics

Materials

References

Assignments

Homework 0

Homework 1