ОУФ Машинное Обучение в Питоне — различия между версиями

Материал из Wiki - Факультет компьютерных наук
Перейти к: навигация, поиск
Строка 74: Строка 74:
 
== Литература ==
 
== Литература ==
  
=== Required Textbooks:===  
+
=== Required Textbooks ===  
  
 
'''Modules 1-2''': [http://faculty.marshall.usc.edu/gareth-james/ISL/ISLR%20Seventh%20Printing.pdf An Introduction to Statistical Learning, with applications in R, J. Gareth, et. al.,]
 
'''Modules 1-2''': [http://faculty.marshall.usc.edu/gareth-james/ISL/ISLR%20Seventh%20Printing.pdf An Introduction to Statistical Learning, with applications in R, J. Gareth, et. al.,]
Строка 86: Строка 86:
 
Highly recommended: [https://www.r-bloggers.com/in-depth-introduction-to-machine-learning-in-15-hours-of-expert-videos/ supplemental videos]
 
Highly recommended: [https://www.r-bloggers.com/in-depth-introduction-to-machine-learning-in-15-hours-of-expert-videos/ supplemental videos]
  
===Optional material:===
+
===Optional material ===
  
 
[https://www.microsoft.com/en-us/research/uploads/prod/2006/01/Bishop-Pattern-Recognition-and-Machine-Learning-2006.pdf Pattern Recognition and Machine Learning, C.M. Bishop. ISBN: 978-0387-31073-2]
 
[https://www.microsoft.com/en-us/research/uploads/prod/2006/01/Bishop-Pattern-Recognition-and-Machine-Learning-2006.pdf Pattern Recognition and Machine Learning, C.M. Bishop. ISBN: 978-0387-31073-2]

Версия 19:12, 3 сентября 2020

О курсе

Курс читается в 1-2 модулях.

Instructor: Oleg Melnikov

Ассистенты: see Canvas LMS

This course introduces the students to the elements of machine learning, including supervised and unsupervised methods such as linear and logistic regressions, splines, decision trees, support vector machines, bootstrapping, random forests, boosting, regularized methods and much more. The two modules (Sept-Dec, 2020) use Python programming language and popular packages to investigate and visualize datasets and develop machine learning models.

Пререквизиты курса: at least one semester of calculus on a real line, vector calculus, linear algebra, probability and statistics, computer programming in high level language such as Python or R.

Технические требования: Laptop, Internet connection, Chrome web browser, Google Drive, Google Colab.

План курса

Лекции

1. Math Essentials. Intro to Python in Google Colab

2. Ch2. Intro to Statistical learning

3. Ch3. Linear Regression

4. Ch3. k-Nearest Neighbors

5. Ch4. Classification: logistic regression

6. Ch4. Classification: LDA, QDA, KNN

7. Ch5. Resampling methods. CV, Bootstrap

8. Ch6. Linear model selection & regularization

9. Ch7. Non-linear regression

10. Ch7. Non-linear regression-2

11. Ch8. Decision Trees

12. Ch8. Bagging, Random Forest, Boosting

13. Ch9. Support Vector Machines/Classifiers

14. Ch10. Clustering methods. PCA, k-Means, HC

15. Special Topics: tSNE, UMAP, Neural Networks

Итоговая оценка за курс

ОУФ Итоговая оценка = 0.35*HW + 0.1*Q+ 0.05*P + 0.5*(E1 + E2)

HW, exams, and project grades are on the scale of 0-100. Course grade is scaled to 0-10, which is the range used at HSE. There are no blocking grading components.

Assignment submission: All submissions will be done as PDF and IPYNB file via Canvas LMS. Graders will leave feedback in your PDF and execute your IPYNB to reproduce the results.

HW: weekly graded homework (HW) assignments, which will include analysis of datasets, analytical and conceptual problems, and programming assignments. These are to be completed individually.

Exams (E): There will be exams at the end of each of the 4 modules. The examination locations are TBD. An in-class exam is closed book, notes, calculators and phones. Take-home exam is an open book/internet, but no collaboration. Exam questions are different from homework questions: HW deepens your understanding, but the exams measure it. Each exam is cumulative. Do not book travel that conflicts with this date.

Automatic grading policy for Exam 2: If grade to the date of exam 2 (G2E1) ≥ 95% and exam 1 grade ≥ 95%, then G2E1 is used as the grade for exam 2.

Coursework Project (CP) in R programming language is for DSBA/ICEF students only and is administered by LSE/UoC. It is released about 1 November and due about 1 April. Although students are given a 4-5 months window, this exercise is meant to be completed in a few days. Typically, students work on it in Feb/Mar. Details TBD.

In-Canvas Quizzes (Q) are based on lectures, slides, and textbooks. Answers can only be submitted once and cannot be seen thereafter, so please check them carefully before submitting. Questions are shuffled and sampled for each student. So, students will likely see different questions.

Participation (P): this includes your active participation in the course, answering questions of your peers in the Piazza forum, and your attendance of seminars and lectures. Redundant and uninformative posts (for the sake of traffic) may lower participation grade. Please leave meaningful questions and comments. All participation is tracked by Zoom software and Piazza. Attendance of the seminar sessions is required and is graded as participation.

Re-grading: We aim to grade fairly, accurately, and timely. If you believe we made a crude grading error, please notify your TA/GA privately via forum ASAP (within 1 week of the grade’s release). To discourage frivolous appeals, we reserve the right to deduct a 2-5% of the grade, if your appeal lacks a strong justification or the benefit fails to exceed 2-5%. Be sure it is worth the mutual effort.

Make up policy: If you miss an exam with a valid/verifiable excuse (be prepared to demonstrate), contact instructors ASAP to reschedule the exam. Please mind that making exceptions is difficult and time consuming and can only be done before exams/solutions are distributed. Typically, a verifiable medical emergency is a valid reason, but travel and conferences are not. Other assignments cannot be made up. It is the student's responsibility to start their work early, so as to hedge against any unforeseeable life event.

Литература

Required Textbooks

Modules 1-2: An Introduction to Statistical Learning, with applications in R, J. Gareth, et. al.,

ISBN: 9781461471387. Book's site, errata.

Cleaned up data is saved here

Important: Non-US editions may have different exercises, which may impact your grade.

Highly recommended: supplemental videos

Optional material

Pattern Recognition and Machine Learning, C.M. Bishop. ISBN: 978-0387-31073-2

Foundations of Machine Learning, M. Mohri, et. al.

Дополнительная информация

Methods of Instruction:

- Lectures via Zoom video

- Seminars in classroom (unless stated otherwise)

- Piazza discussion forum is the platform for all written communication with the teaching team

- Canvas LMS is the platform for the course management, incl. quizzes, assignments, grading, Zoom sessions are recorded for later viewing, but attendance is mandatory and graded on participation.

- Instructors will add students to Piazza and Canvas.

Special Equipment and Software Support

Laptop, Internet connection

Chrome web browser, Google Drive, Google Colab

Seminar sessions focus on hands-on experience, where students will review and/or participate in applications often learnt models to a real world dataset.


Academic Honor Policies Academic Integrity: Students are expected to maintain the exemplary integrity in their class efforts. Make sure you understand the Disciplinary Measures for the Violation of Academic Standards from HSE Academic Handbook. It’s a must! Examples of honor code violations: Looking at the solutions from previous years’ HW or exams - either official or written up by another student. Sharing the write up or code with another student (showing to or looking at). Uploading your write up or code to a public repository, so that it can be accessed by other students. Discussing homework problems in such detail that your solution (write up or code) is almost identical to another student's answer. Unless explicitly mentioned otherwise, we will assume that any submitted work is your own,and Created without assistance from anyone else (except possibly course staff) Created without consulting any resources other than the course materials. Note that StackOverflow, StackExchange, GitHub repos, etc. are also accessible to our staff and plagiarism detection software. Collaboration: You are encouraged to discuss the homework problems and the material with your classmates, but you must submit your own individually developed HW solutions. Please indicate at the top of your write-up the names of the students with whom you worked. Communication & Course Forums: All electronic course communication will be done via Piazza. Unless you have a strictly private matter, all questions should be placed in a discussion forum, where we can scale our assistance to the whole class. For private matters, please contact your the teaching team via a private Piazza post. We may not be replying to emails or Canvas messages. Do NOT share or post the code and solutions on the forum and Internet. Keep the level of solution detail similar to that in your textbook, which we will consider as the key source of truth (which can still have typos and can require clarifications).

Academic Integrity: Students are expected to maintain the exemplary integrity in their class efforts. Make sure you understand the University’s Honor Code. It’s a must! Examples of honor code violations: 1. Looking at the solutions from previous years’ HW or exams - either official or written up by another student. 2. Sharing the write up or code with another student (showing to or looking at). 3. Uploading your write up or code to a public repository so that it can be accessed by other students. 4. Discussing homework problems in such detail that your solution (write up or code) is almost identical to another student's answer. 5. Unless explicitly mentioned otherwise, we will assume that any submitted work is your own 6. created without assistance from anyone else (except possibly course staff) 7. created without consulting any resources other than the course materials. 8. Note that StackOverflow, StackExchange, GitHub repos, etc. are also accessible to our staff and plagiarism detection software.