ОУФ Машинное Обучение в Питоне
Содержание
О курсе
Курс читается в 1-2 модулях.
This course introduces the students to the elements of machine learning, including supervised and unsupervised methods such as linear and logistic regressions, splines, decision trees, support vector machines, bootstrapping, random forests, boosting, regularized methods and much more. The two modules (Sept-Dec, 2020) use Python programming language and popular packages to investigate and visualize datasets and develop machine learning models.
Instructor: Oleg Melnikov
Ассистенты: see Canvas LMS
Пререквизиты курса: at least one semester of calculus on a real line, vector calculus, linear algebra, probability and statistics, computer programming in high level language such as Python or R.
Технические требования: Laptop, Internet connection, Chrome web browser, Google Drive, Google Colab.
План курса
Лекции
1. Math Essentials. Intro to Python in Google Colab
2. Ch2. Intro to Statistical learning
3. Ch3. Linear Regression
4. Ch3. k-Nearest Neighbors
5. Ch4. Classification: logistic regression
6. Ch4. Classification: LDA, QDA, KNN
7. Ch5. Resampling methods. CV, Bootstrap
8. Ch6. Linear model selection & regularization
9. Ch7. Non-linear regression
10. Ch7. Non-linear regression-2
11. Ch8. Decision Trees
12. Ch8. Bagging, Random Forest, Boosting
13. Ch9. Support Vector Machines/Classifiers
14. Ch10. Clustering methods. PCA, k-Means, HC
15. Special Topics: tSNE, UMAP, Neural Networks
Итоговая оценка за курс
ОУФ Итоговая оценка = 0.35*HW + 0.1*Q+ 0.05*P + 0.5*(E1 + E2)
HW, exams, and project grades are on the scale of 0-100. Course grade is scaled to 0-10, which is the range used at HSE. There are no blocking grading components.
Assignment submission: All submissions will be done as PDF and IPYNB file via Canvas LMS. Graders will leave feedback in your PDF and execute your IPYNB to reproduce the results.
HW: weekly graded homework (HW) assignments, which will include analysis of datasets, analytical and conceptual problems, and programming assignments. These are to be completed individually.
Exams (E): There will be exams at the end of each of the 4 modules. The examination locations are TBD. An in-class exam is closed book, notes, calculators and phones. Take-home exam is an open book/internet, but no collaboration. Exam questions are different from homework questions: HW deepens your understanding, but the exams measure it. Each exam is cumulative. Do not book travel that conflicts with this date.
Automatic grading policy for Exam 2: If grade to the date of exam 2 (G2E1) ≥ 95% and exam 1 grade ≥ 95%, then G2E1 is used as the grade for exam 2.
Coursework Project (CP) in R programming language is for DSBA/ICEF students only and is administered by LSE/UoC. It is released about 1 November and due about 1 April. Although students are given a 4-5 months window, this exercise is meant to be completed in a few days. Typically, students work on it in Feb/Mar. Details TBD.
In-Canvas Quizzes (Q) are based on lectures, slides, and textbooks. Answers can only be submitted once and cannot be seen thereafter, so please check them carefully before submitting. Questions are shuffled and sampled for each student. So, students will likely see different questions.
Participation (P): this includes your active participation in the course, answering questions of your peers in the Piazza forum, and your attendance of seminars and lectures. Redundant and uninformative posts (for the sake of traffic) may lower participation grade. Please leave meaningful questions and comments. All participation is tracked by Zoom software and Piazza. Attendance of the seminar sessions is required and is graded as participation.
Re-grading: We aim to grade fairly, accurately, and timely. If you believe we made a crude grading error, please notify your TA/GA privately via forum ASAP (within 1 week of the grade’s release). To discourage frivolous appeals, we reserve the right to deduct a 2-5% of the grade, if your appeal lacks a strong justification or the benefit fails to exceed 2-5%. Be sure it is worth the mutual effort.
Make up policy: If you miss an exam with a valid/verifiable excuse (be prepared to demonstrate), contact instructors ASAP to reschedule the exam. Please mind that making exceptions is difficult and time consuming and can only be done before exams/solutions are distributed. Typically, a verifiable medical emergency is a valid reason, but travel and conferences are not. Other assignments cannot be made up. It is the student's responsibility to start their work early, so as to hedge against any unforeseeable life event.
Литература
Required Textbooks
Modules 1-2: An Introduction to Statistical Learning, with applications in R, J. Gareth, et. al.,
ISBN: 9781461471387. Book's site, errata.
Important: Non-US editions may have different exercises, which may impact your grade.
Highly recommended: supplemental videos
Optional material
Pattern Recognition and Machine Learning, C.M. Bishop. ISBN: 978-0387-31073-2
Foundations of Machine Learning, M. Mohri, et. al.
Дополнительная информация
Methods of Instruction:
- Lectures via Zoom video
- Seminars in classroom (unless stated otherwise)
- Piazza discussion forum is the platform for all written communication with the teaching team
- Canvas LMS is the platform for the course management, incl. quizzes, assignments, grading, Zoom sessions are recorded for later viewing, but attendance is mandatory and graded on participation.
- Instructors will add students to Piazza and Canvas.
Special Equipment and Software Support
Laptop, Internet connection
Chrome web browser, Google Drive, Google Colab
Seminar sessions focus on hands-on experience, where students will review and/or participate in applications often learnt models to a real world dataset.
Academic Honor Policies
Academic Integrity: Students are expected to maintain the exemplary integrity in their class efforts. Make sure you understand the Disciplinary Measures for the Violation of Academic Standards from HSE Academic Handbook. It’s a must!
Examples of honor code violations:
1. Looking at the solutions from previous years’ HW or exams - either official or written up by another student.
2. Sharing the write up or code with another student (showing to or looking at).
3. Uploading your write up or code to a public repository, so that it can be accessed by other students.
4. Discussing homework problems in such detail that your solution (write up or code) is almost identical to another student's answer.
5. Unless explicitly mentioned otherwise, we will assume that any submitted work is your own, and
- Created without assistance from anyone else (except possibly course staff)
- Created without consulting any resources other than the course materials.
6. Note that StackOverflow, StackExchange, GitHub repos, etc. are also accessible to our staff and plagiarism detection software.
Collaboration: You are encouraged to discuss the homework problems and the material with your classmates, but you must submit your own individually developed HW solutions. Please indicate at the top of your write-up the names of the students with whom you worked.
Communication & Course Forums: All electronic course communication will be done via Piazza. Unless you have a strictly private matter, all questions should be placed in a discussion forum, where we can scale our assistance to the whole class. For private matters, please contact your the teaching team via a private Piazza post. We may not be replying to emails or Canvas messages. Do NOT share or post the code and solutions on the forum and Internet. Keep the level of solution detail similar to that in your textbook, which we will consider as the key source of truth (which can still have typos and can require clarifications).
Academic Integrity: Students are expected to maintain the exemplary integrity in their class efforts. Make sure you understand the University’s Honor Code. It’s a must!
Examples of honor code violations:
1. Looking at the solutions from previous years’ HW or exams - either official or written up by another student.
2. Sharing the write up or code with another student (showing to or looking at).
3. Uploading your write up or code to a public repository so that it can be accessed by other students.
4. Discussing homework problems in such detail that your solution (write up or code) is almost identical to another student's answer.
5. Unless explicitly mentioned otherwise, we will assume that any submitted work is your own
6. created without assistance from anyone else (except possibly course staff)
7. created without consulting any resources other than the course materials.
8. Note that StackOverflow, StackExchange, GitHub repos, etc. are also accessible to our staff and plagiarism detection software.
Posting Guidelines
— Update your Canvas profiles with a short bio and a photo.
— Phrase your questions precisely and concisely. Include URL of the referenced Canvas page, or textbook page number or video URL and minute in it.
— Format your posts, i.e. you LaTeX and code formatting to make your messages easier to read. — Do NOT post HW code, solutions, or answers (it may affect your participation grade). Give your peers a chance for discovery and learning. We do encourage higher-level discussions of problems/solutions. For example, you can describe the steps you take to reproduce the problem or you can post parts of the error message you observe.
— The instructor team will try to attend daily, so please be patient.
— Peer assistance is encouraged, just please avoid posting threads for the sake of posting (the quality does matter).
— Before asking a question, quick-search prior posts for similar concerns.
— The anonymous posts are ok, but the staff still sees the identities.
If you have a complaint about the class or inappropriate activity, please notify our teaching team via private Piazza post. If unresolved, contact the lecturer of the course. Mind that the forum answers may not always be correct (typos, question misunderstandings, etc.). Naturally, keep the communication professional, respectful and cordial.
Additional information
Copyright: Each disseminated document in this course is copyrighted with all rights reserved. These may not be reproduced or distributed without an explicit permission of the author(s). The HSE also has a policy that one should know before audio/video recording.
Video recordings: Zoom sessions are recorded. We will try to make these videos available in Canvas. These recordings might be reused in other HSE courses, viewed by other HSE students, faculty, or staff, or used for other education and research purposes. Note that your image or voice might be incidentally captured. If you have questions, please contact a member of the teaching team.
Methods of Instruction: Lectures via Zoom video, Seminars via Zoom and/or in classroom, online assistance. Zoom sessions are recorded for later viewing, but attendance is mandatory and graded on participation. Canvas Learning Management System (LMS) is the likely platform for the course and is being prepared.
Disclaimer: This syllabus and course details are subject to change, but we will keep this to a minimum and give an advanced warning, whenever possible.
Course assistance: If you feel the TAs did not resolve your issue, please escalate to the lecturer.