Statistical learning theory 2025 — различия между версиями
Brbauwens (обсуждение | вклад) |
Brbauwens (обсуждение | вклад) |
||
| Строка 35: | Строка 35: | ||
|| [https://www.dropbox.com/scl/fi/kswtqmyxw3pv336g1vdd6/01sol.pdf?rlkey=bpwnrcsj6ru3nbo4xwq2lp6g0&st=hftnu87m&dl=0 sol01] | || [https://www.dropbox.com/scl/fi/kswtqmyxw3pv336g1vdd6/01sol.pdf?rlkey=bpwnrcsj6ru3nbo4xwq2lp6g0&st=hftnu87m&dl=0 sol01] | ||
|- | |- | ||
| − | | [https://www.youtube.com/watch?v=gQm1G3Ep-5s | + | | [https://www.youtube.com/watch?v=gQm1G3Ep-5s 23 Sep] |
|| The standard optimal algorithm. The perceptron algorithm. | || The standard optimal algorithm. The perceptron algorithm. | ||
|| [https://www.dropbox.com/s/sy959ee81mov5cr/02slides.pdf?dl=0 sl02] | || [https://www.dropbox.com/s/sy959ee81mov5cr/02slides.pdf?dl=0 sl02] | ||
| Строка 42: | Строка 42: | ||
|| <!-- [https://www.dropbox.com/scl/fi/d2wuka77bu18j9plivwl5/02sol.pdf?rlkey=yp2eprgxpc7r2antyidjd8qiw&dl=0 sol02] --> | || <!-- [https://www.dropbox.com/scl/fi/d2wuka77bu18j9plivwl5/02sol.pdf?rlkey=yp2eprgxpc7r2antyidjd8qiw&dl=0 sol02] --> | ||
|- | |- | ||
| − | | [https://www.youtube.com/watch?v=Fk1-QI9PRAI | + | | [https://www.youtube.com/watch?v=Fk1-QI9PRAI 30 Sep] |
|| Kernel perceptron algorithm. Prediction with expert advice. Recap probability theory (seminar). | || Kernel perceptron algorithm. Prediction with expert advice. Recap probability theory (seminar). | ||
|| [https://www.dropbox.com/s/a60p9b76cxusgqy/03slides.pdf?dl=0 sl03] | || [https://www.dropbox.com/s/a60p9b76cxusgqy/03slides.pdf?dl=0 sl03] | ||
| Строка 52: | Строка 52: | ||
|| ''Part 2. Distribution independent risk bounds'' | || ''Part 2. Distribution independent risk bounds'' | ||
|- | |- | ||
| − | | [https://www.youtube.com/watch?v=ycfYXvmKF0I | + | | [https://www.youtube.com/watch?v=ycfYXvmKF0I 07 Oct] |
|| Necessity of a hypothesis class. Sample complexity in the realizable setting, examples: threshold functions and finite classes. | || Necessity of a hypothesis class. Sample complexity in the realizable setting, examples: threshold functions and finite classes. | ||
|| [https://www.dropbox.com/s/pi0f3wab1xna6d7/04slides.pdf?dl=0 sl04] | || [https://www.dropbox.com/s/pi0f3wab1xna6d7/04slides.pdf?dl=0 sl04] | ||
| Строка 59: | Строка 59: | ||
|| <!-- [https://www.dropbox.com/scl/fi/g6j0n39zhm1he8kfena8d/04sol.pdf?rlkey=hcg1cr6s4cca9ekqua67ehlhf&st=81bpsm1a&dl=0 sol04] --> | || <!-- [https://www.dropbox.com/scl/fi/g6j0n39zhm1he8kfena8d/04sol.pdf?rlkey=hcg1cr6s4cca9ekqua67ehlhf&st=81bpsm1a&dl=0 sol04] --> | ||
|- | |- | ||
| − | | [https://www.youtube.com/watch?v=8J5B9CCy-ws | + | | [https://www.youtube.com/watch?v=8J5B9CCy-ws 14 Oct] |
|| Growth functions, VC-dimension and the characterization of sample comlexity with VC-dimensions | || Growth functions, VC-dimension and the characterization of sample comlexity with VC-dimensions | ||
|| [https://www.dropbox.com/s/rpnh6288rdb3j8m/05slides.pdf?dl=0 sl05] | || [https://www.dropbox.com/s/rpnh6288rdb3j8m/05slides.pdf?dl=0 sl05] | ||
| Строка 66: | Строка 66: | ||
|| <!-- [https://www.dropbox.com/scl/fi/jzm82hqbnzp7931gz8jd2/05sol.pdf?rlkey=o04gco2huwqo4m7rrtp0yd9gl&st=6f0uh0q4&dl=0 sol05] --> | || <!-- [https://www.dropbox.com/scl/fi/jzm82hqbnzp7931gz8jd2/05sol.pdf?rlkey=o04gco2huwqo4m7rrtp0yd9gl&st=6f0uh0q4&dl=0 sol05] --> | ||
|- | |- | ||
| − | | [https://www.youtube.com/watch?v=zHau8Br_UFQ | + | | [https://www.youtube.com/watch?v=zHau8Br_UFQ 21 Oct] |
|| Risk decomposition and the fundamental theorem of statistical learning theory (previous [https://www.youtube.com/watch?v=zHau8Br_UFQ recording] covers more) | || Risk decomposition and the fundamental theorem of statistical learning theory (previous [https://www.youtube.com/watch?v=zHau8Br_UFQ recording] covers more) | ||
|| [https://www.dropbox.com/s/0p8r5wgjy1hlku2/06slides.pdf?dl=0 sl06] | || [https://www.dropbox.com/s/0p8r5wgjy1hlku2/06slides.pdf?dl=0 sl06] | ||
| Строка 73: | Строка 73: | ||
|| <!-- [https://www.dropbox.com/scl/fi/w8kc0izfc12sqjyd8hfou/06sol.pdf?rlkey=a09f6yx9e0ifohus9vt2ybthd&st=09qmm3m6&dl=0 sol06] --> | || <!-- [https://www.dropbox.com/scl/fi/w8kc0izfc12sqjyd8hfou/06sol.pdf?rlkey=a09f6yx9e0ifohus9vt2ybthd&st=09qmm3m6&dl=0 sol06] --> | ||
|- | |- | ||
| − | | [https://youtube.com/live/G5fglRAaXMo | + | | [https://youtube.com/live/G5fglRAaXMo 04 Nov] |
|| Bounded differences inequality, Rademacher complexity, symmetrization, contraction lemma. | || Bounded differences inequality, Rademacher complexity, symmetrization, contraction lemma. | ||
|| [https://www.dropbox.com/s/kfithyq0dgcq6h8/07slides.pdf?dl=0 sl07] | || [https://www.dropbox.com/s/kfithyq0dgcq6h8/07slides.pdf?dl=0 sl07] | ||
| Строка 83: | Строка 83: | ||
|| ''Part 3. Margin risk bounds with applications'' | || ''Part 3. Margin risk bounds with applications'' | ||
|- | |- | ||
| − | | [https://www.youtube.com/watch?v=oU2AzubDXeo | + | | [https://www.youtube.com/watch?v=oU2AzubDXeo 11 Nov] |
|| Simple regression, support vector machines, margin risk bounds, and neural nets with dropout regularization | || Simple regression, support vector machines, margin risk bounds, and neural nets with dropout regularization | ||
|| [https://www.dropbox.com/s/oo1qny9busp3axn/08slides.pdf?dl=0 sl08] | || [https://www.dropbox.com/s/oo1qny9busp3axn/08slides.pdf?dl=0 sl08] | ||
| Строка 90: | Строка 90: | ||
|| <!-- [https://www.dropbox.com/scl/fi/fcu1kbczqnxjbvtjpxst7/08sol.pdf?rlkey=irlhu14q6d12poymmc25xmh6q&st=pt7euz9i&dl=0 sol08] --> | || <!-- [https://www.dropbox.com/scl/fi/fcu1kbczqnxjbvtjpxst7/08sol.pdf?rlkey=irlhu14q6d12poymmc25xmh6q&st=pt7euz9i&dl=0 sol08] --> | ||
|- | |- | ||
| − | | [https://youtube.com/live/77-rZFzX2O8 | + | | [https://youtube.com/live/77-rZFzX2O8 18 Nov] |
|| Kernels: RKHS, representer theorem, risk bounds | || Kernels: RKHS, representer theorem, risk bounds | ||
|| [https://www.dropbox.com/s/jst60ww8ev4ypie/09slides.pdf?dl=0 sl09] | || [https://www.dropbox.com/s/jst60ww8ev4ypie/09slides.pdf?dl=0 sl09] | ||
| Строка 97: | Строка 97: | ||
|| <!-- [https://www.dropbox.com/scl/fi/2pxx6ctc7qv4xpvc4esla/09sol.pdf?rlkey=dg9pncbr6d294gz5me3efzrwp&st=v49ksm24&dl=0 sol09] --> | || <!-- [https://www.dropbox.com/scl/fi/2pxx6ctc7qv4xpvc4esla/09sol.pdf?rlkey=dg9pncbr6d294gz5me3efzrwp&st=v49ksm24&dl=0 sol09] --> | ||
|- | |- | ||
| − | | [https://www.youtube.com/watch?v=OgiaWrWh_WA | + | | [https://www.youtube.com/watch?v=OgiaWrWh_WA 25 Nov] |
|| AdaBoost and the margin hypothesis | || AdaBoost and the margin hypothesis | ||
|| [https://www.dropbox.com/s/umum3kd9439dt42/10slides.pdf?dl=0 sl10] | || [https://www.dropbox.com/s/umum3kd9439dt42/10slides.pdf?dl=0 sl10] | ||
| Строка 104: | Строка 104: | ||
|| <!-- [https://www.dropbox.com/scl/fi/5lbthnkjkn35y68ohmhm4/10sol.pdf?rlkey=0w0twp97ohfrlcsspnzfg0wgh&st=74hhghgd&dl=0 sol10] --> | || <!-- [https://www.dropbox.com/scl/fi/5lbthnkjkn35y68ohmhm4/10sol.pdf?rlkey=0w0twp97ohfrlcsspnzfg0wgh&st=74hhghgd&dl=0 sol10] --> | ||
|- | |- | ||
| − | | [https://youtube.com/live/DUgksR6gOQ8 | + | | [https://youtube.com/live/DUgksR6gOQ8 02 Dec] |
|| Losses of neural nets are not locally convex. Gradient descent with stable gradients. ([https://www.youtube.com/watch?v=ygVHVW3y3wM Old recording] about Hessians) | || Losses of neural nets are not locally convex. Gradient descent with stable gradients. ([https://www.youtube.com/watch?v=ygVHVW3y3wM Old recording] about Hessians) | ||
|| | || | ||
| Строка 111: | Строка 111: | ||
|| <!-- [https://www.dropbox.com/scl/fi/topptsvelhdpog2qucfpr/11sol.pdf?rlkey=ceev18140kz2ly8y8crxixf03&st=lvk4j2rz&dl=0 sol11] --> | || <!-- [https://www.dropbox.com/scl/fi/topptsvelhdpog2qucfpr/11sol.pdf?rlkey=ceev18140kz2ly8y8crxixf03&st=lvk4j2rz&dl=0 sol11] --> | ||
|- | |- | ||
| − | | [https://youtube.com/live/URjcCXEMPv4 | + | | [https://youtube.com/live/URjcCXEMPv4 09 Dec] |
|| Lazy training and the neural tangent kernel. | || Lazy training and the neural tangent kernel. | ||
|| | || | ||
| Строка 117: | Строка 117: | ||
|| | || | ||
|| | || | ||
| − | + | | 16 Dec | |
| − | + | || Colloquium [https://www.dropbox.com/scl/fi/e2692ns95pg0kj0m4e0wo/colloqQuest.pdf?rlkey=peey4u0dxz0vohv39a3oc67ft&st=c87t9kqu&dl=0 Rules and questions] previous year. | |
| − | || Colloquium | + | |
|} | |} | ||
Версия 12:25, 23 сентября 2025
Содержание
General Information
Lectures: on Tuesdays 13h00 -- 14h20 in room S834 and in zoom by Bruno Bauwens
Seminars: on Tuesdays 14h20 -- 16h00 online in Zoom by Nikita Lukianenko.
Please join the telegram group The course is similar to last year.
Homeworks
Deadline every 2 weeks, before the lecture. The tasks are at the end of each problem list. (Problem lists will be updated, check the year.)
Before 3rd lecture, submit homework from problem lists 1 and 2. Before 5th lecture, from lists 3 and 4. Etc.
Use --this link-- to submit homeworks. You may submit in English or Russian, as latex or as pictures. Results are here.
Late policy: 1 homework can be submitted at most 24 late without explanations.
Course materials
| Video | Summary | Slides | Lecture notes | Problem list | Solutions | ||
|---|---|---|---|---|---|---|---|
| Part 1. Online learning | |||||||
| 16 Sep | Philosophy. The online mistake bound model. The halving and weighted majority algorithms. | sl01 | ch00 ch01 | prob01 | sol01 | ||
| 23 Sep | The standard optimal algorithm. The perceptron algorithm. | sl02 | ch02 ch03 | prob02 | |||
| 30 Sep | Kernel perceptron algorithm. Prediction with expert advice. Recap probability theory (seminar). | sl03 | ch04 ch05 | prob03 | |||
| Part 2. Distribution independent risk bounds | |||||||
| 07 Oct | Necessity of a hypothesis class. Sample complexity in the realizable setting, examples: threshold functions and finite classes. | sl04 | ch06 | prob04 | |||
| 14 Oct | Growth functions, VC-dimension and the characterization of sample comlexity with VC-dimensions | sl05 | ch07 ch08 | prob05 | |||
| 21 Oct | Risk decomposition and the fundamental theorem of statistical learning theory (previous recording covers more) | sl06 | ch09 | prob06 | |||
| 04 Nov | Bounded differences inequality, Rademacher complexity, symmetrization, contraction lemma. | sl07 | ch10 ch11 | prob07 | |||
| Part 3. Margin risk bounds with applications | |||||||
| 11 Nov | Simple regression, support vector machines, margin risk bounds, and neural nets with dropout regularization | sl08 | ch12 ch13 | prob08 | |||
| 18 Nov | Kernels: RKHS, representer theorem, risk bounds | sl09 | ch14 | prob09 | |||
| 25 Nov | AdaBoost and the margin hypothesis | sl10 | ch15 | prob10 | |||
| 02 Dec | Losses of neural nets are not locally convex. Gradient descent with stable gradients. (Old recording about Hessians) | ch16 | prob11 | ||||
| 09 Dec | Lazy training and the neural tangent kernel. | ch17 | 16 Dec | Colloquium Rules and questions previous year. |
The lectures in October and November are based on the book:
Foundations of machine learning 2nd ed, Mehryar Mohri, Afshin Rostamizadeh, and Ameet Talwalker, 2018.
A gentle introduction to the materials of the first 3 lectures and an overview of probability theory, can be found in chapters 1-6 and 11-12 of the following book: Sanjeev Kulkarni and Gilbert Harman: An Elementary Introduction to Statistical Learning Theory, 2012.
Grading formula
Final grade = 0.35 * [score of homeworks] + 0.35 * [score of colloquium] + 0.3 * [score on the exam] + bonus from quizzes.
All homework questions have the same weight. Each solved extra homework task increases the score of the final exam by 1 point. At the end of the lectures there is a short quiz in which you may earn 0.1 bonus points on the final non-rounded grade.
There is no rounding except for transforming the final grade to the official grade. Arithmetic rounding is used.
Autogrades: if you only need 6/10 on the exam to have the maximal 10/10 for the course, this will be given automatically. This may happen because of extra homework questions and bonuses from quizzes.
Colloquium
Rules and questions from last year.
Date: TBA
Problems exam
Date: TBA
-- You may use handwritten notes, lecture materials from this wiki (either printed or through your PC), Mohri's book
-- You may not search on the internet or interact with other humans (e.g. by phone, forums, etc)
About questions
-- 4 questions of the difficulty of the homework. (Many homework questions were from former exams.)
-- I always ask to calculate VC dimension and to give/prove some risk bound with Rademacher complexity.
Office hours
Bruno Bauwens: TBA. Better send me an email in advance.
Nikita Lukianenko: Write in Telegram, the time is flexible