Reinforcement learning 2021 2022 — различия между версиями
Материал из Wiki - Факультет компьютерных наук
(Новая страница: «== Lecturers and Seminarists == {| class="wikitable" style="text-align:center" |- || Lecturer || [https://www.hse.ru/staff/anaumov Naumov Alexey ] || [anaumov@hs…») |
|||
(не показано 13 промежуточных версии этого же участника) | |||
Строка 3: | Строка 3: | ||
{| class="wikitable" style="text-align:center" | {| class="wikitable" style="text-align:center" | ||
|- | |- | ||
− | || Lecturer || [https://www.hse.ru/staff/anaumov | + | || Lecturer || [https://www.hse.ru/staff/anaumov Alexey Naumov] || [anaumov@hse.ru] || T924 |
|- | |- | ||
|| Lecturer || [https://www.hse.ru/org/persons/93130881 Denis Belomestny ] || [dbelomestny@hse.ru] || T924 | || Lecturer || [https://www.hse.ru/org/persons/93130881 Denis Belomestny ] || [dbelomestny@hse.ru] || T924 | ||
|- | |- | ||
− | || Seminarist || [https://www.hse.ru/org/persons/219484540 | + | || Seminarist || [https://www.hse.ru/org/persons/219484540 Sergey Samsonov] || [svsamsonov@hse.ru] || T926 |
|- | |- | ||
|| Seminarist || [https://www.hse.ru/staff/mkaledin Maxim Kaledin ] || [mkaledin@hse.ru] || T926 | || Seminarist || [https://www.hse.ru/staff/mkaledin Maxim Kaledin ] || [mkaledin@hse.ru] || T926 | ||
Строка 27: | Строка 27: | ||
== Lectures == | == Lectures == | ||
− | *[https://www.dropbox.com/s/ | + | *[https://www.dropbox.com/s/a69ql9duo5jf5gt/Math%20of%20RL%20Lecture%201.pdf?dl=0 ''' Lecture 09.11'''] |
+ | *[https://www.dropbox.com/s/7zkirk1xykua890/Math_of_RL_Le%20cture_2.pdf?dl=0 ''' Lecture 16.11'''] | ||
== Seminars == | == Seminars == | ||
− | *[https://www.dropbox.com/s/ | + | *[https://www.dropbox.com/s/wc951vseud1q1p2/Seminar_09_11_RL.pdf?dl=0 '''Seminar 09.11'''], [https://www.dropbox.com/s/2h83vbjgew1inen/Seminar_1_RL.mp4?dl=0 '''Seminar 09.11, Video'''], [https://www.dropbox.com/s/bxa8h9vjrnegsql/Bandit_intro_strategies_09_11_2021.ipynb?dl=0 '''Seminar 09.11, Notebook'''] |
+ | *[https://www.dropbox.com/s/cq0t2o6n4yn6oag/Seminar_16_11_RL.mp4?dl=0 '''Seminar 16.11, Video'''], | ||
+ | *[https://www.dropbox.com/s/ex8v9w3smar70m7/Seminar_23_11_RL.mp4?dl=0 '''Seminar 23.11, Video'''], | ||
+ | *[https://www.dropbox.com/s/v1ywnk8eyhourjq/Seminar_07_12_RL.mp4?dl=0 '''Seminar 07.12, Video'''], | ||
+ | |||
+ | == Recommended literature == | ||
+ | |||
+ | '''Lecture and seminar 09.11''' | ||
+ | |||
+ | * Sebastien Bubek, Nicolo Cesa-Bianchi. Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems. Chapter 2. http://sbubeck.com/SurveyBCB12.pdf | ||
+ | * Richard S. Sutton, Andrew G. Barto. Reinforcement Learning: An Introduction. Chapter 2. http://incompleteideas.net/book/the-book-2nd.html; | ||
+ | * Botao Hao et al. Bootstrapping Upper Confidence Bound. https://arxiv.org/abs/1906.05247 | ||
+ | |||
+ | '''Lecture and seminar 16.11''' | ||
+ | *[https://www.dropbox.com/s/wc951vseud1q1p2/Seminar_09_11_RL.pdf?dl=0 '''Seminar 09.11'''], [https://www.dropbox.com/s/2h83vbjgew1inen/Seminar_1_RL.mp4?dl=0 '''Seminar 09.11, Video'''], | ||
==Homeworks == | ==Homeworks == | ||
+ | *[https://www.dropbox.com/s/k2at9lixvshpcbw/HW_1_RL_2021.pdf?dl=0 '''Homework №1, deadline 19.12.2021, 23:59'''], [https://www.dropbox.com/s/l7pma6kwnopl856/HW_1_task_2.ipynb?dl=0 '''Environment for task №2'''], | ||
+ | *[https://www.dropbox.com/s/jynwji3dw3xxjww/HW_2_RL_2021.pdf?dl=0 '''Homework №2, deadline 19.12.2021, 23:59''']. | ||
== Projects == | == Projects == | ||
− | |||
− | |||
− | |||
− | |||
− | |||
− |
Текущая версия на 22:49, 14 декабря 2021
Содержание
Lecturers and Seminarists
Lecturer | Alexey Naumov | [anaumov@hse.ru] | T924 |
Lecturer | Denis Belomestny | [dbelomestny@hse.ru] | T924 |
Seminarist | Sergey Samsonov | [svsamsonov@hse.ru] | T926 |
Seminarist | Maxim Kaledin | [mkaledin@hse.ru] | T926 |
About the course
This page contains materials for Mathematical Foundations of Reinforcement learning course in 2021/2022 year, optional one for 2nd year Master students of the Math of Machine Learning program (HSE and Skoltech).
Grading
The final grade consists of 2 components (each is non-negative real number from 0 to 10, without any intermediate rounding) :
- OHW for the hometasks
- OProject for the course project
The formula for the final grade is
- OFinal = 0.5*OHW + 0.5*OProject
with the usual (arithmetical) rounding rule.
Lectures
Seminars
- Seminar 09.11, Seminar 09.11, Video, Seminar 09.11, Notebook
- Seminar 16.11, Video,
- Seminar 23.11, Video,
- Seminar 07.12, Video,
Recommended literature
Lecture and seminar 09.11
- Sebastien Bubek, Nicolo Cesa-Bianchi. Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems. Chapter 2. http://sbubeck.com/SurveyBCB12.pdf
- Richard S. Sutton, Andrew G. Barto. Reinforcement Learning: An Introduction. Chapter 2. http://incompleteideas.net/book/the-book-2nd.html;
- Botao Hao et al. Bootstrapping Upper Confidence Bound. https://arxiv.org/abs/1906.05247
Lecture and seminar 16.11
Homeworks
- Homework №1, deadline 19.12.2021, 23:59, Environment for task №2,
- Homework №2, deadline 19.12.2021, 23:59.