RL 2023 — различия между версиями
Материал из Wiki - Факультет компьютерных наук
Ivlevin (обсуждение | вклад) |
Ivlevin (обсуждение | вклад) |
||
Строка 23: | Строка 23: | ||
*[https://www.overleaf.com/read/kbzmvxdzbrxq '''Lectures and seminars notes'''] | *[https://www.overleaf.com/read/kbzmvxdzbrxq '''Lectures and seminars notes'''] | ||
*[https://colab.research.google.com/drive/10qBq7Ot_1ZpnTeD11P5AnE8jFVj0OLXl?usp=sharing '''Notebook for the first seminar'''] | *[https://colab.research.google.com/drive/10qBq7Ot_1ZpnTeD11P5AnE8jFVj0OLXl?usp=sharing '''Notebook for the first seminar'''] | ||
+ | |||
+ | == Homeworks == | ||
+ | *[https://disk.yandex.ru/i/C8hwvvS5us09sA '''HW 1'''] | ||
+ | |||
== Recommended literature == | == Recommended literature == |
Версия 15:10, 24 ноября 2023
Содержание
Lecturers and Seminarists
Lecturer | Alexey Naumov | [anaumov@hse.ru] | T924 |
Seminarist | Ilya Levin | [tg: @levensons] | T926 |
About the course
This page contains materials for Mathematical Foundations of Reinforcement learning course in 2023 year, optional one for 2nd year Master students of the Math of Machine Learning program (HSE and Skoltech).
Grading
The final grade consists of 2 components (each is non-negative real number from 0 to 10, without any intermediate rounding) :
- OHW for the hometasks
- OProject for the course project
The formula for the final grade is
- OFinal = 0.6*OHW + 0.4*OProject
with the usual (arithmetical) rounding rule.
Course materials
Homeworks
Recommended literature
- Sebastien Bubek, Nicolo Cesa-Bianchi. Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems. Chapter 2. http://sbubeck.com/SurveyBCB12.pdf
- Richard S. Sutton, Andrew G. Barto. Reinforcement Learning: An Introduction. Chapter 2. http://incompleteideas.net/book/the-book-2nd.html;
- Botao Hao et al. Bootstrapping Upper Confidence Bound. https://arxiv.org/abs/1906.05247
- Aleksandrs Slivkins. Introduction to Multi-Armed Bandits. https://arxiv.org/abs/1904.07272 [Chapter 1]