Reinforcement learning 2022 2023 — различия между версиями
Материал из Wiki - Факультет компьютерных наук
(не показано 9 промежуточных версии этого же участника) | |||
Строка 22: | Строка 22: | ||
[https://docs.google.com/spreadsheets/d/1MPWVIkgxyotHU-P5cE7Gik4C6RTWxTnAVK8Btl7Fw3Y/edit?usp=sharing '''Table with grades'''] | [https://docs.google.com/spreadsheets/d/1MPWVIkgxyotHU-P5cE7Gik4C6RTWxTnAVK8Btl7Fw3Y/edit?usp=sharing '''Table with grades'''] | ||
− | == | + | == Course materials == |
− | *[https://www. | + | *[https://www.overleaf.com/read/kbzmvxdzbrxq '''Lectures and seminars notes'''] |
− | *[https:// | + | *[https://colab.research.google.com/drive/10qBq7Ot_1ZpnTeD11P5AnE8jFVj0OLXl?usp=sharing '''Notebook for the first seminar'''] |
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
== Recommended literature == | == Recommended literature == | ||
− | |||
− | |||
* Sebastien Bubek, Nicolo Cesa-Bianchi. Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems. Chapter 2. http://sbubeck.com/SurveyBCB12.pdf | * Sebastien Bubek, Nicolo Cesa-Bianchi. Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems. Chapter 2. http://sbubeck.com/SurveyBCB12.pdf | ||
* Richard S. Sutton, Andrew G. Barto. Reinforcement Learning: An Introduction. Chapter 2. http://incompleteideas.net/book/the-book-2nd.html; | * Richard S. Sutton, Andrew G. Barto. Reinforcement Learning: An Introduction. Chapter 2. http://incompleteideas.net/book/the-book-2nd.html; | ||
* Botao Hao et al. Bootstrapping Upper Confidence Bound. https://arxiv.org/abs/1906.05247 | * Botao Hao et al. Bootstrapping Upper Confidence Bound. https://arxiv.org/abs/1906.05247 | ||
− | + | * Aleksandrs Slivkins. Introduction to Multi-Armed Bandits. https://arxiv.org/abs/1904.07272 [Chapter 1] | |
− | + | ||
− | * | + | |
==Homeworks == | ==Homeworks == | ||
− | *[https:// | + | *[https://github.com/svsamsonov/Math_RL_2022_2023 '''HW #1, deadline: 04.12.22, 23:59'''] |
− | + | ||
== Projects == | == Projects == |
Текущая версия на 14:39, 21 ноября 2022
Содержание
Lecturers and Seminarists
Lecturer | Alexey Naumov | [anaumov@hse.ru] | T924 |
Seminarist | Sergey Samsonov | [svsamsonov@hse.ru] | T926 |
About the course
This page contains materials for Mathematical Foundations of Reinforcement learning course in 2022/2023 year, optional one for 2nd year Master students of the Math of Machine Learning program (HSE and Skoltech).
Grading
The final grade consists of 2 components (each is non-negative real number from 0 to 10, without any intermediate rounding) :
- OHW for the hometasks
- OProject for the course project
The formula for the final grade is
- OFinal = 0.6*OHW + 0.4*OProject
with the usual (arithmetical) rounding rule.
Course materials
Recommended literature
- Sebastien Bubek, Nicolo Cesa-Bianchi. Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems. Chapter 2. http://sbubeck.com/SurveyBCB12.pdf
- Richard S. Sutton, Andrew G. Barto. Reinforcement Learning: An Introduction. Chapter 2. http://incompleteideas.net/book/the-book-2nd.html;
- Botao Hao et al. Bootstrapping Upper Confidence Bound. https://arxiv.org/abs/1906.05247
- Aleksandrs Slivkins. Introduction to Multi-Armed Bandits. https://arxiv.org/abs/1904.07272 [Chapter 1]