Reinforcement learning 2021 2022 — различия между версиями
Материал из Wiki - Факультет компьютерных наук
Строка 31: | Строка 31: | ||
*[https://www.dropbox.com/s/wc951vseud1q1p2/Seminar_09_11_RL.pdf?dl=0 '''Seminar 09.11'''], [https://www.dropbox.com/s/2h83vbjgew1inen/Seminar_1_RL.mp4?dl=0 '''Seminar 09.11, Video'''], [https://www.dropbox.com/s/bxa8h9vjrnegsql/Bandit_intro_strategies_09_11_2021.ipynb?dl=0 '''Seminar 09.11, Notebook'''] | *[https://www.dropbox.com/s/wc951vseud1q1p2/Seminar_09_11_RL.pdf?dl=0 '''Seminar 09.11'''], [https://www.dropbox.com/s/2h83vbjgew1inen/Seminar_1_RL.mp4?dl=0 '''Seminar 09.11, Video'''], [https://www.dropbox.com/s/bxa8h9vjrnegsql/Bandit_intro_strategies_09_11_2021.ipynb?dl=0 '''Seminar 09.11, Notebook'''] | ||
− | |||
− | |||
− | |||
== Recommended literature == | == Recommended literature == | ||
Строка 39: | Строка 36: | ||
* Sebastien Bubek, Nicolo Cesa-Bianchi. Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems. Chapter 2. \url{http://sbubeck.com/SurveyBCB12.pdf} | * Sebastien Bubek, Nicolo Cesa-Bianchi. Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems. Chapter 2. \url{http://sbubeck.com/SurveyBCB12.pdf} | ||
* Richard S. Sutton, Andrew G. Barto. Reinforcement Learning: An Introduction. Chapter~$2$. \url{http://incompleteideas.net/book/the-book-2nd.html}; | * Richard S. Sutton, Andrew G. Barto. Reinforcement Learning: An Introduction. Chapter~$2$. \url{http://incompleteideas.net/book/the-book-2nd.html}; | ||
+ | |||
+ | |||
+ | ==Homeworks == | ||
+ | |||
+ | == Projects == |
Версия 23:20, 9 ноября 2021
Содержание
Lecturers and Seminarists
Lecturer | Naumov Alexey | [anaumov@hse.ru] | T924 |
Lecturer | Denis Belomestny | [dbelomestny@hse.ru] | T924 |
Seminarist | Samsonov Sergey | [svsamsonov@hse.ru] | T926 |
Seminarist | Maxim Kaledin | [mkaledin@hse.ru] | T926 |
About the course
This page contains materials for Mathematical Foundations of Reinforcement learning course in 2021/2022 year, optional one for 2nd year Master students of the Math of Machine Learning program (HSE and Skoltech).
Grading
The final grade consists of 2 components (each is non-negative real number from 0 to 10, without any intermediate rounding) :
- OHW for the hometasks
- OProject for the course project
The formula for the final grade is
- OFinal = 0.5*OHW + 0.5*OProject
with the usual (arithmetical) rounding rule.
Lectures
Seminars
Recommended literature
Lecture and seminar 09.11
- Sebastien Bubek, Nicolo Cesa-Bianchi. Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems. Chapter 2. \url{http://sbubeck.com/SurveyBCB12.pdf}
- Richard S. Sutton, Andrew G. Barto. Reinforcement Learning: An Introduction. Chapter~$2$. \url{http://incompleteideas.net/book/the-book-2nd.html};