Reinforcement learning 2022 2023 — различия между версиями

Текущая версия на 14:39, 21 ноября 2022

Содержание

1 Lecturers and Seminarists
2 About the course
3 Grading
4 Course materials
5 Recommended literature
6 Homeworks
7 Projects

Lecturers and Seminarists

Lecturer	Alexey Naumov	[anaumov@hse.ru]	T924
Seminarist	Sergey Samsonov	[svsamsonov@hse.ru]	T926

About the course

This page contains materials for Mathematical Foundations of Reinforcement learning course in 2022/2023 year, optional one for 2nd year Master students of the Math of Machine Learning program (HSE and Skoltech).

Grading

The final grade consists of 2 components (each is non-negative real number from 0 to 10, without any intermediate rounding) :

O_HW for the hometasks
O_Project for the course project

The formula for the final grade is

O_Final = 0.6*O_HW + 0.4*O_Project

with the usual (arithmetical) rounding rule.

Table with grades

Course materials

Recommended literature

Sebastien Bubek, Nicolo Cesa-Bianchi. Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems. Chapter 2. http://sbubeck.com/SurveyBCB12.pdf
Richard S. Sutton, Andrew G. Barto. Reinforcement Learning: An Introduction. Chapter 2. http://incompleteideas.net/book/the-book-2nd.html;
Botao Hao et al. Bootstrapping Upper Confidence Bound. https://arxiv.org/abs/1906.05247
Aleksandrs Slivkins. Introduction to Multi-Armed Bandits. https://arxiv.org/abs/1904.07272 [Chapter 1]

Homeworks

HW #1, deadline: 04.12.22, 23:59

@@ Строка 4: / Строка 4: @@
 |-
 || Lecturer || [https://www.hse.ru/staff/anaumov Alexey Naumov] || [anaumov@hse.ru] || T924
-|-
-|| Lecturer || [https://www.hse.ru/org/persons/93130881 Denis Belomestny ] || [dbelomestny@hse.ru] || T924
 |-
 || Seminarist || [https://www.hse.ru/org/persons/219484540 Sergey Samsonov] || [svsamsonov@hse.ru] || T926
-|-
-|| Seminarist || [https://www.hse.ru/staff/mkaledin Maxim Kaledin ] || [mkaledin@hse.ru] || T926
 |-
 |}
 == About the course ==
-This page contains materials for Mathematical Foundations of Reinforcement learning course in 2021/2022 year, optional one for 2nd year Master students of the Math of Machine Learning program (HSE and Skoltech).
+This page contains materials for Mathematical Foundations of Reinforcement learning course in 2022/2023 year, optional one for 2nd year Master students of the Math of Machine Learning program (HSE and Skoltech).
 == Grading ==
@@ Строка 21: / Строка 17: @@
 * O<sub>Project</sub> for the course project
 The formula for the final grade is
-* O<sub>Final</sub> = 0.5*O<sub>HW</sub> + 0.5*O<sub>Project</sub>
+* O<sub>Final</sub> = 0.6*O<sub>HW</sub> + 0.4*O<sub>Project</sub>
 with the usual (arithmetical) rounding rule.
 [https://docs.google.com/spreadsheets/d/1MPWVIkgxyotHU-P5cE7Gik4C6RTWxTnAVK8Btl7Fw3Y/edit?usp=sharing '''Table with grades''']
-== Lectures ==
+== Course materials ==
-*[https://www.dropbox.com/s/a69ql9duo5jf5gt/Math%20of%20RL%20Lecture%201.pdf?dl=0 ''' Lecture 09.11''']
+*[https://www.overleaf.com/read/kbzmvxdzbrxq '''Lectures and seminars notes''']
-*[https://www.dropbox.com/s/7zkirk1xykua890/Math_of_RL_Le%20cture_2.pdf?dl=0 ''' Lecture 16.11''']
+*[https://colab.research.google.com/drive/10qBq7Ot_1ZpnTeD11P5AnE8jFVj0OLXl?usp=sharing '''Notebook for the first seminar''']
-== Seminars ==
-*[https://www.dropbox.com/s/wc951vseud1q1p2/Seminar_09_11_RL.pdf?dl=0 '''Seminar 09.11'''], [https://www.dropbox.com/s/2h83vbjgew1inen/Seminar_1_RL.mp4?dl=0 '''Seminar 09.11, Video'''], [https://www.dropbox.com/s/bxa8h9vjrnegsql/Bandit_intro_strategies_09_11_2021.ipynb?dl=0 '''Seminar 09.11, Notebook''']
-*[https://www.dropbox.com/s/cq0t2o6n4yn6oag/Seminar_16_11_RL.mp4?dl=0 '''Seminar 16.11, Video'''],
-*[https://www.dropbox.com/s/ex8v9w3smar70m7/Seminar_23_11_RL.mp4?dl=0 '''Seminar 23.11, Video'''],
-*[https://www.dropbox.com/s/v1ywnk8eyhourjq/Seminar_07_12_RL.mp4?dl=0 '''Seminar 07.12, Video'''],
 == Recommended literature ==
-'''Lecture and seminar 09.11'''
 * Sebastien Bubek, Nicolo Cesa-Bianchi. Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems. Chapter 2. http://sbubeck.com/SurveyBCB12.pdf
 * Richard S. Sutton, Andrew G. Barto. Reinforcement Learning: An Introduction. Chapter 2. http://incompleteideas.net/book/the-book-2nd.html;
 * Botao Hao et al. Bootstrapping Upper Confidence Bound. https://arxiv.org/abs/1906.05247
+* Aleksandrs Slivkins. Introduction to Multi-Armed Bandits. https://arxiv.org/abs/1904.07272 [Chapter 1]
-'''Lecture and seminar 16.11'''
-*[https://www.dropbox.com/s/wc951vseud1q1p2/Seminar_09_11_RL.pdf?dl=0 '''Seminar 09.11'''], [https://www.dropbox.com/s/2h83vbjgew1inen/Seminar_1_RL.mp4?dl=0 '''Seminar 09.11, Video'''],
 ==Homeworks ==
-*[https://www.dropbox.com/s/k2at9lixvshpcbw/HW_1_RL_2021.pdf?dl=0 '''Homework №1, deadline 19.12.2021, 23:59'''], [https://www.dropbox.com/s/l7pma6kwnopl856/HW_1_task_2.ipynb?dl=0 '''Environment for task №2'''],
+*[https://github.com/svsamsonov/Math_RL_2022_2023 '''HW #1, deadline: 04.12.22, 23:59''']
-*[https://www.dropbox.com/s/jynwji3dw3xxjww/HW_2_RL_2021.pdf?dl=0 '''Homework №2, deadline 19.12.2021, 23:59'''].
 == Projects ==

Reinforcement learning 2022 2023 — различия между версиями

Текущая версия на 14:39, 21 ноября 2022

Содержание

Lecturers and Seminarists

About the course

Grading

Course materials

Recommended literature

Homeworks

Projects

Навигация

Персональные инструменты

Пространства имён

Варианты

Просмотры

Действия

Поиск

Навигация

Инструменты