Reinforcement learning 2022 2023 — различия между версиями

Материал из Wiki - Факультет компьютерных наук
Перейти к: навигация, поиск
 
(не показано 5 промежуточных версии этого же участника)
Строка 24: Строка 24:
 
== Course materials ==
 
== Course materials ==
 
*[https://www.overleaf.com/read/kbzmvxdzbrxq '''Lectures and seminars notes''']
 
*[https://www.overleaf.com/read/kbzmvxdzbrxq '''Lectures and seminars notes''']
 +
*[https://colab.research.google.com/drive/10qBq7Ot_1ZpnTeD11P5AnE8jFVj0OLXl?usp=sharing '''Notebook for the first seminar''']
  
 
== Recommended literature ==
 
== Recommended literature ==
 
'''Lecture and seminar 09.11'''
 
  
 
* Sebastien Bubek, Nicolo Cesa-Bianchi. Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems. Chapter 2. http://sbubeck.com/SurveyBCB12.pdf
 
* Sebastien Bubek, Nicolo Cesa-Bianchi. Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems. Chapter 2. http://sbubeck.com/SurveyBCB12.pdf
 
* Richard S. Sutton, Andrew G. Barto. Reinforcement Learning: An Introduction. Chapter 2. http://incompleteideas.net/book/the-book-2nd.html;
 
* Richard S. Sutton, Andrew G. Barto. Reinforcement Learning: An Introduction. Chapter 2. http://incompleteideas.net/book/the-book-2nd.html;
 
* Botao Hao et al. Bootstrapping Upper Confidence Bound. https://arxiv.org/abs/1906.05247
 
* Botao Hao et al. Bootstrapping Upper Confidence Bound. https://arxiv.org/abs/1906.05247
 
+
* Aleksandrs Slivkins. Introduction to Multi-Armed Bandits. https://arxiv.org/abs/1904.07272 [Chapter 1]
'''Lecture and seminar 16.11'''
+
*[https://www.dropbox.com/s/wc951vseud1q1p2/Seminar_09_11_RL.pdf?dl=0 '''Seminar 09.11'''], [https://www.dropbox.com/s/2h83vbjgew1inen/Seminar_1_RL.mp4?dl=0 '''Seminar 09.11, Video'''],
+
  
 
==Homeworks ==
 
==Homeworks ==
*[https://www.dropbox.com/s/k2at9lixvshpcbw/HW_1_RL_2021.pdf?dl=0 '''Homework №1, deadline 19.12.2021, 23:59'''], [https://www.dropbox.com/s/l7pma6kwnopl856/HW_1_task_2.ipynb?dl=0 '''Environment for task №2'''],
+
*[https://github.com/svsamsonov/Math_RL_2022_2023 '''HW #1, deadline: 04.12.22, 23:59''']
*[https://www.dropbox.com/s/jynwji3dw3xxjww/HW_2_RL_2021.pdf?dl=0 '''Homework №2, deadline 19.12.2021, 23:59'''].
+
  
 
== Projects ==
 
== Projects ==

Текущая версия на 14:39, 21 ноября 2022

Lecturers and Seminarists

Lecturer Alexey Naumov [anaumov@hse.ru] T924
Seminarist Sergey Samsonov [svsamsonov@hse.ru] T926

About the course

This page contains materials for Mathematical Foundations of Reinforcement learning course in 2022/2023 year, optional one for 2nd year Master students of the Math of Machine Learning program (HSE and Skoltech).

Grading

The final grade consists of 2 components (each is non-negative real number from 0 to 10, without any intermediate rounding) :

  • OHW for the hometasks
  • OProject for the course project

The formula for the final grade is

  • OFinal = 0.6*OHW + 0.4*OProject

with the usual (arithmetical) rounding rule.

Table with grades

Course materials

Recommended literature

Homeworks

Projects