Reinforcement learning 2021 2022 — различия между версиями

Материал из Wiki - Факультет компьютерных наук
Перейти к: навигация, поиск
(Новая страница: «== Lecturers and Seminarists == {| class="wikitable" style="text-align:center" |- || Lecturer || [https://www.hse.ru/staff/anaumov Naumov Alexey ] || [anaumov@hs…»)
 
 
(не показано 13 промежуточных версии этого же участника)
Строка 3: Строка 3:
 
{| class="wikitable" style="text-align:center"
 
{| class="wikitable" style="text-align:center"
 
|-
 
|-
|| Lecturer || [https://www.hse.ru/staff/anaumov Naumov Alexey ] || [anaumov@hse.ru] || T924
+
|| Lecturer || [https://www.hse.ru/staff/anaumov Alexey Naumov] || [anaumov@hse.ru] || T924
 
|-  
 
|-  
 
|| Lecturer || [https://www.hse.ru/org/persons/93130881 Denis Belomestny ] || [dbelomestny@hse.ru] || T924
 
|| Lecturer || [https://www.hse.ru/org/persons/93130881 Denis Belomestny ] || [dbelomestny@hse.ru] || T924
 
|-  
 
|-  
|| Seminarist || [https://www.hse.ru/org/persons/219484540 Samsonov Sergey ] || [svsamsonov@hse.ru] || T926
+
|| Seminarist || [https://www.hse.ru/org/persons/219484540 Sergey Samsonov] || [svsamsonov@hse.ru] || T926
 
|-
 
|-
 
|| Seminarist || [https://www.hse.ru/staff/mkaledin Maxim Kaledin ] || [mkaledin@hse.ru] || T926
 
|| Seminarist || [https://www.hse.ru/staff/mkaledin Maxim Kaledin ] || [mkaledin@hse.ru] || T926
Строка 27: Строка 27:
  
 
== Lectures ==
 
== Lectures ==
*[https://www.dropbox.com/s/xvf1x6v2frm3k9c/Seminar_11_09_stochan.pdf?dl=0 '''Lecture 09.11''']
+
*[https://www.dropbox.com/s/a69ql9duo5jf5gt/Math%20of%20RL%20Lecture%201.pdf?dl=0 ''' Lecture 09.11''']
 +
*[https://www.dropbox.com/s/7zkirk1xykua890/Math_of_RL_Le%20cture_2.pdf?dl=0 ''' Lecture 16.11''']
  
 
== Seminars ==
 
== Seminars ==
*[https://www.dropbox.com/s/i5g7a1pnbsnwclm/Seminar_18_09.pdf?dl=0 '''Seminar 09.11'''], [https://www.dropbox.com/s/i5g7a1pnbsnwclm/Seminar_18_09.pdf?dl=0 '''Seminar 09.11, Video''']
+
*[https://www.dropbox.com/s/wc951vseud1q1p2/Seminar_09_11_RL.pdf?dl=0 '''Seminar 09.11'''], [https://www.dropbox.com/s/2h83vbjgew1inen/Seminar_1_RL.mp4?dl=0 '''Seminar 09.11, Video'''], [https://www.dropbox.com/s/bxa8h9vjrnegsql/Bandit_intro_strategies_09_11_2021.ipynb?dl=0 '''Seminar 09.11, Notebook''']
 +
*[https://www.dropbox.com/s/cq0t2o6n4yn6oag/Seminar_16_11_RL.mp4?dl=0 '''Seminar 16.11, Video'''],
 +
*[https://www.dropbox.com/s/ex8v9w3smar70m7/Seminar_23_11_RL.mp4?dl=0 '''Seminar 23.11, Video'''],
 +
*[https://www.dropbox.com/s/v1ywnk8eyhourjq/Seminar_07_12_RL.mp4?dl=0 '''Seminar 07.12, Video'''],
 +
 
 +
== Recommended literature ==
 +
 
 +
'''Lecture and seminar 09.11'''
 +
 
 +
* Sebastien Bubek, Nicolo Cesa-Bianchi. Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems. Chapter 2. http://sbubeck.com/SurveyBCB12.pdf
 +
* Richard S. Sutton, Andrew G. Barto. Reinforcement Learning: An Introduction. Chapter 2. http://incompleteideas.net/book/the-book-2nd.html;
 +
* Botao Hao et al. Bootstrapping Upper Confidence Bound. https://arxiv.org/abs/1906.05247
 +
 
 +
'''Lecture and seminar 16.11'''
 +
*[https://www.dropbox.com/s/wc951vseud1q1p2/Seminar_09_11_RL.pdf?dl=0 '''Seminar 09.11'''], [https://www.dropbox.com/s/2h83vbjgew1inen/Seminar_1_RL.mp4?dl=0 '''Seminar 09.11, Video'''],
  
 
==Homeworks ==
 
==Homeworks ==
 +
*[https://www.dropbox.com/s/k2at9lixvshpcbw/HW_1_RL_2021.pdf?dl=0 '''Homework №1, deadline 19.12.2021, 23:59'''], [https://www.dropbox.com/s/l7pma6kwnopl856/HW_1_task_2.ipynb?dl=0 '''Environment for task №2'''],
 +
*[https://www.dropbox.com/s/jynwji3dw3xxjww/HW_2_RL_2021.pdf?dl=0 '''Homework №2, deadline 19.12.2021, 23:59'''].
  
 
== Projects ==
 
== Projects ==
 
== Recommended literature (1st term) ==
 
*http://www.statslab.cam.ac.uk/~james/Markov/ - Cambridge lecture notes on discrete-time Markov Chains
 
*https://link.springer.com/book/10.1007%2F978-3-319-97704-1 - book by E. Moulines et al, you are mostly interested in chapters 1,2,7 and 9 (book is accessible for download through HSE network)
 
*https://link.springer.com/book/10.1007%2F978-3-319-62226-2 - Stochastic Calculus by P. Baldi, good overview of conditional probabilities and expectations (part 4, also accessible through HSE network)
 
*https://link.springer.com/book/10.1007%2F978-1-4419-9634-3 - Probability for Statistics and Machine Learning by A. Dasgupta, chapter 19 (MCMC), also accessible through HSE network
 

Текущая версия на 22:49, 14 декабря 2021

Lecturers and Seminarists

Lecturer Alexey Naumov [anaumov@hse.ru] T924
Lecturer Denis Belomestny [dbelomestny@hse.ru] T924
Seminarist Sergey Samsonov [svsamsonov@hse.ru] T926
Seminarist Maxim Kaledin [mkaledin@hse.ru] T926

About the course

This page contains materials for Mathematical Foundations of Reinforcement learning course in 2021/2022 year, optional one for 2nd year Master students of the Math of Machine Learning program (HSE and Skoltech).

Grading

The final grade consists of 2 components (each is non-negative real number from 0 to 10, without any intermediate rounding) :

  • OHW for the hometasks
  • OProject for the course project

The formula for the final grade is

  • OFinal = 0.5*OHW + 0.5*OProject

with the usual (arithmetical) rounding rule.

Table with grades

Lectures

Seminars

Recommended literature

Lecture and seminar 09.11

Lecture and seminar 16.11

Homeworks

Projects