Data analysis (Software Engineering) 2020 — различия между версиями

Материал из Wiki - Факультет компьютерных наук
Перейти к: навигация, поиск
(не показано 15 промежуточных версии 3 участников)
Строка 22: Строка 22:
  
 
== Course Schedule (3rd module)==
 
== Course Schedule (3rd module)==
===Seminars===
 
'''Dates: Tuesdays (..)'''
 
* Group BPI-xxx, xx:xx-xx:xx, Room xxx
 
 
 
===Lectures===
 
===Lectures===
'''Dates: Mondays (..)'''
+
'''Mondays'''
* xx:xx-xx:xx, Room xxx
+
* 10:30-11:50, Room R205
 
+
[Complete Schedule of Software Engineering]
+
  
 
== Lecture materials ==
 
== Lecture materials ==
  
 
'''Lecture 1. Introduction to data science and machine learning ''' <br/>
 
'''Lecture 1. Introduction to data science and machine learning ''' <br/>
[https://shestakoff.github.io/hse_se_ml/2019/l1-intro/lecture-intro.slides#/ Slides] <br/>
+
[https://shestakoff.github.io/hse_se_ml/2020/l01-intro/lecture-intro.slides#/ Slides] <br/>
 +
 
 +
'''Lecture 2. Metric-based methods. K-NN ''' <br/>
 +
[https://shestakoff.github.io/hse_se_ml/2020/l02-knn/lecture-knn.slides#/ Slides] <br/>
 +
 
 +
'''Lecture 3. Decision Trees''' <br/>
 +
[https://shestakoff.github.io/hse_se_ml/2020/l03-trees/lecture-trees.slides#/ Slides] <br/>
 +
 
 +
'''Lecture 4. Linear Regression''' <br/>
 +
[https://shestakoff.github.io/hse_se_ml/2020/l04-linreg/lecture-linreg.slides#/ Slides] <br/>
  
 
== Seminars ==
 
== Seminars ==
  
 
'''Seminar 1. Introduction to Data Analysis in Python '''<br/>
 
'''Seminar 1. Introduction to Data Analysis in Python '''<br/>
[https://github.com/shestakoff/hse_se_ml/tree/master/2019/s1-intro Practice in class] <br/>
+
[https://github.com/shestakoff/hse_se_ml/tree/master/2020/s01-intro-to-python Practice in class] <br/>
 +
[https://github.com/shestakoff/hse_se_ml/blob/master/2020/s01-intro-to-python/seminar1-homework.ipynb Homework 1] '''Due Date: 28.01.2020 23:59'''<br/>
 +
 
 +
'''Seminar 2. Metric-based methods. K-NN'''<br/>
 +
[https://github.com/shestakoff/hse_se_ml/tree/master/2020/s02-metric-based-methods%20 Practice in class] <br/>
 +
[https://github.com/shestakoff/hse_se_ml/blob/master/2020/s02-metric-based-methods%20/seminar2-homework.ipynb Homework 2] '''Due Date: 04.02.2020 23:59'''<br/>
 +
 
 +
'''Seminar 3. Decision Trees'''<br/>
 +
[https://github.com/shestakoff/hse_se_ml/tree/master/2020/s03-decision-trees Practice in class] <br/>
 +
[https://github.com/shestakoff/hse_se_ml/blob/master/2020/s03-decision-trees//seminar3-homework.ipynb Homework 3] '''Due Date: 01.03.2020 23:59'''<br/>
 +
 
 +
'''Seminar 4. Linear Regression'''<br/>
 +
[https://github.com/shestakoff/hse_se_ml/tree/master/2020/s04-linear-regression Practice in class] <br/>
 +
 
 +
== Theoretical questions for the colloquium ==
 +
 
 +
[https://github.com/shestakoff/hse_se_ml/blob/master/2020/s02-metric-based-methods%20/knn_theory.pdf Metric-based methods. K-NN] <br/>
 +
[https://github.com/shestakoff/hse_se_ml/blob/master/2020/s03-decision-trees/trees_theory.pdf Decision Trees] <br/>
 +
[https://github.com/shestakoff/hse_se_ml/blob/master/2020/s04-linear-regression/linreg_theory.pdf Linear Regression] <br/>
 +
 
 +
 
  
 
== Evaluation criteria ==
 
== Evaluation criteria ==

Версия 23:30, 22 февраля 2020

Slack Invite Link
Anonymous feedback form: here
Previous Course Page
Course repo



Course description

In this class we consider the main problems of data mining and machine learning: classification, clustering, regression, dimensionality reduction, ranking, collaborative filtering. We will also study mathematical methods and concepts which data analysis is based on as well as formal assumptions behind them and various aspects of their implementation.

A significant attention is given to practical skills of data analysis that will be developed on seminars by studying the Python programming language and relevant libraries for scientific computing.

The knowledge of linear algebra, real analysis and probability theory is required.

The class consists of:

  1. Lectures and seminars
  2. Practical and theoretical homework assignments
  3. A machine learning competition (more information will be available later)
  4. Midterm theoretical colloquium
  5. Final exam

Course Schedule (3rd module)

Lectures

Mondays

  • 10:30-11:50, Room R205

Lecture materials

Lecture 1. Introduction to data science and machine learning
Slides

Lecture 2. Metric-based methods. K-NN
Slides

Lecture 3. Decision Trees
Slides

Lecture 4. Linear Regression
Slides

Seminars

Seminar 1. Introduction to Data Analysis in Python
Practice in class
Homework 1 Due Date: 28.01.2020 23:59

Seminar 2. Metric-based methods. K-NN
Practice in class
Homework 2 Due Date: 04.02.2020 23:59

Seminar 3. Decision Trees
Practice in class
Homework 3 Due Date: 01.03.2020 23:59

Seminar 4. Linear Regression
Practice in class

Theoretical questions for the colloquium

Metric-based methods. K-NN
Decision Trees
Linear Regression


Evaluation criteria

The course lasts during the 3rd and 4th modules. Knowledge of students is assessed by evaluation of their home assignments and exams. There are two exams during the course – after the 3rd module and after the 4th module respectively. Each of the exams evaluates theoretical knowledge and understanding of the material studied during the respective module.

Grade takes values 4,5,…10. Grades, corresponding to 1,2,3 are assumed unsatisfactory. Exact grades are calculated using the following rule:

  • score ≥ 35% => 4,
  • score ≥ 45% => 5,
  • ...
  • score ≥ 95% => 10,

where score is calculated using the following rule:

score = 0.7 * Scumulative + 0.3 * Sexam2
cumulative score = 0.8 * Shomework + 0.2 * Sexam1 + 0.2 * Scompetition

  • Shomework – proportion of correctly solved homework,
  • Sexam1 – proportion of successfully answered theoretical questions during exam after module 3,
  • Sexam2 – proportion of successfully answered theoretical questions during exam after module 4,
  • Scompetition – score for the competition in machine learning (it's also from 0 to 1).

Participation in machine learning competition is optional and can give students extra points.
"Automative" passing of the course based on cumulative score may be issued.

Plagiarism

In case of discovered plagiarism zero points will be set for the home assignemets - for both works, which were found to be identical. In case of repeated plagiarism by one and the same person a report to the dean will be made.

Deadlines

Assignments sent after late deadlines will not be scored (assigned with zero score) in the absence of legitimate reasons for late submission which do not include high load on other classes.

Structure of emails and homework submissions

Practical assignments must be implemented in Jupyter Notebook format, theoretical ones in pdf. Practical assignments must use Python 3 (or Python 3 compatible). Use your surname as a filename for assignments (e.g. Ivanov.ipynb). Do not archive your assignments.

Assignments can be performed in either Russian or English.

Assignments can be submitted only once!

Useful links

Machine learning, Stats, Maths

Python

Python installation and configuration

anaconda