Data Science Case Studies (JD SAS) 21/22

Материал из Wiki - Факультет компьютерных наук
Перейти к: навигация, поиск
Capture.PNG

Timetable of classes

Classes on saturday

Link to all lectures and workshops: https://zoom.us/j/99220349786?pwd=WjFZTEFiQzA4b1lzVmVDbXdmNVMwUT09

Conference ID: 992 2034 9786
Code: 476860

9:30 - Seminar in Russian - Economics 3 year (with "Открытие")
11:10 - Seminar in Russian - Economics 4 year (with "Открытие")
13:00 - English lecture (recording) - DSBA and ICEF
14:40 - Russian lecture (recording) - AMI, Economics
16:20 - Russian seminar (recording) - AMI, Economics
18:10 - Russian seminar (recording) - DSBA1 and ICEF
19:40 - English seminar - DSBA2

We ask you to rename and write your name as «Prefix_Last name First name» choosing prefix from (ICEF, DSBA, AMI, Economics), when attending lectures and seminars on Zoom. For example, "DSBA_Oxlong Mike".

About course

This page contains links to materials for 2021/2022 course for educational programs:

Educational program Year Faculty Link to the DSCS page
Applied Mathematics and Computer Science 3 year Faculty of Computer Science, NRU HSE Business Data Analytics
Applied Data Analysis 3 year Faculty of Computer Science, NRU HSE Business Data Analytics
Economics 3 year Faculty of Economic Sciences, NRU HSE Business Data Analytics
Economics 4 year Faculty of Economic Sciences, NRU HSE Business Data Analytics
Economics and Statistics 3 year Faculty of Economic Sciences, NRU HSE Business Data Analytics
Economics and Statistics 4 year Faculty of Economic Sciences, NRU HSE Business Data Analytics
Double Degree Program in Economics at the Higher School of Economics and
the University of London
3 курс International Institute of Economics and Finance NRU HSE No


Additional links:

Course program

The first module will be read for the 3rd and 4th courses - an overview of the sections:

  • Client analytics;
  • Text analytics;
  • Tasks of data analysis in retail sales networks of goods;
  • Fundamentals of risk assessment;
  • Model Ops.

The first module will be able to immerse students in current business issues, as well as in particular data analysis and analytical model building for each section of the module. In this module, students will be introduced to the SAS software.

The second module is a team project for the 3rd course only.

Students will be divided into groups of 2-7 people and each group will be given a practical task. This module will allow students to gain practical experience in data analysis, development and building analytical models on real data.

Studying and using SAS software in «Data Analytics in Business» course

To perform practical tasks, a student is free of choice to pick any of the given software tools: SAS, R, Python.

Students who plan to perform practical tasks on SAS platform, may take advanced online courses for free.

To access the course, you must contact the course instructor - Natalia Titova via Telegram.

Links to access the software SAS -
https://sas-viya.cs.hse.ru/SASStudioV/main?locale=en_US
https://sas-viya.cs.hse.ru/SASStudioV/main?locale=ru_RU

If a student has completed all the practical tasks on the SAS and passed the course with excellence, then he will receive:

  • academic SAS program completion certificate
  • Acclaim electronic badge confirming completion of the course and a list of technologies used by SAS

All interested students can take basic SAS online courses for free:

Students who are willing to spend extra time learning to program in SAS can try to take a professional certification within the SCYP program for free (SAS® Software Certified Young Professionals) link to the course.

Лекции

 суббота      
ПАД и МИЭФ - 13:00 - 14:30;
ПМИ, Экономика, Экономика и статистика, МК - 14:40 - 16:10

ссылка на подключение к лекции - https://zoom.us/j/99220349786?pwd=WjFZTEFiQzA4b1lzVmVDbXdmNVMwUT09
Идентификатор конференции: 992 2034 9786
Код доступа: 476860
Название раздела Тема Дата для 3 и 4 курса Презентация Запись
Клиентская аналитика Введение в клиентскую и онлайн аналитику 15.01.2022 Лекция №1 - рус
Клиентская аналитика Построение прогнозных моделей и визуализация данных 22.01.2022
Текстовая аналитика Введение в задачи анализа текстовых данных 29.01.2022
Текстовая аналитика Инструменты и методы текстовой аналитики 05.02.2022
Задачи анализа данных в розничных сетях продаж товаров Введение в задачи анализа данных в ритейле. Прогнозирование спроса 12.02.2022
Задачи анализа данных в розничных сетях продаж товаров Описательная аналитика в Ритейл: кластеризация магазинов, сегментация товаров, восстановление спроса 19.02.2022
Задачи анализа данных в розничных сетях продаж товаров Задачи оптимизации запасов товаров в ритейл-сети, оптимизация цен, оптимизация ассортимента 26.02.2022
Основы оценки рисков Введение в кредитные риски 5.03.2022
Основы оценки рисков Введение в рыночные риски 12.03.2022
Основы оценки рисков Валидация моделей 19.03.2022
ModelOps Операционализация моделей машинного обучения 26.03.2022

Семинары

суббота   

Группа ФЭН 3 курс - 9:30,
Группы ФЭН 4 курс - 11:10,
Группа ПМИ+МК - 16:20,
Группа ПАД 1 и МИЭФ - 18:10,
Группа ПАД 2 - 19:40

ссылка на подключение к семинару - https://zoom.us/j/99220349786?pwd=WjFZTEFiQzA4b1lzVmVDbXdmNVMwUT09
Идентификатор конференции: 992 2034 9786
Код доступа: 476860

ссылка на доп.материалы - к семинарским занятиям

Название раздела Тема Дата для ПМИ, ПАД, МИЭФ и МК Дата для ФЭН Презентация Запись
Клиентская аналитика Введение в клиентскую и онлайн аналитику 15.01.2022 22.01.2022 Семинар №1 - рус
Клиентская аналитика Построение прогнозных моделей и визуализация данных 22.01.2022 29.01.2022
Текстовая аналитика Введение в задачи анализа текстовых данных 29.01.2022 05.02.2022
Текстовая аналитика Инструменты и методы текстовой аналитики 05.02.2022 12.02.2022
Задачи анализа данных в розничных сетях продаж товаров Введение в задачи анализа данных в ритейле. Прогнозирование спроса 12.02.2022 19.02.2022
Задачи анализа данных в розничных сетях продаж товаров Описательная аналитика в Ритейл: кластеризация магазинов, сегментация товаров, восстановление спроса 19.02.2022 26.02.2022
Задачи анализа данных в розничных сетях продаж товаров Задачи оптимизации запасов товаров в ритейл-сети, оптимизация цен, оптимизация ассортимента 26.02.2022 5.03.2022
Основы оценки рисков Введение в кредитные риски 5.03.2022 12.03.2022
Основы оценки рисков Введение в рыночные риски 12.03.2022 19.03.2022
Основы оценки рисков Валидация моделей 19.03.2022 26.03.2022
ModelOps Операционализация моделей машинного обучения 26.03.2022 2.04.2022

Course report and grade evaluation

The course includes several forms of knowledge control:

  • 3 practical homework assignments
  • Written exam, questions in the form of a test with multiple choice
  • Team project (only for 3rd-year students)

Criteria for assessing knowledge, skills

  • All homework assignments are graded on a 2-point scale, where «2» — task is fully completed, «1» — the task is not completely solved or with slight mistakes, «0» — task is not solved or solved incorrectly.

In case if the homework is divided into several parts, then each part is evaluated on a 2-point scale as described above, and then the marks are averaged with equal weights without rounding.

The transfer of grades for homework from a 2-point scale to a 10-point scale is done by means of multiplying the grade by 5 without any rounding.

  • The grade for the exam is set on a 10-point scale.
  • The grade for the team project is also set on a 10-point scale.

The order of the formation of grades for the discipline

Let's denote the grades for 3 homework assignments on a 10-point scale — O_1,O_2,O_3, and the grade for the exam at the end of the 1st module on a 10-point scale — O_ex.


The final grade for 4th-year students O_final is evaluated by the following formula

O_final = 0.225 * O_1 + 0.225 * O_2 + 0.225 * O_3 + 0.325 * O_ex


The final grade for 3rd-year students in the 1st module O_mod is evaluated by the following formula

O_mod = 0.1 * O_1 + 0.1 * O_2 + 0.1 * O_3 + 0.2 * O_ex

The grade for the project in the 2nd module O_prj is set on a 10-point scale by means of the project defense.

The final grade O_final is defined by the formula O_final = O_mod + 0.5 * O_prj


Rounding occurs only at the very end - in the final grade, i.e. arithmetic rounding.

Each task and exam is evaluated on a 10-point scale (fractional marks are allowed for tasks). For some tasks, it will be possible to receive bonus points, which will be announced when the task is issued.

Home assignments

Home assignment #1

The home task #1 consists of 2 parts:

1. Data research and data processing for subsequent segmentation;

2. Making customer profiles based on segmentations (use at least 2 segmentation methods).

An example of a detailed description of Home assignment №1 2020-2021 with examples and results read in the attached file

Each student chooses the version that is indicated opposite his name in the list link

Versions and data description are presented in the folder at the link

In order to receive an assessment, you need:

1. Send an archive with files where all calculations were made and a cover letter with conclusions and comments on each part:

  • Calculations can be done using code (python/sas/sql), pivot tables and formulas in excel or SAS Viya project;
  • All conclusions must be supported by visually interpretable graphs and data.

2. The archive (.zip) with the files must be sent to ntitova@hse.ru with the email subject “FCS HSE”

3. The file name must be sent according to the template <First Name>_<Last name>_<group number>_hw1.zip.

For example, Alexander_Sharipov_156_hw1

Grade for Home assignment #1 is given on a 10-point scale, where:

"8-10" - the task is completely solved, all 2 parts of homework are completed:

  • data analysis was carried out, a code is working, and tables for data research were provided;
  • constructed segmentation by 2 methods;
  • provided clear conclusions with confirmed data (tables, graphs);

"6-7" - the task is solved incompletely or with shortcomings:

  • data analysis was carried out, a working code and tables for data research were provided;
  • built segmentation using at least one method;
  • provided clear conclusions with confirmed data (tables, graphs);

"4-5" - the task was solved with significant shortcomings,

  • data analysis was carried out, a working code and tables for data research were provided;
  • revealed top-level dependencies and patterns for customers without building a segmentation model;

"0-3" - the task is not solved or solved incorrectly.

Deadline – 2 weeks (February 19, 2022 23:59).

Home assignment #2

Description of Home assignment #2 is in the file link

Deadline - March 12, 2022 23:59. Solutions should be sent to aromanenko@hse.ru

The subject of the letter must contain the following: HSE + Course number + version_number + full name.

The file name must include:

  • Course number
  • Version number
  • Full name

Example: "AMIS_3course_Version_8_IvanovIvanIvanovich"

Home assignment #3

It is necessary to build a scoring model that assesses the probability of a client default at the stage of making a loan application. To do this you need:

0. Download data from the link https://drive.google.com/drive/u/0/folders/16CMyPnLu7Fv7IgsYOZimQK-7MaFZEWEZ

Each student selects 2 data samples "accept" and "reject", which start with the student's HW version. Version numbers for HW #3 must be taken the same as for HW #1.

The completed task must be sent in the following form:

1) File/scripts with built models (there must be comments, without comments the task is considered as unresolved)

2) Excel file with answers to the following questions:

1. What is the proportion of 1 in the "accept" sample?

2. It is necessary to calculate the following for all interval variables:

- Proportion of missing values
- Median
- Mean
- Standard deviation
- Are there any abnormal values, outliers?
- Information Value

3. It is necessary to calculate for all categorical variables the following:

- Mode
- Proportion of missing values
- Information Value
- Are there outliers, abnormal values?

4. Build logistic regression only on approved applications with transformed WoE variables. What is the meaning of GINI? F1 measure?

5. Conduct a Reject Inference analysis. What is the percentage of rejected applications?

6. Build a logistic regression on all applications with transformed WoE variables. What GINI, F1 mean? Has the model changed?

7. What model would you recommend for implementation in a productive environment? Give a detailed explanation

Assessment for HW #3 is set on a 2-point scale, where "2" - the task is solved completely, "1" - the task is not completely solved or with shortcomings, "0" - the task is not solved or solved incorrectly. The transfer of grades for homework from a 2-point scale to a 10-point scale is carried out by multiplying the grade by 5 without rounding.

For HW #3, marks will be given: "2" - the model is correctly built on both the accept and reject samples. Correct answers given. "1.6" - the model is correctly built both on the accept sample and on the reject sample. 50% of answers are correct. "1.4" - the model is correctly built both on the accept sample and on the reject sample. Wrong answers are given. "1" are given - the model is built only on the accept sample. "0.8" - the task is not completely solved. 50% of answers are correct. "0" - the task is not solved or solved incorrectly.

Deadline - until 26.03.2022 (including 26.03.2022)

Decisions should be sent to msvorobeva@hse.ru The subject of the letter must contain the following: HSE + Course number + version_number + full name.

The file name must include:

  • Course number
  • Version number
  • Full name

Example: "AMIS_3course_Version_8_IvanovIvanIvanovich"

If the works will repeat each other, both works will be considered unresolved.

Team project for 3rd course

Choosing a topic for a team project on the course "Data Science Case Studies (JD SAS)"

Students are divided into groups of 3 people. In the future, 2-3 groups can be combined into one. Within the group, the team captain must be determined - the responsible person of the group.

The responsible person is responsible for the performance and result of the following main functions:

  • sending an application for a project topic and agreeing on a project topic on behalf of the whole group with professors;
  • providing information to the curator about the current status of the project: accounting, distribution and control of the implementation of the project task;
  • sending reports, presentations, technical documentation on the work performed by the group in electronic form.

Project topic descriptions: link to file

Link to the application form on the topic: link to file


Topic selection deadlines:

Until April 17, the group must select two topics from the specified list, indicating the first and second priority for these topics.

April 18 - each group that has applied for the project, through the specified form, will be assigned a project topic and curator. While assigning topics, the priorities indicated by the students will be taken into account, as well as the average score of students in the group for the first module of the course. No more than 3 groups of students are assigned to one topic.

If you have any questions, write to Natalia Titova on Telegram.


The result of each project should be a presentation of the results of the project for 10-15 minutes.

The defense of the project for DSBA and ICEF will be in mid-May (May 17-22), as well as for everyone who is ready to defend themselves.

The final defense for AMIS, Economics, Economics and Statistics will take place in mid-June (June 15-19) before the start of the session.

The assessment for the project in the second module is set on a 10-point scale based on the results of the defense of the project.

Those students who did not find a group for themselves also apply for a topic, but for themselves alone. We ourselves will connect with a group on a similar topic.


Dates for the defense of educational projects in module 4:

Project defenses will be held from 17 to 19 June 2021.

In the file you will find a link to the list of project groups with reference to the date and time of project defense, as well as links to defenses.

Team projects - dates for defenses by groups and links to join.

How to ask a question about the course

Questions about the course can be asked in the telegram chat of the course, the course professor Natalia Titova @Natalitics or the manager of the SAS department Tatiana Lobok @tatianalobok (tlobok@hse.ru).

Telegram channel for announcements: https://t.me/+Lj-yHhfNJTQxYWEy

Chat in telegram for discussions: https://t.me/+a1VMTe2xwNA5Mzcy


All announcements and course materials will be posted in the telegram chat and in the telegram channel! There are professors in the chat, but not always. For all important questions, you should write to Natalia Titova in the telegram chat @Natalitics or mail Natalia.Titova@sas.com. Be sure to add the tag [AMIS FCS HSE/DSBA FCS HSE/ICEF FCS HSE/Ec FES HSE/EcSt FES HSE] to the title of the letter, and also indicate your last name and first name.

All files provided are intended for use by students during their studies and are updated throughout the year. If you find any typos, inaccuracies, malfunctions of the page, please send email tlobok@hse.ru.

Course materials

Documents and program of the course Attention: files are being updated!

  • The working program of the discipline for the 3rd and 4th courses document is located at the following link.

Recommended literature and useful additional materials

Useful materials


Useful literature

Section 1:


Section 2:

  • Шапиро Дж (2006). Моделирование цепи поставок. Питер. Серия «Теория менеджмента».
  • Tijms H.C., Groenevelt H. (1984). Simple approximations for the reorder point in periodic and continuous review (s, S) inventory systems with service level constraints. European Journal of Operational Research, Vol. 17, Issue 2, August 1984, Pages 175-190.]


Section 3:

  • Christoffersen P. (2012) Elements of Financial Risk Management. 2nd ed. Elseiver Academic Press.

Section 4:

  • Мортон С. (2016) Лаборатория презентаций. Формула идеального выступления. Альпина Паблишер.

Contacts

микро
Titova Natalia Nikolaevna - Senior tutor

Natalia.Titova@sas.com



микро
Lobok Tatiana Sergeevna - Manager of the basic SAS Department

tlobok@hse.ru