Data Science for Business 2020 — различия между версиями

Материал из Wiki - Факультет компьютерных наук
Перейти к: навигация, поиск
Строка 97: Строка 97:
 
3. Analyze Walmart Sales dataset
 
3. Analyze Walmart Sales dataset
 
<br><span style="color:#DC143C"> due to Friday, May 25, 8 am Moscow time. </span> (there will be no extensions)  
 
<br><span style="color:#DC143C"> due to Friday, May 25, 8 am Moscow time. </span> (there will be no extensions)  
* Home assignment [https://docs.google.com/document/d/1WMIyh9opbYrK7cWfwHiTcnhOud3l--Sh2VPeVF7QUbc/edit?usp=sharing description], starting with page 5.
+
* Home assignment [https://docs.google.com/document/d/1WMIyh9opbYrK7cWfwHiTcnhOud3l--Sh2VPeVF7QUbc/edit?usp=sharing description], starting with page 6.
 
* [https://www.kaggle.com/c/walmart-recruiting-store-sales-forecasting/data Walmart data]
 
* [https://www.kaggle.com/c/walmart-recruiting-store-sales-forecasting/data Walmart data]
 
* Google form to submit your solution (will be published later)
 
* Google form to submit your solution (will be published later)

Версия 20:22, 16 мая 2020


About the Course

Data Science for Business. MAGoLEGO course.

Spring 2020. Module 4.

Department of Data Analysis and Artificial Intelligence, School of Computer Science.
Join our telegram channel Data science for business.

Instructors

Prof. Leonid Zhukov

Ilya Makarov

Anvar Kurmukov

Links

Course outline

  • Introduction to data science
  • Data mining, statistics, machine learning, optimization
  • Case studies
  • Increasing business impact

Content

Date Title Abstract
1 10.04.2020 Introduction to data science. Introduction to data science and its role in industry. Examples of real world use cases.
2 17.04.2020 Working with data. Data cleaning and preparation. ETL process. Basic data analysis and visualization.
3 24.04.2020 Data mining, machine learning, statistics Types of ML algorithms, applicability, training and testing, solution quality.
4 15.05.2020 Case study 1: Pricing The goal of the case is to compute price elasticity.

Algorithms: Supervised learning: linear and non-linear regression, predicting continuous variable. Dimensionality reduction: PCA.

5 22.05.2020 Case study 2: Churn modeling The goal of the case is to predict which customers are going to leave the service within a given time.

Algorithms: Supervised learning. Classification: Logistic regression, Decision trees, Random forest.

6 29.05.2020 Case study 3: Customer segmentation The goal of the case is to group customers into clusters based on some customer similarity metrics.

Algorithms: clustering – k-means, agglomerative, dimensionality reduction - PCA.

7 05.06.2020 Case study 4: Personalizaton The goal of the case is to build a recommender system.

Algorithms: association rules and collaborative filtering.

8 12.06.2020 Case study 5. Demand forecasting The goal of the case is to develop demand forecasting model.

Algorithms: ARIMA, sliding window regression.

9 19.06.2020 Case study 6. Fraud detection The goal of the case is to find abnormal customer transactions.

Algorithms: anomaly detection.

10 26.06.2020 Impacting the business How to create a visible impact on business with analytics


Seminar's materials

Seminar 1. video.

Seminar 2. video., RM processes COVID regression, COVID, Fisher's Iris, COVID days since 50 confirmed cases, Iris depivot example

Seminar 3. video. Handling categorical values. Handling missing values. Titanic prediction on train-test setting.

Seminar 4. video RM processes Walmart preprocessing Walmart regression GridSearch Seminar plan

Home assignments


Google doc with Q&A about Home Assignment tasks (contributed by students).

1. Analyze COVID dataset
due to Monday, April 27, 8 am 23:59 Moscow time.


2. Analyze Titanic dataset
due to Friday, May 8, 8 am Moscow time. (there will be no extensions)
May 11, 8am, Moscow time.


3. Analyze Walmart Sales dataset
due to Friday, May 25, 8 am Moscow time. (there will be no extensions)

  • Home assignment description, starting with page 6.
  • Walmart data
  • Google form to submit your solution (will be published later)

Textbooks

  • Provost, Foster, Fawcett, Tom. Data Science for Business: What you need to know about data mining and data-analytic thinking. O'Reilly Media, Inc.", 2013.
  • James, G. et al. An introduction to statistical learning. Springer, 2013.
  • Siegel, E. Predictive analytics: The power to predict who will click, buy, lie, or die. John Wiley & Sons, 2016.

Software

  • For online lectures and seminars. zoom
  • Modelling package. RapidMiner


Apply for educational version https://rapidminer.com/get-started-educational/

  • Email: Enter your university email (end with @edu.hse.ru)
  • Job Function: Student
  • University: Higher School of Economics
  • Course Name: Data Science for Business
  • Course Number: https://www.hse.ru/edu/courses/341840822
  • Course Term: Summer Term
  • Professor: Leonid Zhukov