Data Science for Business 2020 — различия между версиями

Материал из Wiki - Факультет компьютерных наук
Перейти к: навигация, поиск
(First draft)
 
(Seminar's materials)
 
(не показано 77 промежуточных версии 2 участников)
Строка 1: Строка 1:
 +
 +
 
== About the Course ==
 
== About the Course ==
  
 
Data Science for Business. MAGoLEGO course.
 
Data Science for Business. MAGoLEGO course.
  
Spring 2020. Module 4
+
Spring 2020. Module 4.
  
 
Department of Data Analysis and Artificial Intelligence, School of Computer Science.
 
Department of Data Analysis and Artificial Intelligence, School of Computer Science.
 +
<br><span style="color:#DC143C">Join our telegram channel </span> [https://t.me/joinchat/ENzQEhr-hra2WhEjxvgayw Data science for business.]
  
 
===Instructors===
 
===Instructors===
  
[https://www.hse.ru/staff/lzhukov Prof. Leonid Zhukov]
+
[https://www.hse.ru/staff/lzhukov Prof. Leonid Zhukov]  
  
 
[https://www.hse.ru/staff/iamakarov Ilya Makarov]
 
[https://www.hse.ru/staff/iamakarov Ilya Makarov]
  
 
[https://www.hse.ru/staff/intergalactic_admiral/ Anvar Kurmukov]
 
[https://www.hse.ru/staff/intergalactic_admiral/ Anvar Kurmukov]
 +
 +
===Links===
 +
* Alternative Course website [http://www.leonidzhukov.net/hse/2020/datascience/]
 +
* Lectures link https://zoom.us/j/7723819319 Fridays, 6.10pm - 7.30pm
 +
* Seminars link https://zoom.us/j/636910206 Fridays, 7.40pm - 9.00pm
  
 
===Course outline===
 
===Course outline===
Строка 34: Строка 42:
 
| 3 ||  24.04.2020          ||  Data mining, machine learning, statistics || Types of ML algorithms, applicability, training and testing, solution quality.   
 
| 3 ||  24.04.2020          ||  Data mining, machine learning, statistics || Types of ML algorithms, applicability, training and testing, solution quality.   
 
|-
 
|-
| 4 ||  15.05.2020        || Case study 1: Customer segmentation||  The goal of the case is to group customers into clusters based on some customer similarity metrics.  
+
| 4 ||  15.05.2020        || Case study 1: Pricing||  The goal of the case is to compute price elasticity.  
'''Algorithms''': Unsupervised learning. Clustering: k-means, agglomerative;  Dimensionality reduction: PCA.         
+
'''Algorithms''': Supervised learning: linear and non-linear regression, predicting continuous variable. Dimensionality reduction: PCA.         
 
|-
 
|-
 
| 5 ||    22.05.2020        ||Case study 2: Churn modeling||The goal of the case is to predict which customers are going to leave the service within a given time.  
 
| 5 ||    22.05.2020        ||Case study 2: Churn modeling||The goal of the case is to predict which customers are going to leave the service within a given time.  
 
'''Algorithms''': Supervised learning. Classification: Logistic regression, Decision trees, Random forest.
 
'''Algorithms''': Supervised learning. Classification: Logistic regression, Decision trees, Random forest.
 
|-
 
|-
| 6 || 29.05.2020  ||Case study 3: Pricing||The goal of the case is to determine the optimal pricing for goods and services.  
+
| 6 || 29.05.2020  ||Case study 3: Customer segmentation ||The goal of the case is to group customers into clusters based on some customer similarity metrics.
'''Algorithms''': Supervised learning. Regression: linear and non-linear models.
+
'''Algorithms''': clustering – k-means, agglomerative, dimensionality reduction - PCA.
 
|-
 
|-
| 7 || 05.06.2020||Case study 4: Industrial analytics||The goal of the case is to predict an output of the production line and find optimal parameter setting.  
+
| 7 || 05.06.2020||Case study 4: Personalizaton ||The goal of the case is to build a recommender system.
'''Algorithms''': Supervised learning. Regression: non-linear optimization.
+
'''Algorithms''': association rules and collaborative filtering.
 
|-
 
|-
| 8 || 12.06.2020 || Case study 5. Sales territory design ||The goal of the case is to select locations of the sales offices to maximize the coverage under constrained resources.
+
| 8 || 12.06.2020 ||Impacting the business ||How to create a visible impact on business with analytics
'''Algorithms''': clustering and geo-analytics approaches.
+
|-
+
| 9 || 19.06.2020 ||Impacting the business ||How to create a visible impact on business with analytics
+
 
|-
 
|-
 
|}
 
|}
 +
 +
===Seminar's materials===
 +
 +
[https://yadi.sk/i/eJm4z1dwLOAXhw Seminar 1. video. ]
 +
 +
[https://yadi.sk/d/aAYt0omgokKuRg Seminar 2. video.], RM processes [https://yadi.sk/d/ij1bgNTd7VFJQQ COVID regression], [https://yadi.sk/d/XvLpO7frlD_l-A COVID], [https://yadi.sk/d/NhmsEpnWYQgshg Fisher's Iris], [https://yadi.sk/d/RO3RX1PPP2XoSw COVID days since 50 confirmed cases], [https://yadi.sk/d/TdmY-qZt_Uu-pQ Iris depivot example]
 +
 +
[https://yadi.sk/i/l2LhZSooi8RkEA Seminar 3. video.] [https://yadi.sk/d/BJw8HSYJMoC5qQ Handling categorical values]. [https://yadi.sk/d/AmL3HXa-fuxS9g Handling missing values]. [https://yadi.sk/d/STx2V0-rgYlybA Titanic prediction on train-test setting.]
 +
 +
[https://youtu.be/lsK5rK7-WvI Seminar 4. video] RM processes [https://yadi.sk/d/BVRGT0fqpwdUTQ Walmart preprocessing] [https://yadi.sk/d/nw76wzn1x7e1VQ Walmart regression] [https://yadi.sk/d/ig5j_tHesiRg0A GridSearch] [https://docs.google.com/document/d/1Ptj7J1ikOVsuGmY5rPNFCo0p3hrDcLyQAwotDxxbBiQ/edit?usp=sharing Seminar plan]
 +
 +
[https://youtu.be/mcaic7sgz3M Seminar 5. video.] [https://docs.google.com/presentation/d/1uCC1xNon8OpWg3jzRSXqaM8xpV11DFdGl8bWsUVJ-ho/edit?usp=sharing Seminar presentation.][https://yadi.sk/d/E4ToVUBnC4dong Telecom Churn data] [https://yadi.sk/d/jdv5incZra30Fw RM process]
 +
 +
[https://youtu.be/vZYSp36_0o4 Seminar 6. video.] [https://docs.google.com/document/d/1Ptj7J1ikOVsuGmY5rPNFCo0p3hrDcLyQAwotDxxbBiQ/edit?usp=sharing Plan for the Seminar 6]  [https://yadi.sk/d/rl-7BVlMrhKVkA small_mnist] [https://yadi.sk/d/JPPZfxE1A0SWkg transactions]; RM processes: [https://yadi.sk/d/H1uzIN09w3FDfg digits_clustering_dr] [https://yadi.sk/d/EXrPElNnv4SMWA stores_clustering]
 +
 +
[https://youtu.be/YOVGgGQq0rM Seminar 7. video.] [https://yadi.sk/d/IbzLqpgD0wLccw RM process]
 +
 +
===Lecture's materials===
 +
 +
[https://www.youtube.com/playlist?list=PLriUvS7IljvlcLnrvYUyNc9nXhiM9kWjq Lecture's youtube playlist]
 +
 +
===Home assignments===
 +
<br><span style="color:#228B22"> Google doc with [https://docs.google.com/document/d/1Sk-nr5owlKf8MYgdmItit47WLBvWVzMCyc3G_sRLxfs/edit?usp=sharing Q&A] about Home Assignment tasks (contributed by students).</span>
 +
 +
[https://docs.google.com/spreadsheets/d/1jPHcaTL0CeeaF79VvNqFF3-RvS2XGERNY2dlWmV5aUs/edit#gid=1218360773 Google doc] with grades.
 +
 +
1. Analyze COVID dataset
 +
<br><span style="color:#DC143C"> due to Monday, April 27, <s>8 am</s> 23:59 Moscow time. </span>
 +
* Home assignment [https://docs.google.com/document/d/1WMIyh9opbYrK7cWfwHiTcnhOud3l--Sh2VPeVF7QUbc/edit?usp=sharing description].
 +
* Starter [https://yadi.sk/d/jWZDCPhOq_CbuA process.]
 +
* [https://yadi.sk/d/SST3aJE6e4nSzQ Total Cases], [https://yadi.sk/d/pFx64HkmX93UWw Deaths], [https://yadi.sk/d/2cixg4bpOfKbMQ Recovered] on April 19, 2020.
 +
* [https://docs.google.com/forms/d/e/1FAIpQLScCQkT4XB9z74r2mZMttY9IFqlQe0RTVgIU8pjj-u3bUBwGig/viewform?usp=sf_link Google form] to submit your solution.
 +
* [https://yadi.sk/d/VwfTrIUMltfabA HA 1. Solution]
 +
 +
 +
2. Analyze Titanic dataset
 +
<s><br><span style="color:#DC143C"> due to Friday, May 8, 8 am Moscow time. </span> (there will be no extensions)</s> May 11, 8am, Moscow time.
 +
* Home assignment [https://docs.google.com/document/d/1WMIyh9opbYrK7cWfwHiTcnhOud3l--Sh2VPeVF7QUbc/edit?usp=sharing description], starting with page 3.
 +
* [https://yadi.sk/d/1d6rarln1ybqNQ Titanic dataset]
 +
* [https://docs.google.com/forms/d/e/1FAIpQLSecCpGrn6e_j5KS8rRoKCMKy0Zy0f2wIaiMFxwHRWecxHH1Nw/viewform?usp=sf_link Google form] to submit your solution.
 +
* [https://yadi.sk/d/5aCD_7VGo8GUdw HA 2. Solution]
 +
 +
 +
3. Analyze Walmart Sales dataset
 +
<s><br><span style="color:#DC143C"> due to Friday, May 25, 8 am Moscow time. </span> (there will be no extensions)</s> May 27, 23:59 Moscow time.
 +
* Home assignment [https://docs.google.com/document/d/1WMIyh9opbYrK7cWfwHiTcnhOud3l--Sh2VPeVF7QUbc/edit?usp=sharing description], starting with page 6.
 +
* [https://www.kaggle.com/c/walmart-recruiting-store-sales-forecasting/data Walmart data]
 +
* [https://docs.google.com/forms/d/e/1FAIpQLSfgZl2UCwr8oDxfaeinV0xjlYCGjQ3OQWdbOeWxCHLQwP9_sA/viewform?usp=sf_link Google form] to submit your solution.
 +
 +
 +
4. Predict customers churn
 +
 +
<s><span style="color:#DC143C"> due to Friday, June 5, 23:59 Moscow time. </span> (there will be no extensions)</s> Monday, June 8, 23:59 Moscow
 +
* Home assignment [https://docs.google.com/document/d/1WMIyh9opbYrK7cWfwHiTcnhOud3l--Sh2VPeVF7QUbc/edit?usp=sharing description], starting with page 14.
 +
* [https://yadi.sk/d/Fhs7pElrkdhW-w data] for the assignment.
 +
* [https://docs.google.com/forms/d/e/1FAIpQLSfwilwkleJSqAB1F2OMPDgyGBQmC3Z4su9IdmJbI9NGNzyxGA/viewform?usp=sf_link Google form] to submit your solution.
 +
 +
 +
5. Cluster items and build a recommender system.
 +
 +
<span style="color:#DC143C"> due to Wednesday, June 17, 23:59 Moscow time. </span>
 +
* Home assignment [https://docs.google.com/document/d/1WMIyh9opbYrK7cWfwHiTcnhOud3l--Sh2VPeVF7QUbc/edit?usp=sharing description], starting with page 19.
 +
* [https://docs.google.com/forms/d/e/1FAIpQLSfnHymipXfjTTnrvat9vDOs5L6tQtA8WJY4JQ-7DTCY7pexyQ/viewform?usp=sf_link Google form] to submit your solution will be posted later.
  
 
===Textbooks===
 
===Textbooks===
Строка 60: Строка 128:
  
 
===Software===
 
===Software===
[https://rapidminer.com/ RapidMiner]
+
 
 +
*For online lectures and seminars. [https://zoom.us/ zoom]
 +
*Modelling package. [https://rapidminer.com/ RapidMiner]
 +
<br><span style="color:#DC143C"> Apply for educational version https://rapidminer.com/get-started-educational/ </span>
 +
 
 +
*Email: Enter your university email (end with @edu.hse.ru)
 +
*Job Function: Student
 +
*University: Higher School of Economics
 +
*Course Name: Data Science for Business
 +
*Course Number: https://www.hse.ru/edu/courses/341840822
 +
*Course Term: Summer Term
 +
*Professor: Leonid Zhukov

Текущая версия на 17:14, 9 декабря 2020


About the Course

Data Science for Business. MAGoLEGO course.

Spring 2020. Module 4.

Department of Data Analysis and Artificial Intelligence, School of Computer Science.
Join our telegram channel Data science for business.

Instructors

Prof. Leonid Zhukov

Ilya Makarov

Anvar Kurmukov

Links

Course outline

  • Introduction to data science
  • Data mining, statistics, machine learning, optimization
  • Case studies
  • Increasing business impact

Content

Date Title Abstract
1 10.04.2020 Introduction to data science. Introduction to data science and its role in industry. Examples of real world use cases.
2 17.04.2020 Working with data. Data cleaning and preparation. ETL process. Basic data analysis and visualization.
3 24.04.2020 Data mining, machine learning, statistics Types of ML algorithms, applicability, training and testing, solution quality.
4 15.05.2020 Case study 1: Pricing The goal of the case is to compute price elasticity.

Algorithms: Supervised learning: linear and non-linear regression, predicting continuous variable. Dimensionality reduction: PCA.

5 22.05.2020 Case study 2: Churn modeling The goal of the case is to predict which customers are going to leave the service within a given time.

Algorithms: Supervised learning. Classification: Logistic regression, Decision trees, Random forest.

6 29.05.2020 Case study 3: Customer segmentation The goal of the case is to group customers into clusters based on some customer similarity metrics.

Algorithms: clustering – k-means, agglomerative, dimensionality reduction - PCA.

7 05.06.2020 Case study 4: Personalizaton The goal of the case is to build a recommender system.

Algorithms: association rules and collaborative filtering.

8 12.06.2020 Impacting the business How to create a visible impact on business with analytics

Seminar's materials

Seminar 1. video.

Seminar 2. video., RM processes COVID regression, COVID, Fisher's Iris, COVID days since 50 confirmed cases, Iris depivot example

Seminar 3. video. Handling categorical values. Handling missing values. Titanic prediction on train-test setting.

Seminar 4. video RM processes Walmart preprocessing Walmart regression GridSearch Seminar plan

Seminar 5. video. Seminar presentation.Telecom Churn data RM process

Seminar 6. video. Plan for the Seminar 6 small_mnist transactions; RM processes: digits_clustering_dr stores_clustering

Seminar 7. video. RM process

Lecture's materials

Lecture's youtube playlist

Home assignments


Google doc with Q&A about Home Assignment tasks (contributed by students).

Google doc with grades.

1. Analyze COVID dataset
due to Monday, April 27, 8 am 23:59 Moscow time.


2. Analyze Titanic dataset
due to Friday, May 8, 8 am Moscow time. (there will be no extensions)
May 11, 8am, Moscow time.


3. Analyze Walmart Sales dataset
due to Friday, May 25, 8 am Moscow time. (there will be no extensions)
May 27, 23:59 Moscow time.


4. Predict customers churn

due to Friday, June 5, 23:59 Moscow time. (there will be no extensions) Monday, June 8, 23:59 Moscow


5. Cluster items and build a recommender system.

due to Wednesday, June 17, 23:59 Moscow time.

  • Home assignment description, starting with page 19.
  • Google form to submit your solution will be posted later.

Textbooks

  • Provost, Foster, Fawcett, Tom. Data Science for Business: What you need to know about data mining and data-analytic thinking. O'Reilly Media, Inc.", 2013.
  • James, G. et al. An introduction to statistical learning. Springer, 2013.
  • Siegel, E. Predictive analytics: The power to predict who will click, buy, lie, or die. John Wiley & Sons, 2016.

Software

  • For online lectures and seminars. zoom
  • Modelling package. RapidMiner


Apply for educational version https://rapidminer.com/get-started-educational/

  • Email: Enter your university email (end with @edu.hse.ru)
  • Job Function: Student
  • University: Higher School of Economics
  • Course Name: Data Science for Business
  • Course Number: https://www.hse.ru/edu/courses/341840822
  • Course Term: Summer Term
  • Professor: Leonid Zhukov