Data Science for Business 2020
Содержание
About the Course
Data Science for Business. MAGoLEGO course.
Spring 2020. Module 4.
Department of Data Analysis and Artificial Intelligence, School of Computer Science.
Join our telegram channel Data science for business.
Instructors
Links
- Alternative Course website [1]
- Lectures link https://zoom.us/j/7723819319 Fridays, 6.10pm - 7.30pm
- Seminars link https://zoom.us/j/636910206 Fridays, 7.40pm - 9.00pm
Course outline
- Introduction to data science
- Data mining, statistics, machine learning, optimization
- Case studies
- Increasing business impact
Content
№ | Date | Title | Abstract |
---|---|---|---|
1 | 10.04.2020 | Introduction to data science. | Introduction to data science and its role in industry. Examples of real world use cases. |
2 | 17.04.2020 | Working with data. | Data cleaning and preparation. ETL process. Basic data analysis and visualization. |
3 | 24.04.2020 | Data mining, machine learning, statistics | Types of ML algorithms, applicability, training and testing, solution quality. |
4 | 15.05.2020 | Case study 1: Pricing | The goal of the case is to compute price elasticity.
Algorithms: Supervised learning: linear and non-linear regression, predicting continuous variable. Dimensionality reduction: PCA. |
5 | 22.05.2020 | Case study 2: Churn modeling | The goal of the case is to predict which customers are going to leave the service within a given time.
Algorithms: Supervised learning. Classification: Logistic regression, Decision trees, Random forest. |
6 | 29.05.2020 | Case study 3: Customer segmentation | The goal of the case is to group customers into clusters based on some customer similarity metrics.
Algorithms: clustering – k-means, agglomerative, dimensionality reduction - PCA. |
7 | 05.06.2020 | Case study 4: Personalizaton | The goal of the case is to build a recommender system.
Algorithms: association rules and collaborative filtering. |
8 | 12.06.2020 | Impacting the business | How to create a visible impact on business with analytics |
Seminar's materials
Seminar 2. video., RM processes COVID regression, COVID, Fisher's Iris, COVID days since 50 confirmed cases, Iris depivot example
Seminar 3. video. Handling categorical values. Handling missing values. Titanic prediction on train-test setting.
Seminar 4. video RM processes Walmart preprocessing Walmart regression GridSearch Seminar plan
Seminar 5. video. Seminar presentation.Telecom Churn data RM process
Seminar 6. video. Plan for the Seminar 6 small_mnist transactions; RM processes: digits_clustering_dr stores_clustering
Lecture's materials
Home assignments
Google doc with Q&A about Home Assignment tasks (contributed by students).
Google doc with grades.
1. Analyze COVID dataset
due to Monday, April 27, 8 am 23:59 Moscow time.
- Home assignment description.
- Starter process.
- Total Cases, Deaths, Recovered on April 19, 2020.
- Google form to submit your solution.
- HA 1. Solution
2. Analyze Titanic dataset
May 11, 8am, Moscow time.
due to Friday, May 8, 8 am Moscow time. (there will be no extensions)
- Home assignment description, starting with page 3.
- Titanic dataset
- Google form to submit your solution.
- HA 2. Solution
3. Analyze Walmart Sales dataset
May 27, 23:59 Moscow time.
due to Friday, May 25, 8 am Moscow time. (there will be no extensions)
- Home assignment description, starting with page 6.
- Walmart data
- Google form to submit your solution.
4. Predict customers churn
due to Friday, June 5, 23:59 Moscow time. (there will be no extensions) Monday, June 8, 23:59 Moscow
- Home assignment description, starting with page 14.
- data for the assignment.
- Google form to submit your solution.
5. Cluster items and build a recommender system.
due to Wednesday, June 17, 23:59 Moscow time.
- Home assignment description, starting with page 19.
- Google form to submit your solution will be posted later.
Textbooks
- Provost, Foster, Fawcett, Tom. Data Science for Business: What you need to know about data mining and data-analytic thinking. O'Reilly Media, Inc.", 2013.
- James, G. et al. An introduction to statistical learning. Springer, 2013.
- Siegel, E. Predictive analytics: The power to predict who will click, buy, lie, or die. John Wiley & Sons, 2016.
Software
- For online lectures and seminars. zoom
- Modelling package. RapidMiner
Apply for educational version https://rapidminer.com/get-started-educational/
- Email: Enter your university email (end with @edu.hse.ru)
- Job Function: Student
- University: Higher School of Economics
- Course Name: Data Science for Business
- Course Number: https://www.hse.ru/edu/courses/341840822
- Course Term: Summer Term
- Professor: Leonid Zhukov