Into to DataMining and Machine Learning 2020 2021 — различия между версиями
Machine (обсуждение | вклад) (→Exam) |
Machine (обсуждение | вклад) |
||
Строка 4: | Строка 4: | ||
− | Final mark formula: FM = 0.8 Homeworks + 0.2 Exam. | + | '''Final mark formula''': FM = 0.8 Homeworks + 0.2 Exam. |
Строка 10: | Строка 10: | ||
* Homework 1: Spectral Clustering | * Homework 1: Spectral Clustering | ||
− | * Homework 2: | + | * Homework 2: TBE soon. |
* Homework 3: Recommender Systems | * Homework 3: Recommender Systems | ||
Строка 94: | Строка 94: | ||
=== Exam === | === Exam === | ||
− | * Date: 29.06.2021. Starting time: 11:00. Location: remote exam. | + | * '''Date''': 29.06.2021. '''Starting time''': 11:00. '''Location''': remote exam (see the channel announcements). |
− | * Questions. | + | * '''Questions'''. |
What is and how does it work questions based on the studied topics. | What is and how does it work questions based on the studied topics. | ||
Строка 123: | Строка 123: | ||
# Recommender Systems. Advances in matrix factorisation: PureSVD, SVD++, timeSVD, ALS, Factorisation Machines. | # Recommender Systems. Advances in matrix factorisation: PureSVD, SVD++, timeSVD, ALS, Factorisation Machines. | ||
− | * Small tasks. | + | * '''Small tasks'''. |
Examples of exercises with pen and pencil. | Examples of exercises with pen and pencil. |
Версия 20:13, 11 июня 2021
Lecturer: Dmitry Ignatov
TA: Stefan Nikolić
Final mark formula: FM = 0.8 Homeworks + 0.2 Exam.
Содержание
- 1 Homeworks
- 2 Lecture on 12 January 2021
- 3 Lecture on 19 January 2021
- 4 Lecture on 26 January 2021
- 5 Lecture on 2 February 2021
- 6 Lecture on 09 February 2021
- 7 Practice on 16 Feb 2021
- 8 Lecture on 2 March 2021
- 9 Lecture on 9 March 2021
- 10 Lecture + Practice on 16 March 2021
- 11 Practice on 6 April 2021
- 12 Lecture on 13 April 2021
- 13 Lecture + Practice on 25 April 2021
- 14 Lecture on 11 May 2021
- 15 Practice plus Lecture on 18 May 2021
- 16 Exam
Homeworks
- Homework 1: Spectral Clustering
- Homework 2: TBE soon.
- Homework 3: Recommender Systems
Lecture on 12 January 2021
Intro slides. Course plan. Assessment criteria. ML&DM libraries. What to read and watch?
Practice: demonstration with Orange.
Lecture on 19 January 2021
Classification. One-rule. Naïve Bayes. kNN. Logistic Regression. Train-test split and cross-validation. Quality Metrics (TP, FP, TN, FN, Precision, Recall, F-measure, Accuracy).
Practice: demonstration with Orange.
Lecture on 26 January 2021
Classification (continued). Quality metrics. ROC curves.
Practice: demonstration with Orange.
Lecture on 2 February 2021
Introduction to Clustering. Taxonomy of clustering methods. K-means. K-medoids. Fuzzy C-means. Types of distance metrics. Hierarchical clustering. DBScan
Practice: DBScan Demo.
Lecture on 09 February 2021
- Introduction to Clustering (continued). Density-based techniques. DBScan and Mean-shift.
- Graph and spectral clustering. Min-cuts and normalized cuts. Laplacian matrix. Fiedler vector. Applications.
Practice on 16 Feb 2021
Clustering with scikit-learn (k-means, hierarchical clustering, DBScan, MeanShift, Spectral Clustering).
Lecture on 2 March 2021
Practice: Spectral clustering.
Lecture: Decision tree learning. ID3. Information Entropy. Information gain. Gini coefficient and index. Overfitting and pruning. Decision trees for numeric data. Oblivious decision trees. Regression trees.
Lecture on 9 March 2021
Frequent Itemsets. Association Rules. Algorithms: Apriori, FP-growth. Interestingness measures. Closed and maximal itemsets.
Lecture + Practice on 16 March 2021
Frequent Itemset Mining (continued). Applications: 1) Taxonomies of Website Visitors and 2) Web advertising.
Exercises. Frequent Itemsets. FP-growth. Closed itemsets.
Practice. Orange, SPMF, Concept Explorer.
Practice on 6 April 2021
Practice. Scikit-learn tutorial on kNN, Decision Trees, NaÏveBayes, Logistic Regression, SVM etc.
Lecture on 13 April 2021
Introduction to Recommender systems. Taxonomy of Recommender Systems (non-personalised, content-based, collaborative filtering, hybrid etc). Real Examples. User-based and item-based collaborative filtering. Bimodal cross-validation.
Lecture + Practice on 25 April 2021
Practice: User-based and item-based collaborative filtering with Python and MovieLens.
Case-study: Non-negative Matrix Factorisation, Boolean Matrix Factorisation vs. SVD in Collaborative Filtering.
Lecture: Advanced factorisation models: PureSVD, SVD++, timeSVD, ALS.
Lecture on 11 May 2021
- Advanced factorisation models: Factorisation Machines (continued).
- Supervised Ensemble Learning. Bias-Variance decomposition. Bagging. Random Forest. Boosting for classification (AdaBoost) and regression. Stacking and Blending. Recommendation of Classifiers.
Practice plus Lecture on 18 May 2021
Practice: Bagging, Pasting, Random Projections, and Patching. Random Forest and Extra Trees. Gradient Boosting. Voting.
Lecture on Gradient Boosting.
Exam
- Date: 29.06.2021. Starting time: 11:00. Location: remote exam (see the channel announcements).
- Questions.
What is and how does it work questions based on the studied topics.
- Taxonomy of DM and ML methods.
- Classification. One-rule and Decision Stumps. Decision Trees. ID3 algorithm.
- Classification. Naïve Bayes. Smoothing.
- Classification. KNN
- Classification. Logistic regression.
- Classification quality metrics. ROC and AUC.
- Clustering. k-means and k-medoids. Fuzzy c-means.
- Clustering. Hierarchical clustering.
- Clustering. DBScan and Mean-Shift.
- Clustering quality metrics. Silhouette. Elbow method. Cophenetic distance. Calinski and Harabasz score.
- Spectral Clustering. Laplacian graph transformation and min-cuts.
- Decision Trees. ID3. Information gain and Gini index.
- Ensemble Learning. Bias and variance decomposition. Overfitting.
- Ensemble Learning. Bagging.
- Ensemble Learning. Boosting. AdaBoost.
- Ensemble Learning. Random Forest.
- Ensemble Learning. Gradient Boosting.
- Data Mining. Frequent Itemset Mining and Association Rules. Interestinngess Measures. Closed and Maximal Itemsets.
- Data Mining. Frequent Itemset Mining and Association Rules. Apriori vs. FP-growth.
- Recommender Systems. Collaborative Filtering. Item-based and user-based techniques. Quality metrics and bimodal cross-validation.
- Recommender Systems. NMF, Boolean Matrix Factorisation and SVD for Collaborative Filtering.
- Recommender Systems. Advances in matrix factorisation: PureSVD, SVD++, timeSVD, ALS, Factorisation Machines.
- Small tasks.
Examples of exercises with pen and pencil.
- Given a small dataset 5 x 4, find its most informative attributes based on Information Gain and Gini Index.
- Given a toy set of transactions, find no less than three association rules with a given support and confidence.
- Given a tiny user-item table, find the top three recommendations for a given user by user-based and item-based approaches.
- Given a little matrix of user-item interactions, find its product into Boolean matrices of preferably smaller second dimensions.