Into to DataMining and Machine Learning 2020 2021 — различия между версиями

Версия 20:13, 11 июня 2021

Lecturer: Dmitry Ignatov

TA: Stefan Nikolić

Final mark formula: FM = 0.8 Homeworks + 0.2 Exam.

Содержание

[убрать]

1 Homeworks
2 Lecture on 12 January 2021
3 Lecture on 19 January 2021
4 Lecture on 26 January 2021
5 Lecture on 2 February 2021
6 Lecture on 09 February 2021
7 Practice on 16 Feb 2021
8 Lecture on 2 March 2021
9 Lecture on 9 March 2021
10 Lecture + Practice on 16 March 2021
11 Practice on 6 April 2021
12 Lecture on 13 April 2021
13 Lecture + Practice on 25 April 2021
14 Lecture on 11 May 2021
15 Practice plus Lecture on 18 May 2021
16 Exam

Homeworks

Homework 1: Spectral Clustering
Homework 2: TBE soon.
Homework 3: Recommender Systems

Lecture on 12 January 2021

Intro slides. Course plan. Assessment criteria. ML&DM libraries. What to read and watch?

Practice: demonstration with Orange.

Lecture on 19 January 2021

Classification. One-rule. Naïve Bayes. kNN. Logistic Regression. Train-test split and cross-validation. Quality Metrics (TP, FP, TN, FN, Precision, Recall, F-measure, Accuracy).

Practice: demonstration with Orange.

Lecture on 26 January 2021

Classification (continued). Quality metrics. ROC curves.

Practice: demonstration with Orange.

Lecture on 2 February 2021

Introduction to Clustering. Taxonomy of clustering methods. K-means. K-medoids. Fuzzy C-means. Types of distance metrics. Hierarchical clustering. DBScan

Practice: DBScan Demo.

Lecture on 09 February 2021

Introduction to Clustering (continued). Density-based techniques. DBScan and Mean-shift.

Graph and spectral clustering. Min-cuts and normalized cuts. Laplacian matrix. Fiedler vector. Applications.

Practice on 16 Feb 2021

Clustering with scikit-learn (k-means, hierarchical clustering, DBScan, MeanShift, Spectral Clustering).

Lecture on 2 March 2021

Practice: Spectral clustering.

Lecture: Decision tree learning. ID3. Information Entropy. Information gain. Gini coefficient and index. Overfitting and pruning. Decision trees for numeric data. Oblivious decision trees. Regression trees.

Lecture on 9 March 2021

Frequent Itemsets. Association Rules. Algorithms: Apriori, FP-growth. Interestingness measures. Closed and maximal itemsets.

Lecture + Practice on 16 March 2021

Frequent Itemset Mining (continued). Applications: 1) Taxonomies of Website Visitors and 2) Web advertising.

Exercises. Frequent Itemsets. FP-growth. Closed itemsets.

Practice. Orange, SPMF, Concept Explorer.

Practice on 6 April 2021

Practice. Scikit-learn tutorial on kNN, Decision Trees, NaÏveBayes, Logistic Regression, SVM etc.

Lecture on 13 April 2021

Introduction to Recommender systems. Taxonomy of Recommender Systems (non-personalised, content-based, collaborative filtering, hybrid etc). Real Examples. User-based and item-based collaborative filtering. Bimodal cross-validation.

Lecture + Practice on 25 April 2021

Practice: User-based and item-based collaborative filtering with Python and MovieLens.

Case-study: Non-negative Matrix Factorisation, Boolean Matrix Factorisation vs. SVD in Collaborative Filtering.

Lecture: Advanced factorisation models: PureSVD, SVD++, timeSVD, ALS.

Lecture on 11 May 2021

Advanced factorisation models: Factorisation Machines (continued).
Supervised Ensemble Learning. Bias-Variance decomposition. Bagging. Random Forest. Boosting for classification (AdaBoost) and regression. Stacking and Blending. Recommendation of Classifiers.

Practice plus Lecture on 18 May 2021

Practice: Bagging, Pasting, Random Projections, and Patching. Random Forest and Extra Trees. Gradient Boosting. Voting.

Lecture on Gradient Boosting.

Exam

Date: 29.06.2021. Starting time: 11:00. Location: remote exam (see the channel announcements).

Questions.

What is and how does it work questions based on the studied topics.

Taxonomy of DM and ML methods.
Classification. One-rule and Decision Stumps. Decision Trees. ID3 algorithm.
Classification. Naïve Bayes. Smoothing.
Classification. KNN
Classification. Logistic regression.
Classification quality metrics. ROC and AUC.
Clustering. k-means and k-medoids. Fuzzy c-means.
Clustering. Hierarchical clustering.
Clustering. DBScan and Mean-Shift.
Clustering quality metrics. Silhouette. Elbow method. Cophenetic distance. Calinski and Harabasz score.
Spectral Clustering. Laplacian graph transformation and min-cuts.
Decision Trees. ID3. Information gain and Gini index.
Ensemble Learning. Bias and variance decomposition. Overfitting.
Ensemble Learning. Bagging.
Ensemble Learning. Boosting. AdaBoost.
Ensemble Learning. Random Forest.
Ensemble Learning. Gradient Boosting.
Data Mining. Frequent Itemset Mining and Association Rules. Interestinngess Measures. Closed and Maximal Itemsets.
Data Mining. Frequent Itemset Mining and Association Rules. Apriori vs. FP-growth.
Recommender Systems. Collaborative Filtering. Item-based and user-based techniques. Quality metrics and bimodal cross-validation.
Recommender Systems. NMF, Boolean Matrix Factorisation and SVD for Collaborative Filtering.
Recommender Systems. Advances in matrix factorisation: PureSVD, SVD++, timeSVD, ALS, Factorisation Machines.

Small tasks.

Examples of exercises with pen and pencil.

Given a small dataset 5 x 4, find its most informative attributes based on Information Gain and Gini Index.
Given a toy set of transactions, find no less than three association rules with a given support and confidence.
Given a tiny user-item table, find the top three recommendations for a given user by user-based and item-based approaches.
Given a little matrix of user-item interactions, find its product into Boolean matrices of preferably smaller second dimensions.

@@ Строка 4: / Строка 4: @@
-Final mark formula: FM = 0.8 Homeworks + 0.2 Exam.
+'''Final mark formula''': FM = 0.8 Homeworks + 0.2 Exam.
@@ Строка 10: / Строка 10: @@
 * Homework 1: Spectral Clustering
-* Homework 2:
+* Homework 2: TBE soon.
 * Homework 3: Recommender Systems
@@ Строка 94: / Строка 94: @@
 === Exam ===
-* Date: 29.06.2021. Starting time: 11:00. Location: remote exam.
+* '''Date''': 29.06.2021. '''Starting time''': 11:00. '''Location''': remote exam (see the channel announcements).
-* Questions.
+* '''Questions'''.
 What is and how does it work questions based on the studied topics.
@@ Строка 123: / Строка 123: @@
 # Recommender Systems. Advances in matrix factorisation: PureSVD, SVD++, timeSVD, ALS, Factorisation Machines.
-* Small tasks.
+* '''Small tasks'''.
 Examples of exercises with pen and pencil.

Into to DataMining and Machine Learning 2020 2021 — различия между версиями

Версия 20:13, 11 июня 2021

Содержание

Homeworks

Lecture on 12 January 2021

Lecture on 19 January 2021

Lecture on 26 January 2021

Lecture on 2 February 2021

Lecture on 09 February 2021

Practice on 16 Feb 2021

Lecture on 2 March 2021

Lecture on 9 March 2021

Lecture + Practice on 16 March 2021

Practice on 6 April 2021

Lecture on 13 April 2021

Lecture + Practice on 25 April 2021

Lecture on 11 May 2021

Practice plus Lecture on 18 May 2021

Exam

Навигация

Персональные инструменты

Пространства имён

Варианты

Просмотры

Действия

Поиск

Навигация

Инструменты