Into to DataMining and Machine Learning 2020 2021 — различия между версиями

Материал из Wiki - Факультет компьютерных наук
Перейти к: навигация, поиск
(Lecture on 16 January 2021)
(Lecture on 12 January 2021)
Строка 13: Строка 13:
  
 
Intro slides. Course plan. Assessment criteria. ML&DM libraries. What to read and watch?
 
Intro slides. Course plan. Assessment criteria. ML&DM libraries. What to read and watch?
 +
 +
Practice: demonstration with Orange.
 +
 +
 +
=== Lecture on 19 January 2021===
 +
 +
Classification. One-rule. Naïve Bayes. kNN. Logistic Regression. Train-test split and cross-validation. Quality Metrics (TP, FP, TN, FN, Precision, Recall, F-measure, Accuracy).
  
 
Practice: demonstration with Orange.
 
Practice: demonstration with Orange.

Версия 19:44, 11 июня 2021

Lecturer: Dmitry Ignatov

TA: Stefan Nikolić


Homeworks

  • Homework 1: Spectral Clustering
  • Homework 2:
  • Homework 3: Recommender Systems

Lecture on 12 January 2021

Intro slides. Course plan. Assessment criteria. ML&DM libraries. What to read and watch?

Practice: demonstration with Orange.


Lecture on 19 January 2021

Classification. One-rule. Naïve Bayes. kNN. Logistic Regression. Train-test split and cross-validation. Quality Metrics (TP, FP, TN, FN, Precision, Recall, F-measure, Accuracy).

Practice: demonstration with Orange.

Lecture on 26 January 2021

Classification (continued). Quality metrics. ROC curves.

Practice: demonstration with Orange.

Lecture on 2 February 2021

Introduction to Clustering. Taxonomy of clustering methods. K-means. K-medoids. Fuzzy C-means. Types of distance metrics. Hierarchical clustering. DBScan

Practice: DBScan Demo.

Lecture on 09 February 2021

  • Introduction to Clustering (continued). Density-based techniques. DBScan and Mean-shift.
  • Graph and spectral clustering. Min-cuts and normalized cuts. Laplacian matrix. Fiedler vector. Applications.

Practice on 16 Feb 2021

Clustering with scikit-learn (k-means, hierarchical clustering, DBScan, MeanShift, Spectral Clustering).

Lecture on 2 March 2021

Practice: Spectral clustering.

Lecture: Decision tree learning. ID3. Information Entropy. Information gain. Gini coefficient and index. Overfitting and pruning. Decision trees for numeric data. Oblivious decision trees. Regression trees.

Lecture on 9 March 2021

Frequent Itemsets. Association Rules. Algorithms: Apriori, FP-growth. Interestingness measures. Closed and maximal itemsets.

Lecture + Practice on 16 March 2021

Frequent Itemset Mining (continued). Applications: 1) Taxonomies of Website Visitors and 2) Web advertising.

Exercises. Frequent Itemsets. FP-growth. Closed itemsets.

Practice. Orange, SPMF, Concept Explorer.

Practice on 6 April 2021

Practice. Scikit-learn tutorial on kNN, Decision Trees, NaÏveBayes, Logistic Regression, SVM etc.

Lecture on 13 April 2021

Introduction to Recommender systems. Taxonomy of Recommender Systems (non-personalised, content-based, collaborative filtering, hybrid etc). Real Examples. User-based and item-based collaborative filtering. Bimodal cross-validation.

Lecture + Practice on 25 April 2021

Practice: User-based and item-based collaborative filtering with Python and MovieLens.

Case-study: Non-negative Matrix Factorisation, Boolean Matrix Factorisation vs. SVD in Collaborative Filtering.

Lecture: Advanced factorisation models: PureSVD, SVD++, timeSVD, ALS.

Lecture on 11 May 2021

  • Advanced factorisation models: Factorisation Machines (continued).
  • Supervised Ensemble Learning. Bias-Variance decomposition. Bagging. Random Forest. Boosting for classification (AdaBoost) and regression. Stacking and Blending. Recommendation of Classifiers.

Practice plus Lecture on 18 May 2021

Practice: Bagging, Pasting, Random Projections, and Patching. Random Forest and Extra Trees. Gradient Boosting. Voting.

Lecture on Gradient Boosting.