Data Analysis in Python 2020-2021

Материал из Wiki - Факультет компьютерных наук
Версия от 17:17, 4 мая 2021; Marina Ananyeva (обсуждение | вклад)

(разн.) ← Предыдущая | Текущая версия (разн.) | Следующая → (разн.)
Перейти к: навигация, поиск

About the course

The course is conducted for students of Bachelor’s Programme 'HSE and University of London Parallel Degree Programme in International Relations'.

Abstract: In this course students are introduced to the rapidly growing field of data analytics with the specific focus on Python programming language. Students will learn concepts, techniques and tools they need to make meaningful inferences from data. Students will be exposed to a real-world data sets to gain practical skills in data manipulations. Each week will involve seminars and coding simulations. In the final project students will build a working code that can be readily applied for exploratory data analysis in their own (future) research domain.

Syllabus: open

Required Software

  • Anaconda (Python version >= 3.8)
  • Jupyter Notebook

How to install Anaconda on Mac OS
How to install Anaconda on Windows


Presentations and all materials will be available immediately after each practice class. Additional materials will be used in quizzes at each next seminar.

Github with the materials from our practical classes:

Week Topic Slides Tutorial Additional Materials Assignment Deadline
1 Introduction Intro Slides How to install Anaconda on Mac OS,

How to install Anaconda on Windows

1. Official Python tutorial & documentation
2. Coursera. Python for Everybody Specialization
3. Coursera. Crash Course on Python
4. Snakify. A lot of online exercises in Python
No assignment this time. Yay!
2 Input, output. Numbers, strings. Arithmetical operations - Tutorial 1. PEP8 Style Guide
2. Python Numbers Exercises
3. Input and Output in Python
HA1 23:59, February 7, 2021
3 Lists and tuples. For & while loops - Tutorial
4 Dictionaries, sets, strings. - Tutorial HA2 23:59, February 18, 2021
5 Functions - Tutorial HA3 23:59, March 3, 2021
6 In-class Assignment 1 - - Assignments of all groups
7 Introduction to data analysis, files processing - Tutorial How to read the most commonly used files -
8 Pandas. Part 1 - Tutorial Pandas Community Tutorials HA4 23:59, 25 March 2021
9 Pandas. Part 2 - Tutorial

Pandas Exercises on different topics

10 Web scraping & parsing - Tutorial
11 In-class Assignment 2 - - - Assignment for all groups April 8, 23:59/ April 10, 23:59
12 Statistical hypotheses - Tutorial
13 Intro to logistic regression - Tutorial
14 Group projects (presentations) Submission link , Instructions May 18, 9:00 a.m.


The course consists of 8 home assignments (10 pts/each), each of them performed individually. Short home assignments will be published almost every week after Week 2 (weeks 2, 3, 4, 8, 9, 10, 13, 14) based on the materials of the previous practical classes.

There will be held 2 in-class assignments (10 pts/each) in the format of problem-solving tasks and coding in Python using an online platform (e.g. Yandex Contes or Github Classroom). Problem set 1 deals with the basics of working in Python with data types and data structures, problem set 2 involves performing tasks on data exploratory analysis and visualization.

Each task is checked for plagiarism. Matching more than 25% of the code will be considered plagiarism and will result in 1 point out of 10 with the right to appeal. If the code matches more than 40%, the job will be canceled (0 points) without the right to appeal. After the deadline for each assignment, during the next week, each student will be offered a convenient time for her/him for participating in a conference in Zoom with a lecturer and TA to answer questions on code and explanations of solutions.

Assignment title standard: Please, name your files with solutions in this format: Assignment # _ # Number # _ # Group number # _ # Name # _ # Surname #. Example: HA1_Morty_Smith_195.ipynb

Github with tutorials and assignments:

Links for submitting your assignments: coming soon!


All course materials, assignments, deadlines will be published on this page.

Important announcements from the teaching team will be sent in Telegram channel:

The group with 24/7 online support in Telegram for Q&A, discussions, technical issues, and moral support:

Tutor: Marina Ananyeva Email Telegram

Module 3

Group Schedule
195 Tuesday 9.30-10.50
191 Tuesday 11.10-12.30
196 Tuesday 13.00-14.20
194 Thursday 9.30-10.50
192 Thursday 11.10-12.30
193 Thursday 13.00-14.20

Module 4

Group Schedule
194 Thursday 9.30-10.50
192 Thursday 11.10-12.30
193 Thursday 13.00-14.20
191 Saturday 9.30-10.50
196 Saturday 11.10-12.30
195 Saturday 13.00-14.20


We’ll much appreciate it if you help us to make this course better by sharing your ideas and feedback. Feel free to do it!

Anonymous feedback form: click_here


Final Grade = 0.4*home assignments + 0.3*group project + 0.2*in-class assignments + 0.1*in-class participation

Table with grades

In-class participation (10 pts) The activity during the class is graded by one point per seminar. It implies providing answers to the questions, solving tasks during the seminar. In case a student get more than 10 points in total, its rounded down to 10.

Group project (10 pts) Maximum group size: 4 students. Group project evaluation criteria: • the purpose of the study is clearly stated (1 point); • all steps of the research process are described in a clear and concise way (2 points); • research outcomes are clearly defined (2 points); • includes intuitive visualizations of research outcomes (2 points); • all members of the project team are able to explain the code used for computations (1 points); • code is properly structured (1 point); • meets submission timeline (1 point).

Home assignments (10 pts/each) – week 2, 3, 4, 8, 9, 10, 13, 14 A home assignment will be given 8 times during the course. These assignments are problem sets that are to be solved in Python. Sample problems: • Open file data.csv using pandas and find out whether it contains missing variables. If it does, remove them. Create a new column with boolen values (True or False) using condition by column Age: if age < 18 – return False, otherwise return True.

In-class assignments (10 pts/each) – week 6, 11 An in-class assignment will be given two times during the course. In-class assignments are problem sets that are to be solved in Python. Each problem set concerns a particular topic. Problem set 1 deals with the basics of working in Python with data types and data structures, problem set 2 involves performing tasks on data exploratory analysis and visualization. Sample problems: • Generate a list of even numbers in a range from 0 to 100. Iterate over these numbers in a for-loop and print each of it. • Consider the daily oil prices and the USDRUB daily exchange rate. Compute the sample average, standard deviation of daily returns over the entire sample period. Test if mean values are significantly different from zero. Test if mean values significantly differ from each other. State explicitly your null and alternative hypotheses in each case. Plot histograms of the null distributions.

Cheating and honor

You must abide by the Honor Code.

Please don’t cheat - the rumor has it HSE has quite severe penalties.

To avoid being accused of plagiarism in “grey cases”, please disclose with whom and how you have collaborated on each assignment, except for the final group project. If you warn us, the worst thing that can happen to you after a good-faith mistake is to ask you to complete another version of the task, without disciplinary action and without notifying the HSE administration.