Data Analysis in Python 2020-2021 — различия между версиями

Материал из Wiki - Факультет компьютерных наук
Перейти к: навигация, поиск
(Materials)
(Materials)
 
(не показаны 23 промежуточные версии 2 участников)
Строка 13: Строка 13:
 
* Jupyter Notebook
 
* Jupyter Notebook
  
[https://drive.google.com/file/d/1VKE77_ZTNj4uLpJTry4aHRiNbHLBH-l0/view?usp=sharing How to install Anaconda on Mac OS] <br>
+
[https://drive.google.com/file/d/1Il0gPyzMahfdiISH0qw3d9yZl1rA5nv8/view?usp=sharing How to install Anaconda on Mac OS] <br>
[https://drive.google.com/file/d/1nR6S3vgOrZKl0zNBN19bIfqRA9BBZITL/view?usp=sharing How to install Anaconda on Windows]
+
[https://drive.google.com/file/d/12Dk9bmYqpI09xC1Fl5ITN8khLrYmaOy1/view?usp=sharing How to install Anaconda on Windows]
  
 
==Materials==
 
==Materials==
Строка 31: Строка 31:
 
  || 1. [https://docs.python.org/3.8/tutorial/ Official Python tutorial & documentation] <br> 2. [https://www.coursera.org/specializations/python Coursera. Python for Everybody Specialization] <br> 3. [https://www.coursera.org/learn/python-crash-course?specialization=google-it-automation Coursera. Crash Course on Python] <br> 4. [https://snakify.org/en/lessons/print_input_numbers/ Snakify. A lot of online exercises in Python] || No assignment this time. Yay! ||
 
  || 1. [https://docs.python.org/3.8/tutorial/ Official Python tutorial & documentation] <br> 2. [https://www.coursera.org/specializations/python Coursera. Python for Everybody Specialization] <br> 3. [https://www.coursera.org/learn/python-crash-course?specialization=google-it-automation Coursera. Crash Course on Python] <br> 4. [https://snakify.org/en/lessons/print_input_numbers/ Snakify. A lot of online exercises in Python] || No assignment this time. Yay! ||
 
|-
 
|-
| style="background:#eaecf0;" | '''2''' || || || || || ||
+
| style="background:#eaecf0;" | '''2''' || Input, output. Numbers, strings. Arithmetical operations || - || [https://github.com/anamarina/Data_Analysis_in_Python/blob/main/week2/week2.ipynb Tutorial] || 1. [https://www.python.org/dev/peps/pep-0008/ PEP8 Style Guide] <br> 2. [https://www.w3schools.com/python/python_numbers.asp Python Numbers Exercises] <br> 3. [https://realpython.com/python-input-output/ Input and Output in Python]|| [https://github.com/anamarina/Data_Analysis_in_Python/blob/main/week2/HA1.ipynb HA1] || 23:59, February 7, 2021
 
|-
 
|-
| style="background:#eaecf0;" | '''3''' || || || || || ||
+
| style="background:#eaecf0;" | '''3''' || Lists and tuples. For & while loops|| - || [https://github.com/anamarina/Data_Analysis_in_Python/blob/main/week3/week3.ipynb Tutorial] || || ||
 
|-
 
|-
| style="background:#eaecf0;" | '''4''' || || || || || ||
+
| style="background:#eaecf0;" | '''4''' || Dictionaries, sets, strings. || - ||[https://github.com/anamarina/Data_Analysis_in_Python/blob/main/week4/week4.ipynb Tutorial] || ||[https://github.com/anamarina/Data_Analysis_in_Python/blob/main/week4/HA2.ipynb HA2] || 23:59, February 18, 2021
 
|-
 
|-
| style="background:#eaecf0;" | '''5''' || || || ||  || ||
+
| style="background:#eaecf0;" | '''5''' || Functions || - || [https://github.com/anamarina/Data_Analysis_in_Python/blob/main/week5/week5_functions_initial.ipynb Tutorial] ||  || [https://classroom.github.com/a/vxyb_nMY HA3] || 23:59, March 3, 2021
 
|-
 
|-
| style="background:#eaecf0;" | '''6''' || || || ||  || ||
+
| style="background:#eaecf0;" | '''6''' || In-class Assignment 1|| - || - ||  || [https://github.com/anamarina/Data_Analysis_in_Python/tree/main/week6 Assignments of all groups] ||  
 
|-
 
|-
| style="background:#eaecf0;" | '''7''' || || || || || ||
+
| style="background:#eaecf0;" | '''7''' || Introduction to data analysis, files processing|| - || [https://github.com/anamarina/Data_Analysis_in_Python/blob/main/week7/week7.ipynb Tutorial] || [https://www.analyticsvidhya.com/blog/2017/03/read-commonly-used-formats-using-python/ How to read the most commonly used files] || -||
 
|-
 
|-
| style="background:#eaecf0;" | '''8''' || || || || || ||
+
| style="background:#eaecf0;" | '''8''' || Pandas. Part 1 || - || [https://github.com/anamarina/Data_Analysis_in_Python/blob/main/week8/week8.ipynb Tutorial] || [https://pandas.pydata.org/pandas-docs/stable/getting_started/tutorials.html Pandas Community Tutorials] || [https://classroom.github.com/a/Wz84c84k HA4] || 23:59, 25 March 2021
 
|-
 
|-
| style="background:#eaecf0;" | '''9''' || || || || || ||
+
| style="background:#eaecf0;" | '''9''' || Pandas. Part 2|| - || [https://github.com/anamarina/Data_Analysis_in_Python/tree/main/week9 Tutorial] ||  
 +
[https://github.com/guipsamora/pandas_exercises Pandas Exercises on different topics]
 +
|| ||
 
|-
 
|-
| style="background:#eaecf0;" | '''10''' || || || || || ||
+
| style="background:#eaecf0;" | '''10''' || Web scraping & parsing|| - || [https://github.com/anamarina/Data_Analysis_in_Python/blob/main/week_10/sem10_parsing.ipynb Tutorial] ||  
 +
* [https://realpython.com/python-web-scraping-practical-introduction/ A Practical Introduction to Web Scraping in Python] <br>
 +
* [https://github.com/FUlyankin/Parsers/blob/master/Ryan_Mitchell_Web_Scraping_with_Python-_Collecting_Data_from_the_Modern_Web_2015.pdf Web_Scraping_with_Python (book)] <br>
 +
* [https://2.python-requests.org/en/master/user/advanced/ requests library for PRO]
 +
* [https://habr.com/ru/company/ods/blog/346632/) Parse memes in Python]
 +
* [https://github.com/anamarina/eds_spring_2020/blob/master/sem05_parsing/sem05_parsing_full.ipynb Initial reference for this notebook]
 +
|| ||
 
|-
 
|-
| style="background:#eaecf0;" | '''11''' || || || || || ||
+
| style="background:#eaecf0;" | '''11''' || In-class Assignment 2|| - || - || - || [https://github.com/anamarina/Data_Analysis_in_Python/blob/main/week_11/In_class_assignment_2.ipynb Assignment for all groups] || April 8, 23:59/ April 10, 23:59
 
|-
 
|-
| style="background:#eaecf0;" | '''12''' || || || ||  || ||
+
| style="background:#eaecf0;" | '''12''' || Statistical hypotheses || - || [https://github.com/anamarina/Data_Analysis_in_Python/tree/main/week_12 Tutorial] ||  || ||
 
|-
 
|-
| style="background:#eaecf0;" | '''13''' || || || ||  || ||
+
| style="background:#eaecf0;" | '''13''' || Intro to logistic regression || - || [https://github.com/anamarina/Data_Analysis_in_Python/tree/main/week_13 Tutorial] ||  || ||
 
|-
 
|-
| style="background:#eaecf0;" | '''14''' || ||  ||  ||  || ||
+
| style="background:#eaecf0;" | '''14''' || Group projects (presentations)||  ||  ||  || [https://classroom.github.com/a/Rb8oruAd  Submission link] , [https://drive.google.com/file/d/1n3POr07NZSWk3BSATQs8dF1AfEbXFspl/view?usp=sharing Instructions] || May 18, 9:00 a.m.
 
|-
 
|-
 
|}
 
|}
Строка 73: Строка 81:
 
'''Assignment title standard:'''
 
'''Assignment title standard:'''
 
Please, name your files with solutions in this format: Assignment # _ # Number # _ # Group number # _ # Name # _ # Surname #.
 
Please, name your files with solutions in this format: Assignment # _ # Number # _ # Group number # _ # Name # _ # Surname #.
Example: Assignment_1_BMOL181_Morty_Smith
+
Example: HA1_Morty_Smith_195.ipynb
  
Github with assignments: https://github.com/anamarina/Data_Analysis_in_Python
+
Github with tutorials and assignments: https://github.com/anamarina/Data_Analysis_in_Python
  
Links for '''submitting''' your assignments (Dropbox links): coming soon!  
+
Links for '''submitting''' your assignments: coming soon!
  
 
==Communication==
 
==Communication==
Строка 88: Строка 96:
  
 
Tutor: Marina Ananyeva [mailto:ananyeva.me@gmail.com Email] [https://t.me/ananyevame Telegram]
 
Tutor: Marina Ananyeva [mailto:ananyeva.me@gmail.com Email] [https://t.me/ananyevame Telegram]
 +
 +
Module 3
  
 
{| class="wikitable"
 
{| class="wikitable"
Строка 104: Строка 114:
 
|-
 
|-
 
| style="background:#eaecf0;" | '''193''' || Thursday 13.00-14.20
 
| style="background:#eaecf0;" | '''193''' || Thursday 13.00-14.20
 +
|-
 +
|}
 +
 +
Module 4
 +
 +
{| class="wikitable"
 +
|-
 +
! Group !! Schedule
 +
|-
 +
| style="background:#eaecf0;" | '''194''' || Thursday 9.30-10.50
 +
|-
 +
| style="background:#eaecf0;" | '''192''' || Thursday 11.10-12.30
 +
|-
 +
| style="background:#eaecf0;" | '''193''' || Thursday 13.00-14.20
 +
|-
 +
| style="background:#eaecf0;" | '''191''' || Saturday 9.30-10.50
 +
|-
 +
| style="background:#eaecf0;" | '''196''' || Saturday 11.10-12.30
 +
|-
 +
| style="background:#eaecf0;" | '''195''' || Saturday 13.00-14.20
 
|-
 
|-
 
|}
 
|}
Строка 116: Строка 146:
  
 
'''Final Grade = 0.4*home assignments + 0.3*group project + 0.2*in-class assignments + 0.1*in-class participation'''
 
'''Final Grade = 0.4*home assignments + 0.3*group project + 0.2*in-class assignments + 0.1*in-class participation'''
 +
 +
[https://drive.google.com/file/d/1zOf4z7kPGLlTNgcrY3_I10xbK6AgF4FA/view?usp=sharing Table with grades]
  
 
'''In-class participation''' (10 pts)  
 
'''In-class participation''' (10 pts)  
Строка 140: Строка 172:
 
Sample problems:
 
Sample problems:
 
• Generate a list of even numbers in a range from 0 to 100. Iterate over these numbers in a for-loop and print each of it.  
 
• Generate a list of even numbers in a range from 0 to 100. Iterate over these numbers in a for-loop and print each of it.  
• Consider the daily oil prices and the USDRUB daily exchange rate. Compute the sample average, standard deviation of daily returns over the entire sample period. Test if mean values are significantly different from zero. Test if mean values significantly differ from each other. State explicitly your null and alternative hypotheses in each case. Plot histograms of the null distributions.  
+
• Consider the daily oil prices and the USDRUB daily exchange rate. Compute the sample average, standard deviation of daily returns over the entire sample period. Test if mean values are significantly different from zero. Test if mean values significantly differ from each other. State explicitly your null and alternative hypotheses in each case. Plot histograms of the null distributions.
  
 
==Cheating and honor==
 
==Cheating and honor==

Текущая версия на 17:17, 4 мая 2021

About the course

The course is conducted for students of Bachelor’s Programme 'HSE and University of London Parallel Degree Programme in International Relations'.

Abstract: In this course students are introduced to the rapidly growing field of data analytics with the specific focus on Python programming language. Students will learn concepts, techniques and tools they need to make meaningful inferences from data. Students will be exposed to a real-world data sets to gain practical skills in data manipulations. Each week will involve seminars and coding simulations. In the final project students will build a working code that can be readily applied for exploratory data analysis in their own (future) research domain.

Syllabus: open

Required Software

  • Anaconda (Python version >= 3.8)
  • Jupyter Notebook

How to install Anaconda on Mac OS
How to install Anaconda on Windows

Materials

Presentations and all materials will be available immediately after each practice class. Additional materials will be used in quizzes at each next seminar.

Github with the materials from our practical classes: https://github.com/anamarina/Data_Analysis_in_Python

Week Topic Slides Tutorial Additional Materials Assignment Deadline
1 Introduction Intro Slides How to install Anaconda on Mac OS,

How to install Anaconda on Windows

1. Official Python tutorial & documentation
2. Coursera. Python for Everybody Specialization
3. Coursera. Crash Course on Python
4. Snakify. A lot of online exercises in Python
No assignment this time. Yay!
2 Input, output. Numbers, strings. Arithmetical operations - Tutorial 1. PEP8 Style Guide
2. Python Numbers Exercises
3. Input and Output in Python
HA1 23:59, February 7, 2021
3 Lists and tuples. For & while loops - Tutorial
4 Dictionaries, sets, strings. - Tutorial HA2 23:59, February 18, 2021
5 Functions - Tutorial HA3 23:59, March 3, 2021
6 In-class Assignment 1 - - Assignments of all groups
7 Introduction to data analysis, files processing - Tutorial How to read the most commonly used files -
8 Pandas. Part 1 - Tutorial Pandas Community Tutorials HA4 23:59, 25 March 2021
9 Pandas. Part 2 - Tutorial

Pandas Exercises on different topics

10 Web scraping & parsing - Tutorial
11 In-class Assignment 2 - - - Assignment for all groups April 8, 23:59/ April 10, 23:59
12 Statistical hypotheses - Tutorial
13 Intro to logistic regression - Tutorial
14 Group projects (presentations) Submission link , Instructions May 18, 9:00 a.m.

Assignments

The course consists of 8 home assignments (10 pts/each), each of them performed individually. Short home assignments will be published almost every week after Week 2 (weeks 2, 3, 4, 8, 9, 10, 13, 14) based on the materials of the previous practical classes.

There will be held 2 in-class assignments (10 pts/each) in the format of problem-solving tasks and coding in Python using an online platform (e.g. Yandex Contes or Github Classroom). Problem set 1 deals with the basics of working in Python with data types and data structures, problem set 2 involves performing tasks on data exploratory analysis and visualization.

Each task is checked for plagiarism. Matching more than 25% of the code will be considered plagiarism and will result in 1 point out of 10 with the right to appeal. If the code matches more than 40%, the job will be canceled (0 points) without the right to appeal. After the deadline for each assignment, during the next week, each student will be offered a convenient time for her/him for participating in a conference in Zoom with a lecturer and TA to answer questions on code and explanations of solutions.

Assignment title standard: Please, name your files with solutions in this format: Assignment # _ # Number # _ # Group number # _ # Name # _ # Surname #. Example: HA1_Morty_Smith_195.ipynb

Github with tutorials and assignments: https://github.com/anamarina/Data_Analysis_in_Python

Links for submitting your assignments: coming soon!

Communication

All course materials, assignments, deadlines will be published on this page.

Important announcements from the teaching team will be sent in Telegram channel: https://t.me/joinchat/UctGNtxs7zd4StM0

The group with 24/7 online support in Telegram for Q&A, discussions, technical issues, and moral support: https://t.me/joinchat/F_uIPvGE_zA8fftG

Tutor: Marina Ananyeva Email Telegram

Module 3

Group Schedule
195 Tuesday 9.30-10.50
191 Tuesday 11.10-12.30
196 Tuesday 13.00-14.20
194 Thursday 9.30-10.50
192 Thursday 11.10-12.30
193 Thursday 13.00-14.20

Module 4

Group Schedule
194 Thursday 9.30-10.50
192 Thursday 11.10-12.30
193 Thursday 13.00-14.20
191 Saturday 9.30-10.50
196 Saturday 11.10-12.30
195 Saturday 13.00-14.20

Feedback

We’ll much appreciate it if you help us to make this course better by sharing your ideas and feedback. Feel free to do it!

Anonymous feedback form: click_here

Grading

Final Grade = 0.4*home assignments + 0.3*group project + 0.2*in-class assignments + 0.1*in-class participation

Table with grades

In-class participation (10 pts) The activity during the class is graded by one point per seminar. It implies providing answers to the questions, solving tasks during the seminar. In case a student get more than 10 points in total, its rounded down to 10.

Group project (10 pts) Maximum group size: 4 students. Group project evaluation criteria: • the purpose of the study is clearly stated (1 point); • all steps of the research process are described in a clear and concise way (2 points); • research outcomes are clearly defined (2 points); • includes intuitive visualizations of research outcomes (2 points); • all members of the project team are able to explain the code used for computations (1 points); • code is properly structured (1 point); • meets submission timeline (1 point).

Home assignments (10 pts/each) – week 2, 3, 4, 8, 9, 10, 13, 14 A home assignment will be given 8 times during the course. These assignments are problem sets that are to be solved in Python. Sample problems: • Open file data.csv using pandas and find out whether it contains missing variables. If it does, remove them. Create a new column with boolen values (True or False) using condition by column Age: if age < 18 – return False, otherwise return True.

In-class assignments (10 pts/each) – week 6, 11 An in-class assignment will be given two times during the course. In-class assignments are problem sets that are to be solved in Python. Each problem set concerns a particular topic. Problem set 1 deals with the basics of working in Python with data types and data structures, problem set 2 involves performing tasks on data exploratory analysis and visualization. Sample problems: • Generate a list of even numbers in a range from 0 to 100. Iterate over these numbers in a for-loop and print each of it. • Consider the daily oil prices and the USDRUB daily exchange rate. Compute the sample average, standard deviation of daily returns over the entire sample period. Test if mean values are significantly different from zero. Test if mean values significantly differ from each other. State explicitly your null and alternative hypotheses in each case. Plot histograms of the null distributions.

Cheating and honor

You must abide by the Honor Code.

Please don’t cheat - the rumor has it HSE has quite severe penalties.

To avoid being accused of plagiarism in “grey cases”, please disclose with whom and how you have collaborated on each assignment, except for the final group project. If you warn us, the worst thing that can happen to you after a good-faith mistake is to ask you to complete another version of the task, without disciplinary action and without notifying the HSE administration.