Data Analysis in Python 2020-2021 — различия между версиями
(→Materials) |
(→Materials) |
||
(не показаны 24 промежуточные версии 2 участников) | |||
Строка 13: | Строка 13: | ||
* Jupyter Notebook | * Jupyter Notebook | ||
− | [https://drive.google.com/file/d/ | + | [https://drive.google.com/file/d/1Il0gPyzMahfdiISH0qw3d9yZl1rA5nv8/view?usp=sharing How to install Anaconda on Mac OS] <br> |
− | [https://drive.google.com/file/d/ | + | [https://drive.google.com/file/d/12Dk9bmYqpI09xC1Fl5ITN8khLrYmaOy1/view?usp=sharing How to install Anaconda on Windows] |
==Materials== | ==Materials== | ||
Строка 29: | Строка 29: | ||
| style="background:#eaecf0;" | '''1''' || Introduction || [https://drive.google.com/file/d/1q9Fw0xd0y9bOP6qwrboLhRsJwo6i4pA3/view?usp=sharing Intro Slides] || [https://drive.google.com/file/d/1VKE77_ZTNj4uLpJTry4aHRiNbHLBH-l0/view?usp=sharing How to install Anaconda on Mac OS], | | style="background:#eaecf0;" | '''1''' || Introduction || [https://drive.google.com/file/d/1q9Fw0xd0y9bOP6qwrboLhRsJwo6i4pA3/view?usp=sharing Intro Slides] || [https://drive.google.com/file/d/1VKE77_ZTNj4uLpJTry4aHRiNbHLBH-l0/view?usp=sharing How to install Anaconda on Mac OS], | ||
[https://drive.google.com/file/d/1nR6S3vgOrZKl0zNBN19bIfqRA9BBZITL/view?usp=sharing How to install Anaconda on Windows] | [https://drive.google.com/file/d/1nR6S3vgOrZKl0zNBN19bIfqRA9BBZITL/view?usp=sharing How to install Anaconda on Windows] | ||
− | || || || | + | || 1. [https://docs.python.org/3.8/tutorial/ Official Python tutorial & documentation] <br> 2. [https://www.coursera.org/specializations/python Coursera. Python for Everybody Specialization] <br> 3. [https://www.coursera.org/learn/python-crash-course?specialization=google-it-automation Coursera. Crash Course on Python] <br> 4. [https://snakify.org/en/lessons/print_input_numbers/ Snakify. A lot of online exercises in Python] || No assignment this time. Yay! || |
|- | |- | ||
− | | style="background:#eaecf0;" | '''2''' || || | + | | style="background:#eaecf0;" | '''2''' || Input, output. Numbers, strings. Arithmetical operations || - || [https://github.com/anamarina/Data_Analysis_in_Python/blob/main/week2/week2.ipynb Tutorial] || 1. [https://www.python.org/dev/peps/pep-0008/ PEP8 Style Guide] <br> 2. [https://www.w3schools.com/python/python_numbers.asp Python Numbers Exercises] <br> 3. [https://realpython.com/python-input-output/ Input and Output in Python]|| [https://github.com/anamarina/Data_Analysis_in_Python/blob/main/week2/HA1.ipynb HA1] || 23:59, February 7, 2021 |
|- | |- | ||
− | | style="background:#eaecf0;" | '''3''' || || | + | | style="background:#eaecf0;" | '''3''' || Lists and tuples. For & while loops|| - || [https://github.com/anamarina/Data_Analysis_in_Python/blob/main/week3/week3.ipynb Tutorial] || || || |
|- | |- | ||
− | | style="background:#eaecf0;" | '''4''' || || | + | | style="background:#eaecf0;" | '''4''' || Dictionaries, sets, strings. || - ||[https://github.com/anamarina/Data_Analysis_in_Python/blob/main/week4/week4.ipynb Tutorial] || ||[https://github.com/anamarina/Data_Analysis_in_Python/blob/main/week4/HA2.ipynb HA2] || 23:59, February 18, 2021 |
|- | |- | ||
− | | style="background:#eaecf0;" | '''5''' || || | + | | style="background:#eaecf0;" | '''5''' || Functions || - || [https://github.com/anamarina/Data_Analysis_in_Python/blob/main/week5/week5_functions_initial.ipynb Tutorial] || || [https://classroom.github.com/a/vxyb_nMY HA3] || 23:59, March 3, 2021 |
|- | |- | ||
− | | style="background:#eaecf0;" | '''6''' || || | + | | style="background:#eaecf0;" | '''6''' || In-class Assignment 1|| - || - || || [https://github.com/anamarina/Data_Analysis_in_Python/tree/main/week6 Assignments of all groups] || |
|- | |- | ||
− | | style="background:#eaecf0;" | '''7''' || || | + | | style="background:#eaecf0;" | '''7''' || Introduction to data analysis, files processing|| - || [https://github.com/anamarina/Data_Analysis_in_Python/blob/main/week7/week7.ipynb Tutorial] || [https://www.analyticsvidhya.com/blog/2017/03/read-commonly-used-formats-using-python/ How to read the most commonly used files] || -|| |
|- | |- | ||
− | | style="background:#eaecf0;" | '''8''' || || | + | | style="background:#eaecf0;" | '''8''' || Pandas. Part 1 || - || [https://github.com/anamarina/Data_Analysis_in_Python/blob/main/week8/week8.ipynb Tutorial] || [https://pandas.pydata.org/pandas-docs/stable/getting_started/tutorials.html Pandas Community Tutorials] || [https://classroom.github.com/a/Wz84c84k HA4] || 23:59, 25 March 2021 |
|- | |- | ||
− | | style="background:#eaecf0;" | '''9''' || || | + | | style="background:#eaecf0;" | '''9''' || Pandas. Part 2|| - || [https://github.com/anamarina/Data_Analysis_in_Python/tree/main/week9 Tutorial] || |
+ | [https://github.com/guipsamora/pandas_exercises Pandas Exercises on different topics] | ||
+ | || || | ||
|- | |- | ||
− | | style="background:#eaecf0;" | '''10''' || || | + | | style="background:#eaecf0;" | '''10''' || Web scraping & parsing|| - || [https://github.com/anamarina/Data_Analysis_in_Python/blob/main/week_10/sem10_parsing.ipynb Tutorial] || |
+ | * [https://realpython.com/python-web-scraping-practical-introduction/ A Practical Introduction to Web Scraping in Python] <br> | ||
+ | * [https://github.com/FUlyankin/Parsers/blob/master/Ryan_Mitchell_Web_Scraping_with_Python-_Collecting_Data_from_the_Modern_Web_2015.pdf Web_Scraping_with_Python (book)] <br> | ||
+ | * [https://2.python-requests.org/en/master/user/advanced/ requests library for PRO] | ||
+ | * [https://habr.com/ru/company/ods/blog/346632/) Parse memes in Python] | ||
+ | * [https://github.com/anamarina/eds_spring_2020/blob/master/sem05_parsing/sem05_parsing_full.ipynb Initial reference for this notebook] | ||
+ | || || | ||
|- | |- | ||
− | | style="background:#eaecf0;" | '''11''' || || | + | | style="background:#eaecf0;" | '''11''' || In-class Assignment 2|| - || - || - || [https://github.com/anamarina/Data_Analysis_in_Python/blob/main/week_11/In_class_assignment_2.ipynb Assignment for all groups] || April 8, 23:59/ April 10, 23:59 |
|- | |- | ||
− | | style="background:#eaecf0;" | '''12''' || || | + | | style="background:#eaecf0;" | '''12''' || Statistical hypotheses || - || [https://github.com/anamarina/Data_Analysis_in_Python/tree/main/week_12 Tutorial] || || || |
|- | |- | ||
− | | style="background:#eaecf0;" | '''13''' || || | + | | style="background:#eaecf0;" | '''13''' || Intro to logistic regression || - || [https://github.com/anamarina/Data_Analysis_in_Python/tree/main/week_13 Tutorial] || || || |
|- | |- | ||
− | | style="background:#eaecf0;" | '''14''' || || || || || || | + | | style="background:#eaecf0;" | '''14''' || Group projects (presentations)|| || || || [https://classroom.github.com/a/Rb8oruAd Submission link] , [https://drive.google.com/file/d/1n3POr07NZSWk3BSATQs8dF1AfEbXFspl/view?usp=sharing Instructions] || May 18, 9:00 a.m. |
|- | |- | ||
|} | |} | ||
Строка 73: | Строка 81: | ||
'''Assignment title standard:''' | '''Assignment title standard:''' | ||
Please, name your files with solutions in this format: Assignment # _ # Number # _ # Group number # _ # Name # _ # Surname #. | Please, name your files with solutions in this format: Assignment # _ # Number # _ # Group number # _ # Name # _ # Surname #. | ||
− | Example: | + | Example: HA1_Morty_Smith_195.ipynb |
− | Github with assignments: https://github.com/anamarina/Data_Analysis_in_Python | + | Github with tutorials and assignments: https://github.com/anamarina/Data_Analysis_in_Python |
− | Links for '''submitting''' your assignments | + | Links for '''submitting''' your assignments: coming soon! |
==Communication== | ==Communication== | ||
Строка 88: | Строка 96: | ||
Tutor: Marina Ananyeva [mailto:ananyeva.me@gmail.com Email] [https://t.me/ananyevame Telegram] | Tutor: Marina Ananyeva [mailto:ananyeva.me@gmail.com Email] [https://t.me/ananyevame Telegram] | ||
+ | |||
+ | Module 3 | ||
{| class="wikitable" | {| class="wikitable" | ||
Строка 104: | Строка 114: | ||
|- | |- | ||
| style="background:#eaecf0;" | '''193''' || Thursday 13.00-14.20 | | style="background:#eaecf0;" | '''193''' || Thursday 13.00-14.20 | ||
+ | |- | ||
+ | |} | ||
+ | |||
+ | Module 4 | ||
+ | |||
+ | {| class="wikitable" | ||
+ | |- | ||
+ | ! Group !! Schedule | ||
+ | |- | ||
+ | | style="background:#eaecf0;" | '''194''' || Thursday 9.30-10.50 | ||
+ | |- | ||
+ | | style="background:#eaecf0;" | '''192''' || Thursday 11.10-12.30 | ||
+ | |- | ||
+ | | style="background:#eaecf0;" | '''193''' || Thursday 13.00-14.20 | ||
+ | |- | ||
+ | | style="background:#eaecf0;" | '''191''' || Saturday 9.30-10.50 | ||
+ | |- | ||
+ | | style="background:#eaecf0;" | '''196''' || Saturday 11.10-12.30 | ||
+ | |- | ||
+ | | style="background:#eaecf0;" | '''195''' || Saturday 13.00-14.20 | ||
|- | |- | ||
|} | |} | ||
Строка 116: | Строка 146: | ||
'''Final Grade = 0.4*home assignments + 0.3*group project + 0.2*in-class assignments + 0.1*in-class participation''' | '''Final Grade = 0.4*home assignments + 0.3*group project + 0.2*in-class assignments + 0.1*in-class participation''' | ||
+ | |||
+ | [https://drive.google.com/file/d/1zOf4z7kPGLlTNgcrY3_I10xbK6AgF4FA/view?usp=sharing Table with grades] | ||
'''In-class participation''' (10 pts) | '''In-class participation''' (10 pts) | ||
Строка 140: | Строка 172: | ||
Sample problems: | Sample problems: | ||
• Generate a list of even numbers in a range from 0 to 100. Iterate over these numbers in a for-loop and print each of it. | • Generate a list of even numbers in a range from 0 to 100. Iterate over these numbers in a for-loop and print each of it. | ||
− | • Consider the daily oil prices and the USDRUB daily exchange rate. Compute the sample average, standard deviation of daily returns over the entire sample period. Test if mean values are significantly different from zero. Test if mean values significantly differ from each other. State explicitly your null and alternative hypotheses in each case. Plot histograms of the null distributions. | + | • Consider the daily oil prices and the USDRUB daily exchange rate. Compute the sample average, standard deviation of daily returns over the entire sample period. Test if mean values are significantly different from zero. Test if mean values significantly differ from each other. State explicitly your null and alternative hypotheses in each case. Plot histograms of the null distributions. |
==Cheating and honor== | ==Cheating and honor== |
Текущая версия на 17:17, 4 мая 2021
Содержание
About the course
The course is conducted for students of Bachelor’s Programme 'HSE and University of London Parallel Degree Programme in International Relations'.
Abstract: In this course students are introduced to the rapidly growing field of data analytics with the specific focus on Python programming language. Students will learn concepts, techniques and tools they need to make meaningful inferences from data. Students will be exposed to a real-world data sets to gain practical skills in data manipulations. Each week will involve seminars and coding simulations. In the final project students will build a working code that can be readily applied for exploratory data analysis in their own (future) research domain.
Syllabus: open
Required Software
- Anaconda (Python version >= 3.8)
- Jupyter Notebook
How to install Anaconda on Mac OS
How to install Anaconda on Windows
Materials
Presentations and all materials will be available immediately after each practice class. Additional materials will be used in quizzes at each next seminar.
Github with the materials from our practical classes: https://github.com/anamarina/Data_Analysis_in_Python
Week | Topic | Slides | Tutorial | Additional Materials | Assignment | Deadline |
---|---|---|---|---|---|---|
1 | Introduction | Intro Slides | How to install Anaconda on Mac OS, | 1. Official Python tutorial & documentation 2. Coursera. Python for Everybody Specialization 3. Coursera. Crash Course on Python 4. Snakify. A lot of online exercises in Python |
No assignment this time. Yay! | |
2 | Input, output. Numbers, strings. Arithmetical operations | - | Tutorial | 1. PEP8 Style Guide 2. Python Numbers Exercises 3. Input and Output in Python |
HA1 | 23:59, February 7, 2021 |
3 | Lists and tuples. For & while loops | - | Tutorial | |||
4 | Dictionaries, sets, strings. | - | Tutorial | HA2 | 23:59, February 18, 2021 | |
5 | Functions | - | Tutorial | HA3 | 23:59, March 3, 2021 | |
6 | In-class Assignment 1 | - | - | Assignments of all groups | ||
7 | Introduction to data analysis, files processing | - | Tutorial | How to read the most commonly used files | - | |
8 | Pandas. Part 1 | - | Tutorial | Pandas Community Tutorials | HA4 | 23:59, 25 March 2021 |
9 | Pandas. Part 2 | - | Tutorial | |||
10 | Web scraping & parsing | - | Tutorial | |||
11 | In-class Assignment 2 | - | - | - | Assignment for all groups | April 8, 23:59/ April 10, 23:59 |
12 | Statistical hypotheses | - | Tutorial | |||
13 | Intro to logistic regression | - | Tutorial | |||
14 | Group projects (presentations) | Submission link , Instructions | May 18, 9:00 a.m. |
Assignments
The course consists of 8 home assignments (10 pts/each), each of them performed individually. Short home assignments will be published almost every week after Week 2 (weeks 2, 3, 4, 8, 9, 10, 13, 14) based on the materials of the previous practical classes.
There will be held 2 in-class assignments (10 pts/each) in the format of problem-solving tasks and coding in Python using an online platform (e.g. Yandex Contes or Github Classroom). Problem set 1 deals with the basics of working in Python with data types and data structures, problem set 2 involves performing tasks on data exploratory analysis and visualization.
Each task is checked for plagiarism. Matching more than 25% of the code will be considered plagiarism and will result in 1 point out of 10 with the right to appeal. If the code matches more than 40%, the job will be canceled (0 points) without the right to appeal. After the deadline for each assignment, during the next week, each student will be offered a convenient time for her/him for participating in a conference in Zoom with a lecturer and TA to answer questions on code and explanations of solutions.
Assignment title standard: Please, name your files with solutions in this format: Assignment # _ # Number # _ # Group number # _ # Name # _ # Surname #. Example: HA1_Morty_Smith_195.ipynb
Github with tutorials and assignments: https://github.com/anamarina/Data_Analysis_in_Python
Links for submitting your assignments: coming soon!
Communication
All course materials, assignments, deadlines will be published on this page.
Important announcements from the teaching team will be sent in Telegram channel: https://t.me/joinchat/UctGNtxs7zd4StM0
The group with 24/7 online support in Telegram for Q&A, discussions, technical issues, and moral support: https://t.me/joinchat/F_uIPvGE_zA8fftG
Tutor: Marina Ananyeva Email Telegram
Module 3
Group | Schedule |
---|---|
195 | Tuesday 9.30-10.50 |
191 | Tuesday 11.10-12.30 |
196 | Tuesday 13.00-14.20 |
194 | Thursday 9.30-10.50 |
192 | Thursday 11.10-12.30 |
193 | Thursday 13.00-14.20 |
Module 4
Group | Schedule |
---|---|
194 | Thursday 9.30-10.50 |
192 | Thursday 11.10-12.30 |
193 | Thursday 13.00-14.20 |
191 | Saturday 9.30-10.50 |
196 | Saturday 11.10-12.30 |
195 | Saturday 13.00-14.20 |
Feedback
We’ll much appreciate it if you help us to make this course better by sharing your ideas and feedback. Feel free to do it!
Anonymous feedback form: click_here
Grading
Final Grade = 0.4*home assignments + 0.3*group project + 0.2*in-class assignments + 0.1*in-class participation
In-class participation (10 pts) The activity during the class is graded by one point per seminar. It implies providing answers to the questions, solving tasks during the seminar. In case a student get more than 10 points in total, its rounded down to 10.
Group project (10 pts) Maximum group size: 4 students. Group project evaluation criteria: • the purpose of the study is clearly stated (1 point); • all steps of the research process are described in a clear and concise way (2 points); • research outcomes are clearly defined (2 points); • includes intuitive visualizations of research outcomes (2 points); • all members of the project team are able to explain the code used for computations (1 points); • code is properly structured (1 point); • meets submission timeline (1 point).
Home assignments (10 pts/each) – week 2, 3, 4, 8, 9, 10, 13, 14 A home assignment will be given 8 times during the course. These assignments are problem sets that are to be solved in Python. Sample problems: • Open file data.csv using pandas and find out whether it contains missing variables. If it does, remove them. Create a new column with boolen values (True or False) using condition by column Age: if age < 18 – return False, otherwise return True.
In-class assignments (10 pts/each) – week 6, 11 An in-class assignment will be given two times during the course. In-class assignments are problem sets that are to be solved in Python. Each problem set concerns a particular topic. Problem set 1 deals with the basics of working in Python with data types and data structures, problem set 2 involves performing tasks on data exploratory analysis and visualization. Sample problems: • Generate a list of even numbers in a range from 0 to 100. Iterate over these numbers in a for-loop and print each of it. • Consider the daily oil prices and the USDRUB daily exchange rate. Compute the sample average, standard deviation of daily returns over the entire sample period. Test if mean values are significantly different from zero. Test if mean values significantly differ from each other. State explicitly your null and alternative hypotheses in each case. Plot histograms of the null distributions.
Cheating and honor
You must abide by the Honor Code.
Please don’t cheat - the rumor has it HSE has quite severe penalties.
To avoid being accused of plagiarism in “grey cases”, please disclose with whom and how you have collaborated on each assignment, except for the final group project. If you warn us, the worst thing that can happen to you after a good-faith mistake is to ask you to complete another version of the task, without disciplinary action and without notifying the HSE administration.