Hands-on data analysis is Datawhale's open source project on the direction of data analysis. This project began in Datawhale's previous data analysis course, when I was a student who read the book - python for data analysis as the teaching material. The book for pandas and numpy operation is very clear and detailed, but for the logic of data analysis, there is much less content. So many learners and I found after learning, do not know what they have to do, when we meet data analysis problems. The idea of "I don't know how to use it" is actually very understandable, after learning the more theoretical things, there will be a small gap between the practical application in life and what we learned from the theory. How to bridge this gap may require your own experimentation and study of real-world materials.
So if there is a course, it is a project-based line, the knowledge points bred in it, through the side of learning, while doing and being guided to make learning better. After learning the course, we can master pandas and can master the general experience of data analysis process. Through research, it seems that there are no projects on the market about data analysis that can fully meet the above criteria. So Datawhale's partners joined together to make an open source course to accomplish the small goals mentioned above, so that all the learners who have used our course can better start their data analysis journey.
Now the course has been updated to version 1.3, we have improved the learning process, as well as providing better answers to explain. Later on, we will gradually launch the supporting materials. We still want to start from the basic data analysis operation and data analysis process, and introduce real-world examples in each module. After that, we will continue to add new content (such as data mining algorithms and so on). This is an open source project, we will keep iterating, and we will all participate and work together.
About the name of our project - hands-on data analysis . Data analysis is a process to see the truth from a bunch of numbers.Learning to manipulate data is only part of the skill of data analysis, the other half is the experience inside the brain. So we need to think more and summarize more in the learning process, and more hands-on, realistic code. So I also hope that when you learn this course, you will reason more and ask more why; practice more and make sure that the theory and practice are combined. At the end of the course, you will definitely have a big harvest.
Since this is a course born out of Datawhale, it is better to learn it with other resources that Datawhale provides. The code we provide is in the form of a jupyter, which contains the tasks you have to complete, as well as the hints and guidance we give you, so this format combined with Datawhale's group learning, you can discuss with everyone and add information together, then the learning effect will definitely be doubled. Also, Datawhale previously open-sourced a pandas tutorial - Joyful-Pandas. It composes the logic of Pandas as well as the code demonstration, so in our data analysis course, about the operation of Pandas, you can refer to Joyful-Pandas, which can make your data analysis learning more rewarding.
The course is now divided into three units, which can be roughly divided into: Basic Data Operations, Data Cleaning and Reconstruction, and Modeling and Evaluation:
- Part I: We get a data to be analyzed, I have to learn how to load the data, view the data, then learn some basic operations of Pandas, and finally start to try exploratory data analysis.
- Part 2: After we can be more proficient in manipulating the data and recognizing the data, we need to start data cleaning and reconstruction to turn the original data into a usable data, in preparation for putting it into the model later.
- Part 3: We have to consider what model to build depending on the task requirements, and we use the popular sklearn library to build the model. For a model to be good or bad, we are required to evaluate it, after that we evaluate our model and do optimization of the model.
Chapter | Summary |
---|---|
Chapter 1 | Data loading and preliminary observations |
Pandas basics explained | |
Exploratory Data Analysis | |
Chapter 2 | Data cleaning and feature processing |
Data Reconstruction 1 | |
Data Reconstruction 2 | |
Data Visualization | |
Chapter 3 | Data Modeling |
Model Evaluation |
Our codes are in jupyter form, and each part of the course is divided into two parts Course and Answers. During the learning period, in the course code, finish all the learning, find the information by yourself, finish the code operation inside by yourself, think about the part and the insights. After that, you can discuss with your buddies and share the information and insights. About the answer part, you can refer to, because the data analysis itself is open, so the answer is also open, more hope that you can have their own understanding and answers. If you need a reference, we provide the answers we wrote in the Answers section, so you can refer to them.
(课程部分-需要自己根据要求敲代码)
Feedback from learners of previous versions
As a learner with no foundation, I am very comfortable learning data analysis in this period, the tutorials are also relatively simple and clear, and the overall learning is very smooth. Each task I will read the tutorial twice. The first time only watch the tutorial and then chew the book using Python for data analysis. The assignments were great in terms of expansion, which I really liked. Then the second time I read the tutorial was to finish the homework and reflection without reading the answers at all. Basically, it is still a great sense of accomplishment after learning, and really have learned a lot. This course as an introduction to data analysis course, really great!
--------Danfei Wu, North China Electric Power University
First of all this learning document is very well done and very guided. I like the way of learning in the project - active learning and searching if you don't understand.
-------- Li Qingqing
Helped a lot. After I finished the program, I will still use the skills from the course in my real job. I hope that a later version of the course will include a section on data analysis logic.
--------Version V1.0 Group Study Participants
Excellent student Liu Chuchu Excellent assignment:https://space.bilibili.com/621981283/channel/detail?cid=191222
(Welcome to watch the video that explan the all assignments)
If you don't find what you want in Hands-on Data Analysis, or if you find an error in your project, please don't hesitate to go to our GitHub Issues for feedback, we will reply to you within 24 hours, and you can contact me by email if you don't reply after 24 hours ( [email protected]).
Project leader
Andong Chen: Datawhale Member, Hu Nan University|Queen Mary University of London
Core contributors
Juanjuan Jin: Datawhale member, Master of Zhejiang University
Yang Jada: Datawhale member, data mining engineer
Lao Cousin: Datawhale member, author of the Jane said Python
Contributor
Hongxing: Datawhale member, data analyst
Li Ling: Datawhale member, algorithm engineer
Gao Liye: Datawhale member, graduate student of Taiyuan University of Technology
Zhang Wentao: Datawhale member, PhD student at Sun Yat-sen University
Copyright License: CC-BY-NC-ND license