Skip to content

Latest commit

 

History

History
74 lines (47 loc) · 1.89 KB

File metadata and controls

74 lines (47 loc) · 1.89 KB

Data-Analysis-Tools

Overview

the name of this repo is Data-Analysis-Tools, but it actually contains the whole data analysis pipeline, including:

folder meaning
data_wrangling From very raw data to some usable and easy-to-understand raw data
data_analysis From raw data to some easy-to-visualize or machine-learning-ready data.
data_visualization Visualize the data.
datasets Datasets to play with over those folders.
projects Very domain-specific data analysis prjects





Theory

0, how to do data analysis

Preparation -> Preporcessing -> analysis -> posprocessing

To my understanding is :

1) Be clear of needs.

This is the most valuable and always been underestimate.

2) Get data.

This is not only be DB, but also activities to acctual get the data from world and also generate the feature.

We need know how to cooperate with DBs and how to use pandas to generate feature (after step 3)

3) clean data.

it is also called data wrangling.

4) analysis.

Here is where pandas plays and also ML techs.

5) report.

For this we need varies of data visualization skills.






Some experience

  • Do researches on the topic, know the data by heart, use as much human knowledge as possible.
  • Notice the outlier





Related Project

Project Athena: AI-based automotive data analysis tool which acts like an experienced data scientist, tells you important facts in the dataframe, interact with you to make conclusions and predictions.