- Missing Value Detection & Handling
- Data visualization
- Outliers Detection & Handling
- Multi Colinearity Detection & Handling
- Data Cleaning
- Data transformation
- Parsing Dates
- Character Encoading
- Categorical Encoading
- Feature Selection
- Handling Imbalanced Data
- Data Reduction
- Datasets usually contain large volumes of data that may be stored in formats that are not easy to use.
- That’s why data scientists need first to make sure that data is correctly formatted and conforms to the set of rules.
- Data sparseness and formatting inconsistencies are the biggest challenges – and that’s what data cleansing is all about.
Data cleaning is a task that identifies incorrect, incomplete, inaccurate, or irrelevant data, fixes the problems, and makes sure that all such issues will be fixed automatically in the future.
- According to Appen, data scientists spend 60% of the time organizing and cleansing data!