In this project, one will gather, assess, and clean data then act on it through analysis, visualization and/or modeling.
- Utilize WeRateDogs twitter archive file composed of 5,000 tweets created 08/01/2017.
- Tweet image predictions were made with a neural network. This file is downloaded from Udacity servers utilizing Requests library.
- Gather additional tweet data for the tweets archived in step one. Utilize Twitters API to acquire additional tweet info such as: a. retweet count b. favorites count c. anything else that's interesting.
- Import data into dataframes and review for abnormalities, errors, etc. & document what needs cleaning.
- Extract data from tweet data from tweets & combine with archived data & image predictions.
- Determine 8 quality, & 2 tidiness issues.
- Merge datasets where appropriate.
- Clean quality & tidiness issues outlined in assessment stage
...more to come & project progresses