Identify-Customer-Segments is a demonstration of using unsupervised machine learning to identify the target customer base of a company. This is achieved using data cleaning and feature engineering techniques, principal component analysis (PCA) and Kmeans clustering. The datasets used are real demographics of the general population of Germany and the current customer demographics of the company. The result is the identification of segments of the population that are the companies target customer base and segments of the population that are outside of the companies customer base. With the age range, gender, average income, general location, personality types and other useful information about the target customer base, the company can then direct marketing campaigns towards audiences that will have the highest expected rate of return.
- Data exploration and manipulation using Pandas
- Data parsing and missing data conversion
- Data visualization using Matplotlib and Seaborn
- Histogram and Barplot analysis
- Encoding/re-encoding categorical features
- One-hot encoding
- Engineering mixed-type features
- Feature scaling and data imputation
- Dimensionality reduction using PCA (Principal Component Analysis)
- K-means clustering unsupervised machine learning
Python 3
pandas==1.4.*
numpy==1.21.*
matplotlib==3.5.*
seaborn==0.11.*
scikit_learn==1.1.*
jupyter notebook==6.4.*
This project was submitted by Daniel P Florian as part of the Nanodegree At Udacity.
As part of the Udacity Honor code, your submissions must be your own work, hence submitting this project as yours will cause you to break the Udacity Honor Code and the suspension of your account.
Me, the author of the project, allow you to check the code as a reference, but if you submit it, it's your own responsibility if you get expelled.
Copyright (c) 2022 Daniel P Florian
Besides the above notice, the content of this repository is licensed under a MIT License