This repository contains a Jupyter Notebook that focuses on Churn Analysis within the E-Commerce sector. The project aims to predict customer churn using various machine learning algorithms and techniques such as data preprocessing, feature selection, and model optimization.
The primary objective of this project is educational. I've undertaken this analysis to practice my data science and machine learning skills. The project serves as a valuable addition to my portfolio, demonstrating my capability to handle end-to-end data science projects.
- Python
- Jupyter Notebook
- Pandas
- Scikit-Learn
- Matplotlib
- Seaborn
- XGBoost
- Data Loading and Exploration: Loaded the dataset and performed initial exploration to understand the data.
- Data Cleaning: Removed missing or irrelevant information.
- Data Preprocessing: Transformed categorical variables into dummy variables.
- Feature Selection: Used Recursive Feature Elimination for feature selection.
- Model Training: Utilized Logistic Regression for the initial model training.
- Model Optimization: Applied GridSearchCV for hyperparameter tuning and used XGBoost for ensemble learning.
- Evaluation and Interpretation: Evaluated the model using various metrics and interpreted the results.
- Clone this repository.
- Open the Jupyter Notebook to view the code and explanations.
Feel free to explore the notebook and provide any feedback or contributions. Thank you!
The dataset used for this project is sourced from Kaggle and can be found here. It provides various features that are significant for predicting customer churn in the E-Commerce sector.