AIM
Develop a model for predicting fraudulent transactions for a financial company and use insights from the model to develop an actionable plan.
DATASET
https://www.kaggle.com/datasets/miznaaroob/fraudulent-transactions-data
CONTENT
Data for the case is available in CSV format having 6362620 rows and 10 columns.
WHAT I HAD DONE
First I imported all the required libraries and dataset for this project. Then I did some EDA to find which mode of transaction results into most fraudulent transactions. Then I worked throught to treat any inconsistency in the data. Then I proceeded to build the model. I worked two different models and compared results from both to select mode appropriate one for this project. First I used a logistic regression model to classify Fraudulent and Non fraudulent transactions. Next I worked with Random Forest classifier model to amp up the accuracy which resulted in some improvement from the previous LR moel. At the end I observed an accuracy of 99.97.
MODELS USED
The models are:
- Logistic Regression
- Random Forest Classifier
HOW TO RUN
Upload kaggle api key file and fraud_transaction_detection.ipynb file on colab and just run the code.
LIBRARIES NEEDED
- Opendatasets (for downloading the dataset)
- Pandas - for data analysis
- Numpy - for data analysis
- matplotlib - for data visualization
- seaborn - for data visualization
- itertools - for data analysis
CONCLUSION
I was successfully able to find the most accurate model to detect fraudlent transactions.
Connect with me on Linkedin: https://www.linkedin.com/in/tknishh/
Check out my Github profile: https://github.com/tknishh