I'm a physics teacher and a data scientist with a passion for technology.
I currently develop projects that prioritize solving business problems, from understanding them, analyzing data, to extracting insights and implementing the solution. I also continue to develop myself with improvement and study activities such as a portfolio of data science projects and I also write about the same topic on a blog on Medium.
For more details about my projects and each solution, they are described in the data science project section.
-
Data Collection and Storage: MySQL and PostgreSQL.
-
Data Processing and Analytics: Jupyter Notebook, Pandas, Numpy.
-
Development: Python, Git and Clean Code.
-
Data Visualization: Seaborn and Matplotlib.
-
Machine Learning Modeling: Classification, Regression, Clusterization, Time Series and Neural Network.
-
Machine Learning Deployment: Flask and Docker.
-
Olist is the largest departament store in Brazilian marketplaces. This project aims to develop and implement a model to predict the time in days until the delivery of a given product. in progress
-
To help the booking of the Airbnb this data science project aim to create a machine learning model to predict the first booking of a new user. Unfortunately the database is very desbalanced which difficult the prediction of the model, the best result was 17.48% +/- 0.4% of accuracy. Therefore new approaches guided by the business will be necessary to improve the results.
-
To help the sales team, this data science project was created to sort a list to improve the cross-selling. The model was able to organize that almost all interested customers (98.31% +/- 0.16%) stay on up to 50% of the list, saving half of the expenses incurred for calls. So, if each call costs R$ 15.00 in 20,000.00 there is an expense of R$ 300,000.00. Using the model it is possible to spend only R$ 150,000.00.
-
Financial transactions fraud is one of the biggest problems faced by financial institutions. Thus, this project uses data science and machine learning to detect and avoid fraudulent transactions. The model got a precision of 96.3% +/- 0.7% and a recall of 76.3% +/- 3.5%. The profit expected by the company is R$ 57,251,574.44.
-
When a client churns, it represents a problem, which results in money loss for the company. In this project, I created a solution using data to predict such behavior and avoid it. The machine learning model was able to detect 76.5% of the client which could churn, by using unseen data as example. It represents a recovery of R$ 2,878,197.97 for the company.
-
Cardio Catch Disease is a company specialized in detecting heart diseases in early stages. For every 5% above 50% of prediction accuracy, there is an increase of 50% on the value charged per client. So, in this data science project, I created a model with a recognition rate of 71.8% +/- 0.5% and the estimated profit generated by using this model may be about R$ 11,285,500.00.
-
To ideate a new strategy of investments in for each sale store may be difficult. Therefore, to help the stack holders to make decisions about individual investments for each and every store in the chain, this data science project created a machine learning model able to predict the sales up to six weeks in advance. Hence, enabling them to calculate the profit per store and the amount of money available to invest.
-
The Bookclub doesn't collect the data from its website, however they are updated with each purchase, sale or exchange that takes place on the website. For this purpose, this project aims to collect, transforma and load (ETL) data from the website books.toscrape for a MySQL database. The ETL is schenduled using Airflow. Both MySQL and Airflow plataform were active using Docker.
-
In the post "Metrics for Regression: Understanding R², MAE, MAPE, MSE, and RMSE Metrics", title in english, I wrote about metrics for regression models. The selected metrics were R², Mean Absolute Error (MAE), Mean Absolute Percentage Error (MAPE), Mean Squared Error (MSE) and Root Mean Squared Error (RMSE). During this post I explain about the calculation of metrics and how to interpret them.
-
The post "Going Beyond Accuracy: Understanding Balanced Accuracy, Precision, Recall and F1 score", title in english, I wrote about metrics to avaliate models of classification. The metrics covered were Balanced Accuracy, Precision, Recall and F1 score. During the post, I talk about how to calculate these metrics and understand them.
-
The post "Choose Your Words: An introduction to regular expressions and understanding them", title in english, I wrote a introduction about regular expression. Each expression is presented with its functionality within a sentence, along with an example.
-
The post "Feature Engineering: Techniques for dealing with missing data in a data science project", title in english, I wrote about techniques for dealing with missing data. Each technique presented is explained along with an example with images. Some techniques covered "removal of missing data", "mean or median imputation", "categorical imputation" and "missing value indicator".