Clébio de Oliveira Júnior juniorcl

Hi there 👋! I'm Clébio de Oliveira Júnior

Physics Teacher and Data Scientist

I'm a physics teacher and a data scientist with a passion for technology.

I currently develop projects that prioritize solving business problems, from understanding them, analyzing data, to extracting insights and implementing the solution. I also continue to develop myself with improvement and study activities such as a portfolio of data science projects and I also write about the same topic on a blog on Medium.

For more details about my projects and each solution, they are described in the data science project section.

Analytics Tools

Data Collection and Storage: MySQL and PostgreSQL.
Data Processing and Analytics: Jupyter Notebook, Pandas, Numpy.
Development: Python, Git and Clean Code.
Data Visualization: Seaborn and Matplotlib.
Machine Learning Modeling: Classification, Regression, Clusterization, Time Series and Neural Network.
Machine Learning Deployment: Flask and Docker.

Data Science Projects

Olist Delivery Forecast

Olist is the largest departament store in Brazilian marketplaces. This project aims to develop and implement a model to predict the time in days until the delivery of a given product. in progress
Airbnb Scheduling Forecast

To help the booking of the Airbnb this data science project aim to create a machine learning model to predict the first booking of a new user. Unfortunately the database is very desbalanced which difficult the prediction of the model, the best result was 17.48% +/- 0.4% of accuracy. Therefore new approaches guided by the business will be necessary to improve the results.
Health Insurance Cross Sell

To help the sales team, this data science project was created to sort a list to improve the cross-selling. The model was able to organize that almost all interested customers (98.31% +/- 0.16%) stay on up to 50% of the list, saving half of the expenses incurred for calls. So, if each call costs R$ 15.00 in 20,000.00 there is an expense of R$ 300,000.00. Using the model it is possible to spend only R$ 150,000.00.
Transaction Fraud Detection

Financial transactions fraud is one of the biggest problems faced by financial institutions. Thus, this project uses data science and machine learning to detect and avoid fraudulent transactions. The model got a precision of 96.3% +/- 0.7% and a recall of 76.3% +/- 3.5%. The profit expected by the company is R$ 57,251,574.44.
Churn Prediction

When a client churns, it represents a problem, which results in money loss for the company. In this project, I created a solution using data to predict such behavior and avoid it. The machine learning model was able to detect 76.5% of the client which could churn, by using unseen data as example. It represents a recovery of R$ 2,878,197.97 for the company.
Cardiovascular Disease Prediction

Cardio Catch Disease is a company specialized in detecting heart diseases in early stages. For every 5% above 50% of prediction accuracy, there is an increase of 50% on the value charged per client. So, in this data science project, I created a model with a recognition rate of 71.8% +/- 0.5% and the estimated profit generated by using this model may be about R$ 11,285,500.00.
Rossman Store Sales

To ideate a new strategy of investments in for each sale store may be difficult. Therefore, to help the stack holders to make decisions about individual investments for each and every store in the chain, this data science project created a machine learning model able to predict the sales up to six weeks in advance. Hence, enabling them to calculate the profit per store and the amount of money available to invest.

Data Engineering Projects

Bookclub Data Storing

The Bookclub doesn't collect the data from its website, however they are updated with each purchase, sale or exchange that takes place on the website. For this purpose, this project aims to collect, transforma and load (ETL) data from the website books.toscrape for a MySQL database. The ETL is schenduled using Airflow. Both MySQL and Airflow plataform were active using Docker.

Blog Posts

Metrics para Regressão: Entendendo as métricas R², MAE, MAPE, MSE e RMSE

In the post "Metrics for Regression: Understanding R², MAE, MAPE, MSE, and RMSE Metrics", title in english, I wrote about metrics for regression models. The selected metrics were R², Mean Absolute Error (MAE), Mean Absolute Percentage Error (MAPE), Mean Squared Error (MSE) and Root Mean Squared Error (RMSE). During this post I explain about the calculation of metrics and how to interpret them.
Indo Além da Acurácia: Entendo a Acurácia Balanceada, Precisão, Recall e F1 score

The post "Going Beyond Accuracy: Understanding Balanced Accuracy, Precision, Recall and F1 score", title in english, I wrote about metrics to avaliate models of classification. The metrics covered were Balanced Accuracy, Precision, Recall and F1 score. During the post, I talk about how to calculate these metrics and understand them.
Escolha as suas palavras: uma introdução às expressões regulares e ao seu entendimento.

The post "Choose Your Words: An introduction to regular expressions and understanding them", title in english, I wrote a introduction about regular expression. Each expression is presented with its functionality within a sentence, along with an example.
Feature Engineering: Técnicas para lidar com dados faltantes em um projeto de ciência de dados.

The post "Feature Engineering: Techniques for dealing with missing data in a data science project", title in english, I wrote about techniques for dealing with missing data. Each technique presented is explained along with an example with images. Some techniques covered "removal of missing data", "mean or median imputation", "categorical imputation" and "missing value indicator".

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clébio de Oliveira Júnior juniorcl

Achievements

Achievements

Block or report juniorcl

Hi there 👋! I'm Clébio de Oliveira Júnior

Physics Teacher and Data Scientist

Analytics Tools

Data Science Projects

Olist Delivery Forecast

Airbnb Scheduling Forecast

Health Insurance Cross Sell

Transaction Fraud Detection

Churn Prediction

Cardiovascular Disease Prediction

Rossman Store Sales

Data Engineering Projects

Bookclub Data Storing

Blog Posts

Metrics para Regressão: Entendendo as métricas R², MAE, MAPE, MSE e RMSE

Indo Além da Acurácia: Entendo a Acurácia Balanceada, Precisão, Recall e F1 score

Escolha as suas palavras: uma introdução às expressões regulares e ao seu entendimento.

Feature Engineering: Técnicas para lidar com dados faltantes em um projeto de ciência de dados.

Pinned Loading