Skip to content

casanave/casanave

Folders and files

NameName
Last commit message
Last commit date

Latest commit

3440b39 · Oct 21, 2023

History

26 Commits
Oct 21, 2023

Repository files navigation

Hi there 👋 I'm Louis

INTRO VIDEO

Screen Shot 2023-02-08 at 11 23 20 AM

I am a storyteller who is passionate about data, writing, public speaking and building community. I hold a special place in my heart for Natural Language Processing (spaCy, NLTK,) and inferential modeling (statsmodels, scikit-learn, and using SHAP or LIME explainers.) Additionally, I have experience with data wrangling, scraping, and engineering, (pandas, selenium,) and using machine learning algorithms (decision tree/Random Forest, Linear and Logistic regression analysis.) I also adore making visualizations (seaborn, matplotlib,) and using maps (folium.) I have experience using APIs (Google, Yelp, Twitter) as well as using public data sets (NYC Open Portal.) A firm believer in a well-written code comment, accessible documentation, and better coding with functions.

Projects

Pumpkin_Spice (Time series analysis and Modeling)

ReadMe

Main Code Notebook

Time series analysis with naive, SARIMA and ETS modeling.
  • Google trends last 5 years of data with the "pumpkin spice" search term
  • Performed both additive and multiplicative time series decomposition of data with explinations
  • Tested naive models that use last year and last week's data as today's direct prediction
  • Tested SARIMA and ETS itterations of models with terms based on both ACF and PACF plots, as well as an itterative approach with PMD Auto Arima
  • Best model was an ETS model with MAPE: 12.118428034881385, RMSE: 4.429939957414803 which is roughly 2-3 times better than using last year's data
  • Downloaded final model and will verify last year's data in one year for verification blindness.

HurriHelp (Natural Language Processing Task and Modeling)

ReadMe

Main Code Notebook

Natural Language Processing project using sentiment analysis to help find Hurricane Ian survivors in distress, and provide them with links to National Disaster Distress Helpline and FEMA Info.
  • Scraped Twitter for over seven thousand original tweets
  • Used three different sentiment analysis analyzers (TextBlob, VADER, and a distillBERT model) to find the sentiment of tweets
  • Used analysis to explore the most common words in negative and positive sentiments.
  • Used Random Forest, Naive Bayes, XGBoost, and Catboost on vectorized tweets for 80% precision in the best-tuned model
  • Used Lime Text Explainer for the inferential understanding of the most common words in data, and found words most likely to be in negative tweets

Fetal Health Project (Catigorization Task and Modleing)

ReadMe

Main Code Notebook

Health information project for early detection and faster diagnosis of pathological fetal heartbeats.
  • Built data pipelines and created two early detection algorithms ready for A/B testing
  • Tested Logistic Regression, K-Nearest-Neighbors, Support Vector Machine, and Random Forest for the highest recall of pathological class of 92%
  • Used OneVsRest wrapper and GridSearchCV for model specialized to pathological class and best-tuned machine.
  • Used Shap Values for the inferential understanding of the model, and reduced dimensionality of the algorithm by five input features for 90% recall

Seattle Section 8 Expansion Project (Regression Task and Modeling)

ReadMe

Code Notebook

Built inferential linear regression model to inform the city of Seattle on where to build new public housing.
  • Performed in-depth EDA, determining the effect of renovations over time on property values for efficient budgeting.
  • Used Google’s geocoding API and selenium to engineer features: how close each property is to the closest public school, hospital, and police station for infrastructure insights related to property values
  • Visualized multicollinearity in seaborn with correlation heat maps, for feature selection
  • Iteratively constructed eight versions of the linear regression model, removing features with low P Values for an R-Squared score of 83% Produced automated findings report for human analysis of coefficients, including inferential analysis of inequity by zip code

Film Analysis Project (Exploritory Data Analysis)

ReadMe

Code Notebook

Advised hypothetical Microsoft Studios on what kinds of feature films to produce by analyzing box office financial data.
  • Aggregated data using Pandas and visualized in Seaborn to discover the most popular genres by net profit: Animation, Adventure, Fantasy and Family films
  • Pivoted Table to aggregate films of most popular genres by release date to infer seasonal trends: best months for releasing new films in most popular genres to be June, July, May, March, November, and April
  • Inferred best run time for new films is between 120-150 minutes based on historical popularity

Stop And Frisk Project

Tableau Dashboard,

ReadMe

Code Notebook

Tableau Public dashboard using demographic information from 2021 for public transparency.

Geospacial Vizualization

ReadMe

First Code Notebook

In-depth analysis, static geocoding visualization of consent-asked for rates in 2020 for public transparency.
  • Visualized frequencies of stops and consent-asked in 2020 NYPD stop and frisk data towards understanding inequities in policing.
  • Investigated anomalies and outliers, patterns, and relationships for insights into different zip codes and communicate results with choropleth maps and graphs made in Folium and Seaborn
  • Resulting artical Using Folium On Police Data published by Towards Data Science

LINKS

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published