Project completed on May 7, 2024.
In the world of beer, certain varieties stand out due to their versatile flavors, making them popular choices among consumers. A business owner aiming to meet popular demand needs to curate a simple yet appealing range of beers. However, given the overwhelming number of beer styles available, it is impractical to include every type in the inventory.
This project utilizes clustering analysis to assist the business owner in identifying a representative sample of beers. By examining various features of different beers (e.g. Astringency, Bitter, Alcohol etc), the analysis seeks to group them into distinct clusters, enabling the owner to select a diverse yet manageable assortment for their inventory.
analysis_and_report.ipynb
- Introduction
- Dataset Discussion
- Dataset Cleaning and Exploration
- Basic Descriptive Analytics
- Scaling Decisions
- Clusterability and Clustering Structure
- Clustering Algorithm Selection Motivation
- Clustering Algorithm #1: K-Medoids
- Clustering Algorithm #2: HAC with Ward's Linkage
- Discussion
- Conclusion
- Hopkin's Statistic
- t-SNE plot
- Elbow plot
- Average Silhouette score
- Silhouette plot
- Cluster Sorted Similarity Matrix
- K-Medoids Clustering
- Hierarchical Agglomerative Clustering (HAC) with Single, Complete, Average, Ward's linkages
- Dendrogram
beer_profile_and_ratings.csv
-- raw dataset (retreived from Kaggle)
presentation.pdf
-- a short presentation with the project overview