Skip to content

A machine learning workflow to effectively predict the hit rate of a song, given its various parameters owing to acousticness to danceability.

Notifications You must be signed in to change notification settings

Srinath-13/Hit-Song-Prediction

Repository files navigation

Hit-Song-Prediction

Hit Song Science uses machine learning tools to predict a song's success before publication. Hit Song Science may help artists and producers understand their audience and write songs that will appeal to them. Artists can better pick lyrics and customize songs to their audiences. Audio engineers may also help artists improve basic musical components to make songs more appealing, straightforward, and enjoyable. This research examines if song lyrics and audio quality might predict popularity. Statistics and analytics aside, music lovers worldwide have watched musical trends evolve throughout time. Popular songs are simple to spot, even when listeners have different musical preferences. Popular tunes evolve. The project seeks to discover hits using intrinsic music data

Dataset

You would want to take a look at the dataset. The Spotify Hit Predictor Dataset from Kaggle contains track features retrieved via the Spotify Web API. The recordings are labelled '1' or '0' ('Hit' or 'Flop') based on the author's criteria. This dataset can be used to create a classification model that predicts whether or not a song will be a "Hit." The dataset contains 41,106 instances recorded over 22 attributes viz., 'track', 'artist', 'uri', 'danceability', 'energy', 'key', 'loudness', 'mode', 'speechiness', 'acousticness', 'instrumentalness', 'liveness', 'valence', 'tempo', 'duration_ms', 'time_signature', 'chorus_hit', 'sections', 'target' and 'decade'.

Methodology

image

Results

Best Standalone Model

image

From the above ROC-AUC plot, it is concluded that Random Forest, SVM and K-NN are reliable and good performing models. These models alone will be considered for ensemble learning in the forthcoming steps.

Best Ensemble Model

image

Out of all these models, Voting Classifier with Soft Voting with SVM, KNN and RF (with its best hyper parameters) is chosen to be the best classifier owing to its highest score and relatively feasible computational complexity.

References

You might want to take a look at this Presentation