Skip to content

Lyric analysis of songs from five genres to compare word frequency, substance use references and sentiment.

Notifications You must be signed in to change notification settings

okekejus/lyric-analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

34 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Table of Contents

  • Background
  • Objective
  • Tools and Packages
  • Data Collection & Processing
  • Data Visualization
  • Results
  • Future Work

Background

I read an article which claimed Country songs make the most reference to substances in music. I've personally always thought Hip-Hop would win this by a landslide, and decided to use my resources to conduct my own research!

Objective

Analysis of song lyrics from five genres (Pop, Hip-Hop, Country, Rock, R&B) to see which reference substances the most.

Tools and Packages

I used a combination of Python and R for this project. Python is generally better for API calls/Web scraping, so I chose to take advantage of this functionality. I prefer R for plotting graphs & data exploration, so I switched programs after successfully downloading the lyrics.

R packages:

  • tidyverse: Data manipulation & analysis
  • tidyjson: Structuring .json data into tidy data frames
  • rjson: Conversion of .json objects into R objects.
  • tidytext: Editing text data using tidy data principles
  • furrr: Future mapping (parallel processing similar to purrr)
  • gsubfn: String manipulation
  • plyr: Split/Apply/Combine strategies for data

Python modules:

  • lyricsgenius: Client for Genius API
  • pandas: Data manipulation and analysis
  • dask: Parallel processing
  • os: Operating system interfaces
  • json: Working with json files

Data Collection & Processing

Method Notes
search_artist 150 * 50 songs downloaded in ~ 20 minutes
save_lyrics Downloaded .json files to drive
dask.compute Multiprocessing for search_artist

Data Cleaning

  • Change all lyrics to lower case
  • Tokenization of words
  • Changing plural mentions to singular - ex. "girls" to "girl"

Results

Total words per genre

When all the lyrics were downloaded and filtered, Hip Hop was the genre with the most words, with over 20,000 in comparison to other genres:

image

image

Top words per genre

"Love" and "Yeah" were top words in all genres.

image

I thought it would be cool to see which genre made reference to love the most, so I did just that. By dividing the number of mentiones of the word "love" by the total words in each song, I was able to get percentage values.

image

image

R&B makes the most references to love (duh), with Pop in second place. Hip Hop mentions it the least of all genres.

Swear words

"swear words" were words in this list found within the lyrics: "fuck", "shit", "bitch", "damn", "cunt", "slut", "whore", "ho", "piss", "bollocks" (for the British artists!), "dick", "cock".

First, I found the most common swear words per genre:

image

For this category, I expected Hip Hop to be at the top, by a lot (it was). I wasn't too sure what the rest of the rankings would look like for the other genres, and I was a little surprised by the results:

image

43% of rap music is swear words! R&B is in second place, with a shocking 36% difference.

image

I didn't bother looking through the most common swear words to figure out who would reference them the most - safe to say Hip Hop wins this round.

Substance References

I grouped "Substances" into 7 categories: Marijuana (weed), Alcohol, Heroin, Meth, Pills, Cocaine, Ecstasy (including LSD, shrooms, molly). Hip Hop was in first place in terms of substance mentions, but I was shocked to see what was in second place:

image

Country music! I expected Pop/R&B to be in second, but apparently that isn't the case. Of all substances, Alcohol was the most commonly referenced, with Marijuana in 2nd place.

I decided to compare references to these two substances between groups:

Alcohol

image

Country music references alcohol the most! By quite a lot in comparison to the other genres as well. Hip Hop is in second place with this one.

Marijuana

image

Hip Hop references marijuana the most, far more than other genres!

Violence

To capture mentions of violence, I gathered words related to aggression (as much as I could think of, present in the code) and filtered each genre for mentions. Hip Hop was once again first in this category, with Pop narrowly beating out Rock for second place.

image

image

Sentiment Analysis

Lastly, I thought it would be cool to add a sentiment score to see how the genres stacked up against each other. I expected Hip-Hop to be far in the negatives due to the quantity of violence/substance/swear words present.

image

image

As you can see, that is the case! In fact, all genres are in the negatives, with the exception of R&B music, which is in the low positives (makes sense because they're talking about love so much).

I decided to do the same thing, but by artist to see if any were far more negative than others

image

Most negative artists

  1. Eminem
  2. 2Pac
  3. Lil Wayne
  4. DMX
  5. JAY-Z

Most positive artists

  1. Whitney Houston
  2. Mary J. Blige
  3. Celine Dion
  4. Stevie Wonder
  5. Janet Jackson

Future Work

A larger collection of lyrics to analyze would be a major benefit as this would allow analysis of the same metrics accross more genres/languages, allowing for more accurate findings.

About

Lyric analysis of songs from five genres to compare word frequency, substance use references and sentiment.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published