Ahead of the Oscars, DW wanted to look at what film chlichés say about Hollywood. So we analyzed the TVTropes.org entries for 6637 Oscar-eligible movies from 1929 to 2019. In this repository, you will find the methodology, data and code behind the two stories that came out of this analysis.
Read the full article on DW.com: English | German
How does the Academy pick an Oscar nominee? Our data analysis shows which tropes help a movie get nominated as "Best Picture". Hint: Drama helps. So does alcohol.
Idea, research, data analysis, visualization: Kira Schacht
Research and writing: Laura Döing
Read the full article on DW.com: English | German
Movies reflect stereotypes, but they also shape them. If Asians are portrayed as nerdy, black men as dangerous and Latinas as fiery, it influences perceptions. So what exactly does Hollywood have to say about various ethnic groups?
Story: Kira Schacht
On this page, you'll find an interactive table of the tropes we analyzed. Filter and search for your own stories!
tvtropes_analysis_oscars.R
Script file for Oscars analysis
tvtropes_analysis_stereotypes.R
Script file for stereotypes analysis
/cleaning
Folder for scripts and datasets for gathering and cleaning the data
/cleaning/tvtropes.R
R script for scraping the data
/datasets/list_tropes.csv
Comma-separated table with data for 6637 movies. Content see movies
data frame below.
/datasets/list_tropes.csv
Semicolon-separated table with data for 21789 tropes. Content see tropelist
data frame below.
Core datasets of the _analysis_
scripts:
movies
6637x16 data frame with metadata on each movie in the sample, including title, year, tvtropes link, genre and winner statustropelist
21789x4 data frame with list of unique encountered tropes with ID, title, frequency, frequency since 2000 and short link. The short link can be used to look up tropes by addinghttps://tvtropes.org/pmwiki/pmwiki.php/Main/
in front to form the URL.tropes
List with 6637 items, names are movie IDs as listed inmovies
and in the same order. List of trope names ocurring in each movie. Names used as intropelist
-
The following text will explain the process behind this story: Which data sources were used, how the data was gathered and how the analyses were conducted.
The user-generated wiki TVTropes documents the tropes that occur in any piece of media: Which TV shows claim Elvis is still alive? Which video games feature a creepy child character? Does a movie feature a white actor dressed up to look Asian?
Users collect and maintain the data on the website, so there is bound to be some margin of error: Some tropes may be falsely interpreted to be in a movie, or, even more likely, some tropes might not be documented even though they are featured. Movies might also be missing entirely. Newer movies, for example, are always more likely to be featured than older ones, and newer movies tend to have more documented tropes per movie, since, on average, more users edit those entries.
The TVTropes wiki was still used for this analysis because it is still one of the best options for getting detailed and large-scale data on something as complex as reccurring cinematic motifs. As a precaution, entries that were only edited by one user were excluded. To counter the effect of having more data on more recent movies, the analysis uses the frequency of a trope compared to all tropes, instead of the share of movies that have a trope, when tracking prevalence over time.
DW used the official Academy Awards eligibility rules to identify the movies to include in the sample. Oscar-eligible movies can be expected to be relatively successful in Hollywood, to they make for a good basis for analyzing the tastes of the Academy, as well as the stereotypes present in Hollywood movies. The eligibility rules can be read in full here, but this is the short version: All eligible motion pictures must be:
- feature length (defined as over 40 minutes),
- played for paid admission in a commercial motion picture theater in Los Angeles County
- for a qualifying run of at least seven consecutive days
- between January 1 and midnight of December 31 of the year preceding the awards ceremony.
It would be next to impossible to identify all the movies that satisfied these criteria in the 91 years since the Oscars were first hosted. But thankfully, any movie that passes the application process gets recorded in a "Reminder List" that the Academy sends out for the jury's consideration prior to the announcement of the nominees.
Link: https://www.atogt.com/askoscar/display-reminder-list-text.php
Sources for Fact-checking: official 91st reminder list | 90th | 89th | 88th
A list of all the movies eligible for an Oscar nomination, sent out yearly by the Academy. The reminder list has only been made publicly available online for the past four years, so the work of Richard "Flix" Brunner, who runs the website atogt.com, was used to provide a complete picture. Brunner has compiled the previous lists from online publications, historical archives and print copies he has collected over the past years. The publicly available reminder lists, as well as the database of nominees and winners, was used to check the data for accuracy.
Link: http://awardsdatabase.oscars.org/
The official Academy database provides an overview of all previous Oscar winners and nominees. It was used to check and complement the reminder list data and provide information on the movies' "Best Picture" winner/nominee status.
Link: https://www.imdb.com/interfaces/
The IMDB dataset title.basics.tsv.gz
contains metadata on all movies
The script in cleaning/tvtropes.R
contains the code used to scrape and combine the necessary data for this analysis. The steps are as follows. The R library rvest
was used for scraping and the data cleaning software OpenRefine was used for pattern matching between metadata and TVTropes entries.
- Scrape reminder lists from atogt.com for all years.
- Scrape nominees and winners from awards database
- make sure all are included in eligibility lists and spellings match the awards database
- Merge with IMDB data for further info where possible
Result: approx. 28200 eligible movies
- Search for entries by scraping Startpage results for:
"[movie title]" [movie year] host:tvtropes.org
- Filter out wrong links with pattern recognition / manual checks in OpenRefine
- for more info on the link matching decisions, see
cleaning/tvtropes.R
, lines 192 to 229
- for more info on the link matching decisions, see
Result: 7001 matched eligible movies, including all winners and all but 25 (5 %) of the nominees.
- Scrape list of all trope links (contain
pmwiki.php/Main/
) from entry, scrape number of contributing users - Filter movies with less than 2 contributing users
Result: 6637 matched eligible movies, 21789 unique encountered tropes
This is where the two stories differ. You'll find the analysis for the Oscars story in tvtropes_analysis_oscars.R
and the one for the stereotypes story in tvtropes_analysis_stereotypes.R
.
Script file: tvtropes_analysis_oscars.R
-
Descriptive analysis
- Most common tropes in 2019 nominees
- Check number of tropes and movies per year
- Which genres appear most often?
- By winner status: Share of genres in eligible vs. nominees vs. winners
-
Which film clichés will earn you an Oscars nomination?
- Criteria:
oscarsYear >= 2000 & movies.withtrope > 50
(tropes occur at least 50 times in movies since the year 2000) - Question: Which tropes have a statistically significant impact on a movie’s chances of getting a nomination, be it positive or negative?
- Test hypothesis: For each trope: Compare the likelihood of a nomination for movies with the trope to the likelihood of a nomination for movies without the trope
- Conduct Fisher’s exact text for each trope with alpha = 0.05 (Bonferroni-adjusted)
- Result:
- 32 tropes have a statistically significant impact on the nomination chances of a movie
- No tropes were found to have an impact on the chances of a nominee becoming a winner
- 2 tropes were found to have a significant impact on getting from eligible to winner
- For the article, we focused on the tropes that increase the chance of a nomination
- Criteria:
-
Compare the frequency of tropes in all movies to the frequency in just the nominees
- Show the difference in Academy taste vs. popular taste
- Highlight the tropes found to have a significant impact
Script file: tvtropes_analysis_stereotypes.R
- Pick ethnic groups to focus on and compile a list of tropes about them via keyword search and background research
- Main focus: Asians/Asian-Americans, black people
- Additional analysis: Latin people, Germans, British people, Russians
- Track tropes about Asians/Asian-Americans, black people over time
- Count occurence of relevant tropes, as well as number of all tropes, by decade
- Calculate frequency:
share of trope = number of occurrences / total number of tropes
- Get most common tropes about Latin people, Germans, British people, Russians
- Use total number of occurences
- Use only movies since 2000