data story | repository structure | results.ipynb
Note: if you already viewed our data story while we were still working, make sure to reset browser cache
The project goal is to explore the relationship between actors' traits — such as age, gender, ethnicity — and the character archetypes they portray in films. By analyzing casting patterns, this project aims to find out how specific actor profiles consistently coincide with archetypal roles like heroes, villains, mentors, or lovers. Our goal is to uncover whether certain traits predispose actors to particular roles and identify any underlying biases in casting decisions. This research also explores how these patterns vary across different film industries and across time. Ultimately, we aim to tell the story of how an actor's characteristics shape their cinematic destiny, influencing not only their career trajectory but also how audiences perceive iconic characters on screen.
Note: all questions are analyzed and answered in our data story!
-
What Are the Most Common Character Archetypes in Movies?
What are the most prevalent character archetypes found in films, and how can we define and clusterize them based on existing papers and our own research? Identifying these archetypes involves exploring recurring character types and their roles within various plots. -
Which Actor Traits Correspond to Specific Archetypes?
Which actor traits — such as age, gender, ethnicity, and other physical attributes — are typically associated with specific archetypes? For instance, are certain traits more frequently linked to roles like heroes, villains, or mentors? Investigating these correlations can reveal patterns in casting decisions. -
Do Casting Patterns Exhibit Biases Based on Actor Traits?
Do casting patterns exhibit biases based on actor traits like age, gender, or ethnicity? Are there noticeable trends in how certain demographics are cast in specific roles? Examining these patterns can shed light on potential biases within the casting industry. -
How Do Casting Trends Vary Across Genres and Film Industries?
How do these casting trends vary across different genres and film industries, such as Hollywood compared to Bollywood? Are there differences in how actors are cast for similar archetypes in different cultural or geographic contexts? Comparing casting practices can highlight cultural influences on the film industry. -
How Did Casting Trends For Different Archetypes Vary Across Time?
Can we notice shifts in casting process based on the change of actor traits correspondence to archetypes? For example, how did the image of hero or villian change over time? Can we match these changes, if present, to some events in the world? -
Are Certain Actors More Likely to Be Typecast into Specific Roles?
Are certain actor profiles more likely to be typecast into specific roles? Do actors with particular traits find themselves repeatedly cast in similar roles throughout their careers? Analyzing actors' careers might reveal tendencies toward typecasting. -
What Is the Composite Image of the "Ideal" Actor for a Given Character Archetype?
What traits define the "ideal" actor for a character archetype? Can we develop actors profiles based on common traits? Understanding these ideals sheds light on industry standards and expectations.
-
Wikipedia
We enhance our initial data with the Wikipedia API, resolving missing information. We collect actors' descriptions, including gender, ethnicity, and height, and movie descriptions. Movie descriptions are crucial for identifying characters' archetypes, as the main actors are typically highlighted. We later use these descriptions for archetype inference. -
Freebase
We used the dataset to extract structured information about actors that is not present in the original dataset. We do so by processing the full freebase dump (3 billion entries). -
IMDB
IMDB API is used to extract more unstructured information about movie plot and characters descriptions. -
Paper: Learning Latent Personas of Film Characters
The paper's authors explore the categorization of movie characters based on their personas. Drawing from their findings, they propose a dataset of character archetypes. We leverage this data to refine our classification solution and evaluate its performance.
-
Additional Data Gathering
The initial dataset is significantly affected by missing data. To enhance our archetype descriptions, we need more actors' characteristics. This step addresses missing information and adds actors' traits and movie descriptions to facilitate further analysis. -
Enriching Data Using Generative AI
As movie persona archetypes are derived from data, we develop a method that extracts archetypes for each main actor in a movie using generative AI and a common question inference method, such as fine-tuning, few-shot answering, or zero-shot answering. We use different models for that: Gemini-flash, ChatGPT-4o, ChatGPT-3.5, achieving high quality. -
Exploratory Data Analysis
This step involves exploring the data to address the proposed questions. It includes identifying relationships between features, tracking changes over time, and determining crucial archetypes for the project. It may also exclude or merge some archetypes. This is an essential part for our data story.
- Enriching Data. We will continue to gather more data about actors to develop more comprehensive archetypes. Furthermore, we will also collect data about movies to ensure that the AI has access to the most relevant context.
- Archetypes Inference. We use LM to predict and evaluate archetypes. This process is iterative as it is dependent with the data enrichment step.
- Archetypes EDA. We distribute the project questions among the team members and answer them based on the data we acquired.
Internal milestones are outlined in the "Proposed Timeline" section. Organisation of the team:
-
until P2
- Kirill Z — exploring LM solutions for predicting characters' archetypes and drafting a working example, showing that we can infer archetypes for new movies and actors.
- Andrew — working on data collection pipeline for actors features.
- Kirill A — working on data collection pipeline for movies features.
- Seva — explores the initial dataset, guides data gathering for Kirill A and Andrew.
- Alex — responsible for team coordination, research planning, this document drafting, and milestone result presentation.
-
until P3
- each member: EDA & answering the proposed questions.
- Kirill Z — improving archetypes inference pipeline.
- Kirill A, Andrew, Seva, Alex — improving data gathering pipelines and finalising data for EDA.
-
final contributions P3
- EDA: data story contribution
- Introduction — Seva
- Question 1 — Kirill Z
- Question 2 — Seva
- Question 3 — Seva
- Question 4 — Alex
- Question 5 — Andrew
- Question 6 — Kirill A
- Question 7 — Kirill Z
- Conclusion — Alex
- Kirill Z & Alex — improving archetypes inference pipeline, making final archetypes predictions for EDA.
- Alex — processing freebase and extracting information about actors and movies, extracting movie summaries from wikipedia
- Kirill Z & Alex — script for rendering data story to Jekyll website
- Andrew — IMDB data extraction: movies and actors
- EDA: data story contribution