This software will probably not tell you anything you don't already know.
It might help quantify things you already knew.
The style-guide produced by Trans Media Watch, like the Media Reference Guide released by GLAAD and the Guide For Journalists released by the TGEU are designed to help Journalists avoid transphobic language, rather than measure it. However, most of the rules from these guides can be implemented with simple regular expressions. This software crawls (UK) media sites, applies the rules to articles, and examines how the usage of transphobic terms and who and what they are applied to has changed. It was first thought of at the 2015 TransCode event held at PyCon UK.
- There is a noticable variation in the frequency of transphobic terms across UK media websites.
- There is a noticable variation in the kinds of transphobic terms across media websites.
- There is a noticable variation in the people and organisations mentioned in articles, depending on whether potentially transphobic terms are used.
- This software was not solicited by the three organizations mentioned.
- The style-guides take pains to mention that some terms sometimes considered transphobic may be used by trans people about themselves.
- Some terms may be used in direct quotations, or to discuss transphobia rather than engage in it.
- The presence or absence of a term is not an absolute indication of transphobia or otherwise.
- Website search features are not exhaustive, so the frequency of articles may be less reliable further back into the past.
- You are STRONGLY DISCOURAGED from deploying this software in its current state on a public-facing webserver.
- The warranty of this software according to its License is NONE AT ALL.
All this software does is to Count Things. Interpretation should be left to the reader.
For more thorough research from the trans community itself, beyond this automated approach you are strongly recommended The Trans Safety Network. Feature, and pull requests, or better still, forks from the LGBT community are welcome.
By default, articles containing the terms transgender
, transexual
and
intersex
are searched for. Articles are not counted if they do not contain
one of the terms, regardless of their appearance in search results. They are
rated red
if potentially offsensive, inaccurate or inappropriate terms are
used, yellow
if outdated or inappropriate medical terms are used, or green
if no annotations are found.
While website search functions might be biased to more recent articles, both "The Times" and "The Telegraph" exhibit a large increase in article frequency:
The number of articles that match the search criteria from "The Daily Mail" are consistently high.
Articles from magazines like "The Spectator", and online analogues such as "Spiked" or "The Critic" exhibit somewhat more frequent annotations from the style-guides:
Although The Guardian receives fewer annotations, it should be noted that this approach will not detect incidents such as its selective editing of an interview with Judith Butler, or its focus during its interview with "Handmaid's Tale" author Margaret Atwood.
It is not the intention of this software to circumvent pay-walls. Analysis of "The Times" and "The Telegraph" is performed via legitimate credentials.
It is a simple matter to plot the monthly count of each type of annotation. The keys on the right of the charts list terms in order of decreasing frequency.
"The Times" and "Telegraph" both show a recent fondness for the loaded but meaningless terms "biologically male or female".
Tabloids like "The Sun" and "The Daily Mail" have broadly similar results. The most common annotation "sex-change", is often considered sensationalism. "The Daily Mail" site features large numbers of syndicated articles from the US, which may explain the prominence of the term "bathroom bill". The simple approach of counting annotations cannot take into account the prominence given to articles, or their appearance on the front-page, however.
Magazines such as "The Spectator" and online equivalents have a substantially different focus. In all cases, the term "transgenderism", widely considered perjorative, is the most frequent.
"Transgenderism" is also more common than "sex-change" in The Guardian, though by far the most common term is "passing", which of course has a frequent common usage as well as its more controversial one.
The Python NLP library SpaCy can perform named entity recognition (NER) to extract people, places, organisations, etc... from texts. The most commonly occuring entities in articles can be plotted according to the article ratings.
For "The Times", athlete Caster Semenya has higher prominence in annotated articles compared to un-annotated ones. Penny Mordaunt's fleeting support of self-id during the 2022 Conservative Party leadership election also makes more frequent appearances in annotated articles.
The pattern is repeated in "The Telegraph". Named transgender individuals like Emily Bridges feature more heavily in annotated articles.
For "The Spectator", in articles rated green, the most common entities are political parties, countries, All-Party Parliamentary Groups (APPG) and Jordan Peterson. The red-rated entities start with the Harry Miller case, given that this refers to a court case concerning transphobic tweets, it is not surprising that it appears in annotated articles, as does the similar Margaret Nelson case. As for the other entities, "Lucas" may refer to Matt or Lucia Lucas, both LGBT individuals, and former Green Party LGBT spokesperson Aimee Challenor is included.
For "Spiked", the most common entities are contributors, but for entities in annotated articles, Stonewall rises in prominence.
The same happens for "The Critic".
This software is not yet ready for non-techical researchers. To get started, first install Docker and Docker-compose. Then, clone the repo, build the container and initialize the database:
git clone https://github.com/augeas/TransMediaLint.git
cd TransMediaLint
docker-compose build
./init_db
You can then start the servers with:
docker-compose up
It will take a little while for the various crawlers to be deployed to Scrapyd. At the moment, it is easiest to run crawers via curl:
curl http://localhost:6800/schedule.json -d project=the_sun -d spider=search
You can view the Scrapyd logs at http://localhost:6800. Current crawler projects include:
spiked
the_critic
the_daily_mail
the_guardian
the_spectator
the_sun
the_times
unherd
It will take a long time to crawl and annotate "The Daily Mail". Crawling "The Guardian" will require obtaining an API key.
Start the servers with:
GUARDIAN_KEY=YOUR_API_KEY \
TIMES_USERNAME=YOUR_EMAIL \
TIMES_PASSWORD=YOUR_PASSWORD \
TELEGRAPH_USERNAME=YOUR_EMAIL \
TELEGRAPH_PASSWORD=YOUR_PASSWORD \
docker-compose up
The current state of the API created with the Django Rest Framework can be explored at: http://localhost:8000/api/
The various charts can be viewed at:
http://localhost:8000/charts/rated_articles?source=SLUG
http://localhost:8000/charts/annotations?source=SLUG
http://localhost:8000/charts/source_entities?source=SLUG
http://localhost:8000/charts/rated_entities?source=slug&rating=red
where SLUG
is one of:
spiked
the-critic
the-daily-mail
the-guardian
the-spectator
the-sun
the-times
the-telegraph
- a (probably) Angular/React JS front-end to consume the API
- more web-crawlers for more sources
- more charts
- a search API to put Solr to good use
- preparing a VirtualBox appliance for non-techical users.
- .csv downloads