Skip to content

augeas/TransMediaLint

Repository files navigation

TransMediaLint

This software will probably not tell you anything you don't already know.

It might help quantify things you already knew.

The style-guide produced by Trans Media Watch, like the Media Reference Guide released by GLAAD and the Guide For Journalists released by the TGEU are designed to help Journalists avoid transphobic language, rather than measure it. However, most of the rules from these guides can be implemented with simple regular expressions. This software crawls (UK) media sites, applies the rules to articles, and examines how the usage of transphobic terms and who and what they are applied to has changed. It was first thought of at the 2015 TransCode event held at PyCon UK.

  • There is a noticable variation in the frequency of transphobic terms across UK media websites.
  • There is a noticable variation in the kinds of transphobic terms across media websites.
  • There is a noticable variation in the people and organisations mentioned in articles, depending on whether potentially transphobic terms are used.

Some Caveats

  • This software was not solicited by the three organizations mentioned.
  • The style-guides take pains to mention that some terms sometimes considered transphobic may be used by trans people about themselves.
  • Some terms may be used in direct quotations, or to discuss transphobia rather than engage in it.
  • The presence or absence of a term is not an absolute indication of transphobia or otherwise.
  • Website search features are not exhaustive, so the frequency of articles may be less reliable further back into the past.
  • You are STRONGLY DISCOURAGED from deploying this software in its current state on a public-facing webserver.
  • The warranty of this software according to its License is NONE AT ALL.

All this software does is to Count Things. Interpretation should be left to the reader.

For more thorough research from the trans community itself, beyond this automated approach you are strongly recommended The Trans Safety Network. Feature, and pull requests, or better still, forks from the LGBT community are welcome.

Article Frequency

By default, articles containing the terms transgender, transexual and intersex are searched for. Articles are not counted if they do not contain one of the terms, regardless of their appearance in search results. They are rated red if potentially offsensive, inaccurate or inappropriate terms are used, yellow if outdated or inappropriate medical terms are used, or green if no annotations are found.

While website search functions might be biased to more recent articles, both "The Times" and "The Telegraph" exhibit a large increase in article frequency:

rated articles from The Times rated articles from The Telegraph

The number of articles that match the search criteria from "The Daily Mail" are consistently high.

rated articles from The Sun rated articles from The Daily Mail

Articles from magazines like "The Spectator", and online analogues such as "Spiked" or "The Critic" exhibit somewhat more frequent annotations from the style-guides:

rated articles from The Spectator rated articles from Spiked rated articles from The Critic

Although The Guardian receives fewer annotations, it should be noted that this approach will not detect incidents such as its selective editing of an interview with Judith Butler, or its focus during its interview with "Handmaid's Tale" author Margaret Atwood.

rated articles from The Guardian

It is not the intention of this software to circumvent pay-walls. Analysis of "The Times" and "The Telegraph" is performed via legitimate credentials.

Annotation Lable Frequency

It is a simple matter to plot the monthly count of each type of annotation. The keys on the right of the charts list terms in order of decreasing frequency.

"The Times" and "Telegraph" both show a recent fondness for the loaded but meaningless terms "biologically male or female".

annotation labels from The Times annotation labels from The Telegraph

Tabloids like "The Sun" and "The Daily Mail" have broadly similar results. The most common annotation "sex-change", is often considered sensationalism. "The Daily Mail" site features large numbers of syndicated articles from the US, which may explain the prominence of the term "bathroom bill". The simple approach of counting annotations cannot take into account the prominence given to articles, or their appearance on the front-page, however.

annotation labels from The Sun annotation labels from Daily Mail

Magazines such as "The Spectator" and online equivalents have a substantially different focus. In all cases, the term "transgenderism", widely considered perjorative, is the most frequent.

annotation labels from The Spectator annotation labels from Spiked annotation labels from The Critic

"Transgenderism" is also more common than "sex-change" in The Guardian, though by far the most common term is "passing", which of course has a frequent common usage as well as its more controversial one.

annotation labels from The Guardian

Named Entity Recognition

The Python NLP library SpaCy can perform named entity recognition (NER) to extract people, places, organisations, etc... from texts. The most commonly occuring entities in articles can be plotted according to the article ratings.

For "The Times", athlete Caster Semenya has higher prominence in annotated articles compared to un-annotated ones. Penny Mordaunt's fleeting support of self-id during the 2022 Conservative Party leadership election also makes more frequent appearances in annotated articles.

named entities from Times articles rated green named entities from Times articles rated red named entities from Times articles rated red (detail)

The pattern is repeated in "The Telegraph". Named transgender individuals like Emily Bridges feature more heavily in annotated articles.

named entities from Telegraph articles rated green named entities from Telegraph articles rated red

For "The Spectator", in articles rated green, the most common entities are political parties, countries, All-Party Parliamentary Groups (APPG) and Jordan Peterson. The red-rated entities start with the Harry Miller case, given that this refers to a court case concerning transphobic tweets, it is not surprising that it appears in annotated articles, as does the similar Margaret Nelson case. As for the other entities, "Lucas" may refer to Matt or Lucia Lucas, both LGBT individuals, and former Green Party LGBT spokesperson Aimee Challenor is included.

named entities from Spectator articles rated green named entities from Spectator articles rated red

For "Spiked", the most common entities are contributors, but for entities in annotated articles, Stonewall rises in prominence.

named entities from Spiked articles rated green named entities from Spiked articles rated red

The same happens for "The Critic".

named entities from The Critic articles rated green named entities from The Critic articles rated red

Getting Started

This software is not yet ready for non-techical researchers. To get started, first install Docker and Docker-compose. Then, clone the repo, build the container and initialize the database:

git clone https://github.com/augeas/TransMediaLint.git
cd TransMediaLint
docker-compose build
./init_db

You can then start the servers with:

docker-compose up

It will take a little while for the various crawlers to be deployed to Scrapyd. At the moment, it is easiest to run crawers via curl:

curl http://localhost:6800/schedule.json -d project=the_sun -d spider=search

You can view the Scrapyd logs at http://localhost:6800. Current crawler projects include:

  • spiked
  • the_critic
  • the_daily_mail
  • the_guardian
  • the_spectator
  • the_sun
  • the_times
  • unherd

It will take a long time to crawl and annotate "The Daily Mail". Crawling "The Guardian" will require obtaining an API key.

Start the servers with:

GUARDIAN_KEY=YOUR_API_KEY \
TIMES_USERNAME=YOUR_EMAIL \
TIMES_PASSWORD=YOUR_PASSWORD \
TELEGRAPH_USERNAME=YOUR_EMAIL \
TELEGRAPH_PASSWORD=YOUR_PASSWORD \
docker-compose up

The current state of the API created with the Django Rest Framework can be explored at: http://localhost:8000/api/

The various charts can be viewed at:

  • http://localhost:8000/charts/rated_articles?source=SLUG
  • http://localhost:8000/charts/annotations?source=SLUG
  • http://localhost:8000/charts/source_entities?source=SLUG
  • http://localhost:8000/charts/rated_entities?source=slug&rating=red

where SLUG is one of:

  • spiked
  • the-critic
  • the-daily-mail
  • the-guardian
  • the-spectator
  • the-sun
  • the-times
  • the-telegraph

TO DO

  • a (probably) Angular/React JS front-end to consume the API
  • more web-crawlers for more sources
  • more charts
  • a search API to put Solr to good use
  • preparing a VirtualBox appliance for non-techical users.
  • .csv downloads

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages