Skip to content

Commit

Permalink
Merge remote-tracking branch 'upstream/master'
Browse files Browse the repository at this point in the history
  • Loading branch information
GitHub Actions Bot committed Mar 15, 2024
2 parents 700f70c + 737398e commit 53bc344
Show file tree
Hide file tree
Showing 13 changed files with 90 additions and 18 deletions.
8 changes: 8 additions & 0 deletions bbc.co.uk.txt
100644 → 100755
Original file line number Diff line number Diff line change
Expand Up @@ -90,3 +90,11 @@ test_url: http://feeds.bbci.co.uk/news/rss.xml
test_url: http://feeds.bbci.co.uk/sport/0/football/rss.xml?edition=int
# video entry
test_url: http://www.bbc.co.uk/news/world-asia-22056933

# bbcthree long articles / gel (?) articles?
title: //h1[contains(concat(' ',normalize-space(@class),' '),' LongArticle-headline ')]
body: //div[contains(concat(' ',normalize-space(@class),' '),' LongArticle-body ')]
author: //span[contains(concat(' ',normalize-space(@class),' '),' Info-authorName ')]
date: //time[contains(concat(' ',normalize-space(@class),' '),' Info-timestamp ')]
test_url: https://www.bbc.co.uk/bbcthree/article/991ea351-1e67-46dc-824d-a13033526ca6
test_contains: In essence, Professor Rippon believes that our 'male' and 'female' behaviour is not innate but learned as we develop.
2 changes: 1 addition & 1 deletion bbc.com.txt
100644 → 100755
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
body: //article[contains(@class, "-ArticleWrapper")]
body: (//div[contains(@class, 'article__body-content')])[1]
body: //article[@class='article__body'] | (//div[contains(@class, 'article__body-content')])[1]
body: (//div[@id="story-body"])[1]

# for video entries
Expand Down
13 changes: 0 additions & 13 deletions blog.trendmicro.com.txt

This file was deleted.

6 changes: 6 additions & 0 deletions cleafy.com.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
body: //div[contains(@class, 'full blog')]
title: //h1[@class='clabtitle']

strip_id_or_class: author-block

test_url: https://www.cleafy.com/cleafy-labs/on-device-fraud-on-the-rise-exposing-a-recent-copybara-fraud-campaign
2 changes: 2 additions & 0 deletions digital-photography-school.com.txt
Original file line number Diff line number Diff line change
Expand Up @@ -5,4 +5,6 @@ body: //section[@class='content']
strip: //div[@class='post-meta']
strip: //div[contains(@class, 'topics-list')]

replace_string(wp-block-heading):

test_url: http://www.digital-photography-school.com/10-ways-to-develop-yourself-photographically
11 changes: 11 additions & 0 deletions fmhy.net.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
body: //main[@class='main']

strip: //h1[1]

prune: no

# for wallabag
wrap_in(blockquote): //div[@class='tip custom-block']
wrap_in(blockquote): //div[@class='warning custom-block']

test_url: https://fmhy.net/unsafesites
4 changes: 4 additions & 0 deletions labs.ripe.net.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
body: //article[@class="prose"]

test_url: https://labs.ripe.net/author/kathleen_moriarty/the-llm-misinformation-problem-i-was-not-expecting/
test_contains: The problems around vetting AI results won't be going away anytime soon.
3 changes: 3 additions & 0 deletions lemonde.fr.txt
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,7 @@ strip: //a[contains(@class, 'Header__offer')]

# Remove gift modal
strip_id_or_class: article__gift-modal
strip_id_or_class: reuse__content

# Remove the insane "conjugaison.lemonde.fr" links:
find_string: <a target='_blank' onclick='return false;' class='lien_interne conjug'
Expand Down Expand Up @@ -77,3 +78,5 @@ test_url: http://www.lemonde.fr/big-browser/article/2017/10/27/assassinat-de-ken
test_url: https://www.lemonde.fr/pixels/article/2018/07/14/douze-jeux-video-pour-s-amuser-a-plusieurs_5331269_4408996.html
test_url: https://www.lemonde.fr/sante/video/2016/04/07/diabete-pourquoi-une-telle-progression-de-l-epidemie_4898147_1651302.html
test_url: https://www.lemonde.fr/pixels/article/2023/12/20/manettes-de-playstation-sony-condamne-a-13-5-millions-d-euros-d-amende-pour-des-pratiques-anticoncurrentielles_6206941_4408996.html
test_url: https://www.lemonde.fr/planete/article/2024/02/29/contre-le-frelon-asiatique-les-apiculteurs-appellent-a-la-mobilisation-generale_6219215_3244.html

10 changes: 10 additions & 0 deletions medscape.com.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
body: //div[contains(concat(' ',normalize-space(@class),' '),' article__main-content ')]
title: //h1[@class='article__title']
author: //p[@class='meta__author-name']
date: //p[@class='meta__date']

prune: false
tidy: true

test_url: https://www.medscape.com/viewarticle/new-evidence-suggests-long-covid-could-be-brain-injury-2024a10002v0?form=fpf
test_contains: Brain fog is one of the most common, persistent complaints in patients with long COVID.
11 changes: 7 additions & 4 deletions msn.com.txt
Original file line number Diff line number Diff line change
@@ -1,7 +1,10 @@
http_header(user-agent): curl/7.54.0
http_header(user-agent): Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)

body: //div[contains(concat(' ',normalize-space(@class),' '),' articlecontent ')]/div[contains(concat(' ',normalize-space(@class),' '),' richtext ')]
body: //*[@class='article-body']

strip: //div[contains(concat(' ',normalize-space(@class),' '),' articlecontent ')]/div[contains(concat(' ',normalize-space(@class),' '),' richtext ')]/div[not(contains(concat(' ',normalize-space(@class),' '),' e-image '))]
strip_id_or_class: continue-reading-slot

test_url: https://www.msn.com/en-us/news/technology/followers-are-a-valuable-currency-who-should-own-them/ar-AAXfrUU
prune: no
tidy: no

test_url: https://www.msn.com/en-us/news/us/mass-casualty-incident-at-us-mexico-border/ar-BB1jice0
10 changes: 10 additions & 0 deletions publications.aap.org.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
titl: //h1
body: //div[contains(concat(' ', normalize-space(@class), ' '), ' article-body ')]
author: //div[contains(concat(' ', normalize-space(@class), ' '), ' al-authors-list ')]
date: //span[contains(concat(' ', normalize-space(@class), ' '), ' article-date ')]

prune: false
tidy: false

test_url: https://publications.aap.org/pediatrics/article/doi/10.1542/peds.2023-062570/196606/Postacute-Sequelae-of-SARS-CoV-2-in-Children?autologincheck=redirected
test_contains: The coronavirus disease 2019 (COVID-19) pandemic has caused unprecedented devastating medical, social, and economic impacts
12 changes: 12 additions & 0 deletions ritimo.org.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
strip: //aside
strip_id_or_class: liste-articles
author: //span[@class='authors']
title: //h1
body: //div[@class='main']
date: //p[@class='publication']/time

tidy: false
prune: false

test_url: https://www.ritimo.org/Guide-de-survie-des-aventures-sur-internet-10313
test_contains: Une surveillance abusive est à craindre, qu’elle soit ciblée ou massive
16 changes: 16 additions & 0 deletions trendmicro.com.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
body: //p[@class='article-details__description'] | //article/main
author: substring-after( //p[@class='article-details__author-by']/text() , 'By: ')

strip_id_or_class: article-details__bar
strip_id_or_class: article-details__display-tag
strip_id_or_class: article-details__title
strip_id_or_class: article-details__author-by
strip_id_or_class: article-details__icons
strip_id_or_class: research-layout-divider

strip: //sidebar

prune: no

test_url: https://www.trendmicro.com/en_us/research/24/c/unveiling-earth-kapre-aka-redcurls-cyberespionage-tactics-with-t.html
test_url: https://www.trendmicro.com/en_us/research/17/c/winnti-abuses-github.html

0 comments on commit 53bc344

Please sign in to comment.