JournalList

This is a repository for code developed for JournalList.net.

It contains the following files:

cron.sh - a bash shell script that runs the python webcrawler, processes the results through sqlite, and generates graphml files of the results.
webcrawler.py - a python script that recursively crawls trust.txt files to capture the state of the trust.txt ecosystem. It captures a copy of all of the trust.txt files it finds and generates a .csv file of the contents of all of them.
init.sql - the initialization sqlite script that creates the intermediate tables used in the following sql script.
symmetric - a sql script that generates .csv files containing the symmetric links in the trust.txt ecosystem and list of associations, publishers, and vendors discovered.
graphml.py - a python script that generates three graphml files containing the symmetric links, the assymetric links, and the full ecosystem including both the symmetric and asymmetric links.
qa_trust_txt.py - a python script that parses a trust.txt file and lists any errors it contains.
genjson.sh - a shell script that generates two JSON files suitable for import into the ArangoDB graph database for social network analysis
genlink.awk - an awk script that generates the link.json file for import into ArangoDB
genurl.awk - an awk script that generates the url.json file for import into ArangoDB
scrapesite.py - a python script that scrapes websites to scan one or more sites and find all social, contact, and vendor links, as well as control links and copyright.
tpa.awk - an example awk script that process an output.csv file from scrapesite to generate multiple trust.txt files.

License: Creative Commons Attribution-NonCommercial-ShareAlike license. See: https://creativecommons.org/

Name		Name	Last commit message	Last commit date
Latest commit History 65 Commits
JournalList Ecosystem Webcrawler.pdf		JournalList Ecosystem Webcrawler.pdf
README.md		README.md
combine.sh		combine.sh
control.sh		control.sh
cron.sh		cron.sh
diff.sh		diff.sh
genjson.sh		genjson.sh
genlink.awk		genlink.awk
genurl.awk		genurl.awk
graphml.py		graphml.py
init.sql		init.sql
qa_trust_txt.py		qa_trust_txt.py
queries-Webcrawl.json		queries-Webcrawl.json
sitescrape.py		sitescrape.py
symmetric.sql		symmetric.sql
tpa.awk		tpa.awk
trust2fps.py		trust2fps.py
webcrawler.py		webcrawler.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

JournalList

About

Releases

Packages

Languages

brownwolf1355/JournalList

Folders and files

Latest commit

History

Repository files navigation

JournalList

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages