The purpose of this repository is an experiment in creating a distributed listing of web archives.
To accomplish, a new format, the Web Archive Manifest is introduced to describe web archives and what properties and APIs they support. The format is designed to be readable by humans and processed by new and existing software tools.
The goal is to highlight, and help promote the sizable (and growing list!) of publicly accessible web archives all over the world, in a distributed and democratic way.
A lot of people may be familiar with "Wayback Machine", but there are actually many wayback machines all over the world. Let's make them more widely known and accessible!
There is a YAML file following the WAM spec for each web archive in the webarchives directory.
YAML was chosen as it strikes a good balance between readability and is easily processable by a wide variety of tools in a variety of languages.
The intent is for the format to be a 'living standard' that may adapt as needed as web archives evolve.
There is also an index which specifies to include all files in the directory.
To add a new web archive, simply add a new .yaml file in this directory.
This listing is specifically for web archives which preserve and provide web content and make it publicly accessible.
While there are many great archives out there, this format and directory is specifically limited to web archives.
Any web archive can be included in the listing, even if they do not support any of the established apis.
For a list of currently supported apis, see the WAM Spec
This directory should also not be seen as an exhaustive list of all web archive apis, as many may support, custom or specific apis.
If there is another api spec that should be included in this shared listing, feel free to submit it as a request and/or suggest how it might be included!
The intent of this directory is to be:
- open-source and distributed (git is the perfect place for it!)
- independent from any specific product, service or protocol.
- presented in a human and machine-readable format
This directory and WAM format are intended to encourage interoperability and interconnectedness between different web archives.
Yes! It is important to recognize that there are a few existing lists out there, mostly originating from the Memento project.
-
The Memento Project at LANL deserves much credit for starting and maintaining achivelist.xml, a list of archives that support the Memento Protocol. This list is a key part of the time travel search engine and memento aggregator api service
-
The ODU Memento Aggregator project also contains such a list: archives.json
-
The oldweb.today project uses an earlier version of such a list: archives.yaml This list is used to provide archives accessible via the service.
-
Wikipedia also maintains a Listing of Web archiving initiatives
If there are other such lists, feel free to let us know or submit a pull request to include them here.
Anyone can contribute! We definitely encourage contributions to this repo to make it a truly distributed project:
-
If you have a web archive not in the directory, and you would like it to be included, feel free to make a PR adding the archive to a new yaml file.
-
If you have a web archive that is included, and you would like to remove it, feel free to make a PR and a brief note requesting the removal.
-
If you have a question about how to include a new type of web archive, please open an issue to discuss.
-
If you would like to make your fork, public or private, feel free to do that as the list is released into the public domain under CC0.
Yes! Currently, all the web archives are specified explicitly in this repository.
However, it would be really great if web archives start to 'advertise' what APIs they support and other information included in the WAM file.
For example, an archive could provide: http://myarchive.example.com/wam.yaml
and then the file need not be stored in this repository, and we would only need to add this url to the index
If adding support for WAM to a web archive, please let us know or submit a PR to include this information.
None yet!
But we hope that this will change, and would be happy to add any tools that make use of this format or listing, directly or directly.
A future release of pywb will likely add support for reading WAM format files.
Webrecorder may also use this directory to provide users the ability to work with existing web archives.
This web archive listing and the WAM format originates with the Webrecorder project, which aims to promote distributed web archiving, encouraging anyone to create and run their own web archives. Having a formal Web Archive Manifest, as well as a public, distributed web archive directory aligns perfectly with this mission.
This document, the WAM format and the accompanying web archive directory are released into the public domain under CC0.