You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It would be very useful if warc.gz files are also made for the url shorteners we are archiving.
The chance of people looking in the wayback machine for an url (shortener) is probably bigger then the chance of looking through the .xz files for the shortener they are looking for.
The text was updated successfully, but these errors were encountered:
If you want to record as WARC files easily, you'll need an agent that supports recording HTTP traffic accurately to WARC files. Some example agents include Heritrix, Wget, and Wpull but these are web crawlers.
If you can get raw HTTP request and responses from Python Requests, then you try to build a WARC file yourself. I wrote a WARC library called Warcat which is supported under Python 3. I also wrote Wpull which runs under Python 3 and maybe you can take code from it.
It would be very useful if warc.gz files are also made for the url shorteners we are archiving.
The chance of people looking in the wayback machine for an url (shortener) is probably bigger then the chance of looking through the .xz files for the shortener they are looking for.
The text was updated successfully, but these errors were encountered: