Skip to content
This repository has been archived by the owner on Apr 7, 2022. It is now read-only.

Latest commit

 

History

History
21 lines (11 loc) · 1.75 KB

README.md

File metadata and controls

21 lines (11 loc) · 1.75 KB

scraper_dominios

Exploring Github Actions to scrape data from argentina internet domain registrations.

The data is found here, it is named in the form Year Month Day .csv and is updated every night when there is new data in the official bulletin.

The fun part is that the data is updated via a Github Action (found here that runs every night with a cron and commits the new data to the repo if there is any.

Limitations of using Github Actions as scrapers

The biggest limitation is that there is a maximum of 2000 minutes to run jobs in the actions per account. So it is not useful for scrapers that run for a long time or very often but a good option for small scrapers.

The minimum cron interval is 5 minutes (maybe you can overcome this by putting more crons).

You have to install the dependencies every time you run the action (which can be time consuming for large projects), but you can cache the dependencies to save time.

The maximum size of the repo (with the whole story) is 100 GB and a file is 100 MB.

TODO

  • [ ]: Scrape old domains.