scraper_dominios

Exploring Github Actions to scrape data from argentina internet domain registrations.

The data is found here, it is named in the form Year Month Day .csv and is updated every night when there is new data in the official bulletin.

The fun part is that the data is updated via a Github Action (found here that runs every night with a cron and commits the new data to the repo if there is any.

Limitations of using Github Actions as scrapers

The biggest limitation is that there is a maximum of 2000 minutes to run jobs in the actions per account. So it is not useful for scrapers that run for a long time or very often but a good option for small scrapers.

The minimum cron interval is 5 minutes (maybe you can overcome this by putting more crons).

You have to install the dependencies every time you run the action (which can be time consuming for large projects), but you can cache the dependencies to save time.

The maximum size of the repo (with the whole story) is 100 GB and a file is 100 MB.

TODO

[ ]: Scrape old domains.

Name		Name	Last commit message	Last commit date
Latest commit History 416 Commits
.github/workflows		.github/workflows
data		data
README.md		README.md
requirements.txt		requirements.txt
scraper.py		scraper.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

scraper_dominios

Limitations of using Github Actions as scrapers

TODO

About

Releases

Packages

Contributors 2

Languages

lbellomo/scraper_dominios

Folders and files

Latest commit

History

Repository files navigation

scraper_dominios

Limitations of using Github Actions as scrapers

TODO

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages