You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I find that for scrapping a mediawiki site,zimit takes 5-6 hours , is there some recommended setting for scraping a mediawiki site for zimit?
To reduce scrapping time how can we do delta scrapping using zimit so that we just scrap the changed web pages and add it to the original zim ?
The text was updated successfully, but these errors were encountered:
nish2482
changed the title
Question : How to Perform incremental scan sicne scrapping take quite a lot of time and some pages on website do not change
How to Perform incremental scrapping of Mediawiki sites ?
Jul 19, 2024
nish2482
changed the title
How to Perform incremental scrapping of Mediawiki sites ?
How to perform incremental scrapping of Mediawiki sites ?
Jul 19, 2024
Problem with mediawikis is that all revision pages are grabbed one by one. This is probably not something you're interested in, you can probably set an exclude parameter to exclude revision history URLs (never tried, but should work). It is also important to note that in order to ZIM a mediawiki it is preferable to use mwoflliner scraper which is specifically tailored to ZIM a mediawiki.
That being said, the problem of incrementally scrapping a site is still relevant for many other cases, and for now there is no real solution in place. And it is probably not going to be something straightforward to implement.
benoit74
changed the title
How to perform incremental scrapping of Mediawiki sites ?
How to perform incremental scrapping of websites ?
Jul 19, 2024
I find that for scrapping a mediawiki site,zimit takes 5-6 hours , is there some recommended setting for scraping a mediawiki site for zimit?
To reduce scrapping time how can we do delta scrapping using zimit so that we just scrap the changed web pages and add it to the original zim ?
The text was updated successfully, but these errors were encountered: