Releases: openzim/zimit
Releases · openzim/zimit
1.5.0
1.4.1
1.4.0
Added
--title
to set ZIM title--description
to set ZIM description- New crawler options:
--maxPageLimit
,--delay
,--diskUtilization
--zim-lang
param to set warc2zim's--lang
(ISO-639-3)
Changed
- Using browsertrix-crawler 0.10.2
- Default and accepted values for
--waitUntil
from crawler's update - Using warc2zim
1.5.2
- Disabled Chrome updates to prevent incidental inclusion of update data in WARC/ZIM (#172)
--failOnFailedSeed
used inconditionally--lang
now passed to crawler (ISO-639-1)
Removed
--newContext
from crawler's update
1.3.1
1.3.0
Added
- Initial url check normalizes homepage redirects to standart ports – 80/443 (#137)
Changed
- Using warc2zim version 1.5.0 with scope conflict fix and videos fix
- Using browsertrix-crawler 0.8.0-beta.1
- Fixed
--allowHashUrls
being a boolean param - Increased
check_url
timeout (12s to connect, 27s to read) instead of 10s
1.2.0
Added
--urlFile
browsertrix crawler parameter--depth
browsertrix crawler parameter--extraHops
, parameter--collection
browsertrix crawler parameter--allowHashUrls
browsertrix crawler parameter--userAgentSuffix
browsertrix crawler parameter--behaviors
, parameter--behaviorTimeout
browsertrix crawler parameter--profile
browsertrix crawler parameter--sizeLimit
browsertrix crawler parameter--timeLimit
browsertrix crawler parameter--healthCheckPort
, parameter--overwrite
parameter
Changed
- using browsertrix-crawler
0.6.0
and warc2zim1.4.2
- default WARC location after crawl changed
fromcollections/capture-*/archive/
tocollections/crawl-*/archive/
Removed
--scroll
browsertrix crawler parameter (see--behaviors
)--scope
browsertrix crawler parameter (see--scopeType
,--include
and--exclude
)