-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Improve the general usage docs and move the advanced usage to a new s…
…ection
- Loading branch information
Wesley van Lee
committed
Oct 28, 2024
1 parent
9ea5f71
commit 24873a3
Showing
3 changed files
with
36 additions
and
29 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,26 @@ | ||
# Advanced usage | ||
|
||
## Crawling | ||
|
||
### Iterating a WACZ archive index | ||
|
||
Going around the default behaviour of the spider, the `WaczCrawlMiddleware` spider middleware will, when enabled, replace the crawl by an iteration through all the entries in the WACZ archive index. | ||
|
||
To use this strategy, enable both middlewares in the spider settings like so: | ||
|
||
```python | ||
DOWNLOADER_MIDDLEWARES = { | ||
"scrapy_webarchive.downloadermiddlewares.WaczMiddleware": 543, | ||
} | ||
|
||
SPIDER_MIDDLEWARES = { | ||
"scrapy_webarchive.spidermiddlewares.WaczCrawlMiddleware": 543, | ||
} | ||
``` | ||
|
||
Then define the location of the WACZ archive with `SW_WACZ_SOURCE_URI` setting: | ||
|
||
```python | ||
SW_WACZ_SOURCE_URI = "s3://scrapy-webarchive/archive.wacz" | ||
SW_WACZ_CRAWL = True | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -9,4 +9,5 @@ nav: | |
- Introduction: index.md | ||
- installation.md | ||
- usage.md | ||
- Advanced Usage: advanced_usage.md | ||
- settings.md |