Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document middleware design and behavior #4

Open
redapple opened this issue Jun 28, 2016 · 2 comments
Open

Document middleware design and behavior #4

redapple opened this issue Jun 28, 2016 · 2 comments

Comments

@redapple
Copy link
Contributor

See scrapinghub/scrapylib#45 (comment) for motivation.

It can be counter-intuitive for newcomers that the middleware will let the spider revisit pages if they did not produce any item.

@kmike
Copy link
Member

kmike commented Mar 3, 2017

FTR: I've recently created a middleware similar to deltafetch, but which is more explicit: https://github.com/TeamHG-Memex/scrapy-crawl-once. It does a similar thing, but in a less automatic way - user needs to set request.meta['crawl_once'] = True. I considered contributing to scrapy-deltafetch instead, but implementations have almost nothing in common (sqlite vs bsddb, items vs meta keys, different options).

@arunsayone
Copy link

arunsayone commented Oct 9, 2017

@redapple , Hi I am new here. I have a project in which i used deltafetch,
is there a way, we can specify main url and some sub urls that the spider visit again ?
I am using my spider to scrape data periodically. Can you please help me ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants