-
Notifications
You must be signed in to change notification settings - Fork 1
Scrapy
Adilson Carvalho edited this page Nov 17, 2016
·
4 revisions
See our Dockerfile to get how to install it. Other good resource on how to install it can be found on Scrapy Installation Guide
docker run --rm -v "$(pwd):/scrapy" barateza-nfcrawler sh
scrapy startproject nfcrawler
cd nfcrawler
scrapy genspider pr-nfce dfeportal.fazenda.pr.gov.br
scrapy crawl pr-nfce
As it is defined on Scrapy's Documentation
After you finished using it just type CTRL+D to exit.
import scrapy
class MySpider(scrapy.Spider):
name = "myspider"
start_urls = [
"http://example.com",
"http://example.org",
"http://example.net",
]
def parse(self, response):
# We want to inspect one specific response.
if ".org" in response.url:
from scrapy.shell import inspect_response
inspect_response(response, self)
# Rest of parsing code.