Skip to content
Adilson Carvalho edited this page Nov 17, 2016 · 4 revisions

Installation

See our Dockerfile to get how to install it. Other good resource on how to install it can be found on Scrapy Installation Guide

Creating a spider

docker run --rm -v "$(pwd):/scrapy" barateza-nfcrawler sh
scrapy startproject nfcrawler
cd nfcrawler
scrapy genspider pr-nfce dfeportal.fazenda.pr.gov.br

Running the spider

scrapy crawl pr-nfce

Running an interactive shell on a spider

As it is defined on Scrapy's Documentation

After you finished using it just type CTRL+D to exit.

import scrapy

class MySpider(scrapy.Spider):
    name = "myspider"
    start_urls = [
        "http://example.com",
        "http://example.org",
        "http://example.net",
    ]

    def parse(self, response):
        # We want to inspect one specific response.
        if ".org" in response.url:
            from scrapy.shell import inspect_response
            inspect_response(response, self)

        # Rest of parsing code.
Clone this wiki locally