Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

crawler_valor #85

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open

crawler_valor #85

wants to merge 3 commits into from

Conversation

veniciusgrjr
Copy link

I've created this crawler for Valor. Please check if it's ok. I try to take in count the coments on @lucasmachadorj pull request.

index = requests.get(INDEX_URL).content
soup = BeautifulSoup(index, "lxml")
news_index = soup.find(id="block-valor_capa_automatica-central_automatico").find_all('h2')
news_urls = news_urls + ['http://www.valor.com.br' + BeautifulSoup( art.encode('utf8') , "lxml" ).find('a').attrs['href'] for art in news_index]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be a good idea to break this line to make it more readable. PEP8 suggests 72 with a maximum of 79. I like to follow that whenever possible.

@flavioamieiro
Copy link
Member

I think that, apart from the really small issues I pointed out in line comments, the code looks good and we should merge it.

@flavioamieiro
Copy link
Member

Also, another really important thing: please use 4 spaces instead of tab. I have no real problem with tabs, but mixing spaces and tabs are a bad idea, and our entire code base is using spaces. If you use vim you can use spaces by adding

    set tabstop=4
    set shiftwidth=4
    set expandtab

To your .vimrc.

from bs4 import BeautifulSoup
import requests
import re
import pandas as pd
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need to import re nor pandas. It's a good idea to remove these imports.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants