Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IndexError: list index out of range #3

Open
celestinoxp opened this issue Aug 16, 2022 · 5 comments
Open

IndexError: list index out of range #3

celestinoxp opened this issue Aug 16, 2022 · 5 comments

Comments

@celestinoxp
Copy link

celestinoxp commented Aug 16, 2022

Hi,
scrape is not working. i use python 3.9 on windows 11. Can you help?

IndexError Traceback (most recent call last)
Input In [19], in <cell line: 2>()
3 data = requests.get(standings_url)
4 soup = BeautifulSoup(data.text)
----> 5 standings_table = soup.select('table.stats_table')[0]
7 links = [l.get("href") for l in standings_table.find_all('a')]
8 links = [l for l in links if '/squads/' in l]

IndexError: list index out of range

@SangeethsivanSivakumar
Copy link

I have the same issue when I try to run it

years = list(range(2023, 2020, -1))
all_matches = []

standings_url = "https://fbref.com/en/comps/9/Premier-League-Stats"

import time
for year in years:
    data = requests.get(standings_url)
    soup = BeautifulSoup(data.text)
    standings_table = soup.select('table.stats_table')[0]

    links = [l.get("href") for l in standings_table.find_all('a')]
    links = [l for l in links if '/squads/' in l]
    team_urls = [f"https://fbref.com{l}" for l in links]
    
    previous_season = soup.select("a.prev")[0].get("href")
    standings_url = f"https://fbref.com{previous_season}"
    import time
    for team_url in team_urls:
        team_name = team_url.split("/")[-1].replace("-Stats", "").replace("-", " ")
        data = requests.get(team_url)
        matches = pd.read_html(data.text, match="Scores & Fixtures")[0]
        soup = BeautifulSoup(data.text)
        links = [l.get("href") for l in soup.find_all('a')]
        links = [l for l in links if l and 'all_comps/shooting/' in l]
        data = requests.get(f"https://fbref.com{links[0]}")
        shooting = pd.read_html(data.text, match="Shooting")[0]
        shooting.columns = shooting.columns.droplevel()
        time.sleep(60)
        try:
            team_data = matches.merge(shooting[["Date", "Sh", "SoT", "Dist", "FK", "PK", "PKatt"]], on="Date")
        except ValueError:
            continue
        team_data = team_data[team_data["Comp"] == "Premier League"]
        
        team_data["Season"] = year
        team_data["Team"] = team_name
        all_matches.append(team_data)
        time.sleep(60)

IndexError Traceback (most recent call last)
Input In [30], in <cell line: 2>()
3 data = requests.get(standings_url)
4 soup = BeautifulSoup(data.text)
----> 5 standings_table = soup.select('table.stats_table')[0]
7 links = [l.get("href") for l in standings_table.find_all('a')]
8 links = [l for l in links if '/squads/' in l]

IndexError: list index out of range

@saad-24
Copy link

saad-24 commented Aug 28, 2022

@SangeethsivanSivakumar @celestinoxp Please share whole code

@TrudeauOkech
Copy link

Hey @celestinoxp @VikParuchuri @SangeethsivanSivakumar , did you manage to fix this problem? I'm having trouble with it right now

@scarecrow165
Copy link

scarecrow165 commented Jan 23, 2023

I think I have fixed it! Where the For-Statement is, the author has put in a time.sleep(5) for the delay ( to stop the web server from booting you for web scraping). 5sec is not sufficient to stop the webserver from booting you. change it to 15sec. Makes the code VERY VERY slow (took over 10mins to complete the code) , but it will work. I might try using an IP Randomiser later to try and speed it up a little. But changing the 5sec to 15sec will fix it!

@angshumanraj
Copy link

@scarecrow165 still facing the same problem

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants