This project scrapes a GSoC project archive url and converts this data in a csv file. this project also filters the students of IIT Kanpur from this scraped data.
The default webpage to be scraped is the GSoC-'19 archive if you wish to scrape another Google Summer of Code archive enter the relevant URL when prompted.
The 'student.json' file is used as given.
Use the package manager pip to install requests and bs4 packages
pip install requests
pip install bs4
First enter the 'scrape' directory.
cd scrape
Then execute the following
python scrape_data.py
python sanitize_and_combine_data.py