This project scrapes "Data Analyst" job listings from LinkedIn, AngelList, Built In NYC and Entertainment Careers. I've narrowed down the search area to include jobs only located in New York City or are remote, and have placed a criteria to look for "entry-level" or "associate" positions. For LinkedIn, I've set a maximum search time-frame of a month till the scrape date.
The main purpose of this project is to demonstrate foundational web-scraping skills using BeautifulSoup and Selenium. As a graduating Senior, I'm currently in the midst of my job-hunt, so I thought it was natural for me to compile a list of relevant job openings that may help my search. This list updates daily (scheduled on PythonAnywhere).
Entertainment Careers and Built In NYC uses BeautifulSoup as they're static web-pages that allow pagination by adjusting HTML parameters. LinkedIn and AngelList are both dynamic websites written in Javascript that utilize some form of infinite scrolling. In this case, Selenium is used to automate web interactions - scrolling down the page and extracting job-features on the go. After scraping all the websites, I've deduplicated any reoccurring job listings, and have exported it to Google sheets via Google Drive's API.
- job-scraper.py: Full code to scrape all 4 websites and dump results to Googe sheets.
- Search Results - Google Sheets: The result of the scrape has been dumped to this google sheets.
- Taku's LinkedIn