This project is a job data scraper that extracts job applicant details from the German Federal Employment Agency's website. The data is fetched, parsed, and saved in both JSON and Excel formats.
- Prerequisites
- Installation
- Configuration
- Usage
- Logging
- Contributing
- License
- Python 3.8 or higher
- pip (Python package installer)
- Google Chrome browser
-
Clone the repository:
git clone https://github.com/faisal-fida/job-data-scraper.git cd job-data-scraper
-
Create a virtual environment:
python -m venv venv
-
Activate the virtual environment:
-
On Windows:
venv\Scripts\activate
-
On macOS/Linux:
source venv/bin/activate
-
-
Install the required packages:
pip install -r requirements.txt
-
Download the Playwright browser driver:
python -m playwright install chrome
Copy the config.py file to the same directory as app.py and url_fetcher.py.
-
Fetch URLs and extract data:
python app.py
-
The extracted data will be saved in the
output
directory as data.json
and parsed_data.xlsx
.
The application logs its activities, which can be helpful for debugging and monitoring. The logs are printed to the console.