Skip to content

Latest commit

 

History

History
83 lines (53 loc) · 2.08 KB

README.md

File metadata and controls

83 lines (53 loc) · 2.08 KB

Job Data Scraper

This project is a job data scraper that extracts job applicant details from the German Federal Employment Agency's website. The data is fetched, parsed, and saved in both JSON and Excel formats.

Table of Contents

  • Prerequisites
  • Installation
  • Configuration
  • Usage
  • Logging
  • Contributing
  • License

Prerequisites

  • Python 3.8 or higher
  • pip (Python package installer)
  • Google Chrome browser

Installation

  1. Clone the repository:

    git clone https://github.com/faisal-fida/job-data-scraper.git
    cd job-data-scraper
  2. Create a virtual environment:

    python -m venv venv
  3. Activate the virtual environment:

    • On Windows:

      venv\Scripts\activate
    • On macOS/Linux:

      source venv/bin/activate
  4. Install the required packages:

    pip install -r requirements.txt
  5. Download the Playwright browser driver:

    python -m playwright install chrome

Configuration

Copy the config.py file to the same directory as app.py and url_fetcher.py.

Usage

  1. Fetch URLs and extract data:

    python app.py
  2. The extracted data will be saved in the

output

directory as data.json and parsed_data.xlsx.

Logging

The application logs its activities, which can be helpful for debugging and monitoring. The logs are printed to the console.