Skip to content

This project is a job data scraper that extracts job applicant details from the German Federal Employment Agency's website. The data is fetched, parsed, and saved in both JSON and Excel formats.

Notifications You must be signed in to change notification settings

faisal-fida/Job-Data-Scraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Job Data Scraper

This project is a job data scraper that extracts job applicant details from the German Federal Employment Agency's website. The data is fetched, parsed, and saved in both JSON and Excel formats.

Table of Contents

  • Prerequisites
  • Installation
  • Configuration
  • Usage
  • Logging
  • Contributing
  • License

Prerequisites

  • Python 3.8 or higher
  • pip (Python package installer)
  • Google Chrome browser

Installation

  1. Clone the repository:

    git clone https://github.com/faisal-fida/job-data-scraper.git
    cd job-data-scraper
  2. Create a virtual environment:

    python -m venv venv
  3. Activate the virtual environment:

    • On Windows:

      venv\Scripts\activate
    • On macOS/Linux:

      source venv/bin/activate
  4. Install the required packages:

    pip install -r requirements.txt
  5. Download the Playwright browser driver:

    python -m playwright install chrome

Configuration

Copy the config.py file to the same directory as app.py and url_fetcher.py.

Usage

  1. Fetch URLs and extract data:

    python app.py
  2. The extracted data will be saved in the

output

directory as data.json and parsed_data.xlsx.

Logging

The application logs its activities, which can be helpful for debugging and monitoring. The logs are printed to the console.

About

This project is a job data scraper that extracts job applicant details from the German Federal Employment Agency's website. The data is fetched, parsed, and saved in both JSON and Excel formats.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages