Skip to content

Scrape 3.5 GB of tabular data from Kazhydromet website. Tech: Python (selenium, os, pandas, shutil, nominatim), HTML/CSS

Notifications You must be signed in to change notification settings

SaniyaAbushakimova/Kazhydromet-Web-Scraping

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Project completed on August 24, 2023.

Project description

Kazhydromet serves as the national hydrometeorological agency of the Republic of Kazakhstan, offering comprehensive hydrological data sourced from 227 stations distributed across the country.

The Kazhydromet-Web-Scraping automates the retrieval of data from Kazhydromet's Meteorological Database spanning from 01/01/2000 to 30/04/2023. Manually handling this would be extremely time-consuming because the database contains 3.5 GB of tabular data, which is a very large number of tables. This process was automated using Python and Selenium framework.

Overview of the Kazhydromet Database

About the website:

  • The website URL is: https://www.kazhydromet.kz/ru/. By default, the language is set to Russian, but it can be switched to English.
  • Following the guide below, we can access the target database for scraping:

Guide 1

About the database:

  • The database includes meteorological indices such as Temperature, Partial Pressure, Relative Humidity, and 8 other indices.
  • Data is categorized by regions (17 in total across Kazakhstan) and respective stations within each region (totaling 227 stations).
  • Our goal is to scrape data from 01/01/2000 to 30/04/2023.
  • Each table contains approximately 8,521 entries, each weighing around 800 KB. Below is a screenshot of the database with comments.

Guide 2

Other details

  • To launch the script:
python .\download_data.py
  • kazgydromet_data/темп/возд -- a sample of the data that was successfully scraped from the Kazhydromet Database.
  • station_geocode/nominatim.ipynb -- provides geographic coordinates for each hydrological station run by Kazhydromet.

About

Scrape 3.5 GB of tabular data from Kazhydromet website. Tech: Python (selenium, os, pandas, shutil, nominatim), HTML/CSS

Topics

Resources

Stars

Watchers

Forks