Pyhton app for downloading Eurostat tables and prepare them for loading in local MySQL using csv.
Eurostat publishes statistical data in a tsv.gz format, for the bulk download facility.
Eurostat tsv files contain data using distict columns for each time (date). This format is not a "nice" format for using the data in a database. To use the data in a database the data (should/might/could) be reshaped i an row format (transform them from a wide to a long format) .MySQL does not have a "melt" function, and the unpivoting is not completely straightforward. A easier apporach is to unpivot the data before entering them in the database. A simple pandas operation could have been used (i.e. melt) but the desigh choice was to use minimal requirements, as the goal is to load the data in an (My)SQL db and use them from there
While Eurostat has published a number of veryu helpfull tools for manipulating data , a db-> db facility is yet non existest. The rationale behind this script is to load the eurostat tables in a local SQL db, and to be able to perform all the data manipulation locally. Moreover, more often than not, It is necessary to use subset of data from multiple tables. A SQL approach seems a better fit.
The pyhton package has to be built & installed.
- Building the package : python3 -m build
- Installing the package : python3 -m pip ./dist/eurostat2csv-0.5.0-py3-none-any.whl
The package is bundled as a small python app
eurostat2csv table_name
(the script will be installed out of path, so care has to by taken)
- -v Do not update local inventory, if inventory.txt exists, it will not be download again
- -f Force update of TSV file. File will download even if local tsv file has been modified in a later day than last change of eurostat
- -p=path, use custom path
The output is
- tablename.csv
- tablename.sql (sql commands to create table and load csv onto the database)
The data can be fed into a MySQL database with mysql -e "source ./tablename.sql"
- Run test cases for verifying correctness of script
- More internall checks for verifying data is interpreted correctly
- Clean the code
- support for comporessed files