Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

deal with CDS API and format change #13

Open
mbjones opened this issue Sep 18, 2024 · 1 comment
Open

deal with CDS API and format change #13

mbjones opened this issue Sep 18, 2024 · 1 comment
Labels
data management Data management

Comments

@mbjones
Copy link
Contributor

mbjones commented Sep 18, 2024

CDS seems to have made a major change to their service and is discontinuing the old API on September 26, 2024. They have a page describing the changes here:

https://confluence.ecmwf.int/display/CKB/Please+read%3A+CDS+and+ADS+migrating+to+new+infrastructure%3A+Common+Data+Store+%28CDS%29+Engine

I was able to create a new account, and the API looks to be the same (more or less), so I suspect the download code in cdstool.py will migrate fine once we update the python client library. But, it looks like they changed the format of the netcdf files significantly for the ERA5 dataset (ug). Here's the page describing the format differences:

https://confluence.ecmwf.int/display/CKB/GRIB+to+netCDF+conversion+on+CDS-Beta+and+ADS-Beta

That page describes the following as the main NetCDF differences for the ERA5 dataset:

  • NetCDF3 → NetCDF4 (including compression options)
  • Changes to metadata attributes in files
  • ordering of dimensions
  • changes to time dimension names
  • Splitting of files when incompatibilities are detected
  • Format and metadata will be consistent with the post-processed data (e.g. daily statistics)

I suspect these changes could have an impact on how tiletool.py works and how we build our integrated dataset.

@mbjones mbjones added the data management Data management label Sep 18, 2024
@mbjones
Copy link
Contributor Author

mbjones commented Sep 18, 2024

I downloaded the new client tool (cdsapi 0.7.3) and updated my API keys, and verified that cdstool still handles the downloads properly. I uploaded several example files of the new format to our shared drive folder here:

datateam:/data/HWITW/input/cds_era5/2024/10m_u_component_of_wind/global-2024-001-10m_u_component_of_wind.nc

The metadata for one of these new format netcdf files is:

netcdf global-2024-001-10m_u_component_of_wind {
dimensions:
	valid_time = 24 ;
	latitude = 721 ;
	longitude = 1440 ;
variables:
	int64 number ;
		string number:long_name = "ensemble member numerical id" ;
		string number:units = "1" ;
		string number:standard_name = "realization" ;
	int64 valid_time(valid_time) ;
		string valid_time:long_name = "time" ;
		string valid_time:standard_name = "time" ;
		string valid_time:units = "seconds since 1970-01-01" ;
		string valid_time:calendar = "proleptic_gregorian" ;
	double latitude(latitude) ;
		latitude:_FillValue = NaN ;
		string latitude:units = "degrees_north" ;
		string latitude:standard_name = "latitude" ;
		string latitude:long_name = "latitude" ;
		string latitude:stored_direction = "decreasing" ;
	double longitude(longitude) ;
		longitude:_FillValue = NaN ;
		string longitude:units = "degrees_east" ;
		string longitude:standard_name = "longitude" ;
		string longitude:long_name = "longitude" ;
	string expver(valid_time) ;
	float u10(valid_time, latitude, longitude) ;
		u10:_FillValue = NaNf ;
		u10:GRIB_paramId = 165LL ;
		string u10:GRIB_dataType = "an" ;
		u10:GRIB_numberOfPoints = 1038240LL ;
		string u10:GRIB_typeOfLevel = "surface" ;
		u10:GRIB_stepUnits = 1LL ;
		string u10:GRIB_stepType = "instant" ;
		string u10:GRIB_gridType = "regular_ll" ;
		u10:GRIB_uvRelativeToGrid = 0LL ;
		u10:GRIB_NV = 0LL ;
		u10:GRIB_Nx = 1440LL ;
		u10:GRIB_Ny = 721LL ;
		string u10:GRIB_cfName = "unknown" ;
		string u10:GRIB_cfVarName = "u10" ;
		string u10:GRIB_gridDefinitionDescription = "Latitude/Longitude Grid" ;
		u10:GRIB_iDirectionIncrementInDegrees = 0.25 ;
		u10:GRIB_iScansNegatively = 0LL ;
		u10:GRIB_jDirectionIncrementInDegrees = 0.25 ;
		u10:GRIB_jPointsAreConsecutive = 0LL ;
		u10:GRIB_jScansPositively = 0LL ;
		u10:GRIB_latitudeOfFirstGridPointInDegrees = 90. ;
		u10:GRIB_latitudeOfLastGridPointInDegrees = -90. ;
		u10:GRIB_longitudeOfFirstGridPointInDegrees = 0. ;
		u10:GRIB_longitudeOfLastGridPointInDegrees = 359.75 ;
		u10:GRIB_missingValue = 3.40282346638529e+38 ;
		string u10:GRIB_name = "10 metre U wind component" ;
		string u10:GRIB_shortName = "10u" ;
		u10:GRIB_totalNumber = 0LL ;
		string u10:GRIB_units = "m s**-1" ;
		string u10:long_name = "10 metre U wind component" ;
		string u10:units = "m s**-1" ;
		string u10:standard_name = "unknown" ;
		u10:GRIB_surface = 0. ;
		string u10:coordinates = "number valid_time latitude longitude expver" ;

// global attributes:
		string :GRIB_centre = "ecmf" ;
		string :GRIB_centreDescription = "European Centre for Medium-Range Weather Forecasts" ;
		:GRIB_subCentre = 0LL ;
		string :Conventions = "CF-1.7" ;
		string :institution = "European Centre for Medium-Range Weather Forecasts" ;
		string :history = "2024-09-18T01:58 GRIB to CDM+CF via cfgrib-0.9.14.0/ecCodes-2.36.0 with {\"source\": \"data.grib\", \"filter_by_keys\": {\"stream\": [\"oper\"]}, \"encode_cf\": [\"parameter\", \"time\", \"geography\", \"vertical\"]}" ;
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
data management Data management
Projects
None yet
Development

No branches or pull requests

1 participant