-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extrapolation of CMEMS initial field #1021
Comments
Some ideas of @Huite. Related xarray issue: pydata/xarray#6360 >> 3 useful comments by @Huite with examples. Related imod implementation (but use Or preferably Laplace interpolation like in xugrid but it requires tweaking settings there in some cases: https://deltares.github.io/xugrid/api/xugrid.UgridDataArrayAccessor.laplace_interpolate.html |
Another option is to use data_xr.rio.write_crs('EPSG:4326', inplace=True) # Set coordinate system (assume WGS84)
for var in data_xr.data_vars:
data_xr[var].rio.write_nodata(np.nan, inplace=True) # Set nodata for all variables
interpolated = []
for t in range(data_xr[var].shape[0]):
interpolated.append(data_xr.isel(time=t).rio.interpolate_na(method='nearest'))
data_xr = xr.concat(interpolated, dim='time') Or in less lines of code, but with the same for-loops: data_xr.rio.write_crs('EPSG:4326', inplace=True) # Set coordinate system (assume WGS84)
[data_xr[var].rio.write_nodata(np.nan, inplace=True) for var in data_xr.data_vars] # Set nodata for all variables
data_xr = xr.concat([data_xr.isel(time=t).rio.interpolate_na(method='nearest') for t in range(data_xr[next(iter(data_xr.data_vars))].shape[0])], dim='time') |
Nice, result looks good! But since it is xarray, I would expect one could do this without loops, or not? If might not matter too much since we are looking initial netcdf files here, containing two (and in the future hopefully just one) timestep. But for-loops should in general be avoided, also since we probably want to apply this in a modular function, for instance to interpolate/extrapolate datasets for bc-files. Performance will be an issue there in case of for-loops. Furthermore, I wonder how to approach the depth dimension. @Huite also created an example for rioxarray with apply_ufunc, might be good to check this out: pydata/xarray#6360 (comment). Before I only pointed to the issue itself, not to the individual comments. |
The primary benefit of Note that
In the numpy vectorize documentation:
I'm guessing the apply_ufunc is still more efficient, primarily because it can skip a lot of alignment checks on the coordinates, which |
The inter- and extrapolation of data in
dfm_tools.modelbuilder.cmems_nc_to_ini
can lead to stripes in resulting initial field. Large sudden jumps in salinity and temperature can result in unstable simulations that are not able to run through their spin-up.MWE (using
cmems_nc_to_ini
)Current behavior
Currently, the following steps are taken in
cmems_nc_to_ini
:data_xr = data_xr.interpolate_na(dim='latitude').interpolate_na(dim='longitude')
data_xr = data_xr.ffill(dim='latitude').bfill(dim='latitude')
data_xr = data_xr.ffill(dim='longitude').bfill(dim='longitude')
data_xr = data_xr.ffill(dim='depth').bfill(dim='depth')
Desired behavior
data_xr = data_xr.ffill(dim='depth').bfill(dim='depth')
data_xr = data_xr.interpolate_na(dim='latitude', method='nearest').interpolate_na(dim='longitude', method='nearest')
TO DO
data_xr = data_xr.ffill(dim='latitude').bfill(dim='latitude')
data_xr = data_xr.ffill(dim='longitude').bfill(dim='longitude')
Reversing the order of these steps, results in a slightly better initial field:
The next (missing) step is to use a triangulation instead of
data_xr.interpolate_na
per dimension.The text was updated successfully, but these errors were encountered: