Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extrapolation of CMEMS initial field #1021

Open
SCLaan opened this issue Oct 3, 2024 · 4 comments
Open

Extrapolation of CMEMS initial field #1021

SCLaan opened this issue Oct 3, 2024 · 4 comments

Comments

@SCLaan
Copy link

SCLaan commented Oct 3, 2024

The inter- and extrapolation of data in dfm_tools.modelbuilder.cmems_nc_to_ini can lead to stripes in resulting initial field. Large sudden jumps in salinity and temperature can result in unstable simulations that are not able to run through their spin-up.

MWE (using cmems_nc_to_ini)

import os
import xarray as xr

import hydrolib.core.dflowfm as hcdfm
import dfm_tools as dfmt

#%% input
dir_output = r'p:\11209715-rnbs\hydro\forcings\initial_field'
t_start    = '20140101' 
dir_pattern     = r'p:\11209715-rnbs\hydro\forcings\NBS-FM_forcing\data\cmems_reanalysis\cmems_{ncvarname}_*.nc'
list_quantities = ['salinitybnd','temperaturebnd']

#%% make initial field
ext_file_old = os.path.join(dir_output, 'CMEMSini_old.ext')
ext_old = hcdfm.ExtOldModel()

ext_old = dfmt.cmems_nc_to_ini(ext_old=ext_old,
                               dir_output=dir_output,
                               list_quantities=list_quantities,
                               tstart=t_start,
                               dir_pattern=dir_pattern)

#%% plot result
file_nc  = str(ext_old.forcing[0].filename.filepath)
data_xr = xr.open_dataset(file_nc)

data_xr['so'].isel(time=-1,depth=1).plot()

image

Current behavior

Currently, the following steps are taken in cmems_nc_to_ini:

  1. Interpolation of NaNs in horizontal plane (first y, then x):
    data_xr = data_xr.interpolate_na(dim='latitude').interpolate_na(dim='longitude')
  2. Extrapolation of data in horizontal plane (first y, then x):
    data_xr = data_xr.ffill(dim='latitude').bfill(dim='latitude')
    data_xr = data_xr.ffill(dim='longitude').bfill(dim='longitude')
  3. Extrapolation of data in vertical:
    data_xr = data_xr.ffill(dim='depth').bfill(dim='depth')

Desired behavior

  1. Extrapolation of data in vertical.
    data_xr = data_xr.ffill(dim='depth').bfill(dim='depth')
  2. Interpolation of data in horizontal:
    • Using nearest neighbor, not linear interpolation.
      data_xr = data_xr.interpolate_na(dim='latitude', method='nearest').interpolate_na(dim='longitude', method='nearest')
    • Ideally, using a triangulation, instead of an interpolation per dimension.
      TO DO
  3. Extrapolation of data in horizontal.
    data_xr = data_xr.ffill(dim='latitude').bfill(dim='latitude')
    data_xr = data_xr.ffill(dim='longitude').bfill(dim='longitude')

Reversing the order of these steps, results in a slightly better initial field:
image
The next (missing) step is to use a triangulation instead of data_xr.interpolate_na per dimension.

@veenstrajelmer
Copy link
Collaborator

veenstrajelmer commented Oct 3, 2024

Some ideas of @Huite.

Related xarray issue: pydata/xarray#6360 >> 3 useful comments by @Huite with examples.

Related imod implementation (but use apply_ufunc instead of loading the data arrays): https://deltares.github.io/imod-python/api/generated/prepare/imod.prepare.fill.html. This uses https://docs.scipy.org/doc/scipy/reference/generated/scipy.ndimage.distance_transform_edt.html

Or preferably Laplace interpolation like in xugrid but it requires tweaking settings there in some cases: https://deltares.github.io/xugrid/api/xugrid.UgridDataArrayAccessor.laplace_interpolate.html

@SCLaan
Copy link
Author

SCLaan commented Oct 14, 2024

Another option is to use rioxarray. This code is relatively quick:

data_xr.rio.write_crs('EPSG:4326', inplace=True) # Set coordinate system (assume WGS84)
for var in data_xr.data_vars:
    data_xr[var].rio.write_nodata(np.nan, inplace=True) # Set nodata for all variables
interpolated = []
for t in range(data_xr[var].shape[0]):
    interpolated.append(data_xr.isel(time=t).rio.interpolate_na(method='nearest'))
data_xr = xr.concat(interpolated, dim='time')

Or in less lines of code, but with the same for-loops:

data_xr.rio.write_crs('EPSG:4326', inplace=True) # Set coordinate system (assume WGS84)
[data_xr[var].rio.write_nodata(np.nan, inplace=True) for var in data_xr.data_vars] # Set nodata for all variables
data_xr = xr.concat([data_xr.isel(time=t).rio.interpolate_na(method='nearest') for t in range(data_xr[next(iter(data_xr.data_vars))].shape[0])], dim='time')

Image

@veenstrajelmer
Copy link
Collaborator

veenstrajelmer commented Oct 14, 2024

Nice, result looks good! But since it is xarray, I would expect one could do this without loops, or not? If might not matter too much since we are looking initial netcdf files here, containing two (and in the future hopefully just one) timestep. But for-loops should in general be avoided, also since we probably want to apply this in a modular function, for instance to interpolate/extrapolate datasets for bc-files. Performance will be an issue there in case of for-loops. Furthermore, I wonder how to approach the depth dimension.

@Huite also created an example for rioxarray with apply_ufunc, might be good to check this out: pydata/xarray#6360 (comment). Before I only pointed to the issue itself, not to the individual comments.

@Huite
Copy link

Huite commented Oct 14, 2024

The primary benefit of apply_ufunc is that it'll work for as many additional dimensions as you provide.

Note that vectorized=True though, so from the apply_ufunc docs:

vectorize (bool, optional) – If True, then assume func only takes arrays defined over core dimensions as input and vectorize it automatically with numpy.vectorize().

In the numpy vectorize documentation:

The vectorize function is provided primarily for convenience, not for performance. The implementation is essentially a for loop.

I'm guessing the apply_ufunc is still more efficient, primarily because it can skip a lot of alignment checks on the coordinates, which xr.concat does incur.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants