Extrapolation of CMEMS initial field #1021

SCLaan · 2024-10-03T15:01:16Z

The inter- and extrapolation of data in dfm_tools.modelbuilder.cmems_nc_to_ini can lead to stripes in resulting initial field. Large sudden jumps in salinity and temperature can result in unstable simulations that are not able to run through their spin-up.

MWE (using `cmems_nc_to_ini`)

import os
import xarray as xr

import hydrolib.core.dflowfm as hcdfm
import dfm_tools as dfmt

#%% input
dir_output = r'p:\11209715-rnbs\hydro\forcings\initial_field'
t_start    = '20140101' 
dir_pattern     = r'p:\11209715-rnbs\hydro\forcings\NBS-FM_forcing\data\cmems_reanalysis\cmems_{ncvarname}_*.nc'
list_quantities = ['salinitybnd','temperaturebnd']

#%% make initial field
ext_file_old = os.path.join(dir_output, 'CMEMSini_old.ext')
ext_old = hcdfm.ExtOldModel()

ext_old = dfmt.cmems_nc_to_ini(ext_old=ext_old,
                               dir_output=dir_output,
                               list_quantities=list_quantities,
                               tstart=t_start,
                               dir_pattern=dir_pattern)

#%% plot result
file_nc  = str(ext_old.forcing[0].filename.filepath)
data_xr = xr.open_dataset(file_nc)

data_xr['so'].isel(time=-1,depth=1).plot()

Current behavior

Currently, the following steps are taken in cmems_nc_to_ini:

Interpolation of NaNs in horizontal plane (first y, then x):
data_xr = data_xr.interpolate_na(dim='latitude').interpolate_na(dim='longitude')
Extrapolation of data in horizontal plane (first y, then x):
data_xr = data_xr.ffill(dim='latitude').bfill(dim='latitude')
data_xr = data_xr.ffill(dim='longitude').bfill(dim='longitude')
Extrapolation of data in vertical:
data_xr = data_xr.ffill(dim='depth').bfill(dim='depth')

Desired behavior

Extrapolation of data in vertical.
data_xr = data_xr.ffill(dim='depth').bfill(dim='depth')
Interpolation of data in horizontal:
- Using nearest neighbor, not linear interpolation.
  data_xr = data_xr.interpolate_na(dim='latitude', method='nearest').interpolate_na(dim='longitude', method='nearest')
- Ideally, using a triangulation, instead of an interpolation per dimension.
  TO DO
Extrapolation of data in horizontal.
data_xr = data_xr.ffill(dim='latitude').bfill(dim='latitude')
data_xr = data_xr.ffill(dim='longitude').bfill(dim='longitude')

Reversing the order of these steps, results in a slightly better initial field:

The next (missing) step is to use a triangulation instead of data_xr.interpolate_na per dimension.

The text was updated successfully, but these errors were encountered:

veenstrajelmer · 2024-10-03T15:54:10Z

Some ideas of @Huite.

Related xarray issue: pydata/xarray#6360 >> 3 useful comments by @Huite with examples.

Related imod implementation (but use apply_ufunc instead of loading the data arrays): https://deltares.github.io/imod-python/api/generated/prepare/imod.prepare.fill.html. This uses https://docs.scipy.org/doc/scipy/reference/generated/scipy.ndimage.distance_transform_edt.html

Or preferably Laplace interpolation like in xugrid but it requires tweaking settings there in some cases: https://deltares.github.io/xugrid/api/xugrid.UgridDataArrayAccessor.laplace_interpolate.html

SCLaan · 2024-10-14T15:10:24Z

Another option is to use rioxarray. This code is relatively quick:

data_xr.rio.write_crs('EPSG:4326', inplace=True) # Set coordinate system (assume WGS84)
for var in data_xr.data_vars:
    data_xr[var].rio.write_nodata(np.nan, inplace=True) # Set nodata for all variables
interpolated = []
for t in range(data_xr[var].shape[0]):
    interpolated.append(data_xr.isel(time=t).rio.interpolate_na(method='nearest'))
data_xr = xr.concat(interpolated, dim='time')

Or in less lines of code, but with the same for-loops:

data_xr.rio.write_crs('EPSG:4326', inplace=True) # Set coordinate system (assume WGS84)
[data_xr[var].rio.write_nodata(np.nan, inplace=True) for var in data_xr.data_vars] # Set nodata for all variables
data_xr = xr.concat([data_xr.isel(time=t).rio.interpolate_na(method='nearest') for t in range(data_xr[next(iter(data_xr.data_vars))].shape[0])], dim='time')

veenstrajelmer · 2024-10-14T15:37:20Z

Nice, result looks good! But since it is xarray, I would expect one could do this without loops, or not? If might not matter too much since we are looking initial netcdf files here, containing two (and in the future hopefully just one) timestep. But for-loops should in general be avoided, also since we probably want to apply this in a modular function, for instance to interpolate/extrapolate datasets for bc-files. Performance will be an issue there in case of for-loops. Furthermore, I wonder how to approach the depth dimension.

@Huite also created an example for rioxarray with apply_ufunc, might be good to check this out: pydata/xarray#6360 (comment). Before I only pointed to the issue itself, not to the individual comments.

Huite · 2024-10-14T16:34:04Z

The primary benefit of apply_ufunc is that it'll work for as many additional dimensions as you provide.

Note that vectorized=True though, so from the apply_ufunc docs:

vectorize (bool, optional) – If True, then assume func only takes arrays defined over core dimensions as input and vectorize it automatically with numpy.vectorize().

In the numpy vectorize documentation:

The vectorize function is provided primarily for convenience, not for performance. The implementation is essentially a for loop.

I'm guessing the apply_ufunc is still more efficient, primarily because it can skip a lot of alignment checks on the coordinates, which xr.concat does incur.

veenstrajelmer mentioned this issue Sep 27, 2024

Prepare 0.30.0 release #1016

Closed

6 tasks

veenstrajelmer mentioned this issue Oct 20, 2024

Prepare 0.31.0 release #1031

Closed

2 tasks

veenstrajelmer mentioned this issue Oct 28, 2024

Prepare 0.32.0 release #1036

Open

24 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extrapolation of CMEMS initial field #1021

Extrapolation of CMEMS initial field #1021

SCLaan commented Oct 3, 2024 •

edited

Loading

veenstrajelmer commented Oct 3, 2024 •

edited

Loading

SCLaan commented Oct 14, 2024 •

edited by veenstrajelmer

Loading

veenstrajelmer commented Oct 14, 2024 •

edited

Loading

Huite commented Oct 14, 2024 •

edited

Loading

Extrapolation of CMEMS initial field #1021

Extrapolation of CMEMS initial field #1021

Comments

SCLaan commented Oct 3, 2024 • edited Loading

MWE (using cmems_nc_to_ini)

Current behavior

Desired behavior

veenstrajelmer commented Oct 3, 2024 • edited Loading

SCLaan commented Oct 14, 2024 • edited by veenstrajelmer Loading

veenstrajelmer commented Oct 14, 2024 • edited Loading

Huite commented Oct 14, 2024 • edited Loading

SCLaan commented Oct 3, 2024 •

edited

Loading

MWE (using `cmems_nc_to_ini`)

veenstrajelmer commented Oct 3, 2024 •

edited

Loading

SCLaan commented Oct 14, 2024 •

edited by veenstrajelmer

Loading

veenstrajelmer commented Oct 14, 2024 •

edited

Loading

Huite commented Oct 14, 2024 •

edited

Loading