-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
dist_func in skf.Variogram can support callables #130
Comments
Using your own function can cause issues with the OrdianryKrigging in the init method. Overriding this solved that issue, but presented another in the next line. Setting the self.dict_metric the "dist_func" property produced by the Variogram.describe() throws an error as the class is not aware of the custom distance function I used in Variogram. To fix this, I overrode the init method, simply changed "self.dist_metric = variogram["dist_func"]" on line 127 to "self.dist_metric = self.distance_btwn_coords2" (which is the name of my custom distance function) and then added the distance funciton to my child class. from skgstat.Variogram import Variogram
from skgstat.MetricSpace import MetricSpace, MetricSpacePair
class ZestyOrdinaryKriging(OrdinaryKriging):
def __init__(
self,
variogram,
min_points=5,
max_points=15,
mode='exact',
precision=100,
solver='inv',
n_jobs=1,
perf=False,
sparse=False,
coordinates=None,
values=None
):
"""
Overrriding init method due to issue with "self.dist_metric = variogram["dist_func"]"
This was caused by passing a function in the variogram class instead of using a pre-defined
distance metric (typically refenced by a string)
"""
# store arguments to the instance
if isinstance(variogram, Variogram):
if coordinates is None:
coordinates = variogram.coordinates
if values is None:
values = variogram.values
variogram_descr = variogram.describe()
if variogram_descr["model"] == "harmonize":
variogram_descr["model"] = variogram._build_harmonized_model()
variogram = variogram_descr
self.sparse = sparse
# general attributes
self._minp = min_points
self._maxp = max_points
self.min_points = min_points
self.max_points = max_points
# general settings
self.n_jobs = n_jobs
self.perf = perf
self.range = variogram['effective_range']
self.nugget = variogram['nugget']
self.sill = variogram['sill']
self.dist_metric = self.distance_btwn_coords2
# coordinates and semivariance function
if not isinstance(coordinates, MetricSpace):
coordinates, values = self._remove_duplicated_coordinates(coordinates, values)
coordinates = MetricSpace(coordinates.copy(), self.dist_metric, self.range if self.sparse else None)
else:
assert self.dist_metric == coordinates.dist_metric, "Distance metric of variogram differs from distance metric of coordinates"
assert coordinates.max_dist is None or coordinates.max_dist == self.range, "Sparse coordinates must have max_dist == variogram.effective_range"
self.values = values.copy()
self.coords = coordinates
self.gamma_model = Variogram.fitted_model_function(**variogram)
self.z = None
# calculation mode; self.range has to be initialized
self._mode = mode
self._precision = precision
self._prec_dist = None
self._prec_g = None
self.mode = mode
self.precision = precision
# solver settings
self._solver = solver
self._solve = None
self.solver = solver
# initialize error counter
self.singular_error = 0
self.no_points_error = 0
self.ill_matrix = 0
# performance counter
if self.perf:
self.perf_dist = list()
self.perf_mat = list()
self.perf_solv = list()
@classmethod
def _remove_duplicated_coordinates(cls, coords, values):
"""
Override this method to handle Shapely Point Coordinates.
Extract the coordinates and values
The coordinates array is checked for duplicates and only the
first instance of a duplicate is used. Duplicated coordinates
would result in duplicated rows in the variogram matrix and
make it singular.
"""
c = coords
v = values
unique_idx = cls.unique_indices(coords)
return c[unique_idx], v[unique_idx]
def unique_indices(arr):
unique_vals = []
unique_indices = []
for idx, val in enumerate(arr):
if val[0] not in unique_vals:
unique_vals.append(val)
unique_indices.append(idx)
del unique_vals
return unique_indices
def distance_btwn_coords2(p_i, p_j):
p_i=p_i[0]
p_j=p_j[0]
return geodesic((p_i.y, p_i.x), (p_j.y, p_j.x)).km |
Hey, thanks for sharing all this. Mirko |
What is the advantage of a custom distance function over projecting to a cartesian plane @Megil-Zesty ? Specifically in GIS applications, reprojecting to some UTM zone should be preferable, unless you're working with extremely large areas (a whole continent or so)? If for nothing else: function calls and custom math for pairs of points is going to be much slower than the builtin l2 norm + a call to pyproj with the vector of points at the beginning. |
@redhog I didn't consider the performance drop from using a custom distance function - I originally went that route because I expected it to be simpler than reprojecting from WGS84 (though it obviously ended up causing issues downstream within the Krigging module). I've since tried projecting to UTM and it is much faster and gives the proper units that I was looking for in the Variogram. Thanks for the suggestion! I'm coming from a background outside of geospatial analysis so this was new to me. |
The documentation in init (and online) suggests that we have to use one of the predefined distance functions; however, pdist supports passing functions (instead of just names of common distance functions).
I was able to pass a function in the initialization of the Variogram which enabled me to handle calculating distance between Shapely Point objects containing lat/lon coordinates.
Overall, super happy about this! Fixing the documentation may help others in the future.
The text was updated successfully, but these errors were encountered: