-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Haversine Distance for Geographic Coordinates #108
Comments
Hey there, def haversine(p1, p2) -> float:
pass Where or with Rosetta code's solution: from math import radians, sin, cos, sqrt, asin
def haversine(p1, p2):
# my adaption to fit the function signature
lat1, lon1 = p1
lat2, lon2 = p2
# original
R = 6372.8 # Earth radius in kilometers
dLat = radians(lat2 - lat1)
dLon = radians(lon2 - lon1)
lat1 = radians(lat1)
lat2 = radians(lat2)
a = sin(dLat / 2)**2 + cos(lat1) * cos(lat2) * sin(dLon / 2)**2
c = 2 * asin(sqrt(a))
return R * c For performance reasons, you might want to implement a numpy solution. Maybe as a bit of background and for others that might come across this issue: The haversine formula is pretty straightforward, but tends to become quite imprecise, especially on small scales, which I am personally usually working on. An alternative to haversine is Vincenty's solution, which is more accurate, as far as I know. Finally, I am happy to take a pull request if you want to contribute your solution as a default option to scikit-gstat. Mirko |
Much thanks for your reply Mirko, I scrapped the following function to use numpy to run the great circle distance calculation to pass as a callable to dist_func: `def haversine(p1, p2):
However it appears to be running quite slowly compared to the standard Euclidean distance (running on ~30k coords). Looking into it, I am wondering if this is because euclidean distance is able to take advantage of a cKDTree, whereas my custom function is doing a brute search? If so I am wondering if there would be any way to use something like sklearn.neighbors.BallTree (which supports haversine distance) to speed up the process? |
Hey there, Yes, the Variogram class is using an instance of I think, generally, it should be possible to use a scikit-gstat/skgstat/MetricSpace.py Line 124 in 2819ab4
while the class is explicitly checking for euclidean distance here: scikit-gstat/skgstat/MetricSpace.py Line 116 in 2819ab4
to raise an Error if a scikit-gstat/skgstat/MetricSpace.py Line 138 in 2819ab4
to check whether a maximum distance is set. Otherwise, pdist would be used. My first thoughts here:
The last check (line 138) however, would need some adaptions to the logic. Here, the existing check could be kept and a new elif is needed to check for haversine or anything else BallTree supports. The else could then be kept as is. If you would like to contribute this, I am happy to take any pull request, otherwise, I can try to implement this soon and then a review and test by you would be highly appreciated. Best |
My two cents on this topic: To overcome this problem, one could use the associated Yadrenko-model for a given spatial 3D variogram model. This uses the chordal distance derived from the geographical coordinates. We had several discussions about that in PyKrige and GSTools:
When using haversine in standard models and using them later on for building up a kriging matrix, this could result in undefined behavior since there the "conditional negative semidefinite" part is crucial. Cheers, |
@mmaelicke you could think about implementing a switch to use the assoziated yadrenko model, that is able to use the haversine distance: Starting with a valid isotropic model in 3D, where the covariance between two points is given as a function of their distance: cov(x1, x2) = func(dist(x1, x2)) We can construct the related yadrenko model on the sphere, that takes the great-circle distance cov_sph(x1, x2) = func(2 * sin(zeta(x1, x2) / 2)) So instead of passing the distance PS: See the GSTools tutorials for more informations: https://geostat-framework.readthedocs.io/projects/gstools/en/stable/examples/08_geo_coordinates/index.html |
Hey, @MuellerSeb thanks for all the insights! |
It applies to semi-variances as well. Yadrenko derived a way to generate a huge family of valid models on the sphere by making use of valid models in 3D. Since all models in scikit-gstat seem to be valid in 3D, you can derive the associated yadrenko models for all of them by adopting the passed distance in the variogram function. You can also think this way: When you have lat-lon coordinates, you can derive the spatial point in 3D on a unit-sphere and then calculate the "tunnel distance": This is ultimately the same as deriving the chordal distance for the great-circle distance. |
I might be a tiny bit out of my depth here, but if you want to make a variogram of, and later krige, positions in a non-euclidean 2d space that does have a euclidean embedding in 3d, why would you not convert the coordinates to that 3d embedding first, then do all calculations in euclidean 3d, then convert the results back at the end? The earth isn't a sphere, so the simple geometrical implementations of great circles above aren't gonna give the correct results, and the user might want to be able to chose char datum dependent on their application/locality... |
That is exactly the reasoning behind the yadrenko models and is also the exact way that GSTools treats lat-lon input. We provide two different dimension attributes: |
I am wondering if its possible to generate a variogram using haversine/great-circle as the distance metric? This doesn't seem to be supported by scipy.spatial.distance.pdist but would be helpful for working directly with geographic coordinates. Is it possible to supply a user defined function to calculate haversine distance?
The text was updated successfully, but these errors were encountered: