Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"std::bad_alloc" error workaround? #183

Open
IAGO1215 opened this issue May 22, 2024 · 2 comments
Open

"std::bad_alloc" error workaround? #183

IAGO1215 opened this issue May 22, 2024 · 2 comments

Comments

@IAGO1215
Copy link

When there are more than 90,000 coordinates (for example an area a bit larger than 300 pixels * 300 pixels), the variogram functions produces "std::bad_alloc" error. I would like to know if there is any workaround for this error or it is dependent on the hardware.

The coordinates are all integer and the data (values) are integer as well in this case. Changing n_lags or bin_func doesn't resolve that error. Setting max_lag to 1/2 of maximum lag doesn't help either.

Before I was only doing variogram on area at most 150 pixels * 150 pixels, and the coordinates are float64 as well as the data (values) and there was no error. I am guessing that there is a threshold of maximum of coordinates if it is irrelevant to the hardware.

I can provide the data in case.

@mmaelicke
Copy link
Owner

This is a C++ error, that is caused when something with memory allocation is off (ie. not enough to allocate). As far as I know, these kind of errors are usually catched by numpy and you receive a MemoryError telling you, how much it tried to allocate.

300x300 does not sound too crazy to me. Have you tried to pass only a sample, to verify that everything is working with smaller samples?

You are right, the Vairogram class always calculates the full distance arrays, thus using max lag does not change anything. It will simply not use a number of values in the sample. The Variogram is meant as a tool to analyze the sample, if the sample is too large for memory, the Variogram can't help you with this in the current implementation.

Something you can try is: The Variogram will organize all inputs in 1d arrays, which represent the upper triangle of the distance matrix in row-wise fashion. So if you can allocate an array of size 90,000**2 / 2 - 90,000, the calculation is in principle possible. The Variogram will save the distances and their grouping indices to lag classes in two separated arrays. So you need it twice.
If that does not work on your hardware, then you can in principle also calculate the variogram yourself, ie. using KDTree (the Variogram will use that for variograms using euclidean distance and a max lag, you can check out the MetricSpace class for implementation details), but iterate through the lag classes yourself. Then calculate the semi-variances for each lag class using skgstat.estimators. This is substantially slower, but has by far a smaller memory footprint.
I am currently on vacation, so I can't post a mock-up example here (writing on the phone), but I can come back to that, if you identify that as the problem and need an example.

@IAGO1215
Copy link
Author

Thanks for the detailed resolution. And yes, sometimes it simply shows "std::bad_alloc" but a few times it shows that it requires around 60GB memory to process the data.

Currently I am using resample method (downscale 300 x 300 pixels to 150 x 150 pixels) to resolve the issue temporarily.

I would like a mock-up example if that's possible. Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants