-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BANE in a container sometimes hangs #187
Comments
I attempted to fix this with #200. In testing it worked as expected. However, it seems that when dask-workers are executing code it is done so in a thread outside of main. So, the signal handling does not work as expected.
I suppose the right thing to do now is to either:
I have no idea why simply rerunning BANE again after such an error works. Is it concurrency with out BANE processes at the same time? The problem only seems to happen under load. Fixing it there (fork or otherwise) might be possible. Sigh. Sad. |
I have largely fixed the hanging behavior by raising an appropriate error in the callback log handler, which retriggers the singularity call. The timeout seems to be a known issue in prefect. The real fix is going to be to bane. |
I have noticed that occasionally a BANE process inside a singularity container will hang for an unreasonably long amount of time. Whats strange to me is that whenever it does there is this message:
My hunch is that it has something to do with the use of shared memory to minimise the memory footprint. I think it would be pretty straightforward to switch to using a fork of aegeantools that incorporates either:
From memory the fft approach has a lingering bug where the output map is blanked and shifted by the kernel shape.
I believe I mocked up some other modes to bane in a separate branch somewhere, but I can't remember the specifics of those.
The text was updated successfully, but these errors were encountered: