-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Troubleshooting with ptq #167
Comments
Hi m! What kind of task are you running? Can you let me know what its parameters are? Is your CPU cooking or your network or disk working? |
Hi Will, Notable parameters used in xfer task:
Utilizing 36 cpu cores. Indeed, the disk has been unstable at times. Although, both cpu and network seem fine to me. Any suggestions how to identify the cause? |
How's your memory usage? Sharded transfers can potentially use a lot of memory. If you start swapping, that would cause low utilization of network, cpu, but weird access patterns to disk. Try setting the memory parameter lower (which will create more shards but makes each task smaller). |
I'm using a compute node which has 36 cpu cores, and 1TB memory. I understand the default is 3.5GB. Should I set much smaller than the default value? Maybe 1GB? (--memory 1000000000)
|
Hmm... 1TB should be more than enough. Can you check how much RAM is being used? |
I reset the queue and running once again. From looking at htop summary, total memory usage is fluctuating between 130GB~150GB at the moment. Gradually increasing. |
If it gets stuck again and RAM isn't a problem, one thing you can try is turning off parallel and see if it executes. |
Hi @william-silversmith , I tried running without
htop command shows there are two processes and one is kept running at CPU 0.0%, MEM 0.1%. and another process doesn't seem to be using any resources. Both are showing "S" which seems to be sleeping? -m |
Testing to see if the storage has a problem by switching to another storage. |
This is a good strategy! Let me know how it goes. |
Okay, so this definitely had something to do with the storage. When I switched over to a different storage, I do not see this issue where igneous hangs. |
Hi @william-silversmith ,
I have a situation where my igneous execution is stuck at one point and does not seem to progress. And I don't see any notable logs or outputs. Would you be able to guide me how to troubleshoot what is causing the issue?
Thanks,
-m
The text was updated successfully, but these errors were encountered: