Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Node becomes 'Disconnected from Node' #12

Open
chilio opened this issue Jan 26, 2023 · 21 comments
Open

Node becomes 'Disconnected from Node' #12

chilio opened this issue Jan 26, 2023 · 21 comments

Comments

@chilio
Copy link

chilio commented Jan 26, 2023

This is happening while moving plots over 1G lan network.
Furthermore after completed file transfer, node is not recovering from this situation, and hanging on this:
image

@asdasdasdzxcgreter
Copy link

Like you, I can only use 0.9.5 now

@chilio
Copy link
Author

chilio commented Jan 26, 2023

@madMAx43v3r actually I just noticed that this is a real problem even while not copying any files over the network.
In my case this is happening on Windows 10 with MMX Node v0.9.7

@chilio
Copy link
Author

chilio commented Jan 26, 2023

@asdasdasdzxcgreter on v0.9.5 I have the same problem.
When I disable GPU compressed plots, both versions are working.

@madMAx43v3r
Copy link
Owner

Can you see any output in the terminal?

@chilio
Copy link
Author

chilio commented Jan 28, 2023

@madMAx43v3r I'm running this on windows so no terminal output, but when the node goes disconnected (with this red emoji), everything stops in logs also and no error nor any further logs are generated...

@madMAx43v3r
Copy link
Owner

yeah it sounds like the node crashed...

if you got to settings on the left side there is an option to enable the terminal, maybe we can see something there.

@asdasdasdzxcgreter
Copy link

@madMAx43v3r Maybe the reason is found. If there are original images and compressed images in a disk, an error will be reported~

@chilio
Copy link
Author

chilio commented Jan 30, 2023

@madMAx43v3r unfortunately it is not available in my case...
image

@madMAx43v3r
Copy link
Owner

you have to triple click on "Debug" to activate it.

@chilio
Copy link
Author

chilio commented Jan 30, 2023

@madMAx43v3r so that's the result after running for 10 mins with attached directory of 56 compressed plots (3 x c1, 3 x c8, 50 x c9):
image
Without compressed plots the same installation was working without issues for last few days.
System Specs: Windows 10 x 64, Ryzen 5 3500, 32 GB RAM, RTX 3080 Ti.

@madMAx43v3r
Copy link
Owner

are you plotting at the same time? sounds like not enough VRAM..

@madMAx43v3r
Copy link
Owner

Also C9 of what K size? I think k32 C9 is very close to 8G VRAM

@chilio
Copy link
Author

chilio commented Jan 31, 2023

I'm NOT plotting on this machine and this RTX 3080 Ti has 12 GB RAM.
All compressed plots are k32.

@madMAx43v3r
Copy link
Owner

Can you monitor RAM and VRAM usage in TaskManager and see if any of those is filling up?

@chilio
Copy link
Author

chilio commented Jan 31, 2023

Yes, this was the moment node got disconnected.
image

@chilio
Copy link
Author

chilio commented Feb 2, 2023

When I attach c9 k32 plots error occurs pretty fast -> 1-3 mins
When I add these compressed plots => 77 c7 k32, 3 c8 k32 and 3 c1 k32, node is working and not crashing anymore.
Tested with v0.9.8

@chilio
Copy link
Author

chilio commented Feb 2, 2023

Update => actually it crashed 2 hours later.
@madMAx43v3r any ideas? What else might be wrong?

@madMAx43v3r
Copy link
Owner

There's a known issue where on windows it can crash after some time, due to a memory leak. Is that what you are observing?

@chilio
Copy link
Author

chilio commented Feb 2, 2023

It might be, but none of system components are hitting nor approaching memory limits...

@chilio
Copy link
Author

chilio commented Feb 2, 2023

For example if I include one drive with c9 k32 plots, node gets disconnected fast in matter of let's say 30 seconds...
And taskbar does not show anything new beside, what I've published before in this post.

@chilio
Copy link
Author

chilio commented Feb 9, 2023

@madMAx43v3r it seems like the issue is resolved with v0.9.9.
From logs I assume there were 2 problems:

  1. Initial spike in VRAM on node start
  2. System RAM config not sufficient enough to handle c9 plots

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants