Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Potential enhancements to cuda-plot and plot-sink combo #22

Open
Jacek-ghub opened this issue Feb 1, 2023 · 5 comments
Open

Potential enhancements to cuda-plot and plot-sink combo #22

Jacek-ghub opened this issue Feb 1, 2023 · 5 comments

Comments

@Jacek-ghub
Copy link

I started plotting last night to 4x 10TB drives. Around 5TB in, plot-sink crashed. However, plotter was fine, as it was waiting for some free space on NVMe as well as complaining that it cannot talk to any plot-sink.

The box that I am using is Ubuntu 22.10, Dell t7610 - dual Xeon e5-2600 v2, currently depopulated to just one CPU and 256 GB RAM (to save on power draw, as I have 2695 v2). (I will be getting a couple of low power v2 CPUs later this week.) On the GPU side I have 3060 Ti.

The initial disk transfer speed (yesterday) was ~200 MBps, so basically max 3 HDs were used at a time. Today, when I got a new plot-sink running (without stopping the plotter or rebooting the box) they were around 100 MBps, but once the NVMe got cleared they are closer to 120-130 MBps right now (still kind of low).

Unfortunately, I don't know what triggered plot-sink crash, and am not sure (yet?) why the write speed dropped that much (those are WD Red Pro drives). One culprit could be the NVMe, but I cannot say it for sure.

I would like to ask to consider two enhancements for cuda-plotter:

  1. Be able to specify two temp folders (e.g., -t /mnt/nvme1/ -t /mnt/nvme2/). This way those NVMes may potentially completely go out of the critical path of writing/reading plots.
  2. Use extra RAM (if available) as the first level buffer before writing finished plots to NVMe(s) or passing them to plot-sink directly (so NVMe will go out of the loop). Assuming that the box has 512 GB RAM, that would allow to potentially reduce at least 75% of NVMe reads / writes, both saving NVMes and removing extra IO cycles.
@madMAx43v3r
Copy link
Owner

  1. proper solution would be RAID 0 those NVMe and / or partition the drive to leave 25% of free space
  2. you can already do this by using a ramdisk for -t

@madMAx43v3r
Copy link
Owner

Also keep in mind HDDs get slower as they fill up

@Jacek-ghub
Copy link
Author

Agreed on RAID0, if the sole intention is to use 2 NVMes. However, my thinking was that I would like to do something like this:

plot ... -t /mnt/ram -t /mnt/nvme ...

where the /mnt/ram would be prioritized. This way, the /mnt/nvme would be used more or less like an overflow buffer if there is not enough space in /mnt/ram. This way, when drives are mostly empty, there is a chance that nvme would not be used at all but would start kicking in when drives will start slowing down (when filling up). The disadvantage of this method is that the RAM has to be dedicated a priori and needs to fit plots based on n x plot-size chunks.

Although, I am not sure how you set up connection between the plotter and plot-sink (whether there is a distinction between local and remote plot-sink). I would imagine that for a local plot-sink, the plotter may pass just file name to plot-sink. However, for a remote plot sink, the plotter needs to feed the plot-sink with the plot data. In this case, one way of looking at it is that the plotter may do plot assembly in RAM (if available) and feed the remote plot-sink with chunks that will be immediately removed from RAM assembled plot. In case, if there is not enough RAM to store those chunked plots, the nvme (the second -t) would be used to back up RAM (as overflow buffer). This would also work for the local plot-sink.

@madMAx43v3r
Copy link
Owner

Local plot sink is the same as remote, plot data is sent via TCP.

Having multiple -t is possible but quite a bit of work...

@Jacek-ghub
Copy link
Author

Jacek-ghub commented Feb 2, 2023

Hey, you have the best plotter out there from technical point of view. However, I have noticed that for a lot of people their knowledge about it stops at MMX UI not having a dark mode (or you not using it at home). Some other person was asking how to use -w flag to slow plotter on his Epyc down to a single disk speed.

This is potentially useful feature only for those that have extra RAM and want to save on NVMe wear (so maybe a small percentage of people, but potentially a big percentage of generated plots). However, it also could be seen as a one more checkmark feature by those Epyc / dark-mode focused people while comparing to what is out there. :)

Local plot sink is the same as remote, plot data is sent via TCP.

OK, so this is the reason to have plot-sink and plot-copy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants