Potential enhancements to cuda-plot and plot-sink combo #22

Jacek-ghub · 2023-02-01T02:37:18Z

I started plotting last night to 4x 10TB drives. Around 5TB in, plot-sink crashed. However, plotter was fine, as it was waiting for some free space on NVMe as well as complaining that it cannot talk to any plot-sink.

The box that I am using is Ubuntu 22.10, Dell t7610 - dual Xeon e5-2600 v2, currently depopulated to just one CPU and 256 GB RAM (to save on power draw, as I have 2695 v2). (I will be getting a couple of low power v2 CPUs later this week.) On the GPU side I have 3060 Ti.

The initial disk transfer speed (yesterday) was ~200 MBps, so basically max 3 HDs were used at a time. Today, when I got a new plot-sink running (without stopping the plotter or rebooting the box) they were around 100 MBps, but once the NVMe got cleared they are closer to 120-130 MBps right now (still kind of low).

Unfortunately, I don't know what triggered plot-sink crash, and am not sure (yet?) why the write speed dropped that much (those are WD Red Pro drives). One culprit could be the NVMe, but I cannot say it for sure.

I would like to ask to consider two enhancements for cuda-plotter:

Be able to specify two temp folders (e.g., -t /mnt/nvme1/ -t /mnt/nvme2/). This way those NVMes may potentially completely go out of the critical path of writing/reading plots.
Use extra RAM (if available) as the first level buffer before writing finished plots to NVMe(s) or passing them to plot-sink directly (so NVMe will go out of the loop). Assuming that the box has 512 GB RAM, that would allow to potentially reduce at least 75% of NVMe reads / writes, both saving NVMes and removing extra IO cycles.

madMAx43v3r · 2023-02-02T18:20:02Z

proper solution would be RAID 0 those NVMe and / or partition the drive to leave 25% of free space
you can already do this by using a ramdisk for -t

madMAx43v3r · 2023-02-02T18:21:20Z

Also keep in mind HDDs get slower as they fill up

Jacek-ghub · 2023-02-02T18:54:37Z

Agreed on RAID0, if the sole intention is to use 2 NVMes. However, my thinking was that I would like to do something like this:

plot ... -t /mnt/ram -t /mnt/nvme ...

where the /mnt/ram would be prioritized. This way, the /mnt/nvme would be used more or less like an overflow buffer if there is not enough space in /mnt/ram. This way, when drives are mostly empty, there is a chance that nvme would not be used at all but would start kicking in when drives will start slowing down (when filling up). The disadvantage of this method is that the RAM has to be dedicated a priori and needs to fit plots based on n x plot-size chunks.

Although, I am not sure how you set up connection between the plotter and plot-sink (whether there is a distinction between local and remote plot-sink). I would imagine that for a local plot-sink, the plotter may pass just file name to plot-sink. However, for a remote plot sink, the plotter needs to feed the plot-sink with the plot data. In this case, one way of looking at it is that the plotter may do plot assembly in RAM (if available) and feed the remote plot-sink with chunks that will be immediately removed from RAM assembled plot. In case, if there is not enough RAM to store those chunked plots, the nvme (the second -t) would be used to back up RAM (as overflow buffer). This would also work for the local plot-sink.

madMAx43v3r · 2023-02-02T23:01:38Z

Local plot sink is the same as remote, plot data is sent via TCP.

Having multiple -t is possible but quite a bit of work...

Jacek-ghub · 2023-02-02T23:12:47Z

Hey, you have the best plotter out there from technical point of view. However, I have noticed that for a lot of people their knowledge about it stops at MMX UI not having a dark mode (or you not using it at home). Some other person was asking how to use -w flag to slow plotter on his Epyc down to a single disk speed.

This is potentially useful feature only for those that have extra RAM and want to save on NVMe wear (so maybe a small percentage of people, but potentially a big percentage of generated plots). However, it also could be seen as a one more checkmark feature by those Epyc / dark-mode focused people while comparing to what is out there. :)

Local plot sink is the same as remote, plot data is sent via TCP.

OK, so this is the reason to have plot-sink and plot-copy.

Jacek-ghub mentioned this issue Feb 7, 2023

[Fixed] mmx-cuda-plotter doesn't back off when new plot is finished, but rather starts a second plot xfr per HD #31

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Potential enhancements to cuda-plot and plot-sink combo #22

Potential enhancements to cuda-plot and plot-sink combo #22

Jacek-ghub commented Feb 1, 2023

madMAx43v3r commented Feb 2, 2023

madMAx43v3r commented Feb 2, 2023

Jacek-ghub commented Feb 2, 2023

madMAx43v3r commented Feb 2, 2023

Jacek-ghub commented Feb 2, 2023 •

edited

Loading

Potential enhancements to cuda-plot and plot-sink combo #22

Potential enhancements to cuda-plot and plot-sink combo #22

Comments

Jacek-ghub commented Feb 1, 2023

madMAx43v3r commented Feb 2, 2023

madMAx43v3r commented Feb 2, 2023

Jacek-ghub commented Feb 2, 2023

madMAx43v3r commented Feb 2, 2023

Jacek-ghub commented Feb 2, 2023 • edited Loading

Jacek-ghub commented Feb 2, 2023 •

edited

Loading