Optimization Roundup - Discuss <=12GB VRAM Optimization/settings here! #77
Replies: 14 comments 32 replies
-
So, I have a PR here: AUTOMATIC1111/stable-diffusion-webui#4527 Which would let us use "accelerate launch" to run the app, which I think might be the secret sauce to getting it to work on 8GB with a GPU. You also want to disable training the text encoder. I still need to test this on my 8GB GPU and see if it's possible, but if Shivam's repo can be run on 8GB, then this should work too. Option B is to use "Use CPU", which will work, but very slowly. |
Beta Was this translation helpful? Give feedback.
-
Related to: |
Beta Was this translation helpful? Give feedback.
-
I have a 3060 12GB card (and only 12GB of system RAM, with 25GB set aside for my paging file), and I was successful in running dreambooth training this morning for the first time. Here were my settings under Advanced: I am running xformers. I was even able to use a classification dataset, and txt files for labels on my dataset. I did not save any checkpoints during training, or generate any previews during training (not sure if that would cause OOM, so more testing is needed there to see if I could enable them). Let me know if you have any questions about my setup, and I'll try to help. |
Beta Was this translation helpful? Give feedback.
-
I have a 4GB VRAM GPU, and I would like to know if it was possible to train the model on CPU only, because despite checking "use CPU only", when it's time to make the ckpt file I get a CUDA out of memory error. Is this intended or is it an issue? I want to know just so that I don't keep trying if there is no way :) |
Beta Was this translation helpful? Give feedback.
-
I was able to train on a 10GB 3080 Documentation of the steps and settings I used can be found here: #84 (comment) It seems that generating checkpoint every N steps and preview image every N steps does not clean up memory as thorough as the "final" preview image after training ends. So, traing 100 steps, training another 100 steps, and another 100 steps works, but getting a preview every 100 steps does not (OOM crash after the first checkpoint). |
Beta Was this translation helpful? Give feedback.
-
No, I disabled the bit that unloaded optimizations before training.
…On Sat, Nov 12, 2022, 11:53 AM Jonseed ***@***.***> wrote:
@drnagel <https://github.com/drnagel> glad you got it working! I wasn't
sure if xformers would make a difference. I thought such optimizations were
unloaded during training, but maybe not.
—
Reply to this email directly, view it on GitHub
<#77 (reply in thread)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAMO4NBAQGEFWRSIGQSTA6TWH7KSPANCNFSM6AAAAAAR34HC5E>
.
You are receiving this because you commented.Message ID:
<d8ahazard/sd_dreambooth_extension/repo-discussions/77/comments/4124557@
github.com>
|
Beta Was this translation helpful? Give feedback.
-
Something seems off, before the "Generate ckpt" button was added, i managed to run Dreambooth on 3080 10GB on Linux Mint (Nvidia drivers 520.56.06, CUDA version 11.8), with the recommended VRAM optimizations listed above. Training completed with 1000 steps but OOM crashed at that point - and no ckpt file was generated. After i saw the button added, i can no longer even begin training. Straight out of VRAM. I've tried all tips i've seen here, to no avail. edit to add settings:
Version info from starting Stable Diffusion: Launching Web UI with arguments: --autolaunch --xformers --ckpt-dir /evo/sd-resources/checkpoints |
Beta Was this translation helpful? Give feedback.
-
By design, the extension should clear all prior VRAM usage before training, and then restore SD back to "normal" when training is complete. Similarly, someone somewhere was talking about killing their web browser to save VRAM, but I think that the VRAM used by the GPU for stuff like browser and desktop windows comes from "shared" VRAM space, while stuff like computations in Torch is used in "reserved' VRAM. Or, in a nutshell, unless you're running something like video editing software, CAD stuff, or a 3D video game, I don't think running software will have an effect on OOM. Of course, I'm also not a hardware engineer, so I could be completely wrong. |
Beta Was this translation helpful? Give feedback.
-
Anyone has experience with Colossal AI? https://github.com/hpcaitech/ColossalAI/tree/main/examples/images/diffusion |
Beta Was this translation helpful? Give feedback.
-
Related to #124 On that note, I've added a "wizard" button that attempts to set the "optimal" settings based on the total amount of available VRAM. It's not perfect, but if anybody wants to contribute their "VRAM Total" and settings used to train without OOM, this would help me to help others easily set params to avoid the most common issue with this bit of software. :D |
Beta Was this translation helpful? Give feedback.
-
I'd be interested in the EMA training option if it makes the model better. Is it possible that there will be <= 12 GB VRAM cards? I'm going over the limit with 512 MB, so I'm waiting to see if it will be optimized. |
Beta Was this translation helpful? Give feedback.
-
Hot off the presses! |
Beta Was this translation helpful? Give feedback.
-
Something that is not in the hundreds. When I had the simpler version of the training, I could start and continue any number of training sessions, without xformers. This is what I saw at the end of the training: Training complete?? Cleanup Complete. Steps: 100%|███████| 1000/1000 [23:12<00:00, 1.39s/it, loss=0.245, lr=1.22e-7] |
Beta Was this translation helpful? Give feedback.
-
My 5 cents about how I've managed to run it on 3060ti with 8Gb vram
To sum up, here is step by step guide that worked in my case:
|
Beta Was this translation helpful? Give feedback.
-
First, big thanks to all contributors! you have made something impressive!
I hope you could optimize it a tiny little more. I Have Super 2080 8GB VRam and almost made it on Win11:
"Tried to allocate 58.00 MiB (GPU 0; 8.00 GiB total capacity;" :(
--xformers
[v] Don't cache latents
[v] se 8bit Adam
Mixed Precision fp16 and "none" tested
Any chance?
Beta Was this translation helpful? Give feedback.
All reactions