Replies: 5 comments 12 replies
-
Awesome stuff, as always!! By the way there seems to be a paper on pivotal tuning, but with full rank from adobe if you are interested. |
Beta Was this translation helpful? Give feedback.
-
Just throwing bunch of ideas regrading Inversion. That paper has very good things to learn from
|
Beta Was this translation helpful? Give feedback.
-
Updated figure to include Custom diffusion. |
Beta Was this translation helpful? Give feedback.
-
I just discovered a method called hypernetworks. Anyone have a reference for this? @cloneofsimo ? |
Beta Was this translation helpful? Give feedback.
-
@brian6091 Your detailed exploration and documentation of these things always makes me smile (even if I only understand a rough third of it currently). In addition the Hypernetworks, would Aesthetic Gradients training also be in this family? |
Beta Was this translation helpful? Give feedback.
-
@cloneofsimo's LoRA method has massively accelerated the already fast-moving world of diffusion model fine-tuning. In an attempt to maintain my sanity in trying to keep up with the different methods and their combinations, I ended up "barcoding" different methods and their variants.
I thought I'd post a graphic here to show where Textual Inversion, Dreambooth, LoRA, pivotal tuning, etc. differ from each other. Nothing new here for the experts, but I think it clarifies the relationships between fine-tuning methods that have been proposed. And perhaps more interestingly, it reveals some that haven't yet been explored (and maybe they should be!).
Most fine-tuning of Stable Diffusion models come down to whether we modify 1) the tokenizer, 2) the text encoder, or 3) the Unet. Adding prior class regularization represents a fourth element, but represents adding different data rather than modifying existing model components. For the text encoder and Unet, we can distinguish training transformer attention layers or all the rest (everything that is not transformer attention). Finally, we can add LoRA on top.
Below I represent each method with a row of symbols (Fine-tuning code layout in figure), to which we can apply various modifications of the different model components (Legend in figure). What follows are strategies based on Dreambooth and Textual inversion, as well as several that @cloneofsimo has highlighted in this repo (e.g., LoRA X Textual inversion w/ pivotal tuning).
I included some new variants in the bottom box (new to me at least). Personally, I find the last two the most elegant! I'm testing some of these out now, and I'll update the post when the models finish training...
As always, looking forward to discussion, corrections, and different points of view.
full-size image here
References
Beta Was this translation helpful? Give feedback.
All reactions