Overview of fine-tuning strategies, with new possibilities to explore #67

brian6091 · 2022-12-19T13:52:27Z

brian6091
Dec 19, 2022
Collaborator

@cloneofsimo's LoRA method has massively accelerated the already fast-moving world of diffusion model fine-tuning. In an attempt to maintain my sanity in trying to keep up with the different methods and their combinations, I ended up "barcoding" different methods and their variants.

I thought I'd post a graphic here to show where Textual Inversion, Dreambooth, LoRA, pivotal tuning, etc. differ from each other. Nothing new here for the experts, but I think it clarifies the relationships between fine-tuning methods that have been proposed. And perhaps more interestingly, it reveals some that haven't yet been explored (and maybe they should be!).

Most fine-tuning of Stable Diffusion models come down to whether we modify 1) the tokenizer, 2) the text encoder, or 3) the Unet. Adding prior class regularization represents a fourth element, but represents adding different data rather than modifying existing model components. For the text encoder and Unet, we can distinguish training transformer attention layers or all the rest (everything that is not transformer attention). Finally, we can add LoRA on top.

Below I represent each method with a row of symbols (Fine-tuning code layout in figure), to which we can apply various modifications of the different model components (Legend in figure). What follows are strategies based on Dreambooth and Textual inversion, as well as several that @cloneofsimo has highlighted in this repo (e.g., LoRA X Textual inversion w/ pivotal tuning).

I included some new variants in the bottom box (new to me at least). Personally, I find the last two the most elegant! I'm testing some of these out now, and I'll update the post when the models finish training...

As always, looking forward to discussion, corrections, and different points of view.

full-size image here

References

cloneofsimo · 2022-12-19T14:57:10Z

cloneofsimo
Dec 19, 2022
Maintainer

Awesome stuff, as always!! By the way there seems to be a paper on pivotal tuning, but with full rank from adobe if you are interested.
https://github.com/adobe-research/custom-diffusion

3 replies

brian6091 Dec 19, 2022
Collaborator Author

Haha, that code got released two days ago! Looks like they keep the query mapping (Q) frozen. Would be easy to spin up a LoRA variant of this if we could specify which of Q, V, K to approximate. Do you think this is worth it?

Thanks for the heads-up. I'll update the figure once I get through the paper.

cloneofsimo Dec 19, 2022
Maintainer

According to them, other parameters don't change as much. So this might answer some of your question on weight-change per layers.

brian6091 Dec 19, 2022
Collaborator Author

Would be crazy if we could get the fine-tuned weights down to <500kb!

cloneofsimo · 2022-12-19T16:14:54Z

cloneofsimo
Dec 19, 2022
Maintainer

Just throwing bunch of ideas regrading Inversion. That paper has very good things to learn from

we really need statistically validated ways to evaluate methods. Unlike textual inversion, their captions used for distortion-edit curve
Is adam, adamw the best optimizer to inverse latent? They've seemed to use lagrangian based formulation to optimize multiple concepts, better have a detailed look. I'm thinking about using BFGS, or something like ST-Gumbel-softmax, Vector-quantization, or even stuff from nevergrad. Workable approach from classical CLIP-guided latent optimization https://github.com/lucidrains/big-sleep/blob/main/big_sleep/big_sleep.py

3 replies

brian6091 Dec 19, 2022
Collaborator Author

1. we really need statistically validated ways to evaluate methods. Unlike textual inversion, their captions used for distortion-edit curve

I thought that figure was great. Are you suggesting we implement it? Or that you think something else is better?

cloneofsimo Dec 19, 2022
Maintainer

Yes! maybe parse the used prompts from here and make evaluate() function?
https://www.cs.cmu.edu/~custom-diffusion/results.html

brian6091 Dec 19, 2022
Collaborator Author

Cool, that's what I was thinking. I'll see if I can get around to adapting Gal et al's code.

brian6091 · 2022-12-19T20:15:54Z

brian6091
Dec 19, 2022
Collaborator Author

Updated figure to include Custom diffusion.

0 replies

brian6091 · 2022-12-20T09:37:38Z

brian6091
Dec 20, 2022
Collaborator Author

I just discovered a method called hypernetworks. Anyone have a reference for this? @cloneofsimo ?

3 replies

ExponentialML Dec 20, 2022

@brian6091 Hypernetworks were created by the team at NovelAI.

You can read about it here: https://blog.novelai.net/novelai-improvements-on-stable-diffusion-e10d38db82ac

I honestly feel that the shared logic between textual inversion, LoRA, hypernetworks, Dreambooth, and custom diffusion can pave a way to high quality, distributed fine tuned models in a justifiable file size. We are very close.

ExponentialML Dec 20, 2022

An open source implementation of these hypernetworks can also be found in the AUTOMATIC's repository:
https://github.com/AUTOMATIC1111/stable-diffusion-webui/tree/master/modules/hypernetworks

brian6091 Dec 20, 2022
Collaborator Author

@ExponentialML

Thanks for the links. Lots of interesting tidbits. So it looks like the NovelAI guys hit on the same broad strategy as the Adobe guys (in the Unet anyways):

The hypernets are applied to the k and v vectors of CrossAttention layers in StableDiffusion, while not touching any other parts of the U-net.

I honestly feel that the shared logic between textual inversion, LoRA, hypernetworks, Dreambooth, and custom diffusion can pave a way to high quality, distributed fine tuned models in a justifiable file size. We are very close.

Now we just need a good way to compare them all objectively!

ClaytonFarr · 2022-12-20T14:10:30Z

ClaytonFarr
Dec 20, 2022

@brian6091 Your detailed exploration and documentation of these things always makes me smile (even if I only understand a rough third of it currently). In addition the Hypernetworks, would Aesthetic Gradients training also be in this family?

3 replies

brian6091 Dec 20, 2022
Collaborator Author

@ClaytonFarr
I'm glad you are getting something out of it, I'm having a blast! I'll add hypernetworks and aesthetic gradients into the figure once I can source a reference with enough detail to know what is getting tuned.

krahnikblis Dec 20, 2022

i had fun learning with text embeddings stemming from the aesthetic gradients. basically you CLIP-embed your input images, then combine all their embeddings into a single token. i have been messing around with how to do that combination - the default within the networks is to just sum them up and norm them, but i wanted to discourage the outlier values that make images differ but not their subjects (like dark vs light image of the same object). so for that i used Theil-Sen slope estimator across the batch dimension, and then calculated along the embedding dimension - so that i ended up with a "center of gravity" approach to combining into a single (768,) array.
anyway, the aesthetic embedding, and really any kind of math with the embeddings, has fun and surprising results. training is still a pain, but in the aesthetic method, they're just adding another loss calculation against the aesthetic, in addition to the normal one for the prompt...

brian6091 Dec 20, 2022
Collaborator Author

Thanks for the great description. Found the paper, which is full of great images. Neat technique since it can be applied on top of any of the fine-tuning methods.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Overview of fine-tuning strategies, with new possibilities to explore #67

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 5 comments 12 replies

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Overview of fine-tuning strategies, with new possibilities to explore #67

brian6091 Dec 19, 2022 Collaborator

References

Replies: 5 comments · 12 replies

cloneofsimo Dec 19, 2022 Maintainer

brian6091 Dec 19, 2022 Collaborator Author

cloneofsimo Dec 19, 2022 Maintainer

brian6091 Dec 19, 2022 Collaborator Author

cloneofsimo Dec 19, 2022 Maintainer

brian6091 Dec 19, 2022 Collaborator Author

cloneofsimo Dec 19, 2022 Maintainer

brian6091 Dec 19, 2022 Collaborator Author

brian6091 Dec 19, 2022 Collaborator Author

brian6091 Dec 20, 2022 Collaborator Author

ExponentialML Dec 20, 2022

ExponentialML Dec 20, 2022

brian6091 Dec 20, 2022 Collaborator Author

ClaytonFarr Dec 20, 2022

brian6091 Dec 20, 2022 Collaborator Author

krahnikblis Dec 20, 2022

brian6091 Dec 20, 2022 Collaborator Author

brian6091
Dec 19, 2022
Collaborator

Replies: 5 comments 12 replies

cloneofsimo
Dec 19, 2022
Maintainer

brian6091 Dec 19, 2022
Collaborator Author

cloneofsimo Dec 19, 2022
Maintainer

brian6091 Dec 19, 2022
Collaborator Author

cloneofsimo
Dec 19, 2022
Maintainer

brian6091 Dec 19, 2022
Collaborator Author

cloneofsimo Dec 19, 2022
Maintainer

brian6091 Dec 19, 2022
Collaborator Author

brian6091
Dec 19, 2022
Collaborator Author

brian6091
Dec 20, 2022
Collaborator Author

brian6091 Dec 20, 2022
Collaborator Author

ClaytonFarr
Dec 20, 2022

brian6091 Dec 20, 2022
Collaborator Author

brian6091 Dec 20, 2022
Collaborator Author