Refactor: wtype per tensor from file instead of global #455

stduhpf · 2024-11-01T16:24:34Z

I'm not sure if it makes a significant difference yet.

stduhpf · 2024-11-23T15:23:38Z

Trying master again after having these changes implemented for a while it feels like this reduces the loading times, but maybe there's somthing else affecting it.

Green-Sky · 2024-11-23T17:35:18Z

well, it removes conversions. I had this today where i loaded flux with q8_0 t5 and f16 clip, and was wondering why t5 was using f16 (including ram usage). Turns out sd.cpp can only have one conditioner wtype rn...

fszontagh · 2024-11-23T19:20:18Z

Maybe this is the wrong place where i write this, but i started a flux diffusion, which is loaded the model into VRAM, and started to make the diffusion on CPU. Please the the screenshot below:

UPDATE: Sorry, my mistake. The computation was run on the GPU; maybe it was the conversion that I saw.

Green-Sky · 2024-11-24T08:01:04Z

You probably saw the embedding models.

stduhpf · 2024-11-25T18:36:26Z

I can confirm now, this PR makes loading weights very much faster for larger models.

(Results including warm runs only, so the models are always in disk cache)

model (types)	master (model loading time)	PR (model loading time)
Flux.1 [Schnell] (q3_k, ae: q8_0, clip_l: q8_0, t5xxl: q4_k)	92.37s	6.08s
SD3.5 Medium (q8_0, clip_l: f16, clip_g: f16, t5xxl: q4_k)	18.19s	5.37s
SD3.5 Large (q4_0, clip_l: q8_0, clip_g: f16, t5xxl: q4_k)	104.6s !	5.95s
SDXL (q8_0, vae: f16)	2.70s	2.72s
SD 1.5 (f16)	1.43s	1.42s

I think this makes this PR worth merging.

Green-Sky · 2024-11-25T18:53:25Z

But how do the other performance metrics change. Also does it all work?

stduhpf · 2024-11-25T21:30:09Z

Also does it all work?

I think so. Surprisingly, results aren't always exactly the same but it's very close nonetheless. I'm not sure what's up with that (maybe it has something to do with text encoders, or maybe it's just #439 since I'm using Vulkan). Results are exactly the same, except when using different quants for the text encoders. I have tried various models so far, they all worked for both txt2img and img2img. I didn't try ~~loras, photomaker, or~~ controlnets, (or anything that isn't supported by Vulkan) yet though.

how do the other performance metrics change.

Diffusion/sampling and vae performance are within margin of error. Prompt encoding is significantly faster, when mixing quantizations.

Edit: Photomaker (V1 and V2) works. LoRAs work too (on CPU and without quantization on Vulkan).

stduhpf · 2024-11-26T17:49:07Z

Ah, controlnets are not working, I'll see if I can fix.
Upscaler is also competely broken I think.

stduhpf · 2024-11-27T02:36:02Z

I think I got pretty much everything working at least as well as it used to with this refactoring now. If anyone notices something I might have missed, lmk.

thxCode · 2024-11-29T09:04:24Z

it introduces a tensor types mapper to guide the ggml type of each tensor, but i saw the conversion has filtered out these fixed tensors, is this an overkill implementation? https://github.com/leejet/stable-diffusion.cpp/blob/master/model.cpp#L1873-L1902

the root cause is using one Conditioner weight type to replace the weight types of various clip_l/clip_g/t5xxl, why do we still keep this wrong information Conditioner weight type: fp16(clip_l: fp16, clip_g: fp16, t5xxl: q8_0)?

can we just refactor the get_conditioner_wtype of the model loader, make it able to extract the ggml type of each conditioner according to the sd_version, and then use the correct ggml type to initialize the different conditioners? looks like changing is less but achieving the same goals.

stduhpf · 2024-11-29T11:35:01Z

can we just refactor the get_conditioner_wtype of the model loader, make it able to extract the ggml type of each conditioner according to the sd_version, and then use the correct ggml type to initialize the different conditioners? looks like changing is less but achieving the same goals.

That's a fair point. It would indeed probably improve the loading times the same way without refactoring the whole thing. But the point of this PR was initially to refactor the model loading logic, the improvement in loading time for conditionning models is just a nice side effect.

My original motivation for doing this refactor was to be able to better support models with mixed quantization types (like those made with https://github.com/city96/ComfyUI-GGUF/blob/main/tools/convert.py or with #447). Now it also make it possible to implement #490 using the keys of the same tensor types map.

leejet · 2024-11-30T03:45:04Z

I think the loading time optimization is just because you use the quantized t5 model, but get_conditioner_wtype doesn't recognize the quantized type correctly.

stduhpf added 4 commits November 25, 2024 12:57

Refactor: wtype per tensor

6cbcbe0

Fix default args

ee674a5

refactor: fix flux

b465f13

Refactor photmaker v2 support

cb46146

stduhpf force-pushed the refactor-wtype branch from 6167c1a to cb46146 Compare November 25, 2024 12:10

stduhpf marked this pull request as ready for review November 25, 2024 18:36

stduhpf changed the title ~~Refactor Idea: wtype per tensor from file instead of global~~ Refactor: wtype per tensor from file instead of global Nov 25, 2024

stduhpf added 4 commits November 26, 2024 21:52

unet: refactor the refactoring

371d81f

Refactor: fix controlnet and tae

04ca926

refactor: upscaler

38f5685

Refactor: fix runtime type override

170663f

upscaler: use fp16 again

8e7fbf8

stduhpf mentioned this pull request Nov 29, 2024

Refactor: Flexible model architecture for dit models (Flux & SD3) #490

Merged

stduhpf closed this Nov 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor: wtype per tensor from file instead of global #455

Refactor: wtype per tensor from file instead of global #455

stduhpf commented Nov 1, 2024 •

edited

Loading

stduhpf commented Nov 23, 2024

Green-Sky commented Nov 23, 2024

fszontagh commented Nov 23, 2024 •

edited

Loading

Green-Sky commented Nov 24, 2024

stduhpf commented Nov 25, 2024 •

edited

Loading

Green-Sky commented Nov 25, 2024

stduhpf commented Nov 25, 2024 •

edited

Loading

stduhpf commented Nov 26, 2024 •

edited

Loading

stduhpf commented Nov 27, 2024

thxCode commented Nov 29, 2024

stduhpf commented Nov 29, 2024

leejet commented Nov 30, 2024

Refactor: wtype per tensor from file instead of global #455

Refactor: wtype per tensor from file instead of global #455

Conversation

stduhpf commented Nov 1, 2024 • edited Loading

stduhpf commented Nov 23, 2024

Green-Sky commented Nov 23, 2024

fszontagh commented Nov 23, 2024 • edited Loading

Green-Sky commented Nov 24, 2024

stduhpf commented Nov 25, 2024 • edited Loading

Green-Sky commented Nov 25, 2024

stduhpf commented Nov 25, 2024 • edited Loading

stduhpf commented Nov 26, 2024 • edited Loading

stduhpf commented Nov 27, 2024

thxCode commented Nov 29, 2024

stduhpf commented Nov 29, 2024

leejet commented Nov 30, 2024

stduhpf commented Nov 1, 2024 •

edited

Loading

fszontagh commented Nov 23, 2024 •

edited

Loading

stduhpf commented Nov 25, 2024 •

edited

Loading

stduhpf commented Nov 25, 2024 •

edited

Loading

stduhpf commented Nov 26, 2024 •

edited

Loading