Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Issue]: Flux Model load hanging forever out of nowhere #3484

Open
2 tasks done
Olivier-aka-Raiden opened this issue Oct 12, 2024 · 14 comments
Open
2 tasks done

[Issue]: Flux Model load hanging forever out of nowhere #3484

Olivier-aka-Raiden opened this issue Oct 12, 2024 · 14 comments
Labels
platform Platform specific problem question Further information is requested

Comments

@Olivier-aka-Raiden
Copy link

Issue Description

Hi, it's been a month now that I'm stuck with my setup trying to make FLUX.dev work again. For the record, I tried FLUX on my PC early september with the model "Disty0/FLUX.1-dev-qint4_tf-qint8_te" and it was working on my PC which was a big surprise but a good one.
After being away for a few days, I came back and had many updates (windows, Nvidia and SDNext) to do but after doing all updates nothing was working.
There was multiple errors when reinstalling sdNext so I decided to go with a fresh install and upgrading python to 3.11 (which I read was recommended).
I saw that it was installing Torch with CUDA 12.4 and I realised I didn't have this one installed so I did.
And now comes my issue : after starting SDNext, downloading the Flux model I was using before, puting back the settings as they were. The model "loading" is hanging forever, using a lot of CPU and Memory but nothing really happens in the UI nor produce any logs to debug on.
I thought it could be the system memory offload from my GPU so I made sure it is not activated and it didn't change anything.
I tried going back to previous dev version I was using at the time it was working but it didn't change anything either.
So I thought it was maybe Nvidia firmware and installed the previous version : didn't work as well.
Then I started tweak SDNext settings : model, balanced, sequential offload modes.
For sequential I got an error instead of hanging sometimes :
11:20:49-742597 INFO Autodetect model: detect="FLUX" class=FluxPipeline
file="models\Diffusers\models--Disty0--FLUX.1-dev-qint4_tf-qint8_te\snapshots\e40bd0d879eff11b5
9d5b6fca9233accfaed08e0" size=0MB
Downloading shards: 100%|██████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 2002.05it/s]
Diffusers 3.61s/it █████████████ 100% 2/2 00:07 00:00 Loading checkpoint shards
Diffusers 15.58it/s ████████ 100% 7/7 00:00 00:00 Loading pipeline components...
11:21:15-487263 INFO Load network: type=embeddings loaded=0 skipped=0 time=0.00
11:21:15-527261 ERROR Setting model: offload=sequential WeightQBytesTensor.new() missing 6 required positional
image
The only thing that is bothering me is that while it's hanging, CPU and RAM are at max usage but GPU is not used at all... And this is happening before inference even starts.

I didn't see anyone having the same issues so I guess this is a very tricky one but I hope someone will have fresh ideas on things I could try to make it work again.

Version Platform Description

Setup :

  • SDnext branch: dev
  • Python Version: 3.11.9
  • Operating System: Windows 10, version 10.0.22631
  • CPU: 12th Gen Intel(R) Core(TM) i7-12700KF
  • Architecture: AMD64
  • GPU: NVIDIA GeForce RTX 3070 Ti
  • RAM: 32GB
  • CUDA Version: 12.4
  • CUDNN Version: 90100
  • GPU Driver: 565.90
  • Memory Optimization: medvram
  • Installed Torch Version: 2.4.1+cu124
  • Installed Diffusers Version: 0.31.0.dev0
  • Installed Gradio Version: 3.43.2
  • Installed Transformers Version: 4.45.2
  • Installed Accelerate Version: 1.0.0
  • Backend: Diffusers
  • Torch Parameters:
    • Backend: CUDA
    • Device: CUDA
    • Data type: torch.bfloat16
    • Attention Optimization: Scaled-Dot-Product
  • Model Loaded: Diffusers - FLUX.1-dev-qint4_tf-qint8_te

Relevant log output

2024-10-12 10:42:32,564 | sd | INFO | launch | Starting SD.Next
2024-10-12 10:42:32,567 | sd | INFO | installer | Logger: file="C:\Users\kille\Documents\Workspace\automatic\sdnext.log" level=INFO size=96903 mode=append
2024-10-12 10:42:32,568 | sd | INFO | installer | Python: version=3.11.9 platform=Windows bin="C:\Users\kille\Documents\Workspace\automatic\venv\Scripts\python.exe" venv="C:\Users\kille\Documents\Workspace\automatic\venv"
2024-10-12 10:42:32,719 | sd | INFO | installer | Version: app=sd.next updated=2024-10-11 hash=f5253dad branch=dev url=https://github.com/vladmandic/automatic.git/tree/dev ui=dev
2024-10-12 10:42:33,269 | sd | INFO | installer | Repository latest available e7ec07f9783701629ca1411ad82aec87232501b9 2024-09-13T16:51:56Z
2024-10-12 10:42:33,284 | sd | INFO | launch | Platform: arch=AMD64 cpu=Intel64 Family 6 Model 151 Stepping 2, GenuineIntel system=Windows release=Windows-10-10.0.22631-SP0 python=3.11.9
2024-10-12 10:42:33,285 | sd | DEBUG | installer | Setting environment tuning
2024-10-12 10:42:33,286 | sd | DEBUG | installer | Torch allocator: "garbage_collection_threshold:0.65,max_split_size_mb:512"
2024-10-12 10:42:33,294 | sd | DEBUG | installer | Torch overrides: cuda=False rocm=False ipex=False diml=False openvino=False zluda=False
2024-10-12 10:42:33,302 | sd | INFO | installer | CUDA: nVidia toolkit detected
2024-10-12 10:42:33,431 | sd | INFO | installer | Verifying requirements
2024-10-12 10:42:33,438 | sd | INFO | installer | Verifying packages
2024-10-12 10:42:33,473 | sd | DEBUG | installer | Timestamp repository update time: Fri Oct 11 15:53:46 2024
2024-10-12 10:42:33,473 | sd | DEBUG | installer | Timestamp previous setup time: Fri Oct 11 23:29:15 2024
2024-10-12 10:42:33,473 | sd | INFO | installer | Extensions: disabled=[]
2024-10-12 10:42:33,474 | sd | INFO | installer | Extensions: enabled=['Lora', 'sd-extension-chainner', 'sd-extension-system-info', 'sd-webui-agent-scheduler', 'sdnext-modernui', 'stable-diffusion-webui-rembg'] extensions-builtin
2024-10-12 10:42:33,479 | sd | DEBUG | installer | Timestamp latest extensions time: Fri Oct 11 22:48:19 2024
2024-10-12 10:42:33,479 | sd | DEBUG | installer | Timestamp: version:1728654826 setup:1728682155 extension:1728679699
2024-10-12 10:42:33,479 | sd | INFO | launch | Startup: quick launch
2024-10-12 10:42:33,480 | sd | DEBUG | paths | Register paths
2024-10-12 10:42:33,481 | sd | INFO | installer | Extensions: disabled=[]
2024-10-12 10:42:33,481 | sd | INFO | installer | Extensions: enabled=['Lora', 'sd-extension-chainner', 'sd-extension-system-info', 'sd-webui-agent-scheduler', 'sdnext-modernui', 'stable-diffusion-webui-rembg'] extensions-builtin
2024-10-12 10:42:33,483 | sd | INFO | installer | Running in safe mode without user extensions
2024-10-12 10:42:33,487 | sd | DEBUG | installer | Extension preload: {'extensions-builtin': 0.0}
2024-10-12 10:42:33,487 | sd | DEBUG | launch | Starting module: <module 'webui' from 'C:\\Users\\kille\\Documents\\Workspace\\automatic\\webui.py'>
2024-10-12 10:42:33,487 | sd | INFO | launch | Command line args: ['--safe'] safe=True
2024-10-12 10:42:33,488 | sd | DEBUG | launch | Env flags: []
2024-10-12 10:42:43,403 | sd | INFO | loader | System packages: {'torch': '2.4.1+cu124', 'diffusers': '0.31.0.dev0', 'gradio': '3.43.2', 'transformers': '4.45.2', 'accelerate': '1.0.0'}
2024-10-12 10:42:44,254 | sd | DEBUG | shared | Huggingface cache: folder="C:\Users\kille\.cache\huggingface\hub"
2024-10-12 10:42:44,367 | sd | INFO | shared | Device detect: memory=8.0 ptimization=medvram
2024-10-12 10:42:44,369 | sd | DEBUG | shared | Read: file="config.json" json=42 bytes=1948 time=0.000
2024-10-12 10:42:44,369 | sd | INFO | shared | Engine: backend=Backend.DIFFUSERS compute=None device=cuda attention="Scaled-Dot-Product" mode=no_grad
2024-10-12 10:42:44,377 | sd | DEBUG | shared | Read: file="html\reference.json" json=52 bytes=29118 time=0.007
2024-10-12 10:42:44,411 | sd | INFO | devices | Torch parameters: backend=cuda device=cuda config=BF16 dtype=torch.bfloat16 vae=torch.bfloat16 unet=torch.bfloat16 context=no_grad nohalf=False nohalfvae=False upscast=False deterministic=False test-fp16=True test-bf16=True optimization="Scaled-Dot-Product"
2024-10-12 10:42:44,944 | sd | DEBUG | __init__ | ONNX: version=1.19.2 provider=CPUExecutionProvider, available=['AzureExecutionProvider', 'CPUExecutionProvider']
2024-10-12 10:42:45,044 | sd | INFO | shared | Device: device=NVIDIA GeForce RTX 3070 Ti n=1 arch=sm_90 capability=(8, 6) cuda=12.4 cudnn=90100 driver=565.90
2024-10-12 10:42:45,121 | sd | DEBUG | sd_hijack | Importing LDM
2024-10-12 10:42:45,134 | sd | DEBUG | webui | Entering start sequence
2024-10-12 10:42:45,136 | sd | DEBUG | webui | Initializing
2024-10-12 10:42:45,167 | sd | INFO | sd_vae | Available VAEs: path="models\VAE" items=0
2024-10-12 10:42:45,169 | sd | INFO | sd_unet | Available UNets: path="models\UNET" items=0
2024-10-12 10:42:45,170 | sd | INFO | model_te | Available TEs: path="models\Text-encoder" items=0
2024-10-12 10:42:45,171 | sd | INFO | extensions | Disabled extensions: ['sd-extension-chainner', 'sd-webui-agent-scheduler', 'sdnext-modernui', 'stable-diffusion-webui-rembg']
2024-10-12 10:42:45,173 | sd | DEBUG | modelloader | Scanning diffusers cache: folder="models\Diffusers" items=1 time=0.00
2024-10-12 10:42:45,173 | sd | INFO | sd_models | Available Models: path="models\Stable-diffusion" items=1 time=0.00
2024-10-12 10:42:45,243 | sd | INFO | yolo | Available Yolo: path="models\yolo items=5 downloaded=0
2024-10-12 10:42:45,244 | sd | DEBUG | webui | Load extensions
2024-10-12 10:42:45,301 | sd | INFO | networks | Available LoRAs: items=0 folders=2
2024-10-12 10:42:45,304 | sd | INFO | script_loading | Extension: script='extensions-builtin\Lora\scripts\lora_script.py' �[2;36m10:42:45-301751�[0m�[2;36m �[0m�[34mINFO    �[0m Available LoRAs: �[33mitems�[0m=�[1;36m0�[0m �[33mfolders�[0m=�[1;36m2�[0m
2024-10-12 10:42:45,309 | sd | DEBUG | webui | Extensions init time: 0.06 
2024-10-12 10:42:45,330 | sd | DEBUG | shared | Read: file="html/upscalers.json" json=4 bytes=2672 time=0.006
2024-10-12 10:42:45,331 | sd | INFO | modelloader | Available Upscalers: items=29 downloaded=0 user=0 time=0.02 types=['None', 'Lanczos', 'Nearest', 'AuraSR', 'ESRGAN', 'LDSR', 'RealESRGAN', 'SCUNet', 'SD', 'SwinIR']
2024-10-12 10:42:45,768 | sd | INFO | styles | Available Styles: folder="models\styles" items=288 time=0.44
2024-10-12 10:42:45,773 | sd | DEBUG | webui | Creating UI
2024-10-12 10:42:45,773 | sd | DEBUG | theme | UI themes available: type=Standard themes=12
2024-10-12 10:42:45,773 | sd | INFO | theme | UI theme: type=Standard name="black-teal"
2024-10-12 10:42:45,777 | sd | DEBUG | ui_javascript | UI theme: css="C:\Users\kille\Documents\Workspace\automatic\javascript\black-teal.css" base="sdnext.css" user="None"
2024-10-12 10:42:45,779 | sd | DEBUG | ui_txt2img | UI initialize: txt2img
2024-10-12 10:42:45,800 | sd | DEBUG | ui_extra_networks | Networks: page='model' items=52 subfolders=2 tab=txt2img folders=['models\\Stable-diffusion', 'models\\Diffusers', 'models\\Reference'] list=0.01 thumb=0.00 desc=0.00 info=0.00 workers=8 sort=Default
2024-10-12 10:42:45,800 | sd | DEBUG | ui_extra_networks | Networks: page='lora' items=0 subfolders=0 tab=txt2img folders=['models\\Lora', 'models\\LyCORIS'] list=0.00 thumb=0.00 desc=0.00 info=0.00 workers=8 sort=Default
2024-10-12 10:42:45,805 | sd | DEBUG | ui_extra_networks | Networks: page='style' items=288 subfolders=1 tab=txt2img folders=['models\\styles', 'html'] list=0.01 thumb=0.00 desc=0.00 info=0.00 workers=8 sort=Default
2024-10-12 10:42:45,808 | sd | DEBUG | ui_extra_networks | Networks: page='embedding' items=0 subfolders=0 tab=txt2img folders=['models\\embeddings'] list=0.00 thumb=0.00 desc=0.00 info=0.00 workers=8 sort=Default
2024-10-12 10:42:45,808 | sd | DEBUG | ui_extra_networks | Networks: page='vae' items=0 subfolders=0 tab=txt2img folders=['models\\VAE'] list=0.00 thumb=0.00 desc=0.00 info=0.00 workers=8 sort=Default
2024-10-12 10:42:45,809 | sd | DEBUG | ui_extra_networks | Networks: page='history' items=0 subfolders=0 tab=txt2img folders=[] list=0.00 thumb=0.00 desc=0.00 info=0.00 workers=8 sort=Default
2024-10-12 10:42:45,922 | sd | DEBUG | ui_img2img | UI initialize: img2img
2024-10-12 10:42:46,102 | sd | DEBUG | ui_control_helpers | UI initialize: control models=models\control
2024-10-12 10:42:46,578 | sd | DEBUG | shared | Read: file="ui-config.json" json=6 bytes=248 time=0.003
2024-10-12 10:42:46,664 | sd | DEBUG | theme | UI themes available: type=Standard themes=12
2024-10-12 10:42:46,763 | sd | DEBUG | shared | Reading failed: C:\Users\kille\Documents\Workspace\automatic\html\extensions.json [Errno 2] No such file or directory: 'C:\\Users\\kille\\Documents\\Workspace\\automatic\\html\\extensions.json'
2024-10-12 10:42:46,763 | sd | INFO | ui_extensions | Extension list is empty: refresh required
2024-10-12 10:42:47,173 | sd | DEBUG | ui_extensions | Extension list: processed=6 installed=6 enabled=2 disabled=4 visible=6 hidden=0
2024-10-12 10:42:47,251 | sd | DEBUG | webui | Root paths: ['C:\\Users\\kille\\Documents\\Workspace\\automatic']
2024-10-12 10:42:47,307 | sd | INFO | webui | Local URL: http://127.0.0.1:7860/
2024-10-12 10:42:47,309 | sd | DEBUG | webui | Gradio functions: registered=1830
2024-10-12 10:42:47,310 | sd | DEBUG | middleware | FastAPI middleware: ['Middleware', 'Middleware']
2024-10-12 10:42:47,312 | sd | DEBUG | webui | Creating API
2024-10-12 10:42:47,564 | sd | DEBUG | webui | Scripts setup: ['IP Adapters:0.017', 'XYZ Grid:0.018', 'Face:0.01', 'AnimateDiff:0.005', 'CogVideoX:0.005']
2024-10-12 10:42:47,564 | sd | DEBUG | sd_models | Model metadata: file="metadata.json" no changes
2024-10-12 10:42:47,565 | sd | DEBUG | modeldata | Model requested: fn=C:\Users\kille\Documents\Workspace\automatic\webui.py:<lambda>/C:\Program Files\Python311\Lib\threading.py:run
2024-10-12 10:42:47,565 | sd | INFO | sd_models | Load model: select="Diffusers\Disty0/FLUX.1-dev-qint4_tf-qint8_te [e40bd0d879]"
2024-10-12 10:42:47,567 | sd | DEBUG | sd_models | Load model: target="models\Diffusers\models--Disty0--FLUX.1-dev-qint4_tf-qint8_te\snapshots\e40bd0d879eff11b59d5b6fca9233accfaed08e0" existing=False info=None
2024-10-12 10:42:47,567 | sd | DEBUG | sd_models | Load model: path="models\Diffusers\models--Disty0--FLUX.1-dev-qint4_tf-qint8_te\snapshots\e40bd0d879eff11b59d5b6fca9233accfaed08e0"
2024-10-12 10:42:47,567 | sd | INFO | sd_models | Autodetect model: detect="FLUX" class=FluxPipeline file="models\Diffusers\models--Disty0--FLUX.1-dev-qint4_tf-qint8_te\snapshots\e40bd0d879eff11b59d5b6fca9233accfaed08e0" size=0MB
2024-10-12 10:42:47,572 | sd | DEBUG | model_flux | Load model: type=FLUX model="Diffusers\Disty0/FLUX.1-dev-qint4_tf-qint8_te" repo="Disty0/FLUX.1-dev-qint4_tf-qint8_te" unet="None" t5="T5 QINT8" vae="Automatic" quant=qint8 offload=model dtype=torch.bfloat16
2024-10-12 10:42:48,049 | sd | INFO | modelloader | HF login: token="C:\Users\kille\.cache\huggingface\token" Token is valid (permission: fineGrained).
2024-10-12 10:43:04,389 | sd | DEBUG | devices | GC: utilization={'gpu': 14, 'ram': 20, 'threshold': 65} gc={'collected': 138, 'saved': 0.0} before={'gpu': 1.1, 'ram': 6.5} after={'gpu': 1.1, 'ram': 6.5, 'retries': 0, 'oom': 0} device=cuda fn=optimum_quanto_model time=0.17
2024-10-12 10:43:04,621 | sd | INFO | server | MOTD: N/A
2024-10-12 10:43:06,825 | sd | DEBUG | theme | UI themes available: type=Standard themes=12
2024-10-12 10:43:06,983 | sd | INFO | api | Browser session: user=None client=127.0.0.1 agent=Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/129.0.0.0 Safari/537.36
2024-10-12 10:43:14,930 | sd | DEBUG | model_flux | Load model: type=FLUX preloaded=['transformer', 'text_encoder_2']
2024-10-12 10:43:15,736 | sd | DEBUG | sd_models | Load module: type=t5 path="T5 QINT8" module="text_encoder_2"
2024-10-12 10:43:15,737 | sd | INFO | textual_inversion | Load network: type=embeddings loaded=0 skipped=0 time=0.00
2024-10-12 10:43:15,738 | sd | DEBUG | sd_models | Setting model: component=VAE upcast=False
2024-10-12 10:43:15,738 | sd | DEBUG | sd_models | Setting model: component=VAE slicing=True
2024-10-12 10:43:15,738 | sd | DEBUG | sd_models | Setting model: component=VAE tiling=True
2024-10-12 10:43:15,738 | sd | DEBUG | sd_models | Setting model: attention="Scaled-Dot-Product"
2024-10-12 10:43:15,749 | sd | DEBUG | sd_models | Setting model: offload=model
2024-10-12 10:43:16,051 | sd | DEBUG | devices | GC: utilization={'gpu': 14, 'ram': 62, 'threshold': 65} gc={'collected': 611, 'saved': 0.0} before={'gpu': 1.1, 'ram': 19.74} after={'gpu': 1.1, 'ram': 19.74, 'retries': 0, 'oom': 0} device=cuda fn=load_diffuser time=0.16
2024-10-12 10:43:16,054 | sd | INFO | sd_models | Load model: time=28.32 load=28.17 move=0.14 native=1024 memory={'ram': {'used': 19.74, 'total': 31.85}, 'gpu': {'used': 1.1, 'total': 8.0}, 'retries': 0, 'oom': 0}
2024-10-12 10:43:16,058 | sd | DEBUG | script_callbacks | Script callback init time: system-info.py:app_started=0.18
2024-10-12 10:43:16,058 | sd | INFO | webui | Startup time: 42.57 torch=7.63 gradio=2.01 diffusers=0.11 libraries=1.88 extensions=0.06 detailer=0.07 networks=0.44 ui-networks=0.21 ui-txt2img=0.10 ui-img2img=0.06 ui-control=0.09 ui-models=0.29 ui-settings=0.19 ui-extensions=0.43 launch=0.09 api=0.07 app-started=0.18 checkpoint=28.49
2024-10-12 10:43:16,060 | sd | DEBUG | shared | Save: file="config.json" json=42 bytes=1878 time=0.003
2024-10-12 11:04:25,598 | sd | INFO | sd_models | Load model: select="Diffusers\Disty0/FLUX.1-dev-qint4_tf-qint8_te [e40bd0d879]"
2024-10-12 11:04:25,599 | sd | DEBUG | sd_models | Load model: target="models\Diffusers\models--Disty0--FLUX.1-dev-qint4_tf-qint8_te\snapshots\e40bd0d879eff11b59d5b6fca9233accfaed08e0" existing=False info=None

Backend

Diffusers

UI

Standard

Branch

Dev

Model

Other

Acknowledgements

  • I have read the above and searched for existing issues
  • I confirm that this is classified correctly and its not an extension issue
@vladmandic
Copy link
Owner

first in windows disable nvidia usage of shared memory (google for instructions)!
when vram spills into ram, entire thing is so slow that it looks like it hangs.

then, lets look at memory utilization: go to windows task manger:

  1. -> settings -> realtime update speed -> low
  2. -> performance -> gpu
    start sdnext
    attempt to load flux as usual
    do a screenshot of taskmanager window after 1min so i can see the gpu vram utilization growth over time.

@vladmandic vladmandic added the question Further information is requested label Oct 12, 2024
@Olivier-aka-Raiden
Copy link
Author

The flux model not loading :
image
Another model that is loading normally :
image

@vladmandic
Copy link
Owner

sorry to be a pain, but you cropped the screenshot so numbers below the graphs are not visible - need to see dedicated/shared splits.

@Olivier-aka-Raiden
Copy link
Author

image
Sorry for the late reply. It's not used at all, anyway.

@vladmandic vladmandic added help wanted Extra attention is needed and removed question Further information is requested labels Oct 21, 2024
@vladmandic
Copy link
Owner

pls try with the latest update

@Olivier-aka-Raiden
Copy link
Author

Just after upgrading to latest dev version :

2024-10-30 14:06:29,399 | sd | DEBUG | installer | Extensions all: ['Lora', 'sd-extension-chainner', 'sd-extension-system-info', 'sd-webui-agent-scheduler', 'sdnext-modernui', 'stable-diffusion-webui-rembg']
2024-10-30 14:06:29,495 | sd | DEBUG | installer | Extension installer: J:\sdxl\sdnext-flux\extensions-builtin\sd-webui-agent-scheduler\install.py
2024-10-30 14:06:31,304 | sd | DEBUG | installer | Extension installer: J:\sdxl\sdnext-flux\extensions-builtin\stable-diffusion-webui-rembg\install.py
2024-10-30 14:06:37,260 | sd | INFO | installer | Extensions enabled: ['Lora', 'sd-extension-chainner', 'sd-extension-system-info', 'sd-webui-agent-scheduler', 'sdnext-modernui', 'stable-diffusion-webui-rembg']
2024-10-30 14:06:37,261 | sd | INFO | installer | Verifying requirements
2024-10-30 14:06:37,262 | sd | DEBUG | launch | Setup complete without errors: 1730293597
2024-10-30 14:06:37,263 | sd | INFO | installer | Running in safe mode without user extensions
2024-10-30 14:06:37,265 | sd | DEBUG | installer | Extension preload: {'extensions-builtin': 0.0}
2024-10-30 14:06:37,266 | sd | DEBUG | launch | Starting module: <module 'webui' from 'J:\\sdxl\\sdnext-flux\\webui.py'>
2024-10-30 14:06:37,267 | sd | INFO | launch | Command line args: ['--safe', '--debug', '--backend', 'diffusers'] backend=diffusers safe=True debug=True
2024-10-30 14:06:37,268 | sd | DEBUG | launch | Env flags: []
2024-10-30 14:06:41,354 | sd | INFO | loader | System packages: {'torch': '2.5.1+cu124', 'diffusers': '0.32.0.dev0', 'gradio': '3.43.2', 'transformers': '4.46.0', 'accelerate': '1.0.1'}
2024-10-30 14:06:41,805 | sd | DEBUG | shared | Huggingface cache: folder="C:\Users\kille\.cache\huggingface\hub"
2024-10-30 14:06:41,918 | sd | INFO | shared | Device detect: memory=8.0 optimization=medvram
2024-10-30 14:06:41,920 | sd | DEBUG | shared | Read: file="config.json" json=36 bytes=1716 time=0.000
2024-10-30 14:06:41,921 | sd | INFO | shared | Engine: backend=Backend.DIFFUSERS compute=cuda device=cuda attention="Scaled-Dot-Product" mode=no_grad
2024-10-30 14:06:41,922 | sd | DEBUG | shared | Read: file="html\reference.json" json=59 bytes=31585 time=0.000
2024-10-30 14:06:41,958 | sd | INFO | devices | Torch parameters: backend=cuda device=cuda config=BF16 dtype=torch.bfloat16 vae=torch.bfloat16 unet=torch.bfloat16 context=no_grad nohalf=False nohalfvae=False upscast=False deterministic=False test-fp16=True test-bf16=True optimization="Scaled-Dot-Product"
2024-10-30 14:06:42,184 | sd | DEBUG | __init__ | ONNX: version=1.19.2 provider=CUDAExecutionProvider, available=['AzureExecutionProvider', 'CPUExecutionProvider']
2024-10-30 14:06:42,259 | sd | DEBUG | __init__ | ONNX failed to initialize XL pipelines: module 'optimum.onnxruntime.modeling_diffusion' has no attribute 'ORTPipelinePart'
2024-10-30 14:06:42,308 | sd | INFO | shared | Device: device=NVIDIA GeForce RTX 3070 Ti n=1 arch=sm_90 capability=(8, 6) cuda=12.4 cudnn=90100 driver=565.90
2024-10-30 14:06:42,369 | sd | DEBUG | sd_hijack | Importing LDM
2024-10-30 14:06:42,377 | sd | DEBUG | webui | Entering start sequence
2024-10-30 14:06:42,380 | sd | DEBUG | webui | Initializing
2024-10-30 14:06:42,403 | sd | INFO | sd_vae | Available VAEs: path="models\VAE" items=0
2024-10-30 14:06:42,404 | sd | INFO | sd_unet | Available UNets: path="models\UNET" items=0
2024-10-30 14:06:42,406 | sd | INFO | model_te | Available TEs: path="models\Text-encoder" items=0
2024-10-30 14:06:42,407 | sd | INFO | extensions | Disabled extensions: ['sd-extension-chainner', 'sd-webui-agent-scheduler', 'sdnext-modernui', 'stable-diffusion-webui-rembg']
2024-10-30 14:06:42,408 | sd | DEBUG | modelloader | Scanning diffusers cache: folder="models\Diffusers" items=2 time=0.00
2024-10-30 14:06:42,409 | sd | INFO | sd_checkpoint | Available Models: path="models\Stable-diffusion" items=2 time=0.00
2024-10-30 14:06:42,458 | sd | INFO | yolo | Available Yolo: path="models\yolo" items=6 downloaded=0
2024-10-30 14:06:42,459 | sd | DEBUG | webui | Load extensions
2024-10-30 14:06:42,579 | sd | INFO | networks | Available LoRAs: items=0 folders=2
2024-10-30 14:06:42,697 | sd | INFO | script_loading | Extension: script='extensions-builtin\Lora\scripts\lora_script.py' �[2;36m14:06:42-579889�[0m�[2;36m �[0m�[34mINFO    �[0m Available LoRAs: �[33mitems�[0m=�[1;36m0�[0m �[33mfolders�[0m=�[1;36m2�[0m
2024-10-30 14:06:42,702 | sd | DEBUG | webui | Extensions init time: 0.24 k_diff.py=0.08 Lora=0.12
2024-10-30 14:06:42,716 | sd | DEBUG | shared | Read: file="html/upscalers.json" json=4 bytes=2672 time=0.001
2024-10-30 14:06:42,717 | sd | INFO | modelloader | Available Upscalers: items=29 downloaded=0 user=0 time=0.01 types=['None', 'Lanczos', 'Nearest', 'AuraSR', 'ESRGAN', 'LDSR', 'RealESRGAN', 'SCUNet', 'SD', 'SwinIR']
2024-10-30 14:06:42,727 | sd | INFO | styles | Available Styles: folder="models\styles" items=288 time=0.01
2024-10-30 14:06:42,731 | sd | DEBUG | webui | Creating UI
2024-10-30 14:06:42,732 | sd | DEBUG | theme | UI themes available: type=Standard themes=12
2024-10-30 14:06:42,733 | sd | INFO | theme | UI theme: type=Standard name="black-teal"
2024-10-30 14:06:42,736 | sd | DEBUG | ui_javascript | UI theme: css="J:\sdxl\sdnext-flux\javascript\black-teal.css" base="sdnext.css" user="None"
2024-10-30 14:06:42,738 | sd | DEBUG | ui_txt2img | UI initialize: txt2img
2024-10-30 14:06:42,761 | sd | DEBUG | ui_extra_networks | Networks: page='model' items=60 subfolders=2 tab=txt2img folders=['models\\Stable-diffusion', 'models\\Diffusers', 'models\\Reference'] list=0.01 thumb=0.00 desc=0.00 info=0.00 workers=8 sort=Default
2024-10-30 14:06:42,762 | sd | DEBUG | ui_extra_networks | Networks: page='lora' items=0 subfolders=0 tab=txt2img folders=['models\\Lora', 'models\\LyCORIS'] list=0.00 thumb=0.00 desc=0.00 info=0.00 workers=8 sort=Default
2024-10-30 14:06:42,768 | sd | DEBUG | ui_extra_networks | Networks: page='style' items=288 subfolders=1 tab=txt2img folders=['models\\styles', 'html'] list=0.01 thumb=0.00 desc=0.00 info=0.00 workers=8 sort=Default
2024-10-30 14:06:42,771 | sd | DEBUG | ui_extra_networks | Networks: page='embedding' items=0 subfolders=0 tab=txt2img folders=['models\\embeddings'] list=0.00 thumb=0.00 desc=0.00 info=0.00 workers=8 sort=Default
2024-10-30 14:06:42,773 | sd | DEBUG | ui_extra_networks | Networks: page='vae' items=0 subfolders=0 tab=txt2img folders=['models\\VAE'] list=0.00 thumb=0.00 desc=0.00 info=0.00 workers=8 sort=Default
2024-10-30 14:06:42,774 | sd | DEBUG | ui_extra_networks | Networks: page='history' items=0 subfolders=0 tab=txt2img folders=[] list=0.00 thumb=0.00 desc=0.00 info=0.00 workers=8 sort=Default
2024-10-30 14:06:42,890 | sd | DEBUG | ui_img2img | UI initialize: img2img
2024-10-30 14:06:43,021 | sd | DEBUG | ui_control_helpers | UI initialize: control models=models\control
2024-10-30 14:06:43,361 | sd | DEBUG | shared | Read: file="ui-config.json" json=4 bytes=130 time=0.000
2024-10-30 14:06:43,446 | sd | DEBUG | theme | UI themes available: type=Standard themes=12
2024-10-30 14:06:43,526 | sd | DEBUG | shared | Reading failed: J:\sdxl\sdnext-flux\html\extensions.json [Errno 2] No such file or directory: 'J:\\sdxl\\sdnext-flux\\html\\extensions.json'
2024-10-30 14:06:43,527 | sd | INFO | ui_extensions | Extension list is empty: refresh required
2024-10-30 14:06:43,926 | sd | DEBUG | ui_extensions | Extension list: processed=6 installed=6 enabled=2 disabled=4 visible=6 hidden=0
2024-10-30 14:06:44,010 | sd | DEBUG | webui | Root paths: ['J:\\sdxl\\sdnext-flux']
2024-10-30 14:06:44,063 | sd | INFO | webui | Local URL: http://127.0.0.1:7860/
2024-10-30 14:06:44,063 | sd | DEBUG | webui | Gradio functions: registered=1871
2024-10-30 14:06:44,066 | sd | DEBUG | middleware | FastAPI middleware: ['Middleware', 'Middleware']
2024-10-30 14:06:44,068 | sd | DEBUG | webui | Creating API
2024-10-30 14:06:44,325 | sd | DEBUG | webui | Scripts setup: ['IP Adapters:0.019', 'XYZ Grid:0.017', 'Face:0.009', 'LUT Color grading:0.006']
2024-10-30 14:06:44,326 | sd | DEBUG | sd_checkpoint | Model metadata: file="metadata.json" no changes
2024-10-30 14:06:44,328 | sd | DEBUG | modeldata | Model requested: fn=run:<lambda>
2024-10-30 14:06:44,329 | sd | INFO | sd_checkpoint | Load model: select="Diffusers\Disty0/FLUX.1-dev-qint4_tf-qint8_te [e40bd0d879]"
2024-10-30 14:06:44,330 | sd | INFO | sd_detect | Autodetect model: detect="FLUX" class=FluxPipeline file="models\Diffusers\models--Disty0--FLUX.1-dev-qint4_tf-qint8_te\snapshots\e40bd0d879eff11b59d5b6fca9233accfaed08e0" size=0MB
2024-10-30 14:06:44,332 | sd | DEBUG | model_flux | Load model: type=FLUX model="Diffusers\Disty0/FLUX.1-dev-qint4_tf-qint8_te" repo="Disty0/FLUX.1-dev-qint4_tf-qint8_te" unet="None" te="None" vae="Automatic" quant=qint8 offload=model dtype=torch.bfloat16
2024-10-30 14:06:44,725 | sd | INFO | modelloader | HF login: token="C:\Users\kille\.cache\huggingface\token" 
2024-10-30 14:06:44,727 | sd | DEBUG | model_quant | Quantization: type=quanto fn=load_flux:load_flux_quanto
2024-10-30 14:06:45,429 | sd | ERROR | model_flux | Load model: type=FLUX failed to load Quanto transformer: Ninja is required to load C++ extensions
2024-10-30 14:06:45,430 | sd | ERROR | errors | FLUX Quanto:: RuntimeError
2024-10-30 14:06:46,993 | sd | DEBUG | model_flux | Load model: type=FLUX preloaded=['transformer', 'text_encoder_2']
2024-10-30 14:06:47,863 | sd | INFO | textual_inversion | Load network: type=embeddings loaded=0 skipped=0 time=0.00
2024-10-30 14:06:47,864 | sd | DEBUG | sd_models | Setting model: component=VAE upcast=False
2024-10-30 14:06:47,865 | sd | DEBUG | sd_models | Setting model: component=VAE slicing=True
2024-10-30 14:06:47,866 | sd | DEBUG | sd_models | Setting model: component=VAE tiling=True
2024-10-30 14:06:47,866 | sd | DEBUG | sd_models | Setting model: attention="Scaled-Dot-Product"
2024-10-30 14:06:47,878 | sd | DEBUG | sd_models | Setting model: offload=model
2024-10-30 14:06:48,081 | sd | DEBUG | devices | GC: utilization={'gpu': 14, 'ram': 5, 'threshold': 80} gc={'collected': 611, 'saved': 0.0} before={'gpu': 1.1, 'ram': 1.58} after={'gpu': 1.1, 'ram': 1.58, 'retries': 0, 'oom': 0} device=cuda fn=reload_model_weights:load_diffuser time=0.16
2024-10-30 14:06:48,086 | sd | INFO | sd_models | Load model: time=3.59 load=3.53 native=1024 memory={'ram': {'used': 1.58, 'total': 31.85}, 'gpu': {'used': 1.1, 'total': 8.0}, 'retries': 0, 'oom': 0}
2024-10-30 14:06:48,088 | sd | INFO | webui | Startup time: 10.82 torch=3.10 gradio=0.79 diffusers=0.08 libraries=1.11 extensions=0.24 ui-networks=0.17 ui-txt2img=0.10 ui-img2img=0.06 ui-control=0.22 ui-settings=0.17 ui-extensions=0.42 launch=0.09 api=0.22 checkpoint=3.76

@vladmandic
Copy link
Owner

well, its not hanging anymore - but you do have issues - and they are specific to qint quantized model, not general:

2024-10-30 14:06:45,429 | sd | ERROR | model_flux | Load model: type=FLUX failed to load Quanto transformer: Ninja is required to load C++ extensions
2024-10-30 14:06:45,430 | sd | ERROR | errors | FLUX Quanto:: RuntimeError

can you check this?

venv\scripts\activate
pip show ninja
pip install ninja

and rerun model load?

@vladmandic vladmandic added question Further information is requested platform Platform specific problem and removed help wanted Extra attention is needed labels Oct 30, 2024
@Olivier-aka-Raiden
Copy link
Author

Erf... new error :

2024-10-30 16:13:43,141 | sd | DEBUG | model_flux | Load model: type=FLUX model="Diffusers\Disty0/FLUX.1-dev-qint4_tf-qint8_te" repo="Disty0/FLUX.1-dev-qint4_tf-qint8_te" unet="None" te="None" vae="None" quant=qint8 offload=model dtype=torch.bfloat16
2024-10-30 16:13:43,527 | sd | INFO | modelloader | HF login: token="C:\Users\kille\.cache\huggingface\token" 
2024-10-30 16:13:43,528 | sd | DEBUG | model_quant | Quantization: type=quanto fn=load_flux:load_flux_quanto
2024-10-30 16:13:45,015 | sd | ERROR | model_flux | Load model: type=FLUX failed to load Quanto transformer: Command '['where', 'cl']' returned non-zero exit status 1.
2024-10-30 16:13:45,017 | sd | ERROR | errors | FLUX Quanto:: CalledProcessError
2024-10-30 16:13:48,173 | sd | DEBUG | model_flux | Load model: type=FLUX preloaded=['transformer', 'text_encoder_2']
2024-10-30 16:13:49,233 | sd | INFO | textual_inversion | Load network: type=embeddings loaded=0 skipped=0 time=0.00
2024-10-30 16:13:49,234 | sd | DEBUG | sd_models | Setting model: component=VAE upcast=False
2024-10-30 16:13:49,235 | sd | DEBUG | sd_models | Setting model: component=VAE slicing=True
2024-10-30 16:13:49,236 | sd | DEBUG | sd_models | Setting model: component=VAE tiling=True
2024-10-30 16:13:49,237 | sd | DEBUG | sd_models | Setting model: attention="Scaled-Dot-Product"
2024-10-30 16:13:49,249 | sd | DEBUG | sd_models | Setting model: offload=model
2024-10-30 16:13:49,461 | sd | DEBUG | devices | GC: utilization={'gpu': 14, 'ram': 5, 'threshold': 80} gc={'collected': 666, 'saved': 0.0} before={'gpu': 1.1, 'ram': 1.57} after={'gpu': 1.1, 'ram': 1.57, 'retries': 0, 'oom': 0} device=cuda fn=reload_model_weights:load_diffuser time=0.17
2024-10-30 16:13:49,466 | sd | INFO | sd_models | Load model: time=6.15 load=6.09 native=1024 memory={'ram': {'used': 1.57, 'total': 31.85}, 'gpu': {'used': 1.1, 'total': 8.0}, 'retries': 0, 'oom': 0}
2024-10-30 16:13:49,468 | sd | INFO | webui | Startup time: 14.67 torch=4.06 gradio=0.85 diffusers=0.09 libraries=1.21 extensions=0.27 detailer=0.05 ui-networks=0.18 ui-txt2img=0.11 ui-img2img=0.06 ui-control=0.24 ui-settings=0.18 ui-extensions=0.44 launch=0.09 api=0.23 checkpoint=6.34
2024-10-30 16:13:49,470 | sd | DEBUG | shared | Save: file="config.json" json=37 bytes=1675 time=0.002
2024-10-30 16:14:00,477 | sd | DEBUG | launch | Server: alive=True jobs=1 requests=1 uptime=20 memory=1.57/31.85 backend=Backend.DIFFUSERS state=idle
2024-10-30 16:14:45,892 | sd | INFO | server | MOTD: N/A
2024-10-30 16:14:47,823 | sd | DEBUG | theme | UI themes available: type=Standard themes=12
2024-10-30 16:14:48,040 | sd | INFO | api | Browser session: user=None client=127.0.0.1 agent=Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/130.0.0.0 Safari/537.36
2024-10-30 16:14:53,007 | sd | INFO | sd_checkpoint | Load model: select="Diffusers\Disty0/FLUX.1-dev-qint4 [82811df42b]"
2024-10-30 16:14:53,133 | sd | DEBUG | sd_models | Model move: device=meta class=FluxPipeline accelerate=False fn=reload_model_weights:unload_model_weights time=0.12
2024-10-30 16:14:53,310 | sd | DEBUG | devices | GC: utilization={'gpu': 14, 'ram': 4, 'threshold': 80} gc={'collected': 591, 'saved': 0.0} before={'gpu': 1.1, 'ram': 1.33} after={'gpu': 1.1, 'ram': 1.33, 'retries': 0, 'oom': 0} device=cuda fn=reload_model_weights:unload_model_weights time=0.17
2024-10-30 16:14:53,313 | sd | DEBUG | sd_models | Unload weights model: {'ram': {'used': 1.33, 'total': 31.85}, 'gpu': {'used': 1.1, 'total': 8.0}, 'retries': 0, 'oom': 0}
2024-10-30 16:14:53,329 | sd | INFO | sd_detect | Autodetect model: detect="FLUX" class=FluxPipeline file="models\Diffusers\models--Disty0--FLUX.1-dev-qint4\snapshots\82811df42b556a1153b971d8375d5170c306a6eb" size=0MB
2024-10-30 16:14:53,331 | sd | DEBUG | model_flux | Load model: type=FLUX model="Diffusers\Disty0/FLUX.1-dev-qint4" repo="Disty0/FLUX.1-dev-qint4" unet="None" te="None" vae="None" quant=qint4 offload=model dtype=torch.bfloat16
2024-10-30 16:14:53,690 | sd | ERROR | model_flux | Load model: type=FLUX failed to load Quanto transformer: DLL load failed while importing quanto_cpp: Le module spécifié est introuvable.
2024-10-30 16:14:53,691 | sd | ERROR | errors | FLUX Quanto:: ImportError
2024-10-30 16:14:54,464 | sd | ERROR | model_flux | Load model: type=FLUX failed to load Quanto text encoder: DLL load failed while importing quanto_cpp: Le module spécifié est introuvable.
2024-10-30 16:14:54,466 | sd | ERROR | errors | FLUX Quanto:: ImportError
2024-10-30 16:14:55,312 | sd | DEBUG | model_flux | Load model: type=FLUX preloaded=['transformer', 'text_encoder_2']
2024-10-30 16:14:56,042 | sd | INFO | textual_inversion | Load network: type=embeddings loaded=0 skipped=0 time=0.00
2024-10-30 16:14:56,044 | sd | DEBUG | sd_models | Setting model: component=VAE upcast=False
2024-10-30 16:14:56,044 | sd | DEBUG | sd_models | Setting model: component=VAE slicing=True
2024-10-30 16:14:56,045 | sd | DEBUG | sd_models | Setting model: component=VAE tiling=True
2024-10-30 16:14:56,046 | sd | DEBUG | sd_models | Setting model: attention="Scaled-Dot-Product"
2024-10-30 16:14:56,058 | sd | DEBUG | sd_models | Setting model: offload=model
2024-10-30 16:14:56,252 | sd | DEBUG | devices | GC: utilization={'gpu': 14, 'ram': 4, 'threshold': 80} gc={'collected': 148, 'saved': 0.0} before={'gpu': 1.1, 'ram': 1.34} after={'gpu': 1.1, 'ram': 1.34, 'retries': 0, 'oom': 0} device=cuda fn=reload_model_weights:load_diffuser time=0.17
2024-10-30 16:14:56,257 | sd | INFO | sd_models | Load model: time=2.75 load=2.71 native=1024 memory={'ram': {'used': 1.34, 'total': 31.85}, 'gpu': {'used': 1.1, 'total': 8.0}, 'retries': 0, 'oom': 0}
2024-10-30 16:14:56,260 | sd | DEBUG | ui | Setting changed: sd_model_checkpoint=Diffusers\Disty0/FLUX.1-dev-qint4 [82811df42b] progress=True
2024-10-30 16:14:56,261 | sd | DEBUG | shared | Save: file="config.json" json=37 bytes=1663 time=0.002
2024-10-30 16:14:57,362 | sd | INFO | sd_checkpoint | Load model: select="Diffusers\Disty0/FLUX.1-dev-qint4_tf-qint8_te [e40bd0d879]"
2024-10-30 16:14:57,559 | sd | DEBUG | devices | GC: utilization={'gpu': 14, 'ram': 4, 'threshold': 80} gc={'collected': 467, 'saved': 0.0} before={'gpu': 1.1, 'ram': 1.34} after={'gpu': 1.1, 'ram': 1.34, 'retries': 0, 'oom': 0} device=cuda fn=reload_model_weights:unload_model_weights time=0.17
2024-10-30 16:14:57,561 | sd | DEBUG | sd_models | Unload weights model: {'ram': {'used': 1.34, 'total': 31.85}, 'gpu': {'used': 1.1, 'total': 8.0}, 'retries': 0, 'oom': 0}
2024-10-30 16:14:57,576 | sd | INFO | sd_detect | Autodetect model: detect="FLUX" class=FluxPipeline file="models\Diffusers\models--Disty0--FLUX.1-dev-qint4_tf-qint8_te\snapshots\e40bd0d879eff11b59d5b6fca9233accfaed08e0" size=0MB
2024-10-30 16:14:57,577 | sd | DEBUG | model_flux | Load model: type=FLUX model="Diffusers\Disty0/FLUX.1-dev-qint4_tf-qint8_te" repo="Disty0/FLUX.1-dev-qint4_tf-qint8_te" unet="None" te="None" vae="None" quant=qint8 offload=model dtype=torch.bfloat16
2024-10-30 16:14:57,917 | sd | ERROR | model_flux | Load model: type=FLUX failed to load Quanto transformer: DLL load failed while importing quanto_cpp: Le module spécifié est introuvable.
2024-10-30 16:14:57,918 | sd | ERROR | errors | FLUX Quanto:: ImportError
2024-10-30 16:14:59,227 | sd | DEBUG | model_flux | Load model: type=FLUX preloaded=['transformer', 'text_encoder_2']
2024-10-30 16:14:59,904 | sd | INFO | textual_inversion | Load network: type=embeddings loaded=0 skipped=0 time=0.00
2024-10-30 16:14:59,905 | sd | DEBUG | sd_models | Setting model: component=VAE upcast=False
2024-10-30 16:14:59,906 | sd | DEBUG | sd_models | Setting model: component=VAE slicing=True
2024-10-30 16:14:59,907 | sd | DEBUG | sd_models | Setting model: component=VAE tiling=True
2024-10-30 16:14:59,908 | sd | DEBUG | sd_models | Setting model: attention="Scaled-Dot-Product"
2024-10-30 16:14:59,920 | sd | DEBUG | sd_models | Setting model: offload=model
2024-10-30 16:15:00,136 | sd | DEBUG | devices | GC: utilization={'gpu': 14, 'ram': 5, 'threshold': 80} gc={'collected': 666, 'saved': 0.0} before={'gpu': 1.1, 'ram': 1.59} after={'gpu': 1.1, 'ram': 1.59, 'retries': 0, 'oom': 0} device=cuda fn=reload_model_weights:load_diffuser time=0.17
2024-10-30 16:15:00,142 | sd | INFO | sd_models | Load model: time=2.39 load=2.33 native=1024 memory={'ram': {'used': 1.59, 'total': 31.85}, 'gpu': {'used': 1.1, 'total': 8.0}, 'retries': 0, 'oom': 0}
2024-10-30 16:15:00,144 | sd | DEBUG | ui | Setting changed: sd_model_checkpoint=Diffusers\Disty0/FLUX.1-dev-qint4_tf-qint8_te [e40bd0d879] progress=True
2024-10-30 16:15:00,146 | sd | DEBUG | shared | Save: file="config.json" json=37 bytes=1675 time=0.002

@vladmandic
Copy link
Owner

optimum.quanto required to process qint quants is broken on your system, dont know why.
try to reinstall it:

venv\scripts\activate
pip uninstall
pip install optimum-quanto ninja

@Olivier-aka-Raiden
Copy link
Author

Actually I tried with nf4 flux model so it might be an issue with Disty0/FLUX.1-dev-qint4_tf-qint8_te. It was working before Flux was completely supported in SDNext and now, maybe I have to go with the models you suggest in embedded list of models.

@vladmandic
Copy link
Owner

vladmandic commented Oct 30, 2024

float-quantized models use bitsandbytes
int-quantized models use optimum.quanto

so if nf4 is working, great. but fyi - thats using different quantization engine - nothing wrong with that, just noting.
i don't think this is model issue - this really looks like optimum.quanto package on your system is corrupt/non-working somehow.

@duchenean
Copy link

Hi, I’m experiencing the same issue as OP on Windows 11 on the master branch.

The Flux models are stuck loading indefinitely. I tested the same model, "Disty0/FLUX.1-dev-qint4_tf-qint8_te," along with others from the recommended models (as the Wiki suggests), but without success.

I’ve also tried various suggestions mentioned here, like disabling shared memory, but the problem persists.
Even a fresh install using pip install --no-cache-dir --force-reinstall ... didn’t resolve it.

I’ll gather logs and additional information to share later this week and try the dev branch.

@Olivier-aka-Raiden
Copy link
Author

@vladmandic I agree that something is wrong with a dependency but I don't think it is optimum.quanto directly. I think there is a combo of diffusers/torch/quanto on windows that currently breaks the workflow somewhere. I can't figure out what it is sadly... But definitely it was working before and something happened with GPU memory allocation as I don't see the GPU being used whenever this issue arise (despite having CPU and system memory heavily used while the model is loading indefinitely)

@vladmandic
Copy link
Owner

2024-10-30 16:14:57,917 | sd | ERROR | model_flux | Load model: type=FLUX failed to load Quanto transformer: DLL load failed while importing quanto_cpp: Le module spécifié est introuvable.

this is optimum.quanto error. yes, it may be linked to the fact that you're using newer torch than before, but its optimum.quanto non-the-less.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
platform Platform specific problem question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants