[Issue]: DeepCache with cpu offload generates distorted images after first batch #2888

ledrose · 2024-02-19T15:49:42Z

Issue Description

Found a problem while testing DeepCache on dev branch. If you enable model cpu offload first image will be generated as expected (3x speed up, expected quality), but all next batches will be distorted and have 9x speed up. It does not happen with sequential cpu offload and with no offload. The problem exist with all tested pipelines. I was able to figure out that this is problem in DeepCache and created issue on it's github (#horseee/DeepCache#23) with recreation of it on kaggle.
There is also a possible temporary fix for this problem. If you move enabling and disabling deepcache_worker from model compilation to pipeline execution (enable before and disable after) you can fix this issue.

Example of first 2 generated batches:

Version Platform Description

Python 3.10.13 on Linux
Version: app=sd.next updated=2024-02-18 hash=be81d486 url=https://github.com/vladmandic/automatic.git/tree/dev
Latest published version: 9c12b74 2024-02-18T22:40:13Z
Platform: arch=x86_64 cpu= system=Linux release=6.6.17-1-lts python=3.10.13
AMD ROCm toolkit detected
ROCm agents detected: ['gfx1032', 'gfx90c']
ROCm agent used by default: idx=0 gpu=gfx1032 arch=navi2x
ROCm version detected: 6.0
Load packages: {'torch': '2.3.0.dev20240218+rocm6.0', 'diffusers': '0.26.3', 'gradio': '3.43.2'}
Backend.DIFFUSERS compute=rocm device=cuda attention="Scaled-Dot-Product" mode=no_grad
Device: device=AMD Radeon RX 6600M n=1 hip=6.0.32830-d62f6a171

Relevant log output

22:42:23-699737 DEBUG    Load model: existing=False target=models/Diffusers/models--stabilityai--stable-diffusion-xl-base-1.0/snapshots/462165984030d82259a11f4367a4eed129e94a7b info=None                                                  
22:42:24-057115 DEBUG    Desired Torch parameters: dtype=FP16 no-half=False no-half-vae=False upscast=False                                                                                                                                 
22:42:24-058276 INFO     Setting Torch parameters: device=cuda dtype=torch.float16 vae=torch.float16 unet=torch.float16 context=no_grad fp16=True bf16=None                                                                                 
22:42:24-059113 INFO     Loading VAE: model=models/VAE/sdxl_vae.safetensors source=settings                                                                                                                                                 
22:42:24-059804 DEBUG    Diffusers VAE load config: {'low_cpu_mem_usage': False, 'torch_dtype': torch.float16, 'use_safetensors': True, 'variant': 'fp16'}                                                                                  
22:42:24-060589 INFO     Autodetect: vae="Stable Diffusion XL" class=StableDiffusionXLPipeline file="models/Diffusers/models--stabilityai--stable-diffusion-xl-base-1.0/snapshots/462165984030d82259a11f4367a4eed129e94a7b" size=0MB        
22:42:24-466941 DEBUG    Diffusers loading: path="models/Diffusers/models--stabilityai--stable-diffusion-xl-base-1.0/snapshots/462165984030d82259a11f4367a4eed129e94a7b"                                                                    
22:42:24-468026 INFO     Autodetect: model="Stable Diffusion XL" class=StableDiffusionXLPipeline file="models/Diffusers/models--stabilityai--stable-diffusion-xl-base-1.0/snapshots/462165984030d82259a11f4367a4eed129e94a7b" size=0MB      
Loading pipeline components...: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 7/7 [00:01<00:00,  4.23it/s]
22:42:26-127056 DEBUG    Setting model VAE: name=sdxl_vae.safetensors                                                                                                                                                                       
22:42:26-127994 DEBUG    Setting model: enable model CPU offload                                                                                                                                                                            
22:42:26-152328 INFO     Model compile: pipeline=StableDiffusionXLPipeline mode=default backend=deep-cache fullgraph=True compile=['Model']                                                                                                 
22:42:26-156748 INFO     Model compile: task=DeepCache config={'cache_interval': 3, 'cache_layer_id': 0, 'cache_block_id': 0, 'skip_mode': 'uniform'} time=0.00                                                                             
22:42:26-182472 INFO     Load embeddings: loaded=0 skipped=11 time=0.02                                                                                                                                                                     
22:42:26-474572 DEBUG    GC: collected=204 device=cuda {'ram': {'used': 1.45, 'total': 15.01}, 'gpu': {'used': 0.17, 'total': 7.98}, 'retries': 0, 'oom': 0} time=0.29                                                                      
22:42:26-483255 INFO     Load model: time=2.48 load=2.48 native=1024 {'ram': {'used': 1.45, 'total': 15.01}, 'gpu': {'used': 0.17, 'total': 7.98}, 'retries': 0, 'oom': 0}                                                                  
22:42:26-484947 DEBUG    Save: file="config.json" json=28 bytes=1090 time=0.000                                                                                                                                                             
22:42:26-485766 DEBUG    Unused settings: ['onnx_show_menu']                                                                                                                                                                                
22:42:26-486554 DEBUG    Script callback init time: image_browser.py:ui_tabs=1.44 system-info.py:app_started=0.07 task_scheduler.py:app_started=0.15                                                                                        
22:42:26-487325 INFO     Startup time: 16.24 torch=6.55 olive=0.41 gradio=1.16 libraries=1.16 extensions=0.30 ui-en=0.24 ui-txt2img=0.07 ui-img2img=0.09 ui-control=0.13 ui-settings=0.72 ui-extensions=1.60 ui-defaults=0.07 launch=0.43   
                         api=0.08 app-started=0.23 checkpoint=2.79                                                                                                                                                                          
22:42:37-380619 DEBUG    Diffuser pipeline: StableDiffusionXLPipeline task=DiffusersTaskType.TEXT_2_IMAGE set={'prompt_embeds': torch.Size([1, 77, 2048]), 'pooled_prompt_embeds': torch.Size([1, 1280]), 'negative_prompt_embeds':         
                         torch.Size([1, 77, 2048]), 'negative_pooled_prompt_embeds': torch.Size([1, 1280]), 'guidance_scale': 8, 'generator': device(type='cuda'), 'output_type': 'latent', 'num_inference_steps': 30, 'eta': 1.0,          
                         'guidance_rescale': 0.7, 'denoising_end': None, 'width': 1024, 'height': 1024, 'parser': 'Full parser'}                                                                                                            
22:42:37-457995 DEBUG    Sampler: sampler="Euler a" config={'num_train_timesteps': 1000, 'beta_start': 0.00085, 'beta_end': 0.012, 'beta_schedule': 'scaled_linear', 'prediction_type': 'epsilon', 'rescale_betas_zero_snr': False}         
Progress  4.43s/it ████▌                               13% 4/30 00:22 01:55 Base22:43:00-105193 DEBUG    VAE load: type=approximate model=models/VAE-approx/model.pt                                                                                                                                                        
Progress  1.27s/it █████████████████████████████████ 100% 30/30 00:38 00:00 Base
22:43:20-765553 INFO     Saving: image="outputs/text/10469-stable-diffusion-xl-base-1.0-beautiful forest.jpg" type=JPEG resolution=1024x1024 size=0                                                                                         
22:43:21-522970 DEBUG    Diffuser pipeline: StableDiffusionXLPipeline task=DiffusersTaskType.TEXT_2_IMAGE set={'prompt_embeds': torch.Size([1, 77, 2048]), 'pooled_prompt_embeds': torch.Size([1, 1280]), 'negative_prompt_embeds':         
                         torch.Size([1, 77, 2048]), 'negative_pooled_prompt_embeds': torch.Size([1, 1280]), 'guidance_scale': 8, 'generator': device(type='cuda'), 'output_type': 'latent', 'num_inference_steps': 30, 'eta': 1.0,          
                         'guidance_rescale': 0.7, 'denoising_end': None, 'width': 1024, 'height': 1024, 'parser': 'Full parser'}                                                                                                            
22:43:21-568679 DEBUG    Sampler: sampler="Euler a" config={'num_train_timesteps': 1000, 'beta_start': 0.00085, 'beta_end': 0.012, 'beta_schedule': 'scaled_linear', 'prediction_type': 'epsilon', 'rescale_betas_zero_snr': False}         
Progress 14.21it/s █████████████████████████████████ 100% 30/30 00:02 00:00 Base
22:43:24-698841 INFO     High memory utilization: GPU=86% RAM=50% {'ram': {'used': 7.43, 'total': 15.01}, 'gpu': {'used': 6.83, 'total': 7.98}, 'retries': 0, 'oom': 0}                                                                     
22:43:25-360550 DEBUG    GC: collected=163 device=cuda {'ram': {'used': 7.46, 'total': 15.01}, 'gpu': {'used': 1.18, 'total': 7.98}, 'retries': 0, 'oom': 0} time=0.66                                                                      
22:43:27-347425 INFO     Saving: image="outputs/text/10470-stable-diffusion-xl-base-1.0-beautiful forest.jpg" type=JPEG resolution=1024x1024 size=0                                                                                         
22:43:27-356450 INFO     High memory utilization: GPU=100% RAM=50% {'ram': {'used': 7.45, 'total': 15.01}, 'gpu': {'used': 7.95, 'total': 7.98}, 'retries': 0, 'oom': 0}                                                                    
22:43:27-670999 DEBUG    GC: collected=124 device=cuda {'ram': {'used': 7.45, 'total': 15.01}, 'gpu': {'used': 1.18, 'total': 7.98}, 'retries': 0, 'oom': 0} time=0.31                                                                      
22:43:27-672861 INFO     Processed: images=2 time=55.28 its=1.09 memory={'ram': {'used': 7.45, 'total': 15.01}, 'gpu': {'used': 1.18, 'total': 7.98}, 'retries': 0, 'oom': 0}                                                               
22:43:27-691607 INFO     Saving: image="outputs/grids/00946-stable-diffusion-xl-base-1.0-beautiful forest-grid.jpg" type=JPEG resolution=2048x1024 size=0                                                                                   
22:43:59-708367 DEBUG    Server: alive=True jobs=1 requests=87 uptime=101 memory=7.43/15.01 backend=Backend.DIFFUSERS state=idle

Backend

Diffusers

Branch

Dev

Model

SD 1.5

Acknowledgements

I have read the above and searched for existing issues
I confirm that this is classified correctly and its not an extension issue

The text was updated successfully, but these errors were encountered:

vladmandic · 2024-02-19T23:06:54Z

i've implemented suggested workaround for now.

test result shows deepcache should be loaded before cpu_offloading if not, may cause issues like horseee/DeepCache#23 vladmandic/automatic#2888 Signed-off-by: wxiwnd <[email protected]>

vladmandic closed this as completed Feb 19, 2024

wxiwnd mentioned this issue Sep 7, 2024

DeepCache should be loaded first before cpu_offloading to avoid errors horseee/DeepCache#50

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Issue]: DeepCache with cpu offload generates distorted images after first batch #2888

[Issue]: DeepCache with cpu offload generates distorted images after first batch #2888

ledrose commented Feb 19, 2024 •

edited

Loading

vladmandic commented Feb 19, 2024

[Issue]: DeepCache with cpu offload generates distorted images after first batch #2888

[Issue]: DeepCache with cpu offload generates distorted images after first batch #2888

Comments

ledrose commented Feb 19, 2024 • edited Loading

Issue Description

Version Platform Description

Relevant log output

Backend

Branch

Model

Acknowledgements

vladmandic commented Feb 19, 2024

ledrose commented Feb 19, 2024 •

edited

Loading